This Master's Thesis has been written for the Master of Science Degree at ..... Digital Subscriber Line, the letter x before the abbreviation means all the DSL ...
Assuit University Faculty of engineering Electrical Engineering Department
Intrusion Detection System in Telecommunication Network A Thesis Submitted in partial fulfillment of the requirements for the degree
Master of Science Department of Electrical Engineering Faculty of Engineering, Assiut University, Assiut, Egypt Submitted by Eng. Mohamed Faisal Elrawy Demonstrator at the Faculty of Engineering at MUST University
2014 I
Assuit University Faculty of engineering Electrical Engineering Department
Intrusion Detection System in Telecommunication Network By Mohamed Faisal Elrawy B. Sc., Electrical Engineering (Electronics and communications) Assuit University Assuit, Egypt A Thesis submitted in partial fulfillment of the requirement for the degree of Master of Science Examined by: Supervised by:
Prof. El-Sayed M. El-Rabaie
Prof. Abdelfatah M. Mohamed
(Menoufia University)
(Assuit University)
Prof. Magdy M. Doss
Dr. Tarik Kamal Abdelhamid.
Prof. Abdelfatah M. Mohamed
(Assuit University)
Dr. Tarik Kamal Abdelhamid. (Assuit University)
2014 II
Abstract Data Security has become a very serious part of any organizational information system. Internet threats have become more intelligent, so it can deceive the basic security solutions such as firewalls and antivirus scanners. To enhance the overall security of the network an additional security layer such as intrusion detection system (IDS) has to be added. The anomaly detection IDS is a type of IDS that can differentiate between normal and abnormal network traffic in the data monitored. It can create a model of normal network traffic. This model can be used to compare with monitored traffic. The feature extraction is the basis of any anomaly detection IDS. The features should express all traffic flow as possible. The efficiency of intrusion detection depends on the features used in anomaly detection IDS so the challenge is if the features most suitable or not. Analysis of different attacks is a good method to find the suitable features which effect on the traffic flow. The common attacks: denial of service, probing, User to Root (U2R) and Remote to User (R2U) are tested in our experiment. This thesis proposes two types of IDS, one of them can be used as a network intrusion detection system (NIDS) with overall success (0.9161) and high detection rate (0.9288) and the other type can be used as a host intrusion detection system (HIDS) with overall success (0.8493) and very high detection rate (0.9628) using NSL-KDD data set. The result depends on the features selected in the two types of IDS.
III
Acknowledgments
This Master’s Thesis has been written for the Master of Science Degree at Assuit University. This thesis would not have finished without support from my supervisors; Prof. Abdelfatah M. Mohamed and Dr. Tarik Kamal Abdel Hamid.
Special thanks to my kind parents and my wife for their encouragement, help and continued support through carrying out this work. Finally, Thanks to everyone who helped me in carrying out this work to the fullest.
Mohammed Faisal Elrawy September 2014
IV
Table of Contents Page
Abstract ............................................................................................................. i Acknowledgments ........................................................................................... ii Table of Contents ........................................................................................... iii List of Symbols ............................................................................................... vi List of Tables.................................................................................................. vii List of Figures ............................................................................................... viii Terms and Abbreviations .............................................................................. ix Chapter 1: Introduction ................................................................................. 1 Chapter 2:Telecommunication Network and Its Threats ........................... 6 2.1 Telecommunication Network ................................................................ 6 2.2 Infrastructure of Telecommunication Network .................................. 7 2.2.1 Access Networks ................................................................................................................ 8 2.2.2 Core Network ..................................................................................................................... 9 2.2.3 Service Network ............................................................................................................... 10
2.3 Threats against Telecommunications Networks ............................... 11 Chapter 3:Intrusion Detection System........................................................ 14 3.1 Definitions of Intrusion Detection System ......................................... 14 3.1.1 Misuse Detection.............................................................................................................. 14 3.1.2 Anomaly Detection .......................................................................................................... 15
3.2 Anomaly Detection Approaches ......................................................... 17 3.2.1Data Mining based Anomaly Detection ............................................................................ 17 3.2.2 Threshold based Anomaly Detection ............................................................................... 17 3.2.3 Statistical based Anomaly Detection................................................................................ 17
III
3.2.4 Machine-learning based Anomaly Detection ................................................................... 18 3.2.5Rule-modeling based Anomaly Detection ........................................................................ 18
3.3 HIDS and NIDS Functions .................................................................. 19 3.3.1 Host-based IDS (HIDS) ................................................................................................... 19 3.3.2 Network-based IDS (NIDS) ............................................................................................. 20
3.4 Intrusion Detection Architecture ....................................................... 21 3.5 IDS and IPS Functions ........................................................................ 23 3.6 IDS placements in Telecommunication Network.............................. 24 3.7 Prior Research on Intrusion Detection .............................................. 26 3.7.1 Denning’s Model .............................................................................................................. 26 3.7.2 Network Intrusion Detection System Researches ............................................................ 27 3.7.3 Intrusion Detection System based on feature extraction Researches ............................... 28 3.7.4 Lincoln Laboratory Dataset.............................................................................................. 29
Chapter 4:Features for IDS.......................................................................... 31 4.1 Feature Extraction ............................................................................... 31 4.2 Feature Selection .................................................................................. 31 4.2.1 Features Selected Depend on Denial of Service Attack Scenario .................................... 34 4.2.2 Features Selected Depend on Reconnaissance Attack Scenario ...................................... 37 4.2.3 User to Root Attack Scenario ........................................................................................... 38 4.2.4 Remote to Local Attack Scenario..................................................................................... 39 4.2.5 Data Secret Attack Scenario............................................................................................. 39
4.3 Feature Reduction ................................................................................ 39 4.4 Feature Extraction in Telecommunication Network ........................ 40 4.5 Audit Data Sources .............................................................................. 41 4.5.1 Network Data ................................................................................................................... 42 4.5.2 Host-based Security Logs ................................................................................................ 42
4.6 Features used in Prior Art .................................................................. 43 4.6.1 Flow-based Features......................................................................................................... 43
IV
4.6.2 Packet-based Features ...................................................................................................... 51
Chapter 5: Experimental and Statistical Analysis .................................... 54 5.1 Anomaly Distance metric .................................................................... 54 5.2 Principal Component Analysis (PCA) ............................................... 55 5.2.1 Principal Component Analysis concept ........................................................................... 55 5.2.2 Applying PCA to Outlier Detection ................................................................................. 57 5.2.3 The offline and online detection phases for PCA............................................................. 58
5.3Data Set Description ............................................................................. 61 5.4 Performance Measures ........................................................................ 62 5.5 Experiment steps and results .............................................................. 63 Chapter 6:Conclusions and Future work ................................................... 68 References ...................................................................................................... 70 Appendix 1: Network Traffic Header Fields .............................................. 80 Appendix 2:Attacks in Lincoln Data 1999 .................................................. 82 Appendix 3:Decimal Number for Protocol Type and Flag Field in KDDCup 1999 Dataset........................................................................................... 84 Appendix 4: Matlab Program for Intrusion Detection ............................. 85 Appendix 5:Matlab Program for Measure the Performance of IDS ....... 90 Appendix 6: Acceptance Letter for Paper about IDS in Telecommunication Network Using PCA ................................................... 94 Appendix 7:Arbitration for Paper about IDS in Telecommunication Network Using PCA ...................................................................................... 95
V
List of Symbols R
Correlation matrix
λ
Eigenvalue
e
Eigenvector Correlation between any two variables i,j Standard deviation Major Principal Component Score threshold Minor principal component score threshold
x
An observation data
VI
List of Tables Page
Chapter4……….………………………………………………..31 Table (4.1) Features in KDD CUP 99 dataset……………………………45 Table (4.2) Features used in KDD CUP 99 studies………………………47 Table (4.3) Features Kabiri et al. used........................................................52
Chapter 5………….......................................................................54 Table (5.1) Feature used in our experiment……………………………....64 Table (5.2) Confusion matrix……………………………………………..65 Table (5.3) Detection attacks in all steps.....................................................66 Table (5.4) Metrics for all steps...................................................................66 Table (5.5) The comparison between classifiers for normal class………...67 Table (5.6) The comparison between classifiers for anomaly class……….67
VII
List of Figures Page
Chapter 2…………………………………………………………6 Figure (2.1) Infrastructure of telecommunication network………………..7 Figure (2.2) Evolution of radio access communication systems…………..8 Figure (2.3) Countries distributions have been targeted by attackers in a Sample month………………………………………………..12 Figure (2.4) Distributions of attack techniques in a sample month………13
Chapter 3………………………………………………………...14 Figure (3.1) HIDS placements in the network……………………………20 Figure (3.2) NIDS placements in the network……………………………21 Figure (3.3) General intrusion detection system architecture…………….22 Figure (3.4) IPS operational actions………………………………………24 Figure (3.4) NIDS and HIDS locations in telecommunications networks……………………………………………………..25
Chapter 4…………………………………………………………31 Figure (4.1) Feature selection……………………………………………..33 Figure (4.2) Feature reduction…………………………………………….40
Chapter 5…………………………………………………………54 Figure (5.1) Applying PCA for intrusion detection system……………….60
VIII
Terms and Abbreviations 2G
2nd Generation Mobile Communications
3G
3rd Generation Mobile Communications
AAA
Authentication, Authorization and Accounting
AD
Anomaly detection
AFRL
Air Force Research Laboratory
BN
Bayesian Network
BSC
Base Station Controller
BTS
Base Transceiver Station
CART
Classification and Regression Tree
CIDS
Central Intrusion Detection System
CPU
Central Processing Unit
CSP
Communications Service Provider
DARPA
Defense Advanced Research Projects Agency
DNS
Domain Name System
DoS
Denial of Service
DPI
Deep Packet Inspection
DR
Detection Rate
ePDG
Evolved Packet Data Gateway
Feature
Synonym to variable, descriptor and parameter used in
IDS. GPRS
General Packet Radio Service
GSM
Global System for Mobile communication
HIDS
Host based intrusion detection system
IX
HLR
Home Location Register
HSGW
HRPD Serving Gateway
HSPA
High speed packet access
HSS
Home Subscriber Server
ICMP
Internet Control Message Protocol
ID
Intrusion Detection
IDS
Intrusion Detection System
IMS
IP Multimedia Subsystem
IP
Internet Protocol
IPS
Intrusion Prevention System
LTE
Long Term Evolution
MME
Mobility Management Entity
N-CDMA
Narrowband Code Division Multiple Access
NIDES
Next-Generation Intrusion Detection Expert System
NIDS
Network based Intrusion Detection System
NTRA
National Telecommunication Regulatory Authority
P-GW
Packet Data Network Gateway
PCA
Principal Component Analysis
PSO
Particle Swarm Optimization
RAN
Radio Access Network
RNC
Radio Network Controller
RST
Rough Set Theory
S-GW
Serving Gateway
SGSN
Serving GPRS Support Node
X
SSL
Secure Socket Layer
SVM
Support Vector Machines
TCP
Transmission Control Protocol
TDMA
Time division multiple access
UE
User Equipment
VoIP
Voice over IP
VLR
Visitor Location Register
VPN
Virtual Private Network
WLAN
Wireless Local Area Network
xDSL
Digital Subscriber Line, the letter x before the abbreviation means all the DSL technologies such as ADSL, ADSL2 etc.
XI
Chapter 1 Introduction Due to the growing demand for data and video services and the limitations of the circuit-switched technology, telecommunication operators find it economically expensive to expand their circuit-switched networks to meet demands. This has led to a gradual move towards the adoption of packet-based switching technology. The telecommunications networks have been developed from circuit switched network to packet switched network, after that it has mutations enormous towards all-IP based networks. These developments make the communication of applications and services such as data and voice are being transferred on top of the IP-protocol [1]. The development of devices that are used by the subscribers of telecommunications networks make the boundary between computers and mobile phones unspecified. With the smart phones, the subscriber can do almost everything and can dispense on the basic personal computers. This means that the full data on the Internet is now in the hands of each smart phone owners. Technologies in communications networks have become more progress and it has raised new unwanted possibilities. Risks and threats that were applicable only in the fixed networks are now feasible in the radio access networks. The security systems have to become more intelligent because of threats are becoming more advanced.
1
The basic security measurements such as firewalls and antivirus scanners cannot keep pace with the overgrowing number of intelligent attacks from the Internet. A solution to enhance the overall security of the networks is to add an additional security layer to increase the security layers by using intrusion detection systems (IDS). Intrusion Detection System (IDS) is designed to complement other security measures based on attack prevention [2]. Amparo Alonso-Betanzos et al. [3] Say ‘The aim of the IDS is to inform the system administrator of any suspicious activities and to recommend specific actions to prevent or stop the intrusion’. Think of intrusion detection as a security guard for an example to understand the function of intrusion detection system in a telecommunication network. The guard is guarding the front gate of a company building. The building of the company represents the network of a mobile operator and the fence surrounding the company is the operator’s firewall. Employees of the company represent the traffic in the operator’s network. It is known that companies are well protected and they do not let people without authorization inside the building. The fence is responsible for keeping all unwanted visitors outside the company building. Just like in a firewall, a fence has holes (gates) in it to let employees move in and out of the company building. These holes in the fence though leave the company vulnerable to the unwanted visitors and this is why the company has a security guard guarding the gate. The security guard is monitoring the people going in and out of the company building. He notifies the head of security when he detects a
2
suspicious looking person walking through the gate, Then he takes steps to prevent this person from entering the company building. The basic functionality of an intrusion detection system is the first example of the security guard. IDS generates an alarm when it detects something suspicious and then the security administrator of the network examines the cause of the alarm. The security guard is depending on a set of rules and instructions to do his job.
IDS In telecommunication network use rules and instructions as
algorithms to analyze network traffic. The Challenge is the definition of these rules and instructions and the criteria to decide the features that should be monitored. There are two types of intrusion detection, one of them is signature-based and the other is anomaly-based intrusion detection [4]. The signature-based or misuse detection method uses patterns of well-known attacks to identify intrusions [5]. The anomaly-based intrusion detection uses network traffic which has been monitored and compared versus any deviation from the established normal usage patterns to determine whether the current state of the network is anomalous. An anomalous traffic can be considered as intrusion attempt. Misuse detection uses well-defined patterns known as signatures of the attacks. Anomaly-based detection builds a normal profile and anomalous traffic detected when the deviation from the normal model reaches a preset threshold level [6]. The anomaly-based intrusion detection depends on features selection. Well selection of features will maintain accuracy of the detection while speeding up
3
its calculations. Therefore, any reduction in the number of features used for the detection will improve the overall performance of the IDS. If there are no useless features, focusing on the most important ones, expected to improve the execution speed of IDS. This increase in the detection speed will not affect accuracy of the detection in a significant way. Incorrect selection of the features may reduce the speed of the operation and reduce detection accuracy [7]. This thesis compares between two different features selections, i.e. 6 features and 10 features. One of this features selections can be used in Network Intrusion Detection System (NIDS) and the other can be used in Host Intrusion Detection System (HIDS) in the environment of telecommunications networks. The rest of the thesis is organized in the following manner: Chapter 2 introduces the basics of telecommunication network and the threats against it. Chapter 3 introduces the role of intrusion detection system and types of intrusion detection system are discussed. In addition, a discussion on the prior art of research on the field of intrusion detection is given. Chapter 4 gives an overview of feature extraction methods for intrusion detection systems, the attack scenarios and what challenges the environment sets to the extraction. In addition, the features used in the research field of network based and host based intrusion detection systems are discussed at the end of this chapter.
4
Chapter 5 discusses the multivariate statistical analysis and the principal component analysis as a feature reduction method. The experiment and results of feature selections performance are discussed at the end of this chapter. Chapter 6 presents conclusion and future work.
5
Chapter2 Telecommunication Network and Its Threats Definition of telecommunication network and the Infrastructure of telecommunication network with its sub networks will be discussed in this chapter. In addition a brief overview of threats against telecommunications network will be discussed at the end of this chapter. 2.1 Telecommunication Network The National Telecommunication Regulatory Authority of Egypt (NTRA) defined telecommunication as any mean for transmitting or receiving signs, signals, messages, texts, images or sounds of whatsoever nature whether through wired or wireless communication. NTRA defined Telecommunication Service as providing or operating telecommunication through whatsoever mean and defined Telecommunication Network as The system or the group of integrated systems for telecommunication including any needed infrastructure [8]. The NTRA is Responsible for keeping up with the technical and technological advancement in the telecommunication field in compliance with health and environmental standards and is setting up the rules guaranteeing users protection that will ensure telecommunication confidentiality, providing the most advanced services at most suitable prices, ensuring high quality of these services and setting a system for users' complaints reception, investigation and follow up with service providers [8]. According to the Egypt
6
telecommunication regulation law, NTRA is responsible for obtaining a license to establish telecommunication network infrastructure in Egypt.
2.2 Infrastructure of Telecommunication Network The IEEE dictionary defines telecommunications as “the transmission of signals over long distance, such as by telegraph, radio or television” [9]. Telecommunication networks are today an inseparable part of social interaction and critical national infrastructure. Telecommunication network looks like the enterprise network with small differences. In enterprise networks, the computers and user’s fixed equipments connected together with routers, switches and interconnected subnets. In telecommunication network, there are multiple radio access networks (RAN) from GSM to LTE and a huge amount of fixed and mobile users. The infrastructure of telecommunication network can be divided into three sub-networks as shown in figure (2.1).
Figure 2.1 Infrastructure of telecommunication network [10].
7
2.2.1 Access Networks This is the part of the network that connects the telecommunication equipment (fixed or mobile) to the core network to supply the subscriber with services. Access networks can be divided into fixed line access networks (Ethernet, xDSL and Cable) and into radio access networks (2G, 3G, LTE, CDMA and WLAN). Figure (2.2) shows the Evolution of radio access communication systems.
Figure 2.2 Evolution of radio access communication systems [10]
The radio access network has been developing towards all IP based network, but older radio techniques have to be supported. According to global GSM incremental market analysis [11] done by ZTE, in 2010 the GSM and 2G are still the most commonly used techniques to use calling and data services globally. Over 80% of global mobile subscribers use only GSM
8
accounts while 3G and CDMA share the rest 20%.The 3G has rich of applications so the ratio of the share of GSM subscribers will be decreased to 56.4% by the end of 2013 [11]. The radio access networks have different communication systems so these networks have still different base stations and different controller stations. In 2G, the radio access network has a base transceiver station (BTS) as a base station and base station controller (BSC) as a controller station. In 3G, the radio access network has Node B as a base station and radio network controller (RNC) as a controller station. In LTE, the radio access network has evolved Node B as a base station. In LTE and CDMA all the mobility management operations are handled by mobility management entity (MME) in the core network. User’s equipments such as mobile phones, laptops, etc. with combination of the access network are a subscriber network in a telecommunication network.
2.2.2 Core Network The core network consists of the network elements responsible for service delivery and setting up of the end-to-end connection and handovers [12]. The core network is classified into circuit-switched and packet-switched domains. The core network includes components such as switches, the Mobile Switching Centre (MSC), the Host Location Register (HLR), the Visitor Location Register, subscriber charging, AAA services, subscriber mobility management services and policy and charging rules function (PCRF) for quality of service.
9
The core network is a complex environment because of the wide variety of access networks. The core network has to support (2G, 3G) radio access techniques where voice and data are separated between packet-switched and circuit-switched networks. The serving GPRS support node handle voice calls depend on circuit-switching operations and data transmissions depend on the packet-switching operations in 2G network. The serving GPRS support node handle voice calls depend on circuit-switching operations, but data transmissions are handled by serving gateway together with packet data network gateway in 3G network. The core network has to provide services for the newer radio access networks (LTE) where voice and data are not separated. Serving gateway together with packet data network gateway handle voice and data because they are not separated in LTE network. High rate packet data serving gateway is providing voice and data operations for the CDMA radio networks. Evolved packet data gateway is providing packet data operations for the WLAN.
2.2.3 Service Network Service Network consists of end-user application servers, systems and services [12]. Service network is responsible for providing access to company intranets, operator specific services and connection to the Internet. Service network is also responsible for access to IP multimedia subsystem (IMS) for multimedia and voice applications such as VoIP.
10
2.3 Threats against Telecommunications Networks Both the traditional circuit-switched networks and the packet-based next generation networks are exposed to different threats and attacks that target the various parts of the telecommunications network. These attacks may be targeted at any part of the telecommunications network such as the radio path of the access network. Attacks on one part in telecommunications network could also spread to multiple networks over the interconnection interfaces. Some of the threats to the telecommunications network are discussed in the following paragraphs. Interception of voice traffic or signaling system in PSTN networks is due to absence of encryption for speech channels and inadequate authentication, integrity and confidentiality for the messages transmitted over the signaling system. This threat is leading to unauthorized access to telecommunication network traffic [12]. Use of modified mobile stations or modified base stations to exploit weaknesses in the authentication of messages received over the radio interface. The attacker can spoof the user registration and location update requests. This threat is leading to unreliable service, disruption in service (Denial of service) and interception of traffic [12]. Unauthorized physical access to switching infrastructure and other critical telecommunications network equipment such as AUC, HLR and VLR. This threat is leading to misuse of telecommunication infrastructure, destroy or theft of information and equipment, illegal tapping and interception of the network traffic [12].
11
Social engineering attacks on operator employees. This threat is leading to gain unauthorized access to confidential information or greater privileges to the network systems [12]. The most dangerous threat is the deployment of malicious applications on devices such as smart phones and tablets. Attackers can use these devices to target the telecommunication network or any part of it, so attackers can intrude into the network and change the users’ service profiles such as billing [12]. According to [13], The United States of America (USA) is the most country that has been targeted by attackers in September 2013. The Countries distributions have been targeted by attackers in the sample month shown in figure (2.3). The distributions of attack techniques in a sample month are shown in figure (2.4).
Figure 2.3 Countries distributions have been targeted by attackers in a sample month [13].
12
Figure 2.4 Distributions of attack techniques in a sample month [13].
According to CERT [14] the attack development has increased during the past 30 years, but the intruder knowledge has been coming down at the same time. Because of the wide availability of freely distributed applications, the intruder knowledge takes downward trend. The reason for increase of complexity of attacks can be the fact that the internet users have become more intelligent [14]. In modern communication networks, the networks are vulnerable to threats even when they are not connected to the Internet, so it is a very important security problem. One example of this security problem is the internet worm STUXNET that was discovered in July 2009 [15]. STUXNET is designed to target control system, especially industrial installations such as uranium enrichment plants. The great property of STUXNET is the function of selfreplicating. STUXNET can replicate itself into USB devices and network shares and spread itself into networks that are not directly connected to the Internet [15].
13
Chapter 3 Intrusion Detection System
Introduction to role of intrusion detection system and types of intrusion detection system are discussed in this chapter. In addition, a discussion on the prior art of research on the field of intrusion detection is given. 3.1Definitions of Intrusion Detection System Intrusion detection system (IDS) is a network security solution to protect availability, integrity and confidentiality for information resources. IDS captures a copy from the monitored traffic and analyzes this copy then can respond to threats. Threats are intrusions or anomalies in the network environment. IDS is a type of security monitoring tool like a firewall and antivirus that tries to detect and prevent malicious activity. The two main techniques used by Intrusion Detection Systems for detecting attacks are Misuse Detection and Anomaly Detection. In a misuse detection system, the system has signatures for all well known attacks. An anomalybased intrusion detection system inspects the monitoring traffic for malicious activities or anomaly behavior in networks or systems.
3.1.1 Misuse Detection Misuse detection is depending on signature methods to detect intrusions. A signature is a pattern of activity which corresponds to intrusion. The IDS
14
identifies intrusions by looking for these patterns in the data being analyzed. The accuracy of such a system depends on its signature database. Misuse detection cannot detect novel attacks as well as slight variations of known attacks. These patterns can be such as certain character strings in IP packet contents [16]. The misuse detection used to detect known attacks so it can be used to analyze network traffic efficiently for known intrusions. Most of the available IDSes use misuse detection because it is easier to match activities based on known attack patterns rather than finding out whether the ongoing activity is malicious or not without previous knowledge. The misuse detection IDS needs to be updated regularly for the latest patterns and signatures.
3.1.2 Anomaly Detection Anomaly-based or profile-based signatures typically looking for network traffic that deviates from normal behavior model. The main principle is that the attack behavior differs enough from normal user behavior. Anomaly-based IDS create standard behavior model depend on normal behavior in telecommunication network. Anomaly-based IDS can detect intrusions when the current behaviors move away statistically from the normal behavior model. The anomaly based IDS has the ability to detect new attacks for which the signatures have not been created [16]. The main disadvantage of this method is that there is no clear cut method for defining normal behavior. So any deviations from this normal model are rare and potentially might be a result of intrusive activity.
15
The anomaly detection requires two phases to detect intrusions. In the first phase (offline training or learning phase) a model of normal network traffic is created. This model can be derived or learned from training data using model generation algorithms or mathematical models. In the second phase traffic is monitored for deviations from the normal model. The model of normal network traffic is created by using features from the traffic. Feature in the context of anomaly detection means a value or symbol which describes the network traffic. These features should represent most of the traffic behavior and characteristics. Creating a model of the normal network traffic, there are two properties that must be exist in normal network traffic. One of them is network traffic should be clean from malicious activities and the other is network traffic that have all the normal variations of the network environment. The network element failures and significantly performance fluctuation can be considered as normal network traffic [17]. After the model of normal network traffic is created, any network traffic deviation from the model can be monitored. Any action that significantly deviates from the normal behavior can be considered as an intrusion. The difference between an anomaly and intrusion is depending on the environment. Intrusion is always a deviation from normal behavior so it is always an anomaly. But an anomaly is not always an intrusion. For example, a failure of a network element might cause abnormal activity in the network, but it is not an intrusion [17].
16
3.2Anomaly Detection Approaches In anomaly detection, there are a wide variety of approaches to choose from, some of these approaches will be discussed in the following paragraphs. 3.2.1 Data Mining based Anomaly Detection The main goal of introducing data mining in intrusion detection is to develop an automated approach for building the intrusion detection models. Data mining generally refers to the process of extracting the descriptive models from the large stores of data that describe the behavior of network traffic for user or programs. Creating classifiers depend on features which maps a data item into one of several predefined categories. Data mining is an example of a method that combines algorithms used in different methods like in machine-learning and statistical methods [18]. 3.2.2 Threshold based Anomaly Detection Threshold based Anomaly Detection defines a threshold [18]. If any case crossed the threshold, then this case is marked as an anomaly. This threshold is a rule that is created based on statistics. For example, if this method is applied on the CPU usage on a network element, then the threshold can set a rule which says that CPU usage cannot be more than 80 percent. An alarm is triggered once this threshold is crossed. 3.2.3 Statistical based Anomaly Detection Statistically speaking, an anomaly is an observation which is suspected of being partially or totally irrelevant because it is not generated by the stochastic
17
model [19]. Statistical methods create a statistical model for normal behavior to the given data. Statistical based methods create models depending on these statistics. If there are any deviations between these models and the current situation, then the current situation is considered as anomalies. The severity grade of the deviation is depending on these statistics. The anomaly that has more severity has higher grade [19]. 3.2.4 Machine-learning based Anomaly Detection Machine learning techniques are based on establishing an explicit or implicit model that enables the patterns analyzed to be categorized [20]. These models are depending on the past behavior of network traffic. Machinelearning based uses a specific learning algorithm. For example, previously recorded data sets containing network traffic create a model of normal behavior. After the learning period the detector monitors deviations from this created model. Machine learning based Anomaly Detection has the ability to change its execution strategy as it gains new information. Example, some application is distributed to all local machines in the network [20]. 3.2.5Rule-modeling based Anomaly Detection Rule based anomaly detection techniques learn rules that capture the normal behavior of a system. A test instance that is not covered by any such rule is considered as an anomaly [21]. Rule-based anomaly detection depends on analyzing historical audit records to identify usage patterns and to generate automatically rules that describe those patterns. Rules can also be defined to identify suspicious behaviors. This method which is similar to the statistical
18
anomaly detection does not require knowledge of security vulnerabilities within the system [21]. All of these methods have their advantages and disadvantages depending on what is the monitoring target. In some cases a combination of different methods is more suitable. The environment and its features have to be evaluated in order to choose the most efficient setup to detect intrusions in that specific environment. 3.3HIDS and NIDS Functions There are network based (NIDS) and host based (HIDS) intrusion detection systems. The functions of these Systems will be discussed in the following paragraphs. 3.3.1 Host-based IDS (HIDS) A HIDS monitors and analyzes the internals of a computing system rather than its external interfaces [19]. The most advantage of HIDS is that it can monitor operating system processes and protect critical system resources, including files that may exist only on that specific host. An example is a word processor that suddenly and inexplicably starts modifying the system password database. HIDS can combine the best features of antivirus and behavioral analysis in one package. One can think of a HIDS as an agent that monitors anything internal of a computing system or external interfaces want to break the security policy that the operating system tries to enforce. HIDS placements in the network are shown in Figure (3.1).
19
Figure 3.1HIDS placements in the network [22]
3.3.2 Network-based IDS (NIDS) A NIDS deals with detecting intrusions in network data [19]. Network IPS consists of monitoring devices or sensors that capture and analyze the traffic throughout the network. Monitoring devices or sensors detect malicious and unauthorized activity in the network. The NIDS reads all incoming packets or flows, trying to find suspicious patterns. For example, if a large number of TCP connection requests to a very large number of different ports are observed within a short time, so it could be an attack to some of the computers in the network [19]. Network-based monitoring systems monitor packets that are traveling through the network for anomaly signs of intrusive activity. On the other hand, host-based monitoring systems monitor information on the local host or operating system. NIDS placements in the network are shown in Figure (3.2).
20
Figure 3.2 NIDS placements in the network [22]
3.4 Intrusion Detection Architecture Intrusion detection systems are divided into four components: data abstraction, anomaly detection, signature detection and intrusion arbiter. Data abstraction is collecting and processing data, such as network traffic in NIDS, log files and system trace files in HIDS. Data abstraction is divided into three levels: packet level, connection level and feature level. In packet level, the packet data are extracted from live packet stream. The connection level has the pre-processing model that converts packet data to flow data. In feature level, the flow data feature is extracted. Anomaly detection is divided into two groups: outlier based methods and statistical modeling. Signature detection depends on signature methods to
21
detect intrusions. Intrusion arbiter can govern on anomaly as an intrusion or not by collecting the judgments from anomaly detection and signature detection [4]. The General architecture of intrusion detection system is illustrated in Figure (3.3). In this thesis, we concentrate on anomaly detection with outlier based methods when the IDS is in passive mode. If the IDS is a reactive mode, then it can do an action when intrusion is detected such as preventing data to path through the network.
Figure 3.3General intrusion detection system architecture [4]
22
3.5 IDS and IPS Functions Nowadays, information security systems technology use Intrusion detection system and Intrusion prevention system. Functions of Intrusion detection system (IDS) and Intrusion prevention system (IPS) will be discussed in the following paragraph. Intrusion detection system is the act of detecting unwanted traffic on a network or a device. An Intrusion Detection System can be a piece of installed software or a physical appliance that monitors network traffic in order to detect unwanted activity and events such as illegal and malicious traffic [22]. The functions of IDS are recording information related to observed events, notifying administrators of important observed events and producing reports. Intrusion prevention system is the process of both detecting intrusion activities or threats and managing responsive actions on those detected intrusions and threats throughout the network [22]. IPS monitor packet traffic to detect malicious activities, then generate a trigger alerts. In addition, intrusion prevention system can drop and block this packet traffic in real time that passes through in the network. IPS can prevent the access of the intrusive data to the system, depending on different methods such as counter measures. The functions of IPS are: detects and takes preventive actions against malicious attacks, stops the attack itself and changes the security environment. The IPS sensor analyzes at Layer 2 to Layer 7 the payload of the packets for more sophisticated embedded attacks that might include malicious data. This deeper analysis lets the IPS identify, stop and block attacks that would
23
normally pass through a traditional firewall device [23]. IPS Operational actions show in Figure (3.4).
Figure 3.4 IPS operational actions [23] 3.6 IDS placements in Telecommunication Network IDS placements in telecommunication network depend on the role of IDS in this network. When the role of IDS is to protect an element that provides important services to the network, IDS will be placed on that element and this type of IDS is called Host based IDS (HIDS). When the role of IDS is to protect the network, some elements and resources, to make balanced between
24
network coverage and allocated resources, then IDS will placed on the network path and this type of IDS is called network based IDS (NIDS). HIDS monitors for intrusions that try to gain access to the core elements such as gateways, HLR/VLR and subscriber charging. These elements provide important services to the network, so any intrusion attacks these elements will harm the overall operation of the network. Host-based IDS is the best IDS type that could be used in important core network elements. Host-based IDS could also be used in the user equipments such as mobile phones and computers. HIDS located inside the core network and subscriber equipments to make an additional security layer to these hosts. NIDS that located in the path of subscribers’ networks enhances the overall security to subscribers, NIDS that located in the path between the internet and the core network monitors the gateway towards Internet and extranets. Figure (3.5) shows NIDS and HIDS locations in telecommunications networks based on Cisco’s IDS sensor deployment considerations in [24].
Figure 3.5NIDS and HIDS locations in telecommunications networks [24]
25
In additional to that, there are two different models: centralized IDS model and distributed IDS model. In centralized IDS model one dedicated IDS is used. This IDS monitors all the incoming and outgoing links in the network. The centralized IDS provides easier operation and management functionality, but when the network size and the amount of traffic are huge, centralized IDS makes overload in the network. The distributed IDS model consists of distributed IDS agents and central IDS. An IDS agent collects information from the network, and then transmits this information to the central IDS. IDS agents known as an IDS sensor could be used as sensors to pre-analyze network traffic and generate alarms from detecting intrusions. The distributed IDS model is able to scale up into large size networks. Distributed IDS model is preferred than central IDS model when the network size grows. 3.7 Prior Research on Intrusion Detection The field of intrusion detection system in network security has been developed for 30 years. A number of methods and techniques have been proposed and many systems have been affected by a variety of intrusions. Denning introduced an intrusion-detection model known as Denning’s model in 1987 [25]. 3.7.1 Denning’s Model Denning presented an idea that malicious behavior could be perceived from system use by comparing it against a model of a normal system use. In [26] they design wavelet analysis based IDS depending on Denning’s idea.
26
Axelsson [27] published a survey on intrusion detection systems in the year 2000 in which he listed 20 research projects from the years 1988 to 1998. From the 20 studies on IDS there were 14 that were completely host based, three that operated both in the host and in network and two that were completely network based [27]. The first challenge is that as Denning’s model is designed to be a model for host based IDS, so without modifications it might not be usable as a basis for network IDS. The second challenge for them was that Denning’s model was created in 1987 when detecting system behavior on a local machine was more important than analyzing the network traffic. The model itself might be too old to meet the requirements of modern environments.
3.7.2 Network Intrusion Detection System Researches Meanwhile, A. Abraham et al. [28] have proved that ensemble of Decision: Tree was suitable for Normal, LGP for Probe, DoS, R2L and Fuzzy classifier were good for R2L attacks. A. Abraham et al. [29] have proved the ability of their suggested method on ensemble structure in modeling lightweight distributed IDS. Gyanchandani, Manasiet al. [30] have improved the performance of C4.5 classifier over NSL-KDD dataset using different classifier combination techniques such as bagging, boosting and stacking. Gholam, Reza Zargar [2] have showed that dimension reduction and identification of effective network features for category-based selection can reduce the processing time in an
27
intrusion detection system while maintaining the detection accuracy within an acceptable range. Bhuyan, M. et al. in [19] have presented a survey on many detection methods, systems and tools. In addition, they have discussed several evaluation criteria for testing the performance of a detection method or system. They provided a brief description of the different existing datasets and its taxonomy. Snort is an IDS that combines signature, protocol and anomaly based intrusion detection methods to efficiently detect and prevent intrusions [31]. Snort has been developed by Sourcefire that also regularly provides rule updates to Snort [32].
3.7.3 Intrusion Detection System based on feature extraction Researches Chakraborty [33] has reported that the existence of irrelevant and redundant features generally affects the performance of machine learning part of the work. Chakraborty proved that good selection of the feature set results in better classification performance. A. H. Sung et al. [34] have demonstrated that the elimination of these unimportant and irrelevant features did not reduce the performance of the IDS. Sung and Mukkamala [34], have explored SVM and Neural Networks that can categorize features with respect to their importance. They have used SVM and Neural Networks to detect specific kinds of attacks such as probing, DoS, Remote to Local, and User to Root. Also, they have proved that the elimination of less importance and irrelevant features has no effect on reducing the performance of the IDS.
28
Chebrolu et al. [35] have reported that an important advantage of combining redundant and complementary classifiers is to increase accuracy and better overall generalization. Chebrolu et al. [35] have also identified important input features in building IDS that are computationally efficient and effective. This work shows the performance of three feature selection algorithms: (1) Bayesian networks, (2) Classification and Regression Trees and (3) an ensemble of Bayesian networks and classification and regression Trees. Chebrolu et al. [35] have suggested CART-BN approach, where CART has a better performance for Normal, Probe and U2R and the ensemble approach worked has a better performance for R2L and DoS.
3.7.4 Lincoln Laboratory Dataset Lincoln laboratory data sets are the first standard corpora for evaluation of computer network intrusion detection systems and were created under the sponsorship of Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL) [36]. Lincoln laboratory has collected two datasets in consecutive years in 1998 and 1999. The 1998 dataset contains seven weeks of training data and two weeks of testing data which contain network traffic and operating system logs. These datasets contain labeled anomalies and network attacks mixed with normal network traffic. Similarly, the 1999 data set contains five weeks of training data and testing data, but in addition to 1998’s data set, the 1999 contains also attack free training data. This attack free data can be used by IDS to create a model of normal network traffic [36].
29
In addition, the Lincoln laboratory 1998 dataset is also converted into connection-based dataset which is also known as KDD cup 1999 [37]. The KDD (‘‘Knowledge Discovery in Databases’’) burst onto the scene, to ‘‘identify new, valid, potentially useful and comprehensible patterns for data’’ (Fayyad et al.). Data mining techniques appeared as a particular case of KDD (Lee and Stolfo); these consisted of ‘‘learning algorithms to large data repositories
with
the
purpose
of
automatically
discovering
useful
information’’. As a specific use case, KDD and data mining have been widely applied in the last few years to correlate traffic instances in network related databases. The KDD cup dataset will be discussed in chapter 4.
30
Chapter 4 Features for IDS Anomaly detection based IDS depend on creating a normal model to detect intrusion in traffic. Intrusion detection systems need to have predefined methods to choose among the huge amount of data going through the network that relevant for monitoring. Choosing the relevant features for the IDS depend on the feature extraction methods used. 4.1 Feature Extraction The traditional goal of the feature extractor is to characterize an object to be recognized by measurements whose values are very similar to objects in the same category and very different for objects in different categories [38]. The basic principle in feature extraction is fewer features make the IDS faster and the more features make the IDS less accurate. The performance of an IDS depends on feature analysis methods used. Network traffic contains features that are redundant, so reducing the amount of features improves the overall performance and computational speed [39]. Feature extraction can be divided into two groups; one of them is feature selection and other is feature reduction. 4.2 Feature Selection Feature selection is the important step in building intrusion detection models. The most effective features are extracted in order to construct suitable detection algorithms. Not all features are relevant to the learning algorithm,
31
and some features are redundant so it can introduce a noisy data to the learning algorithm [40]. Feature selection is a method of identifying most relevant features from a set of given features. The importance of feature selection is taken into account mainly for improving detection rate and detection accuracy in addition to reducing computation time and data size [41]. Feature selections have been essentially divided into features extracted manually from the data monitored or by using a specific feature selection tool. The features are selected manually from the feature spectrum depend on the prior knowledge about the network environment. Intrusion detection systems can either have univariate approach or a multivariate approach to detect intrusions depending on the algorithm used. The univariate approach takes a single variable of the system and uses this variable in the algorithm to detect intrusions. The multivariate approach takes a combination of several features and their inter-correlations to analyses. In our experiment, the multivariate approach is used. The feature selection tools reduce the amount of features into a feasible subset of features that do not correlate with each other. Bayesian networks (BN) and classification and regression tree (CART) are some examples of feature selection tools. A Bayesian network is a probabilistic graphical model that represents the probabilistic relationships between features [42]. CART is a technique that uses tree-building algorithms to construct a tree-like then prediction patterns that can be used to determine different classes from the dataset [43].
32
The feature selection process is illustrated in Figure 4.1, on the left there are the features (F0… FN) that is available from the data monitored, on the right side are the outputs (F0...FM) of the selection tool. The number of features in the output varies based on the selection tool used and the inter correlation of features in the input. The number of features in the output is less or equal than the number of features in the input.
Figure 4.1 Feature selection
If the Lincoln laboratory dataset is taken as an example, the feature selection tool will choose features from the network traffic header fields such as the IP source address, source port number and other features described in Appendix 1. One method to select features is done by analyzing the attacks within the Lincoln laboratory 1999 dataset and how each attack are affecting the network traffic [44]. In [44] feature selection is depending on prior knowledge from other IDS researches then decide which features should be used in the anomaly detection by using the anomaly detection test bench for mobile network management (ADAI) [45]. In this thesis the feature selection is done by studying the attack scenarios within NSL-KDD dataset (see in chapter 5). The features used in this thesis
33
are flow-based features. There are many advantages in using flow data instead of packet data. Reducing the storage space that needed for data is the major advantage of using flow data. Network flows require a one tenth of the original packet-based data which is a huge difference [44]. Another advantage is that the flow data do not contain payload data, so the user privacy is no longer a problem. The traffic volumes such as the number of packets and bytes between destinations are easily extractable from the flow data so extra calculation is not needed. The disadvantage of using flow data is the loss of individual packet information such as the size of the packet and the structure of the packets. The structure and size of packets can be monitored by other methods such as misuse detection based IDSes [44].
4.2.1Features Selected Depend on Denial of Service Attack Scenario As one of most common and aggressive means, denial-of-service (DoS) attacks cause serious impact on these computing systems [46]. Denial of service (DoS) attacks affect the usability and reach ability of network services such as web, mail, voice and data so these attacks affect the stability of services supported by the Communication service provider (CSP) [44]. There are some patterns or some feature that might detect this attack depending on this attack scenario [47]. According to Depren et al. [48] some of the DoS attacks are detectable by monitoring from the traffic flows and the amount of data received by the destination in comparison to the amount of data sent by the source. In normal case the amount of sent data is around 40-50 bytes and as well the amounts of
34
received data are around 40-50 bytes. In a DoS case, the amount of bytes sent remains on the same level of 40-50 bytes, but the amount of bytes received is zero [48]. The NSL-KDD dataset contains multiple DoS attacks that use different methods and techniques to attack the targeted host or service. In the following paragraphs some of the attacks and their effect on the network traffic are discussed. SYN-flooding called Neptune together with the IP spoofing attack is an important example of a DoS. In SYN-flooding the attacker sends multiple SYN-messages to the targeted server with a spoofed IP source address. The server tries to respond to these SYN-messages with a SYN-ACK-message and waits for an ACK - message from the source address which is spoofed, so the server will never get an answer to the SYN-ACK-message [49]. The server creates a transmission control block (TCB) state that is reserved for each connection and is released after the connection is closed (received an ACKmessage). If the attacker keeps on sending SYN-messages, the TCB-table begins to fill, so any further coming connections are rejected. The TCB is emptied within a certain period of time, but this does not help if the attacker keeps on sending SYN-messages with a spoofed IP address [49]. In Smurf attack the targeted host is flooded with multiple ICMP response messages from multiple sources [50]. The attack requires three entities: the attacker, middleman and the destination. The attacker sends ICMP echo request packets to the middleman with the target host as the source address. The middleman then sends response messages to the targeted host. The
35
attacker needs to send multiple messages to multiple middlemen to cause a DoS scenario to the target. These middlemen would then send a large number of response messages to the target that it would not be able to cope with the number of receiving messages. This kind of distributed attack is also known as distributed DoS. Smurf attack can be detected when a large number of ICMP echo replies are sent to a single destination [50]. Using HTTP, it is possible to cause a DoS attack state. This can be achieved by inserting multiple headers into a single HTTP-request message [51]. In NSL-KDD dataset the attack Apache2 sends an HTTP-request that contains 10000 headers in a single message. ICMP-messages with a larger payload than 64kB might cause unpredictable reactions in the targeted systems. This attack is called ping of death. A malformed ICMP-message caused freeze, reboot or crash on the destination system [51]. The land attack requires only a single packet sent to the destination, so the attacker sends a TCP SYN-message that has the same address as the source and destination. This attack is not anymore feasible as the new systems can cope with these messages [52]. A flash crowd attack is not exist in attacks within NSL-KDD dataset, but is a very common root cause for a DoS state in a service. Flash crowd is an attack that is based on the massive amount of people requesting a connection or service from a single destination [53]. When the amount of requests becomes too large for the destination to handle, it causes a DoS state. Flash crowd attacks can be detected from the network traffic amounts and especially for the amount of service requests within a short period of time [53].
36
In general, most of the DoS attacks require sending multiple packets to the targeted host. To detect those attacks, the IDS should monitor the amount of received and sent bytes, number of connections from multiple sources to a single destination, number of packets to a single destination, number of flows to a single destination, ICMP packet size that is not detectable in flow data and malformed packets such as HTTP messages with multiple header fields that is not detectable in flow data.
4.2.2 Features Selected Depend on Reconnaissance Attack Scenario Attackers can better plan their attack when they first know the layout of the targeted network (which IP addresses have active hosts), the possible entry points (which port numbers are active on the active hosts) and the constitution of their victims (which operating system the active hosts are running). To gain this information, attackers must perform reconnaissance [54]. Network mapping is an attack that maps out the infrastructure of the network. Probing is an attack that tries to find out information from a single computer. Probing and network mapping are examples of reconnaissance attacks. Almost reconnaissance attacks are the first attacks which attacker used to know all information on targets. Then he attacks the targets by other attacks such as denial of service attack. Sending a single ICMP echo request message is the most common way to know if there is a computer having this IP-address or not. This kind of network mapping attack is easily detected by firewalls.
37
Reconnaissance attack can reconnaissance through a port that uses TCPprotocol then this attack make communications through this port which allowed passing firewalls, so it can be a dangerous attack. This attack is done by sending messages through this port so it is possible to map every computer inside the network [55]. In order to detect these reconnaissance attacks the IDS have to be keeping track of the connection status. Port scanning attack can scan all or just a specific group of ports from the targeted computer to find out if there is an open port or service. This information is very important for attacker because it can find out what are the known vulnerabilities with the specific version of the service [56]. IDS can detect those attacks by monitoring the state of the connections, number of ports accessed by a single source, number of ICMP packets from a single source and TCP flag combinations. Password guessing attack is a combination of DoS and probing attacks which are targeting services in the monitored network. In password guessing attack, the attacker tries to gain an unauthorized access to services on the communication service provider. To detect this attack, IDS should monitor number of service request, number of packets to a single service and number of flows to a single service.
4.2.3 User to Root Attack Scenario User to Root (U2R) attack is a class of exploit in which the attacker starts with access to a normal user account on the system. U2R attack is detectable with misuse based IDS from the network traffic packets. Attacks belonging to
38
this category have a special pattern or a string in the payload [36]. An anomaly detection based IDS, which does not monitor packet payload, cannot detect the most of these attacks belonging to this category. 4.2.4 Remote to local Attack Scenario Remote to local (R2l) attack occurs when an attacker, who has the ability to send packets to a machine over a network but who does not have an account on that machine, exploits some vulnerability to gain local access as a user of that machine. R2l attack is similar to U2R attack, so they are detectable only from the payload data by looking for specific patterns [36]. 4.2.5 Data Secret Attack Scenario Data Secret Attack contains an attack known as “secret” in which the attacker tries to transfer data from a legal place to an illegal place. In order to detect these actions, the system needs to know which files are secret. A host based IDS, which would monitor actions on the use of these files, is required to detect this attack [36]. 4.3 Feature Reduction Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties [57]. Feature reduction means that a number of different features are monitored during a certain period of time then a new set of features is calculated from this monitored data. The feature reduction methods reduce the total number of features used by creating
39
a new set of features based on the features available from the data monitored. The feature reduction tool could monitor a specific destination and a specific feature within a certain period of time. Then a new feature is available for the IDS. Principal Component Analysis (PCA) is an approach for finding optimal linear transformations. PCA is an example of a feature reduction method. PCA is an algorithm that checks and converts the data set for all the correlated variables into a set of uncorrelated variables, also known as principal components [58]. PCA looks for a projection that best represents the data in a least squares sense. The feature reduction process is illustrated in Figure 4.2. On the left there are the features (F0… FN) those features are available from the monitored data. On the right is the output (V0…VL) of the reduction tool. The number of features in the output is less or equal than in the input. The new features (V0…VL), can be calculated based on a single feature or a combination of multiple features (F0…FN).
Figure 4.2 Feature reduction 4.4 Feature Extraction in Telecommunication Network The feature extraction in Telecommunication Network is difficult because the network traffic contains user confidential information. Deep packet
40
analysis cannot be done, so only a limited analysis of the network traffic can be done. The subscribers’ data in the payload cannot be checked, so the researcher suggested using flow data instead of packet data. Telecommunication network have a huge amount of data flowing through the mobile operator’s network, so the problem is, which features must be calculated or analyzed. Telecommunications networks link traffic can reach up to 150 Gbps traffic rate but Sourcefire’s IPS is capable to monitor network traffic speeds from 5Mbps up to 20 Gbps [32]. In order to cover the whole bandwidth, the traffic needs to be divided and monitored by multiple IDSes. Collection of the information provided by the IDSes adds another challenge for anomaly detection process. Based on prior researches on IDSes, There are different kinds of methods and algorithms to select and reduce features but it still have to develop for more efficiency. The Information about the features should be used for monitoring in prior researches depend on the network environment analysis, but those features cannot be adaptive with all networks environments, especially in telecommunications networks. So in this thesis, study attacks scenarios to select features in experiments are used. 4.5 Audit Data Sources Intrusion detection system depends on data features to detect intrusions and anomalies. Data features have a wide variety of different data sources in telecommunications networks. The audit data sources are divided into two main sources; network data and host-based security logs [27].
41
4.5.1 Network Data Routers, switches and most of network elements providing network data using packet capturers or flow capturers depend on hardware methods. Wireshark and Tcpdump providing network data using packet capturers depend on software methods. NetFlow is a Cisco product that can capture network traffic flows [59]. The flow capturers monitor packet data and create flow information based on the communication between two endpoints. The concept of flow depends on the monitoring system. For example, the communications time between two endpoints, when a flow ends and a new one begins, is called idle time that changes depending on monitoring systems.
4.5.2 Host-based Security Logs Host-based Logs are designed to record events that have occurred within an organization’s system or network. Host-based log records contain the host management of system and network, so Host-based logs are used for security purposes. UNIX user account logs are an example of Host-based security Logs that are used in anomaly detection to identify network user behavior patterns and changes in the user’s behavior from the normal pattern [60]. Intrusion detection system, Firewall, proxy server and authentication server are host-based security software. Firewalls generate logs of events about suspicious or malicious activity. Proxy servers generate log user credentials from the persons who are accessing web resources and log from network connections and web requests associated with these connections. In general,
42
security software provides security information logs that can be used for security purposes [61]. The operating systems and applications generate information logs from system events and audit records [61]. The applications are operating on the top of the operating system. Computers, mobile devices, servers and networking devices are operated by an operating system. The information logs for these devices can be collected from the devices’ operating systems and applications. In telecommunication network, AAA-server, databases, gateway, SGSN, MME, Charging and HLR/VLR are telecommunication network elements that could be used in the intrusion and anomaly detection. These elements generate security and management logs that can be used in the intrusion and anomaly detection. These elements' events are generated from successfully or unsuccessfully completed actions from the system and services that are running on these elements [61]. 4.6 Features Used in Prior Art Features used in prior research on IDSes are depending on features based on flow data and features based on packet data. These two categories will be discussed in the following paragraphs. 4.6.1 Flow-based Features KDD cup 1999 dataset is an example of feature reduction. KDD CUP 1999 studies are widely used in evaluating the IDSes’ performance and detection rate. Most of the studies on IDSes are only describing the achieved results of the IDS not how they managed to reach these results. In most of the studies,
43
the researchers use all or just a specific amount of features from the 41 features in KDD cup without describing the purpose of this selection. The researchers select some parts of KDD cup database and make the experiment on these parts without describing the purpose of this selection. In general, KDD cup dataset is very popular because it has a lot of studies available on the Internet which do provide detailed information about KDD cup 1999 dataset [62]. The features extracted for the KDD cup 1999 dataset are listed in Table (4.1). The features in Table (4.1) were converted into data flows from the packet data in the 1998 Lincoln laboratory dataset using a Bro-IDS [63]. KDD cup 1999 dataset can be classified into four categories when ignore the fields with symbolic values [62]: Group I: fields {A, B, C, D, E, F, G, H} Group I represents the basic characteristics of a connection. The protocol type field is 3 bits. It has only three different values: TCP, UDP, and ICMP; 100 for TCP, 010 for UDP, and 001 for ICMP. Service field has 70 different services in this field, for example, Auth, FTP, HTTP, and Telnet, etc. The 70 services are ordered in alphabetical order, from 1 to 70, e.g. Auth ordered as No. 2, FTP No. 17, http No. 22, and Telnet No. 59. Flag field is 11 bits, it has 11 distinct flags for the fields. They are OTH, REJ, RSTO, RSTOS0, RSTR, S0, S1, S2, S3, SF, and SH. 1 bit of the 11 bits is set to high to represent one of flags (as the same to protocol type field). In the experiment, all binary bits are converted to decimal numbers that show in Appendix 3.
44
Group II: fields {from K to T} Group II cannot be obtained by looking at the traffic records alone, so host based logs is needed. Group III: fields {from W to AE} Group III is a time based traffic features. It is the statistics of traffic features in the previous 2 second time window. The calculation is based on source IP address. Group IV: fields {from AF to AO} Group IV is time based traffic features. It is the statistics of traffic features in the previous 2 second time window. The calculation is based on destination IP address.
Table 4.1 Features in KDD CUP 99 dataset [37] Label Feature A duration B protocol_type C service D flag E src_bytes F dst_bytes G land H wrong_fragment I urgent J hot K num_failed_logins L logged_in M num_compromised N root_shell
Label Feature O su_attempted P num_root Q num_file_creations R num_shells S num_access_files T num_outbound_cmds U is_hot_login V is_guest_login W count X srv_count Y serror_rate Z srv_serror_rate AA rerror_rate AB srv_rerror_rate
45
Label Feature AC same_srv_rate AD diff_srv_rate AE srv_diff_host_rate AF dst_host_count AG dst_host_srv_count AH dst_host_same_srv_rate AI dst_host_diff_srv_rate AJ dst_host_same_src_port_rate AK dst_host_srv_diff_host_rate AL dst_host_serror_rate AM dst_host_srv_serror_rate AN dst_host_rerror_rate AO dst_host_srv_rerror_rate
Gyanchandani et al. [30] have evaluated the performance of C4.5 classifier and its combination using bagging, boosting and stacking over NSLKDD dataset for IDS. NSLKDD dataset consists of selected records of the complete KDD dataset. NSLKDD dataset will be discussed in chapter 5. They used the 41 features in the NSLKDD cup dataset to study the detection rate of IDS. Mukkamala et al. [34] have used two feature ranking and selection methods to choose feature subsets for each attack type group in KDD CUP 1999 database. These feature selection methods were performed-based ranking method (PBRM) and support vector decision function ranking method (SVDFRM). In addition to that, they are compared between three selections methods: SVM, SVM (PBMR) and SVM (SVDFMR). Zainal et al. [64] have used five optimal feature subsets extracted from the 41 features in the KDD cup dataset to study the detection rate of IDSes. They used particle swarm optimization (PSO), rough set theory (RST), support vector decision function ranking (SVDF), linear genetic programming (LGP) and multivariate regression splines (MARS) to select the six most important features for each subset. They extracted two of the optimal feature subsets by using particle swarm optimization (PSO) and rough set theory (RST). The remaining three subsets were chosen according to the study by Sung et al. [65] in which they used support vector decision function ranking (SVDF), linear genetic programming (LGP) and multivariate regression splines (MARS) to select optimal feature subsets. Chebrolu et al. [35] have evaluated the performance of two feature selection algorithms, Bayesian networks (BN) and classification and regression trees
46
(CART). In addition that, they compared between three selections methods: BN, CART and BN+CART. Their conclusions were that the detection rate changes significantly between the feature selection methods. IDS should have modules and each module would use different feature subsets to detect a specific group of the attack categories. In the experiment, which will be discussed in chapter 5, after studying the attack scenario, 6 features will be selected in the first step and 10 features in second step as shown in Table (4.2). The features used in KDD cup database by previous selection methods are shown in Table (4.2). Table 4.2 Features used in KDD CUP 99 studies Method Gyanchandani SVDF
No. features 41 ALL 6 B,D,E,W,X,AG
Features
MARS
6
LGP
6
E,X,AA,AG,AH,AI C,E,L,AA,AE,AI
Rough set
6
D,E,W,X,AI,AJ
Rough PSO
6
B,D,X,AA,AH,AI
SVM SVM (PBMR) SVM (SVDFMR) BN 1
41
ALL
31
41
A,C,E,F,H,I,J,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,AA ,AB,AC,AF,AG,AI,AJ,AL,AM,AN,AO A,B,C,D,E,F,J,L,Q,W,X,Y,Z,AA,AB,AC,AE,A F,AG,AH,AJ,AL,AM ALL
BN 2
17
A,B,C,E,G,H,K,L,N,Q,V,W,X,Y,Z,AD,AF
23
47
BN 3
12
C,E,F,L,W,X,Y,AB,AE,AF,AG,AI
CART 1
41
ALL
CART 2
17
A,B,C,E,G,H,K,L,N,Q,V,W,X,Y,Z,AD,AF
CART 3
12
C,E,F,L,W,X,Y,AB,AE,AF,AG,AI
BN+CART 1
41
ALL
BN+CART 2
17
A,B,C,E,G,H,K,L,N,Q,V,W,X,Y,Z,AD,AF
BN+CART 3
12
C,E,F,L,W,X,Y,AB,AE,AF,AG,AI
Our first step
6
A,B,C,D,E,F
Our second step
10
A,B,C,D,E,F,W,X,AF,AG
Lakhina et al. [66] have analyzed the events that affect the distribution of traffic features by monitoring network-wide backbone traffic using the following IP packet header data. The classification is done by multiway subspace method together with the k-means clustering algorithm [66]. The multiway subspace method is able to isolate correlated changes on the four IP packet header features between traffic flows. Lakhinaet al. have marked the events that affect the distribution of traffic features as anomalies [66]. Anomalies are grouped into seven categories based on the type of the detected attack such as DoS, Flash Crowd, port scan, network scans, outage events and worms. Features used in this method are: I. Source IP address II. Destination IP address III. Source port number
48
IV. Destination port number. Fontugne et al. [67] have used the same four IP packet header data features in their image processing-based approach to detect anomalies. They compared their proposed anomaly detection method against a statistically-based method proposed by Dewaele et al. [68]. The comparison was done using a network traffic data collected from Trans-Pacific. Fontugne et al. [67] have categorized the anomalies results into 15 categories based on the type of the detected attack instead of seven groups in Lakhina method [66]. Gorton [69] have used two detection methods to analyze a router log data; a single event and threshold analysis. The single event analysis raises a flag of intrusive activity when a single event is discovered. The threshold analysis intrusive activity is flagged with respect to accumulated activities. In his analysis he collected SYSLOG messages from Cisco routers and transformed the log data into a set of features. Gorton by event analysis was able to detect spoofed connection attempts, connection attempts to known vulnerable ports, TCP-broadcasting, ICMPecho request and UDP echo request. Gorton by threshold analysis was able to detect SYN flooding, network mapping and port scans [69]. Features used in this method are: I. Time from the SYSLOG II. A status that can be either permitted or deny III. Protocol identifier IV. Type of service V. Source IP address
49
VI. Source port number VII. Destination IP address VIII. Destination port number IX. Number of ICMP messages X. Number of packets. Knuuti [70] has compared the usability and performance of Snort [31], BroIDS [71] and TRCNetADIDSes in a large IP networks. Snort and Bro-IDS are capable of analyzing traffic in real-time when TRCNetAD is a non-real-time anomaly detection based IDS. Bro-IDS detected approximately eight thousand intrusions which were address and port scan. TRCNetAD detected 150 thousand anomalies during the same time period. Snort detected over 1.5 million intrusions during the one-week traffic capturing period. TRCNetAD was able to detect some of the port and address scans that Bro-IDS discovered but there were no similarities between Snort’s and TRCNetAD’s findings. Knuuti has generated time series that are 60 minutes long in order to create clusters and analyze the data with self-organizing maps [70]. Knuuti has evaluated alarm similarities between the Snort, Bro-IDS and TRCNetAD detectors. Features that Knuuti used are [70]: I. IP address II. Time stamps III. Number of ICMP packets IV. Number of UDP flows V. Number of TCP connections
50
VI. Amount of receiving data VII. Amount of sending data VIII. Number of received packets IX. Number of sent packets X. Number of different port numbers used over 1024 XI. Number of port numbers used over 1024 XII. Number of different port numbers used below 1024 XIII. Number of port numbers used below 1024 XIV. Number of receiving sequences from different IP’s XV. Number of receiving sequences XVI. Number of sending sequences to different IP’s XVII. Number of sending sequences.
4.6.2 Packet-based Features Kabiri et al. [72] have a research on identifying effective features for intrusion detection using two related research for detecting probing attacks [73] and for detecting Smurf attacks [74] and the results of these researches are used in [72]. Kabiri et al. [72] have used Lincoln laboratory dataset 1998 to select optimal features from the IP and TCP packet header fields. They have used 32 basic features that they extracted from network traffic header fields show in Appendix 1. They also have investigated the information value of each category and used principal component analysis (PCA) method to select
51
optimal feature subsets from the 32 features for each of the five categories in the Lincoln laboratory dataset. Their conclusion for future work stated that these features should be experimented with an intrusion detection system. In addition a comparison of accuracy and efficiency should be done using the feature subsets and by using all the 32 features. The suggested feature subsets are listed in Table 4.3.
Table 4.3 Features Kabiri et al. used [72] No. Feature 1 Protocol
DoS x
U2R
x
X
R2L
Probing X
5
Coloring_rule_name
10
IP_Total_Lenght
12
MF_Flag_IP
X
X
13
DF_Flag_IP
X
X
16
Protocol_no
x
19
Stream_index
x
24
Urgent_flag
25
Ack_flag
X
26
Psh_flag
X
27
Rst_flag
28
Syn_flag
29
Fin_flag
Normal
X X
X
X
X x
X
X X X
52
X
X
Carrascal et al. [75] have used machine-learning based method to detect intrusions. They have used self-organizing maps together with learning vector quantization in their method. They have combined features that have multiple parameters such as TCP flags and TCP options into single features. Most of the features are self-explanatory, but the coded features are not as clear. The authors did not explain in details how the codification is done so only the exact features can be guessed were in reality. They used Lincoln laboratory data sets to test their anomaly detection efficiency. Features used in this method: I. Codification of TCP flags II. IP protocol number III. IP type of service IV. TCP window packet size V. Codification of source port and source IP address with destination port and destination IP address VI. Destination port VII. Source port VIII. Source IP IX. Destination IP X. Codification of TCP options.
53
Chapter 5 Experimental and Statistical Analysis The experiment is depending on study attacks in NSL-KDD database described in Appendix 2 to select the best features that can express all traffic flow as possible because the efficiency of detection intrusion depends on the features used in anomaly detection IDS. Detection of anomaly is depending on statistical analysis in our IDS. Statistical analysis is described in Principal Component Analysis method and Performance Measures for our IDS. 5.1 Anomaly Distance metric Many multivariate techniques applied to the anomaly detection problem are based upon the concept of distances. The most familiar distance metric is the Euclidean or straight-line distance. In most cases, it is used as a measure of similarity in the nearest neighbor method. Let x = (x1, x2, x3, …,xp) ´ and y = (y1, y2, y3, …, yp) ´ be two p-dimensional observations, the Euclidean distance between x and y is d2 (x, y) = (x − y)′ (x − y)
(1)
Since each feature contributes equally to the calculation of the Euclidean distance, this distance is undesirable when different features measured on different scales or the features have very different variability. The effect of the features that have high variability or large scales of measurement would control others that have less variability or smaller scales. As an alternative, a
54
measure of variability can be incorporated into the distance metric directly. One of these metrics is the well-known Mahalanobis distance d2 (x, y) = (x − y)′ S−1 (x − y)
(2)
Where S is the sample covariance matrix. 5.2 Principal Component Analysis (PCA) Naturally in intrusion detection problems data are found in high dimensions. To easily explore the data and further analysis, the dimensionality of the data must be reduced. The PCA is often used for this purpose. PCA is a predominant linear dimensionality reduction technique, and it has been widely applied to datasets in many different scientific domains [76]. 5.2.1 Principal Component Analysis concept PCA is concerned with explaining the variance covariance structure of a set of variables through a few new variables, which are linear combinations of the original variables. Principal components are particular linear combinations of the p random variables {x1, x2, x3, …,xp} with three important properties. The first one is, the principal components are uncorrelated. The second one is, the first principal component has the highest variance and the second principal component has the second highest variance and so on. The third one is, the total variation in all the principal components combined equal to the total variation in the original variables {X1, X2, X3, …,Xp}. The new variables with such properties are easily obtained from eigen analysis of the covariance matrix or the correlation matrix of {X1, X2, X3, …, Xp } [58]. Let the original
55
data X be a n x p data matrix of n observations on each of p variables (X 1, X2, …, Xp) and let R be a p x p sample correlation matrix of X1, X2 , …, Xp. If (λ1, e1), (λ2, e2), (λ3, e3), … (λp, ep) are the p eigenvalue and eigenvector Pairs of the matrix R, λ ≥ λ ≥λ ≥ … ≥ λ ≥ 0, then ith sample principal component of an observation vector x= (x1, x2, x3, …, xp) ʹ is yi = e′i z yi = ei1z1 + ei2 z2 + ei3z3 +...+ eipzp , i =1,2,3,.., p =(
,
,
,...,
(3)
)′ is the ith eigenvector.
Z = ( , z2, z3, …,zp )′ is the vector of standardized observations defined as ̅ , k=1, 2, 3, ..., p
(4)
Where ̅ is the sample mean of the variable . The ith principal component has sample variance
and the sample covariance or correlation of any pair of
principal components is equal to zero. The PCA produces a set of independent variables so the total variance of a sample is the sum of all the variances accounted for by the principal components [58]. The correlation between any two variables is (5) Where
is the standard deviation of
which is a sample of data. The
principal components of the sample correlation matrix have the same properties as principal components from a sample covariance matrix. As all
56
principal components are uncorrelated, the total variance in all of the principal components is λ
λ
λ
(6)
The principal components produced by the covariance matrix are different from the principal components produced by the correlation matrix. Eigenvalues have larger weights because of some values that are much larger than others [58]. Since The NSL-KDD data set has many items with varying scales and ranges, so the correlation matrix will be useful.
5.2.2 Applying PCA to Outlier Detection PCA applied as an outlier detection method. In applying PCA, there are two main issues, (1) how to interpret the set of principal components and (2) how to calculate the notion of distance [58]. First, each eigenvalue of a principal component corresponds to the relative amount of variation it encompasses. The eigenvector correspond to the larger eigenvalue should be more significant. Therefore, the most significant principal components are sorted before the least significant principal components. If a new data item is projected along the upper set of the significant principal components, it is likely that the data item can be classified without projecting along all of the principal components. Second, the data sample can be represented by the axes of eigenvectors of the principal components. Those axes are considering a normal when the data sample is the training set of normal network
57
connections. If any point is located outside these axes by far distance, then the data connection would exhibit abnormal data connection. Outliers measured using the Mahalanob is distance are presumably network connections that are anomalous [77]. Any network connection with a distance greater than the threshold value (t) is considered an outlier. In this work, any outlier represents an attack. Consider the sample principal components of an observation x where yi = e′i z , i =1,2,... , p ̅ , k=1, 2, 3, ..., p The sum of scores that are squares of the partial principal component is equal to the principal component score. ∑
λ
λ
λ
(7)
λ
This sum equates to the Mahanobolis distance of the observation X from the mean of the normal sample data set [77]. 5.2.3 The offline and online detection phases for PCA Anomaly detections need an offline training or learning phase whether these methods are outlier models or statistical models. PCA has two clearly separate phases (the offline and online detection phases) [58]. These two separate phases are an advantage for hardware implementation. Another advantage of PCA is reduction of features. As it is shown in the experiment in this chapter,
58
PCA effectively reduces the number of processed features from 41 to 10 or 6 features. The outline steps involved in PCA are shown in figure (5.1). Training data are taken as an input and a mean vector of each sample is calculated in the offline phase. Ideally, these data sets are a snapshot of activity connections in a real network environment. In addition, these data sets should contain only normal connections. Second, correlation matrixes are calculated from the training data. A correlation matrix normalizes all of the data by calculating the standard deviation [58]. Next, eigen analysis is performed on the correlation matrix to create independent orthonormal eigenvalue and eigenvector pairs. The set of principal components can be used in online analysis because of these pairs. Finally, the sets of principal components are sorted by eigenvalue in descending order. The eigenvalue is a relative measure of the variance of its corresponding eigenvectors. Dimensionality-reducing method such as PCA is used to extract the most significant principal components, so only a subset of the most important principal components needs to classify any new data. In addition to using the most significant principal components (q) to find intrusions, it has been found that it is helpful to look for intrusions along a number of least-significant components (r) as well. The major principal component score is calculated by the most significant principal components and the minor principal component score is calculated by the least significant principal components [58]. Major principal component
59
score (MajC) is used to detect severe deviations with large values of the original features. These observations follow the correlation structure of the sample data. Minor principal component score (MinC) is used to detect attacks that may not follow the same correlation model. In this work, two thresholds are needed to detect attacks. If the principal components sorted in descending order, then (q) is a subset of the highest values and is a subset of the smallest components. The MajC threshold is referred to as ( threshold is referred to as ( ∑
λ
Or ∑
) while the MinC
. An observation (x) is an attack if (8)
λ
Figure 5.1 Applying PCA for intrusion detection system
The online portion takes major principal components, minor principal components and maps online data into the eigen space of those principal components [58].
60
5.3 Data Set Description Mostly all the experiments on intrusion detection are done on KDDCUP ’99 dataset, which is a subset of the 1998 DARPA intrusion detection evaluation data set and is processed extracting 41 features from the raw data of DARPA 98 data set. Higher level features are defined to help in differentiating between “good” normal connections from “bad” attacks connections [78]. KDDCUP 99 dataset can be used in host-based systems, network-based systems, signature systems and anomaly detection systems. A connection is a sequence of Transmission Control Protocol (TCP) packets starting and ending with the time between which data come from a source IP address to a target IP address under some protocol. Each connection is described as a normal or as an attack with defined the attack type. Each connection record consists of about 100 bytes [37]. KDD train and test set contains a huge number of records and huge number of redundant records. Almost about 78% and 75% of the records are duplicated in the train and test set respectively. The classification will be wrong because of these redundant records and thus these records prevent classifying the other records that is not redundant. To solve this problem, a new dataset was developed NSL-KDD [79]. One copy of each repeated record was not removed in the KDD train and test set. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable [79].
61
5.4 Performance Measures The Metrics that are mainly used to evaluate the performance of classifiers are presented in [80], [81] and are given here in the following text. • The true positives (TP) are correct classifications and true negatives (TN) are correct classifications. True positive is the probability that there is an alert, when there is an intrusion. • A false negative (FN) occurs when the outcome is incorrectly predicted as negative when it is actually positive. • The true positive rate (TPR) is computed as (9) • A false positive (FP) occurs when the outcome is incorrectly predicted positive when it is actually negative. The false positive rate is computed as (10) • A Recall (R) represents the percentage of the total relevant documents in a database retrieved by the user’s search and is computed as (11) • A Precision (P) represents the percentage of relevant documents in relation to the number of documents retrieved and is calculated as
62
(12) • F-Score scores the balance between precision and recall. The F-Score is a measure of the accuracy of a test. It is calculated as (13) • The overall success rate is the number of correct classifications divided by the total number of classifications is calculated as (14) (15)
5.5 Experiment steps and results In the experiments in this chapter NSLKDD database is used, file (KDDTrain_20Percent), [82] in both the training and testing stages. The KDDTrain_20Percent contain 25192 connections records. The training data sets contain records of network connections labeled either as normal or as an attack. Each connection record is made up of 41 different features related to the connection. The 41 features are divided into three categories: basic features of TCP connections (1)*, content features of the connection (2)*, and traffic features (3)* which are derived using a 2-s time window to monitor the relationships between connections. The same service and the same host information are
63
included in the traffic-level features such as the number of connections in the past 2 s that have the same destination host as the current connection. First, 6 features are selected from the basic features of TCP connections which are used with NIDS because these features do not need any host logs. Second, 4 features are added from traffic features, which based on time window, used in HIDS is shown in Table (5.1).
Table 5.1 Features used in our experiment Feature name Duration Protocol Type Service Src-bytes Dst-bytes Flag Count
Srv-count
Dst-host-count Dst-host-srvcount
Description Number of seconds of the connection Type of the protocol, e.g. tcp, udp, icmp. Network service on the destination, e.g., http, telnet, https, etc. Number of data bytes from source to destination. Number of data bytes from destination to source. Normal or error status of the connection Number of connections from the same source as the current connection in the past two seconds. Number of connections to the same service as the current connection in the past two seconds from the same source. Number of connections to the same host as the current connection in the past two seconds. Number of connections to the same service as the current connection in the past two seconds to the same host.
64
Type Continuous (1)* Discrete (1)* Discrete (1)* Continuous (1)* Continuous (1)* Discrete (1)* Continuous (3)*
Continuous (3)*
Continuous (3)*
Continuous (3)*
A Matlab program used to design and test our IDS is shown in Appendix 4 and Appendix 5 respectively. Based on [83], the researcher suggested using (q) major components that can explain about 50 - 70 percents of the total variation in the standardized features. When the original features are uncorrelated, each principal component from the correlation matrix has an eigenvalue equal to 1. So the minor components are those components whose variances or eigenvalues are less than 0.20, which would indicate some relationships among the features (r). First step, 6 features are selected and the researcher suggested using q = 3, r =0. Second step, 4 features are added and the researcher suggested using q= 3, r =2. In a multiclass prediction, the result on a test set is often displayed as a two dimensional confusion matrix with a row and a column for each class. Each matrix element shows the number of test examples for which the actual class is the row and the predicted class is the column. Good results correspond to large numbers down the main diagonal and small, ideally zero, off-diagonal elements [83]. The confusion matrix is shown on the Table (5.2). The Performance Measures are shown in Table (5.3) and Table (5.4).
Table 5.2 Confusion matrix Actual Anomaly Class Actual Normal Class
Predicted Anomaly Class Attack Normal Attack TP FN Normal FP TN Predicted Normal Class Normal Attack Normal TP FN Attack FP TN
65
Table 5.3 Detection attacks in all steps [84] Attacks
DoS
PROBE
R2l
U2r
Exist
9234
2289
209
11
8666
2212
28
1
9028
2244
32
2
Detection from step (1) Detection from step (2)
Table 5.4 Metrics for all steps [84]
Metrics Recall and TPR FPR Precision Overall success Error
Step (1) Normal Anomaly class class
Step (2) Normal Anomaly class class
0.9050
0.9288
0.7503
0.9628
0.0712 0.9357
0.0949 0.8952
0.0372 0.9584
0.2496 0.7719
91.61 %
84.93 %
8.39 %
15.07 %
The result is compared with four statistical classifier models. The classifier models are C4.5 algorithm with classifier combinations techniques: bagging, boosting and stacking [30]. Both recall and precision have good value in these two steps, but one of steps can be used as NIDS, another can be used as HIDS which has a better detection rate. The results in two steps have the best Fscore are shown in Table (5.5) and Table (5.6).
66
Table 5.5 The comparison between classifiers for normal class Classifiers
Recall
Precision
F-score
Bagging
0.719
0.973
0.8269
Boosting
0.677
0.957
0.7930
Stacking
0.693
0.974
0.8098
C4.5
0.708
0.973
0.8196
Our first step
0.9050
0.9357
0.9201
Our second step
0.7503
0.9584
0.8418
Table 5.6The comparison between classifiers for anomaly class Classifiers
Recall
Precision
F-score
Bagging
0.972
0.712
0.8219
Boosting
0.953
0.654
0.7756
Stacking
0.971
0.674
0.7956
C4.5
0.971
0.696
0.8108
Our first step
0.9288
0.8952
0.9117
Our second step
0.9628
0.7719
0.8568
In the experiment in this chapter, multivariate approach is used by using feature selection techniques and feature reduction techniques. The features are selected depending on attack scenarios and Principal Component Analysis (PCA) method is used as feature reduction techniques.
67
Chapter 6 Conclusions and Future work Future intrusion detection system generation will most likely employ both signature detection and anomaly detection modules. Anomaly detection methods process a large amount of data in order to recognize anomalous behavior or new attacks. PCA is an effective way in outlier analysis. PCA is particularly useful because of its ability to reduce data dimensionality into a smaller set of independent variables from which new data can be classified. PCA is used as a feature reduction technique. The selected features, which used in the experiment, depend on studying scenarios of attacks in NSL-KDD dataset. This thesis has two steps in its experiment. The first step takes six features from the basic features of TCP connections that can be used in NIDS and this step has an overall success rate (0.9161) with high detection rate (0.9288). The second step takes ten features {six features from the basic features of TCP connections plus four features from traffic features} that can be used in HIDS and this step has an overall success rate (0.8493) with very high detection rate for anomaly class (0.9628). The aim of this thesis is to improve the intrusion detection system by using Principal Component Analysis as a dimension reduction technique. The thesis compares between two different features selections. One of this features selections can be used in (NIDS) and the other can be used in (HIDS).
68
For the future work, the relationship between these two steps can be utilized to make an integrated intrusion detection system. This system can be then applied on the field programmable gate array (FPGA) to improve the detection rate of the system on a real time basis. The future system is divided into three FPGA. First FPGA is NIDS, second FPGA is HIDS and third FPGA is central IDS which will collect information from first and second FPGA to prevent intrusions that are most dangerous. When collecting information, in the first step, the third FPGA can be used as a central IDS. While in the second step, it can be used as an intrusion prevention system.
69
References [1]
Kumar, A., Maurya, H. C., Misra, R. A Research Paper on Hybrid Intrusion Detection System, International Journal of Engineering and Advanced Technology (IJEAT), volume-2, Issue-4, ISSN: 2249-895, April 2013
[2]
Zargar, G. R. Category Based Intrusion Detection Using PCA, International Journal of Information Security, 3, 259-271, October 2012
[3]
Amparo, A. B., Noelia, S. M., Félix, M. C., Juan, A. S. and Beatriz, P. S. Classification of Computer Intrusions Using Functional Networks a Comparative Study, Proceedings of European Symposium on Artificial Neural Networks (ESANN), Bruges, pp. 579- 584, 25-27 April 2007
[4]
Das, A., Nguyen, D., Zambreno, J., Memik, G. and Choudhary, A. An FPGA-Based Network Intrusion Detection Architecture, IEEE Transactions on Information Forensics and Security, Vol. 3, No. 1, pp. 118-132, 2008
[5]
Ilgun, K., Kemmerer, R. A. and Porras, P. A. State Transition Analysis: A Rule-Based Intrusion Detection Approach, IEEE Transaction on Software Engineering, Vol. 21, No. 3, pp. 181-199, 1995
[6]
Guyon, I. and Elisseff, A. An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, Vol. 3, pp. 11571182, 2003
[7]
Chou, T. S. Yen, K. K. and Luo, J. Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms, International Journal of Computational Intelligence, Vol. 4, No. 3, pp. 196-208, 2008
[8]
Egypt Telecommunication Regulation Law, Law No. 10 of 2003, 2013
[9]
IEEE Standard Dictionary of Electrical and Electronics Terms, 6th ed., IEEE Std. 100-1996, IEEE, New York, 1996.
70
[10] ALTERA measurable advantage, [WWW], [Cited 2014-6-02]. Available at: http://www.altera.com/end-markets/wireless/cellular/wircellular-infrastructure.html [11]
ZTE, Global GSM Incremental Market Analysis, [WWW], [Cited 2013-10-08]. Available at: http://wwwen.zte.com.cn/endata/magazine/ztetechnologies/2010/no4
[12] Rajmohan, c., Subramanya, G. and Sharma, N. Telecommunication networks: security management, TATA Consultancy Services (TCS) white paper, pp. 1- 16, 2012
[13] Hack Mageddon site, [WWW], [Cited 2013-10-15].Available at: http://Hackmageddon.com [14] CERT Coordination Center, Vulnerability Discovery: Bridging the Gap Between Analysis and Engineering, [PDF], [Cited 2014-6-2]. Available at: http://www.cert.org/archive/pdf/CERTCC_Vulnerability_Discovery.pdf [15] McAfee Labs, McAfee Threats Report: Third Quarter 2010, [WWW], [Cited 2014-6-2]. Available at: http://www.mcafee.com/us/threat_center/white_paper.html [16] Mostaque, Md. and Morshedur, H. Current Studies on Intrusion Detection System, Genetic Algorithm and Fuzzy Logic, International Journal of Distributed and Parallel Systems (IJDPS), Vol.4, No.2, March 2013 [17] Gates, C. and Taylor, C. Challenging the anomaly detection paradigm: a provocative discussion, In Proc. of ACM Workshop on New Security Paradigms 2006, Schloss Dagstuhl, Germany, September 2006.
71
[18] Wang, K. and Stolfo, S. J. Anomalous Payload-based Intrusion Detection, Computer Science Department, Columbia University, New York, 2004 [19] Bhuyan, M.H., Bhattacharyya, D.K. and Kalita, J.K. Network Anomaly Detection : Methods, Systems and Tools, Communications Surveys & Tutorials, IEEE ,Volume: 16 , Issue: 1, pp. 303 – 336,2014 [20] Garcı´a, T. P., Dı´az, V. J., Macia´ F. G. and Va´zquez E. Anomalybased network intrusion detection: Techniques, systems and challenges, computers & security, Volume 28, Issues 1–2, pp. 18-28, 2009 [21] Chandola, V., Banerjee, A. and Kumar, V. Anomaly Detection: A Survey, ACM Computing Surveys, technical report, September 2009. [22] Chakraborty, N. Intrusion Detection System and Intrusion Prevention System: A Comparative Study, International Journal of Computing and Business Research (IJCBR), ISSN (Online): 2229-6166, Volume 4, Issue 2, pp. 1-8, May 2013 [23] Paquest, C. Implementing Cisco IOS Network Security (IINS),Cisco Press book, September 2010 [24] Carter, E. Cisco IDS Sensor Deployment Considerations, [WWW], [Cited 2014-6-8]. Available at: http://www.ciscopress.com/articles/article.asp?p=25327 [25] Denning, D. E., An intrusion-detection model, IEEE Transactions on Software Engineering, Volume 13, Issue 2, pp. 222-232, February 1987 [26] Lu, W., Ghorbani, A.A. Network anomaly detection based on wavelet analysis, EURASIP Journal on Advances in Signal Processing, pp.1-16, January 2009
72
[27] Axelsson, S. Intrusion detection systems: a survey and taxonomy. Technical report, Department of Computer Engineering, Chalmers University of Technology, Göteborg, Sweden, March 2000 [28] Abraham, A. and Jain, R. Soft Computing Models for Network Intrusion Detection systems, Springer, Heidelberg, 2004 [29] Abraham, A. Grosan, C. and Vide, C. M. Evolutionary Design of Intrusion Detection Programs, International Journal of Network Security, Vol. 4, No.3, pp. 328-339, 2007 [30] Gyanchandani, M. Yadav, R. N. Rana, J. L. Intrusion Detection using C4.5: Performance Enhancement by Classifier Combination. International Journal on Signal and Image Processing, Vol. 1, No. 03, pp. 46-49, December 2010 [31]
Snort, Snort homepage, [WWW], [Cited 2013-11-10]. Available at: http://www.snort.org/
[32] Sourcefire IPS, Sourcefire homepages, [WWW], [Cited 2013-11-10]. Available at: http://www.sourcefire.com/content/next-generationintrusion-prevention-system-ngips [33] Chakraborty, B. Feature Subset Selection by Neuro-Rough Hybridization. Lecture Notes in Computer Science (LNCS), Springer, Heidelberg, 2005 [34] Sung, A. H. and Mukkamala, S. Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks, Proceedings of International Symposium on Applications and the Internet (SAINT) pp. 209-216, 2003 [35] Chebrolu, S. Abraham, A. and Thomas, J. Feature Deduction and Ensemble Design of Intrusion Detection Systems. Computers and Security, Elsevier Science, Vol. 24, No. 4, pp. 295-307, 2005
73
[36] Massachusetts Institute of Technology (MIT), Lincoln laboratory, Cyber systems and technology, [WWW], [Cited 2013-11-18]. Available at: http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/i ndex.html [37] KDD cup 1999, KDD cup 1999 data distribution page, [WWW], [Cited 2013-12-01]. Available at: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [38] Duda, O. R., Hart, E. P. and Stork, G. D. Pattern Classification, second Edition, AWiley-Interscience Publication (Wiley & Sons), 2001 [39] Al-Sharafat, W.S., Naoum, R. Significant of features selection for detecting network intrusions, Internet Technology and Secured Transactions, 2009. ICITST 2009, pp.1-6, November 2009 [40] Neelakantan, N. P., Nagesh, C. And Tech, M. Role of Feature Selection In Intrusion Detection Systems For 802.11 Networks. International Journal of Smart Sensors and Ad Hoc Networks (IJSSAN) Volume 1, Issue 1, pp.98-101, 2011 [41] Jothi, L. U. Anomaly Based Intrusion Detection using Feature Relevance and Negative Selection Algorithm, International Journal of Technological Exploration and Learning (IJTEL),Volume 2, Issue 5, pp. 223-229,October 2013 [42] Ben-Gal, I. Bayesian Networks, in Ruggeri F., Faltin F. &Kenett R. Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons, 2007 [43] Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. Classification and regression trees, Monterey, California, 1984 [44] NIEMELÄ, A. Traffic Analysis for Intrusion Detection in Telecommunications Networks, Master’s thesis, Tampere University of Technology, 2011
74
[45] Kumpulainen, P., Hätönen, K. Anomaly Detection Algorithm Test Bench for Mobile Network Management, In proceedings of Math Works Matlab User Conference Nordic, Stockholm, November, 2008 [46] Zhiyuan, T., Jamdagni, A., Xiangjian, H., Nanda, P.and Liu, R.P. A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis, Parallel and Distributed Systems, IEEE, Volume 25 , Issue 2, pp. 447 – 456,February 2014 [47] Kumar, A. and Fernandez, E. B. Security Patterns for Intrusion Detection Systems, LACCEI International Symposium on Software Architecture and Patterns (LACCEI-ISAP-MiniPLoP’2012), Panama City, Panama, July 2012, [48] Depren, O. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks, Elsevier, Expert Systems with Applications, Volume 29, pp. 713-722, November 2005 [49] CERT Advisory CA-96.21. TCP SYN flooding and IP spoofing attack, CERT, November 1996 [50] CERT Advisory CA-98.01. Smurf. CERT, January 1998 [51] CERT Advisory CA-96.26. Ping of Death. CERT, December 1996 [52] CERT Advisory CA-97.28. Teardrop Land. CERT, December 1997 [53] Jung, J., Krishnamurthy, B., Rabinovich, M. Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites. Honolulu, AT&T Labs-Research, 2002 [54] Junos OS, Reconnaissance Deterrence for Security Devices, Juniper Networks, Technical documentation, 2012
75
[55] Etutorials.org, TCP Port Scanning, [WWW], [Cited 2014-6-13]. Available at: http://etutorials.org/Networking/network+security+assessment/Chapter +4.+IP+Network+Scanning/4.2+TCP+Port+Scanning/ [56] CERT Advisory CA-95.06, Security Administrator Tool for Analyzing Networks (SATAN), CERT, April 1995 [57] Cunningham, J. P. and Ghahramani, Z. Unifying Linear Dimensionality Reduction, Cornell University library, New York, pp. 1-28, January 2014 [58] Jolliffe, I.T. Principal Component Analysis, second edition, SpringerVerlag, New York, 2002 [59] Cisco, Cisco NetFlow, [WWW], [Cited 2014-6-16]. Available at: http://www.cisco.com/en/US/products/ps6601/products_ios_protocol_g roup_home.html. [60] Linux, Linux Documentation, [WWW], [Cited 2014-6-18]. Available at: http://www.linux.com/learn/docs [61] Kent, K., Souppaya, M. Guide to Computer Security Log Management, Recommendations of the National Institute of Standards and Technology (NIST), September 2006 [62] Ma, W., Tran, D., Sharma, D. Negative Selection with Antigen Feedback in Intrusion Detection, published in Artificial Immune systems 7th International conference, ICARIS2008, pp. 200-209, phuket, Thailand, August 2008 [63] Lin, Y., Fang, B.-X., Guo, L., Chen, Y. TCM-KNN Algorithm for Supervised Network Intrusion Detection, Intelligence and Security Informatics, In proceedings of Pacific Asia Workshop (PAISI 2007), LNCS 4430, pp. 141-151, Chengdu, China, April 2007
76
[64] Zainal, A., Maarof, M.A., Shamsuddin, S.M. Features Selection Using Rough-PSO in Anomaly Intrusion Detection, Faculty of Computer Science and Information Systems, University Teknologi Malaysia,2007 [65] Sung, AH., Mukkala, S. The Feature Selection and Intrusion Detection Problems, Proceedings of Advances in Computer Science – ASIAN 2004: Higher-Level Decision Making, 9th Asian Computing Science Conference, Volume 3321, pp. 468-482, 2004 [66] Lakhina, A., Crovella, M., Diot, C. Mining anomalies using traffic feature distributions, Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 22-26, Philadelphia, Pennsylvania, USA, August 2005 [67] Fontugne, R., Hirotsu, T., Fukuda, K. An image processing approach to traffic anomaly detection, Proceedings of the 4th Asian Conference on Internet Engineering, pp. 17-26, Pratunam, Bangkok, Thailand, November 2008 [68] Dewaeke, G. et al., Extracting hidden anomalies using sketch and non Gaussian multiresolution statistical detection procedures, LSAD '07, pp. 145-152, 2007 [69] Gorton, D. Extending Intrusion Detection with Alert Correlation and Intrusion Tolerance, Thesis for the Degree of Licentiate of Engineering, Chalmers University of Technology, Göteborg, Sweden, 2003 [70] Knuuti, O. Intrusion detection system comparison in large IP-networks, Master's thesis, Tampere University of Technology, 2009 [71] Lawrence Berkeley National Laboratory, Bro Intrusion Detection System, [WWW], [Cited 2014-6-20]. Available at: http://bro.org [72] Zargar, G.R., Kabiri, P. Category-Based Selection of Effective Parameters for Intrusion Detection, International Journal of Computer
77
Science and Network Security (IJCSNS), VOL.9 No.9, pp. 181-188 September 2009 [73] Zargar, G.R., Kabiri, P. Identification of effective network features for probing attack detection, Networked Digital Technologies, First International Conference on Networked Digital Technologies (NDT 2009), VSB- Technical University of Ostrava, Czech Republic, pp. 405410, 2009 [74] Zargar, G. R., Kabiri, P. Identification of Effective Network Features to Detect Smurf Attacks, Proceedings of 2009 Student Conference on Research and Development (SCOReD 2009), Malaysia, pp. 49-52, 2009 [75] Carrascal, A., Couchet, J., Ferreira, E., Manrique, D. Anomaly Detection using prior knowledge: application to TCP/IP traffic, In Artificial Intelligence in Theory and Practice, pp. 139-148, 2006 [76] Boutsidis, C. Mahoney, M. W. and Drineas, P. Unsupervised Feature Selection for Principal Components Analysis, Proceedings of the 14th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, Las Vegas, pp. 61-69, 2008 [77] Jobson, J. D. Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate Methods, New York: Springer Verlag, 1992 [78] Stolfo, J. Fan, W. Lee, W. Prodromidis, A. and Chan, P.K. Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection, DARPA Information Survivability Conference, 2000 [79] Tavallaee, M. Bagheri, E. Lu, W. and Ghorbani, A. A Detailed Analysis of the KDD CUP 99 Data Set, Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009
78
[80] Srinivasulu, P. Nagaraju, D. Ramesh Kumar, P. and NagerwaraRao, K. Classifying the Network Intrusion Attacks using Data Mining Classification Methods and their Performance Comparison, International Journal of Computer Science and Network Security, Vol.9 No.6, pp 11- 18, June 2009 [81] Shyu, M. Chen, S. Sarinnapakorn, K. and Chang, L. A novel anomaly detection scheme based on principal component classifier. Proceedings of the IEEE foundation and New Directions of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining (ICDM03), pp. 172-179, 2003 [82] The NSL-KDD Data set, [WWW], [Cited 2013-12-05]. Available at: http://nsl.cs.unb.ca/NSL-KDD/ [83] Shyu, M. Chen, S. Sarinnapakorn, K. Chang, L. A Novel Anomaly Detection Scheme Based on Principal Component Classifier, IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with ICDM'03, pp. 171-179, 2003 [84] Elrawy, M. F.Abdelhamid, T. K. and Mohamed, A. M. IDS in Telecommunication Network Using PCA, International Journal of Computer Networks & Communications (IJCNC), Vol.5, No.4, pp. 147- 157, July 2013
79
Appendix 1 Network Traffic Header Fields No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Feature Protocol Frame_lenght Capture_lenght Frame_IS_marked Coloring_rule_name Ethernet_type Ver_IP Header_lenght_IP Differentiated_S IP_Total_Lenght Identification_IP MF_Flag_IP DF_Flag_IP Fragmentation_offset_IP Time_to_live_IP Protocol_no Src_port Dst_port Stream_index Sequence_number Ack_number Cwr_flag Ecn_echo_flag Urgent_flag Ack_flag
26 27 28 29
Psh_flag Rst_flag Syn_flag Fin_flag
Description Type of Protocol Length of Frame Length of Capture Frame IS Marked Coloring Rule name Type of Ethernet Protocol IP Version IP Header length Differentiated Service IP total length Identification IP More Fragment flag Don’t Fragment flag Fragmentation offset IP Time to live IP Protocol number Source Port Destination port Stream Index number Sequence number Acknowledgment number Cwr Flag (status flag of the connection) Ecn Echo flag (status flag of the connection) Urgent flag (status flag of the connection) Acknowledgment flag(status flag of the connection) push flag (status flag of the connection) Reset flag (status flag of the connection) Syn flag (status flag of the connection) Finish flag (status flag of the connection)
80
30
ICMP_Type
31 32
ICMP_code ICMP_data
specifies the format of the ICMP message such as: (8=echo request and 0=echo reply) Further qualifies the ICMP message ICMP data
81
Appendix 2 Attacks in Lincoln Data 1999 Category of attacks Probe Denial of Service (DoS) User to root (U2R) Remote to Local (R2L)
Types of attacks [37] ipsweep, nmap, portsweep, satan back, land, Neptune, pod, smurf, teardrop buffer_overflow, loadmodule, perl, rootkit ftp_write, guess_passwd, impat, multihop, phf, spy, warezclient, warezmaster
Type of attack Description [36] back Denial of service attack against apache webserver where a client requests a URL containing many backslashes. dict Guess passwords for a valid user using simple variants of the account name over a telnet connection. eject Buffer overflow using eject program on Solaris. Leads to a user->root transition if successful. ffb Buffer overflow using the ffbconfig UNIX system command leads to root shell format Buffer overflow using the fdformat UNIX system command leads to root shell ftp-write Remote FTP user creates .rhost file in world writable anonymous FTP directory and obtains local login. guest Try to guess password via telnet for guest account. imap Remote buffer overflow using imap port leads to root shell ipsweep Surveillance sweep performing either a port sweep or ping on multiple host addresses. land Denial of service where a remote host is sent a UDP packet with the same source and destination loadmodule Non-stealthy loadmodule attack which resets IFS for a normal user and creates a root shell multihop Multi-day scenario in which a user first breaks into one machine neptune Syn flood denial of service on one or more ports.
82
nmap perlmagic phf
pod portsweep rootkit satan
smurf spy
syslog teardrop warez warezclient warezmaster
Network mapping using the nmap tool. Mode of exploring network will vary—options include SYN Perl attack which sets the user id to root in a perl script and creates a root shell Exploitable CGI script which allows a client to execute arbitrary commands on a machine with a misconfigured web server. Denial of service ping of death Surveillance sweep through many ports to determine which services are supported on a single host. Multi-day scenario where a user installs one or more components of a rootkit Network probing tool which looks for well-known weaknesses. Operates at three different levels. Level 0 is light Denial of service icmp echo reply flood. Multi-day scenario in which a user breaks into a machine with the purpose of finding important information where the user tries to avoid detection. Uses several different exploit methods to gain access. Denial of service for the syslog service connects to port 514 with unresolvable source ip. Denial of service where mis-fragmented UDP packets cause some systems to reboot. User logs into anonymous FTP site and creates a hidden directory. Users downloading illegal software which was previously posted via anonymous FTP by the warezmaster. Anonymous FTP upload of Warez (usually illegal copies of copywrited software) onto FTP server
83
Appendix 3 Decimal Number for Protocol Type and Flag Field in KDD-Cup 1999 Dataset Symbol ICMP UDP TCP SH SF S3 S2 S1 S0 RSTR RSTO S0 RSTO REJ OTH
Binary bits 001 010 100 00000000001 00000000010 00000000100 00000001000 00000010000 00000100000 00001000000 00010000000 00100000000 01000000000 10000000000
84
Decimal number 1 2 4 1 2 4 8 16 32 64 128 256 512 1024
Appendix 4 Matlab Program for Intrusion Detection I. First step % apply PCA technique % offline phase a = xlsread ('my kddtrain(6).xls'); % normal data [n,p]= size (a); % calculate mean for normal data for j = 1 : p xmean(j) = 0; for i = 1 : n xmean(j) = (xmean(j)+ a(i,j)); end xmean(j) = (xmean(j))/n; end % calculate correlation for normal data r = corr(a); [e,v] = eig(r);% find eigenvectors (e) and eigenvalues (v) % calculate major components (q) y(1)=0; x(1)=0; z(1)=0; for k = 1 : p y(1)= y(1)+(e(k, p )*(a(10360,k)-xmean(k))); x(1)= x(1)+(e(k,p-1)*(a(10360,k)-xmean(k))); z(1)= z(1)+(e(k,p-2)*(a(10360,k)-xmean(k))); % 10360 is a line represents to a sample of observed training data end % calculate The MajC threshold ( tm1 )
85
tm1 =(((y(1))^2)/v(p,p) )+ (((x(1))^2)/v(p-1,p-1) )+ (((z(1))^2)/v(p-2,p-2) ); % online phase b=xlsread('my kdd train attack(6).xlsx'); % test data [t,s]= size (b); attack = 0; forlc= 1 : t yt(lc)=0; xt(lc)=0; zt(lc)=0; % map to PC space % major components (q) for test data for k = 1 : s yt(lc)= yt(lc)+(e(k,p )*(b(lc,k)-xmean(k))); xt(lc)= xt(lc)+(e(k,p-1)*(b(lc,k)-xmean(k))); zt(lc)= zt(lc)+(e(k,p-2)*(b(lc,k)-xmean(k))); end % compute PC score for major components (q) for test data ( tmt1 ) tmt1 =(((yt(lc))^2)/v(p,p) )+ (((xt(lc))^2)/v(p-1,p-1) )+ (((zt(lc))^2)/v(p-2,p-2); % threshold comparison if (tmt1>tm1 ) attack = attack +1; line(attack)= lc; end end linet = transpose(line); % detect intrusions data
86
II. Second step % apply PCA technique % offline phase a = xlsread ('my kddtrain(10).xls'); % normal data [n,p]= size (a); % calculate mean for normal data for j = 1 : p xmean(j) = 0; for i = 1 : n xmean(j) = (xmean(j)+ a(i,j)); end xmean(j) = (xmean(j))/n; end % calculate correlation for normal data r = corr(a); [e,v] = eig(r);% find eigenvectors (e) and eigenvalues (v) y(1)=0; x(1)=0; z(1)=0; s(1)=0; f(1)=0; for k = 1 : p % calculate major components (q) y(1)= y(1)+(e(k, p )*(a(13254,k)-xmean(k))); x(1)= x(1)+(e(k,p-1)*(a(13254,k)-xmean(k))); z(1)= z(1)+(e(k,p-2)*(a(13254,k)-xmean(k))); % 13254 is a line represents to a sample of observed training data % calculate minor components (r) s(1)= s(1)+(e(k, f(1)= f(1)+(e(k,
1)*(a(13254,k)-xmean(k))); 2)*(a(13254,k)-xmean(k)));
end
87
% calculate The MajC threshold ( tm1 ) tm1 =(((y(1))^2)/v(p,p) )+ (((x(1))^2)/v(p-1,p-1) )+ (((z(1))^2)/v(p-2,p-2) ); % calculate The MinC threshold ( tm2 ). tm2 =(((s(1))^2)/v(1,1) )+(((f(1))^2)/v(2,2) ); % online phase b=xlsread('my kdd train attack(10).xlsx'); % test data [t,s]= size (b); attack = 0; forlc= 1 : t yt(lc)=0; xt(lc)=0; zt(lc)=0; st(lc)=0; ft(lc)=0; % map to PC space for k = 1 : s % major components (q) for test data yt(lc)= yt(lc)+(e(k,p )*(b(lc,k)-xmean(k))); xt(lc)= xt(lc)+(e(k,p-1)*(b(lc,k)-xmean(k))); zt(lc)= zt(lc)+(e(k,p-2)*(b(lc,k)-xmean(k))); % minor components (r) st(lc)= st(lc)+(e(k, ft(lc)= ft(lc)+(e(k,
1)*(b(lc,k)-xmean(k))); 2)*(b(lc,k)-xmean(k)));
end % compute PC score for major components (q) for test data ( tmt1 ) tmt1 =(((yt(lc))^2)/v(p,p) )+ (((xt(lc))^2)/v(p-1,p-1) )+ (((zt(lc))^2)/v(p-2,p-2); % compute PC score for minor components (r) for test data ( tmt2 )
88
tmt2 =(((st(lc))^2)/v(1,1) )+(((ft(lc))^2)/v(2,2)); % threshold comparison if (tmt1>tm1 || tmt2 > tm2 ) attack = attack +1; line(attack)= lc; end end linet = transpose(line); % detect intrusions data
89
Appendix 5 Matlab Program for Measure the Performance of IDS x= xlsread ('normal&attacktrainrank.xlsx') % data rank from 1 to 5 % probe attack data == 1, DoS attack data == 2, U2R attack data == 3 % R2L attack data == 4, normal data == 5 [n,p]= size (x) countnormal=0; countprobe=0; countdos=0; countu2r=0; countr2l=0; normal=0; attackprobe= 0; attackdos= 0; attacku2r= 0; attackr2l= 0; for i =1:n if x(i)==5%detect actual normal countnormal=countnormal+1; normal(countnormal)= i; end if x(i)==1%detect actual probe attack countprobe=countprobe+1; attackprobe(countprobe)= i; end if x(i)==2%detect actual DoS attack countdos=countdos+1; attackdos(countdos)= i; end if x(i)==3%detect actual U2R attack countu2r=countu2r+1; attacku2r(countu2r)= i; end
90
if x(i)==4%detect actual R2L attack countr2l=countr2l+1; attackr2l(countr2l)= i; end end normalfalse=0; detectattackprobe=0; detectattackdos=0; detectattacku2r=0; detectattackr2l=0; [m,l]=size(linet); [q,t]=size(normal); [w1,s1]=size(attackprobe); [w2,s2]=size(attackdos); [w3,s3]=size(attacku2r); [w4,s4]=size(attackr2l); % classify data for i=1 : t % detect normal data classified as attack data for j=1 : m if normal(i)== (linet(j)) normalfalse=normalfalse+1; end end end for i=1 : s1 % detect data classified as probe attack data for j=1 : m ifattackprobe(i)== (linet(j)) detectattackprobe=detectattackprobe+1; end end end for i=1 : s2 % detect data classified as DoS attack data
91
for j=1 : m ifattackdos(i)== (linet(j)) detectattackdos=detectattackdos+1; end end end
for i=1 : s3 % detect data classified as U2R attack data for j=1 : m if attacku2r(i)== (linet(j)) detectattacku2r=detectattacku2r+1; end end end for i=1 : s4 % detect data classified as R2L attack data for j=1 : m if attackr2l(i)== (linet(j)) detectattackr2l=detectattackr2l+1; end end end %Performance Measures for Anomaly class FP1 = normalfalse; TP1 = detectattackprobe + detectattackdos+ detectattacku2r + detectattackr2l; totalcountattack= countprobe + countdos + countu2r + countr2l; TN1 = n - ( FP1 + totalcountattack ); FN1 = totalcountattack - TP1;
92
TPR1 = TP1 / totalcountattack; FPR1 = FP1 / countnormal; recall1 = TPR1; precision1 = TP1 / ( TP1 + FP1 ); %Performance Measures for normal class FP2 = FN1; TP2 = TN1; TN2 = TP1; FN2 = FP1; TPR2 = TP2 / countnormal; FPR2 = FP2 / totalcountattack; recall2 = TPR2; precision2 = TP2 / ( TP2 + FP2 ); %Performance Measures for system successrate = ( TP1 + TN1 ) / ( TP1 + TN1 + FP1 + FN1 ); % successrate1 == successrate2 == successrate errorrate = 1 - successrate ; % errorrate1 == errorrate2 == errorrate
93
Appendix 6 Acceptance Letter for Paper about IDS in Telecommunication Network Using PCA
94
Appendix 7 Arbitration for Paper about IDS in Telecommunication Network Using PCA PAPER REVIEW
International Journal of Computer Networks & Communications (IJCNC) ISSN : 0974 - 9322(Online), 0975- 2293(Print) Paper No
Title:
Reviewer’Name, Affiliation
Date
26
IDS in Telecommunication Network Using PCA
David
7/15/2013
1. Type of paper: (Type “YES” in appropriate column) Research results
Survey
Tutorial
Speculative
Yes
2.
Please rate the paper on the following features. (Choose the appropriate choice.) Item Potential significance to CS & CE (or Special Issue)
Poor
Unattractive
Acceptable
Good
Excellent
XX
Significance of the main idea(s)
XX
Originality
XX
Technical quality of the paper
XX
Awareness of related work
XX
Clarity of presentation
XX
Organization of the manuscript
XX
95
References
XX
Paper Length
XX
3. How confident are you in your rating of this paper? Excellent XX
Adequate
Out of area
4. COMMENTS to the Author To assist the author(s) in revising his manuscript, please separate your remarks into two sections: (a)
Suggestions which would improve the quality of the paper but are not essential for publication.
None - Very good paper! Will make an excellent contribution to the journal - thanks for the hard work! David
(b) Changes which must be made before publication NA
5. Comments (if any) for the Editor's use:
6. OVERALL RECOMMENDATION Excellent & candidate for Best Paper
Acceptable
Acceptable with major revisions
XXX
Referee's Signature
David
96
Likely Reject
Definite Reject
تم تنظيم الرساله على النحو التالي : يصف الفصل األول شكل تطور شبكة االتصاالت السلكية و الالسلكية و التي أدت إلي ازدياد مستخدمي هذه الشبكة و بالتالي زيادة المخاطر و التهديدات علي هؤالء المستخدمين .و يوضح هذا الفصل أيضا مفهوم حل أمني جديد يدعى نظام كشف التسلل( ) Intrusion detection systemيستطيع إكتشاف التهديدات و األخطار الجديده و ينتهي الفصل بالهدف من الرساله وبيان بمحتواها. يوضح الفصل الثاني مفهوم شبكة اإلتصاالت السلكية و الالسلكية من خالل قانون تنظيم اإلتصاالت المصرية .و يصف البنيه التحتيه لشبكة اإلتصاالت ا لسلكية و الالسلكية .و ينتهي الفصل بتوضيح أنواع األخطار التي تهدد شبكة اإلتصاالت السلكيه و الالسلكية. يصف الفصل الثالث أساسيات أنظمة كشف التسلل( ) Intrusion detection systemsوكيفية تطبيقها بطريقه مناسبة في بيئة شبكات االتصاالت السلكية والالسلكية .و ينتهي الفصل بعرض األبحاث السابقه في مجاألنظمة كشف التسلل( .) Intrusion detection systems يوضح الفصل الرابع لمحة عامة عن أساليب إستخراج الخصائص المميزة للبيانات ألنظمة كشف التسلل وما هي التحديات الموجوده في البيئة المحيطه .و يناقش الفصل أيضا الخصائص المميزة للبيانات المستخدمة في مجال بحوث أنظمة كشف التسلل( .) Intrusion detection systemsو ينتهي الفصل بعرض سيناريوهات الهجوم علي المستخدم لشبكة اإلتصاالت السلكية و الالسلكية و الموجوده في قواعد البينات.NSL-KDD يصف الفصل الخامس طرق حساب التحليل اإلحصائي لمتعدد المتغيرات وذلك لخفض البيانات التي يتم تحليلها إلكتشاف الهجمات المختلفه.و يصف هذا الفصل التجربه المستخدمه إلكتشاف الهجمات المختلفه و ذلك عن طريق خطوتان في كل خطوة يمكن إستخدامهاكنوع من أنواع كاشف التسلل .ففي أول خطوه تم إختيار 6خصائص معينه ترتبط بالخصائص األساسيه ببرتوكول التحكم في التراسل ( )TCP connectionsللبيانات الموجوده في قواعد البيانات ) (NSL-KDD2010ثم تم تحليلها .و أثبتت النتائج أن معدل إكتشاف الهجمات () 8.2900 مع معدل نجاح للنظام بمقدار ( ) 8.2060لذلك يمكن استخدام هذه الخطوه كنظام كاشف التسلل الشبكي) .(NIDSوفي ثاني خطوه تم إضافة 4خصائص آخري ترتبط بخصائص تدفق حركة البيانات ثم تم تحليل البيانات .و أثبتت النتائج أن معدل إكتشاف الهجمات ( ) 8.2690مع معدل نجاح للنظام بمقدار ( )8.0420لذلك يمكن إستخدام هذه الخطوه كنظام كاشف التسلل المضيف) . (HIDSوأوضحت النتائج تميز إختيار الخصائص في كل خطوه بتطبيق كل نظام من األنظمه المقترحه في هاتين الخطوتين . أخيرا ،يلخص الفصل السادس االستنتاجات الرئيسية واإلتجاهات المستقبلية.
2
مستخلص البحث يعد أمن البيانات جزء خطير جدا داخل أي نظام معلومات .حيث أن أخطار اإلنترنت أصبحت أكثر ذكاء بحيث يمكنها خداع الحلول األمنية األساسية مثل الجدران النارية و برامج كاشف الفيروسات .لذلك يجب تعزيز األمن الكلي لشبكة اإلتصاالت بإضافة طبقة أمنية إضافية مثل نظام كشف التسلل ). (IDS كاشف الشذوذ هو نوع من أنواع كاشف التسلل ( )IDSالتي يمكن أن تفرق بين البيانات العادية و البيانات الغير عادية في البيانات المراقبه داخل الشبكة .فإنه يمكن إنشاء نموذج من حركةالبيانات العادية داخل الشبكة .وبالتالي يمكن استخدام هذا النموذج للمقارنة مع حركة البيانات المراقبة. طريقة استخراج خصائص البيانات هي أساس ألي كاشف للشذوذ .لذلك يجب أن تعبرخصائص البيانات المستخدمه عن كامل تدفق حركة البيانات الممكنه.كفاءة كاشف التسلل تعتمد على خصائص البيانات المستخدمة في كاشف الشذوذوبالتالي فإن التحدي هو إذا كانت الخصائص المستخدمه هي األكثر مناسبة أم ال. تحليل الهجمات المختلفة هو وسيلة جيدة للعثور على الخصائص المناسبة التي تؤثر على تدفق حركة البيانات. لهذا تم إستخدام الهجمات الشائعه في إختبارنا وهي الحرمان من الخدمة و التجسس و المستخدم إلى جذر البيانات و المستخدم إلي العضو. تقترح هذه الرساله نوعين من كاشف التسلل؛ واحد منهم يمكن استخدامها كنظام كاشف التسلل الشبكي) (NIDSمع معدل نجاح للنظام بمقدار ( ) 8.2060و معدل إكتشاف عالي للهجمات بمقدار ( ) 8.2900و نوع آخر يمكن أن يستخدم أيضا كنظام كاشف التسلل المضيف ) (HIDSمع معدل نجاح للنظام بمقدار ( )8.0420و معدل إكتشاف عالية جدا للهجمات بمقدار( ) 8.2690و ذلك باستخدام قاعدة البيانات ) . (NSL-KDDهذه النتيجه تعتمد على خصائص البيانات المستخدمه في هذين النوعين من نظام كاشف التسلل ).(IDS الهدف من هذه الرسالة هو تحسين نظام كاشف التسلل باستخدام تقنية تحليل المكونات الرئيسية كأسلوب لتخفيض البعد .فهذه الرساله تقارن بين طريقتين مختلفتين في إختيار خصائص البيانات المناسبه.واحدة من هذه الطرق يمكن استخدامها في نظام كاشف التسلل الشبكي ( )NIDSو الطريقه الثانيه يمكن استخدامها في نظام كاشف التسلل المضيف ).(HIDS
1
جامعة أسيوط كلية الهندسة قسم الهندسة الكهربائية
نظام كشف التسلل داخل شبكة اإلتصاالت السلكية و الالسلكية
رسالة مقدمة كجزء من متطلبات نيل درجة الماجستير قسم الهندسة الكهربائية -كلية الهندسة -جامعة أسيوط
مقدمة من
المهندس /محمد فيصل الراوي رفاعي معيد بكلية الهندسة جامعة مصر للعلوم والتكنولوجيا
2014
I
جامعة أسيوط كلية الهندسة قسم الهندسة الكهربائية
نظام كشف التسلل داخل شبكة اإلتصاالت السلكية و الالسلكية مقدم الرسالة المهندس/محمد فيصل الراوي رفاعي بكالوريوس ،الهندسة الكهربائية (إلكترونيات و إتصاالت) ،جامعة أسيوط أسيوط ،مصر الرسالة مقدمة كجزء من متطلبات نيل درجة الماجستير لجنة اإلشراف: أ.د .عبدالفتاح محمود محمد
األستاذ المتفرغ بكلية الهندسة– جامعة أسيوط
د .طارق كمال عبدالحميد
المدرس المتفرغ بكلية الهندسة– جامعة أسيوط
لجنة المناقشة: أ.د .السيد محمود عبدالحميد الربيعي
األستاذ المتفرغ بكلية الهندسة اإللكترونية بمنوف جامعة المنوفية
أ.د .مجدي مفيد دوس
األستاذ المتفرغ بكلية الهندسة – جامعة أسيوط
أ.د .عبدالفتاح محمود محمد
األستاذ المتفرغ بكلية الهندسة– جامعة أسيوط
د .طارق كمال عبدالحميد
المدرس المتفرغ بكليةالهندسة– جامعة أسيوط
2014