Network Based Intrusion Detection System using Genetic ... - IJETTCS

14 downloads 153 Views 241KB Size Report
genetic algorithm based network intrusion detection Systems. (NIDS). ... It identifies intrusions by monitoring traffic through .... the windows NT administrator.
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856

Network Based Intrusion Detection System using Genetic Algorithm: A Study 1

Purushottam Patil, 2Dr. Yogesh Sharma and 3Dr. Manali Kshirsagar

1

Research Scholar (Computer Science & Engineering), Faculty of Engineering & Technology, Jodhpur National University, Jodhpur (RJ), India.

2

Professor (Mathematics), Department of Applied Science, Faculty of Engineering & Technology, Jodhpur National University, Jodhpur(RJ), India 3

Professor & Dean (Student Affair), Department of Computer Technology, Yashwantrao Chavan College of Engineering, Nagpur(MS), India.

Abstract: The Internet has become a part of daily life and an essential tool today. It aids people in many areas, such as business, entertainment and education, etc. In particular, Internet has been used as an important component of business models. For the business operation, both business and customers apply the Internet application such as website and e-mail on business activities [1]. Therefore, information security of using Internet as the media needs to be carefully concerned. Intrusion detection is one of the major research problems for Computer and internet security. The number of attacks has grown extensively, and many new hacking tools and intrusive methods have appeared. Using an network intrusion detection system (NIDS) is one way of dealing with intruders and suspicious activities within a network. This paper provides a study of the available literature on genetic algorithm based network intrusion detection Systems (NIDS). We analyzed many systems implemented using Genetic algorithm and there accuracy, detection rate and false alarm rate against KDD-cup dataset.

Keywords: intrusion, internet security, algorithm, Network intrusion detection system.

vulnerabilities [3].(e.g virus, or DoS attack). While an anomaly based intrusion detection system detect intrusions by searching for abnormal network traffic. The abnormal traffic pattern can be defined either as the violation of accepted thresholds for frequency of events in a connection or as a user’s violation of the legitimate profile developed for normal behavior. An anomaly detection technique generally consists of two different steps: the first step is called training phase wherein a normal traffic profile is generated; the second phase is called anomaly detection, wherein the learned profile is applied to the current traffic to look for any deviations. A number of anomaly detection mechanisms has been proposed recently to detect such deviations, which can be categorized into statistical methods, data-mining methods and machine learning based methods.

genetic

1. INTRODUCTION: Over the last few decades information is the most precious part of any organization. Most of the things what an organization does revolve around this important asset. Internet plays an important role in this context. Organizations are taking measures to safeguard this information from intruders. The rapid development and expansion of World Wide Web and computer networks and their usage in any industry has changed the computing world by leaps and bounds 1.1 Intrusion detection system (IDS) : These are the system that identifies attacks on a network and takes corrective action to prevent them. They are the set of techniques that are used to detect suspicious activity both at network and host level. There are two main approaches to design an IDS. a. Misuse Based IDS (Signature Based) b. Anomaly Based IDS. In a misuse based intrusion detection system, intrusions are detected by looking for activities that correspond to know signatures of intrusions or Volume 3, Issue 2 March – April 2014

Figure. 1. Types of Intrusion detection System. 1.2 Network Based IDS: It identifies intrusions by monitoring traffic through network devices (e.g. Network Interface Card). Its data is mainly collected network generic stream going through network, such as internet packets. Only NIDS can detect all attacks in a LAN and can detect attacks which cannot be done by host-based IDS, such as DOS [23]. Some of the main points that describe need of an NIDS Provide a greater degree of integrity to infrastructure of an organization. Able to trace user activity from entry point to entry point of intact. Will record alteration of data and give report. Page 282

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 IDS also help in monitoring the internet for latest attacks. Notify us when system is under attack. Analysis of abnormal activity pattern.

Figure 2. Network based IDS. Most existing intrusion detection systems suffer from at least two of the following problems [22]: 1. The information used by the intrusion detection system is obtained from audit trails or from packets on a network. Data has to traverse a longer path from its origin to the IDS and in the process can potentially be destroyed or modified by an attacker. Furthermore, the intrusion detection system has to infer the behavior of the system from the data collected, which can result in misinterpretations or missed events. This is referred as the fidelity problem. 2. The intrusion detection system continuously uses additional resources in the system it is monitoring even when there are no intrusions occurring, because the components of the intrusion detection system have to be running all the time. This is the resource usage problem. 3. Because the components of the intrusion detection system are implemented as separate programs, they are susceptible to tampering. An intruder can potentially disable or modify the programs running on a system, rendering the intrusion detection system useless or unreliable. This is the reliability problem.

2. INTRUSION DATASET In the 1998 DARPA (KDD-cup dataset) [18] intrusion detection evaluation programme, an environment was set up to get raw TCP/IP dump data for a network by simulating a typical US Air Force LAN. The LAN was operated like a real environment, but was blasted with several attacks. For each TCP/IP connection, 41 various quantitative and qualitative features were extracted. Of this database, a training subset of 494014 records was used, of which about 20% represent normal patterns. Indeed, the test set was composed of 311029 data records. The four different categories of attack patterns are as follows [19]. It is important to mention that, we have studied all the papers which are implemented using DARPA dataset. The attacks fall into five main classes namely, 1. Probe, 2. Denial of Service(DoS), Volume 3, Issue 2 March – April 2014

3. Remote to Local(R2L), 4. User to Remote(U2R) and 5. Data attacks. The Probe or Scan attacks automatically scan a network of computers or a DNS server to find valid IP addresses (ipsweep, lsdomain, mscan), active ports (portsweep, mscan), host operating system types (queso, mscan) and known vulnerabilities (satan). The DoS attacks are designed to disrupt a host or network service. These include the Solaris operating system crash (selfping), active termination of all TCP connections to a specific host (tcpreset), corruption of ARP cache entries for a victim not in others' caches (arppoison), crash the Microsoft Windows NT web server (crashiis) and crash Windows NT (dosnuke). In R2L attacks, an attacker who does not have an account on a victim machine gains local access to the machine (guest, dict), exfiltrates files from the machine (ppmacro) or modifies data in transit to the machine (framespoof). In U2R attacks, a local user on a machine is able to obtain privileges normally reserved for the Unix super user or the windows NT administrator. Detection and identification of attack and non-attack behaviours can be generalized as the follows: 1. True Positive (TP): the amount of attack detected when it is actually attack. 2. True Negative (TN): The amount of normal detected when it is actually normal. 3. False Positive (FP): The amount of attack detected when it is actually normal, namely false alarm. 4. False Negative (FN): The amount of normal data detected when it is actually attack, namely the attacks which can be detected by intrusion detection system. An intrusion detection system requires high detection rate and low false alarm rate, thus we compare accuracy, detection rate and false alarm rate. Accuracy refers to the proportion of the data classified an accurate type in total data, namely the situation of TP,TN, thus accuracy can be defined as follows: Accuracy=(TP+TN/TP+TN+FP+FN)*100% (1) Detection rate refers to the proportion of attack detected among all attack data. namely, the situation of TP, thus detection rate is defined as follows: Detection Rate = (TP/TP+FN)*100% (2) False Alarm rate refers to the proportion that normal data is falsely detected as attack behavior , namely, the situation of FP, thus false alarm rate is defined as follows: False alarm rate= (FP/FP+TN)*100% (3)

3. GENETIC ALGORITHM: A Genetic Algorithm (GA) is a programming technique that mimics biological evolution as a problem-solving strategy. It is based on Darwinian’s principle of evolution and survival of fittest to optimize a population of candidate solutions towards a predefined fitness .GA uses an evolution and natural selection that uses a chromosome-like data structure and evolve the Page 283

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 chromosomes using selection, recombination and mutation operators. The process usually begins with randomly generated population of chromosomes, which represent all possible solution of a problem that are considered candidate solutions. From each chromosome different positions are encoded as bits, characters or numbers. These positions could be referred to as genes. An evaluation function is used to calculate the goodness of each chromosome according to the desired solution; this function is known as “Fitness Function”. During the process of evaluation “Crossover” is used to simulate natural reproduction and “Mutation” is used to mutation of species. For survival and combination the selection of chromosomes is biased towards the fittest chromosomes. When we use GA for solving various problems three factors will have vital impact on the effectiveness of the algorithm and also of the applications. They are: i) the fitness function; ii) the representation of individuals; and iii) the GA parameters. The determination of these factors often depends on applications and/or implementation [22]. Genetic algorithms can be used to evolve simple rules for network traffic. These rules are used to differentiate normal network connections from anomalous connections. These anomalous connections refer to events with probability of intrusions. The rules stored in the rule base are usually in the following form if { condition } then { act } For the problems we presented above, the condition usually refers to a match between current network connection and the rules in IDS, such as source and destination IP addresses and port numbers (used in TCP/IP network protocols), duration of the connection, protocol used, etc., indicating the probability of an intrusion. The act field usually refers to an action defined by the security policies within an organization, such as reporting an alert to the system administrator, stopping the connection, logging a message into system audit files, or all of the above.

The final goal of applying GA is to generate rules that match only the anomalous connections. These rules are tested on historical connections and are used to filter new connections to find suspicious network traffic. 3.1 Processing Steps: The process of GA usually begins with a randomly selected population of chromosomes. These chromosomes are representations of the problem to be solved. According to the attributes of the problem, different positions of each chromosomes are encoded as bits, characters, or numbers. These positions are sometimes referred to as genes and are changed randomly within a range during evolution. The set of chromosomes during a stage of evolution are called population. 3.2. Common Elements and parameters of GA a ) Fitness Function: “ Function which scale the value individual relative to the rest of population. “ It computes the best possible solutions from the amount of candidates located in the population. b) GA operators: The selection , crossover are the most effective parts in the generation of each population. Selection: is the phase where population individuals with better fitness are selected, otherwise it get damaged Crossover: is a process where each pair of individuals selects randomly participates in exchanging their parents with each other, until a total new population has been generated. Mutation: flips some bits in an individual, and since all bits could be filled, there is low probability of predicting the change.

For example, a rule can be defined as: if {the connection has following information: source IP address 124.12.5.18; destination IP address:130.18.206.55; destination port number: 21; connection time: 10.1 seconds }then {stop the connection} This rule can be explained as follows: if there exists a network connection request with the source IP address 124.12.5.18, destination IP address 130.18.206.55, destination port number 21, and connection time 10.1 seconds, then stop this connection establishment. This is because the IP address 124.12.5.18 is recognized by the IDS as one of the blacklisted IP addresses; therefore, any service request initiated from it is rejected. Volume 3, Issue 2 March – April 2014

Figure 3. Flowchart (Processing steps of GA.) Page 284

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 3.3 NIDS implemented Using GA. Li [14] described a method using GA to detect anomalous network intrusion .The approach includes both quantitative and categorical features of network data for deriving classification rules. However, the inclusion of quantitative feature can increase detection rate but no experimental results are available. Information gain could become more relevant when attribute interactions are taken into account. This phenomenon is associated with rule interestingness. Goyal and Kumar [17] described a GA based algorithm to classify all types of smurf attack using the training dataset with false positive rate is very low (at 0.2%) and detection rate is almost 100% [20]. Lu and Traore [15] used historical network dataset using GP to derive a set of classification. They used supportconfidence framework as the fitness function and accurately classified several network intrusions. But their use of genetic programming made the implementation procedure very difficult and also for training procedure more data and time is required Xiao et al. [18] used GA to detect anomalous network behaviors based on information theory. Some network features can be identified with network attacks based on mutual information between network features and type of intrusions and then using these features a linear structure rule and also a GA is derived. The approach of using mutual information and resulting linear rule seems very effective because of the reduced complexity and higher detection rate. Gong et al. [16] presented an implementation of GA based approach to Network Intrusion Detection using GA and showed software implementation. The approach derived a set of classification rules and utilizes a supportconfidence framework to judge fitness function. Noda et al [20] use GAs to discover interesting rules in a dependence modeling task, where different rules can predict different goal attributes. Generally attributes with high information gain are good predictors of a class when considered individually. However attributes with low inform 3.4 Overall accuracy, detection rate and False alarm rare Comparison on test data. Following table summarizes average value of accuracy, detection rate and False alarm rate over 10000 runs on DARPA dataset. Table 1: Experimental Results Research Detection False Alarm Work Rate (%) rate (%) 57.14 5.23 [14] 40 0 [15] 99 0.8 [15] 99 0.2 [17] [18] 99.25 1.66 95 3 [20] Volume 3, Issue 2 March – April 2014

Figure 4. Experimental Results

4. CONCLUSION In this paper we studied intrusion, IDS, Network Based IDS, GA, GA Based NIDS, also work of many researchers using GA for NIDS using DARPA datasets over 10000 runs are studied. More effort should be taken to achieve 100 % detection rate and 0% False alarm rate either by improvement in implementation techniques or hybridization of GA with other soft computing techniques.

REFERENCES: [1] Ajith Abraham, Ravi Jain, “Soft Computing Models for Intrusion Detection System”, Department of Computer Science ,Oklahoma State University, [2] Abharanidharan Shanmugam , Norbik Bashah Idris , “Anomaly Intrusion Detection based on Fuzzy Logic and Data Mining”, Proceedings of the Postgraduate Annual Research Seminar 2006, Centre for Advanced Software Engineering, University Technology Malaysia – City Campus Jalan Semarak, Kuala Lumpur [3] Dokas P,Ertoz L.,Vipin Kumar.,Srivastava J.,Tan P., “Data Mining for Network Intrusion Detection”, National Science Foundation Workshop on Next Generation Data Mining, USA 2002 [4] Lunt. T.“Detecting intruders in computer systems”.Conference on auditing and computer technology, 1993 [5] Gomez J,Dasgupta D, Evolving Fuzzy classifiers for Intrusion Detection, Proceedings of 2002 IEEE Workshop in Information Assurance, USA NY 2002 [6] S Selvakani Kandeeban, and Rengan S Rajesh,” Integrated Intrusion Detection System using Soft computing”, International Journal of Network Security, Vol.10, No.2, PP.87-92, Mar. 2010 [7] C.Kolias,G.Kambourakis,M.Maragoudakis,” Swarm intelligence in intrusion detection: A survey, Elsevier Ltd, 2011 [8] Chetan Gupta,Amit Sinhal,Rachana Kamble.” Intrusion Detection based on K-Means Clustering and Ant Colony Optimization: A Survey”,

Page 285

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected], [email protected] Volume 3, Issue 2, March – April 2014 ISSN 2278-6856 International Journal of Computer Applications (0975 – 8887) Volume 79 – No6, October 2013 [9] Hecht-Nielsen, R. Applications of counter propagation networks. Neural Networks, 1, 131– 139.1988 [10] Application of Neural Networks to Intrusion Detection, SANS institute. [11] Survey of Current Network Intrusion Detection Techniques,Sailesh Kumar [12] Survey on Intrusion Detection System using Machine Learning Techniques Sharmila Kishor Wagh, Vinod K. Pachghare, Satish R. Kolhe, [13] International Journal of Computer Applications (0975 – 8887) Volume 78 – No.16, September 2013 [14] Ciza Thomas Vishwas Sharma N. Balakrishnan, “Usefulness of DARPA Dataset for Intrusion Detection System Evaluation, Indian Institute of Science, Bangalore, India [15] W. Li, “Using Genetic Algorithm for Network Intrusion Detection”. “A Genetic Algorithm Approach to Network Intrusion Detection”. SANS Institute, USA, 2004. [16] W. Lu, I. Traore, “Detecting New Forms of Network Intrusion Using Genetic Programming”. Computational Intelligence, vol. 20, pp. 3, Blackwell Publishing, Malden, pp. 475-494, 2004. [17] R. H. Gong, M. Zulkernine, P. Abolmaesumi, “A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection”, 2005. [18] Anup Goyal, Chetan Kumar, “GA-NIDS: A Genetic Algorithm based Network Intrusion Detection System”, 2008. [19] T. Xia, G. Qu, S. Hariri, M. Yousif, “An Efficient Network Intrusion Detection Method Based on Information Theory and Genetic Algorithm”, Proceedings of the 24th IEEE International Performance Computing and Communications Conference (IPCCC ‘05), Phoenix, AZ, USA.2005. [20] B. Abdullah, I. Abd-alghafar, Gouda I. Salama, A. Abd-alhafez, “Performance Evaluation of a Genetic Algorithm Based Approach to Network Intrusion Detection System”, 2009. [21] .Zadeh, L. A. “The concept of a linguistic variable and its application to approximate reasoning, Parts 1, 2, and 3,” Information Sciences, 1975, 8:199-249, 8:301-357, 9:43-80. [22] Mukkamala, R., Gagnon J., Jaiodia S., “Integrating data mining techniques with intrusion detection methods”.Research Advances in Database and Information systems security, 33-46, 2000 [23] Mohammad Sazzadul Hoque, Md. Abdul Mukit and Md. Abu Naser Bikas,” an implementation of intrusion detection System using genetic algorithm”, International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012. [24] Alireza Osareh, Bita Shadgar,”Intrusion Detection in Computer Networks based on Machine Learning Volume 3, Issue 2 March – April 2014

Algorithms”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.11, November 2008

Page 286

Suggest Documents