A Selective Parameter-based Evolutionary

0 downloads 0 Views 183KB Size Report
{zbaig, saad, saif, sqalli}@kfupm.edu.sa ..... Fahd University of Petroleum and Minerals (KFUPM), for ... http://portal.acm.org/citation.cfm?id=1069813.1070290.
A Selective Parameter-based Evolutionary Technique for Network Intrusion Detection Zubair A. Baig, Saad Khan, Saif Ahmed, and Mohammed H. Sqalli Department of Computer Engineering King Fahd University of Petroleum & Minerals Dhahran, Saudi Arabia {zbaig, saad, saif, sqalli}@kfupm.edu.sa

Abstract—Network intrusion detection has remained a field of rigorous research over the past few years. Advances in computing performance, in terms of processing power and storage, have allowed the use of resource-intensive intelligent algorithms, to detect intrusive activities, in a timely manner. Genetic Algorithms have emerged as a powerful domain-independent technique to facilitate searching of the most effective set of rules, to differentiate between normal and anomalous network traffic. The scope of research for developing cutting-edge and effective GA-based intrusion detection systems, has rapidly expanded to keep pace with variant attack types, increasingly witnessed from the adversary class. In this paper, we propose a GA-based technique for effectively identifying network intrusion attempts, and clearly differentiating these from normal network traffic. The performance of the proposed scheme is studied and analyzed on the KDD-99 intrusion benchmark data set. We performed a simulation-based analysis of the proposed scheme, with results strengthening our findings, and providing us directions for future work. Keywords: Intrusion Detection Systems, Genetic Algorithm, KDD-99 Dataset.

I. I NTRODUCTION Network intrusion is defined as any malicious attempt to access a computer network infrastructure, with sole intention of inflicting loss [1]. Such malicious attempts have increased manifold in the recent past. In addition to their aim of providing a better end-user experience, rapid increases in resources facilitated by bandwidth-providers, has also ensured proliferation of network-based malicious attacks against target networking resources such as routers, switches, and host machines. As a result, the adversary-class has become less concerned with the network bandwidth constraint, for producing highly sophisticated and varying types of network-based malicious attacks, emerging from existing and known attack types such as Denial of Service, Root-Kits, Ping-of-Death, etc. Although the field of intrusion detection has progressively witnessed proposals for effective counter-techniques against all such attacks, there is still a dearth of highly efficient, and intelligent techniques for timely and accurate detection of variant attacks, that pose an eminent danger to the smooth operations of a network infrastructure. A network intrusion detection system (NIDS) operates on the principles of attack modeling and definition of rules, to facilitate accurate classification of network traffic, based

on a given set of parameters extracted therefrom, as either normal or anomalous [1]. The NIDS may operate either as a dedicated hardware unit, or a software overlay on an existing machine. It collects traffic-based information that traverses a network, and performs certain rule-based operations to correctly distinguish normal traffic from anomalous. The Genetic algorithm (GA) is a powerful domainindependent search technique that is based on the principles of evolution and natural selection. GAs were first introduced in the 70s [2] [3]. In the context of network intrusion detection, this technique is mainly used for formulating (and subsequently evolving) a set of rules, encoded as chromosomes, for providing the most accurate portrayal of a malicious attack. The steps of execution of a genetic algorithm usually begin with a randomly selected population of chromosomes. These chromosomes are representations of a given problem to be solved. According to the attributes of the problem, different positions of each chromosome are encoded as bits, characters, or numbers. All these chromosomic positions are referred to as genes. In addition, each chromosome is characterized by a fitness function which implies that chromosomes with higher fitness values will lead to a better solution making the selection process biased towards the fittest chromosomes. The fitness of a chromosome is calculated by an evaluation function. During evaluation, two basic operators, crossover and mutation, are used to simulate the natural reproduction and mutation of species [2]. During the crossover process, two parents combine to produce two offspring, G × G → G, for a set of genes G. However, it is possible that the chromosomes of two parents are copied unmodified as offspring. There is also a possibility that the chromosomes of the two parents are randomly combined (crossover) to form offspring. The mutation process randomly changes the value of a gene from its present state to an entirely distinct one G → G [3]. We postulate that if attack information (based on network parameter values) is known apriori, it can be encoded in the form of a GA chromosome, which can serve as a simple rule to differentiate normal from anomalous traffic. Subsequently, the evolution of the chromosomes based on certain criteria, may lead to emerging and more accurate representations of all network-based attacks. In this paper, we propose the mapping of network traffic attacks, as defined in the KDD-99 dataset [4], to chromosomic representations, for emulation of network

intrusion. Subsequently, we evolve the modeled chromosomes to attain variant chromosomic representations, for added accuracy in network intrusion detection. Our approach is analyzed for all four attack types in the dataset, namely, Probe attacks, User to Root attacks (U2R), Remote to Local (R2L) attacks, and Denial of Service (DoS) attacks. Our results provide an insight on the effect of the evolution process on detection of network-based malicious attacks, and prove our scheme as being effective in timely and accurate attack detection. Another reason for using the KDD-99 dataset was to ensure that the effectiveness (i.e. accuracy) of our intrusion detection technique is benchmarked against a known and frequently used dataset. However, the eventual outcome of this exercise is the ability for such a scheme to readily detect anomalous network traffic and classify it accurately for large-scale honeynet traffic. The paper is organized as follows. In Section II, we perform a literature review of various Genetic Algorithm-based techniques that have been proposed in the past for network intrusion detection. We elaborate upon all steps that constitute our proposed intrusion detection technique, in Section III. In Section IV, we present our simulation results, and analyze our findings. Finally, we conclude and provide future directions for our work, in Section V. II. R ELATED W ORK Some of the best-performed intrusion detection methods apply GA [5], combination of neural networks and C4.5 [6], genetic programming (GP) ensemble [7], support vector machines [8] or fuzzy logic [9]. All of these techniques involve two steps, namely, training and testing. During the training phase, the chromosomes are modeled as rules. Testing involves the verification of the accuracy of the formulated chromosomes (rules). It is known that GA or Genetic Programming-based systems are superior as compared to others due to their ability to be easily retrained at run time. Moreover, in these schemes, it is also sufficient to evolve the best population from the previous iteration as initial population and repeat the process, but this time including new data. This makes the system inherently adaptive [10]. However, the outcomes produced by a GA-based system are known to be accompanied by a large amount of false positives. As such, most of the research in the area is centered on the concept of trying to reduce these false positives to the minimum. Li [11] presented an initial framework for intrusion detection using GAs. The technique proposed models simple rules for network traffic that differentiates normal connections from anomalous connections. Rules are usually stored in the form of ’if {condition} then {act}’. The condition refers to a match between a current network connection and the rules defined in the IDS, such as source and destination IP addresses, TCP or UDP port numbers, connection duration, protocol used, etc., and the act field defines the action based on the security policy such as reporting an alert to the system administrator or terminating a connection. The actual validity of a rule is examined by matching the historical data set comprised

of connections marked as either anomalous or normal. On successful match between a rule and an anomalous network connection, a bonus is given to the current chromosome and a penalty is applied if the chromosome wrongly matches normal to anomalous. The absolute difference between the outcome of the chromosome and the actual suspicion level is computed as a factor, Δ = Outcome − Suspicion − Level .The suspicion level is a threshold that indicates the degree to which two network connections are considered a ’match’. On the occurrence of a mismatch, a penalty is computed using the value of the factor computed previously. The ranking in the equation is a measure of whether or not an intrusion is easy to identify. The penalty and fitness are given by: penalty = ((Δ∗ranking)/100), and f itness = 1−penalty, respectively. In [11], it is suggested to use a large number of rules, so that all anomalies in large datasets can be covered, for reducing the false alarm rates. Bankovic et al. [10] proposed a serial combination of two GA-based intrusion detection systems that provide the advantage of being simple, with low computational overhead. The first IDS is used as a simple linear classifier for anomaly detection that differentiates normal connections from possible attacks. Because this system is known to exhibit a high rate of false-positives, the authors implemented an additional system based on if-then rules that is trained to recognize normal connections. This second system filters the output of the first system to considerably reduce the number of false positives. The linear classifier is trained using incremental GA, where each chromosome in the population is comprised of four genes. The first three genes characterize the coefficients of the linear classifier and the fourth one signifies the threshold value. The rule-based system is also similarly trained using incremental GA, where each rule is represented by a 3-gene chromosome. However, the population chosen in this case is significantly lower as compared to the first stage of the system. The division of the entire setup into two different systems allows each of the systems to be trained independently of the other. In [12], the authors propose the ID-SOMGA (Intrusion Detection - Self Organizing Migrating Genetic Algorithm), which is based on the combination of two algorithms, namely, the optimization algorithm - Self Organizing Migrating Algorithm (SOMA), and the GA. The motivation behind this integration of SOMA and GA was to be able to support both low and high population sizes, as well as to increase the system’s exploration capabilities. SOMGA is a rule based system the goal of which is to develop rules that detect only the anomalous connections. These rules are tested on historical connections and are used to filter new connections to find suspicious network traffic. On successful detection of an anomalous connection, the rule is given a bonus point and otherwise a penalty. Although, ID-SOMGA exhibits large computational time than a typical GA-based IDS, it has a very low false positive rate. Gong et al. [13] proposed a genetic algorithm based intrusion detection approach which is composed of two modules, where each step works in one of two phases, namely, the

training phase and the detection phase. The training phase uses GA in an offline environment to generate rules from historical data. In the detection phase, the incoming traffic connections are classified in the real-time environment using the rules generated in the training phase. The fitness function of the GA is used as a metric to select the fit individuals who would undergo crossover and mutation to create the next generation population. Islam et al. [14] recommended a fitness function that is based on the accuracy-existence-occurrence structure. In contrast to the previous work done for GA-based intrusion detection, as outlined in this section, our proposed scheme provides the following contributions: • The KDD-99 dataset is pre-processed based on the frequencies of occurrences of network traffic parameter values. • Chromosome modeling is performed on the pre-processed data so as to accurately model all four attack types. • The evolution of the initial chromosomes is performed based on bit-inversion Mutation and 1-point Crossover.

a pattern to accurately distinguish network connections. A fitness measure is associated with each chromosome which is based entirely on the ability of each chromosome to correctly classify a connection. The chromosomes are modeled in such a way that the probability of false alarms is kept low. This is ensured through careful construction of initial chromosomes. During the evolution process, the chromosomes having higher fitness values are strong candidates to survive in the next generation as they have correctly detected a large number of network connection states. Chromosomes are evolved and new offspring chromosomes are added into the rule base, and the process is repeated again. In Figure 1, we illustrate the steps of execution of the proposed scheme, for network intrusion detection, described in detail, as follows: PREPROCESSING DATA SET

CHROMOSOME EVOLUTION

III. P ROPOSED I NTRUSION D ETECTION T ECHNIQUE In this section, we present an intrusion detection technique based on chromosome modeling and the evolution steps, of the GA algorithm. Our proposed technique falls into the class of rule-based network intrusion detection, wherein the fundamental property of GA is adapted for the generation of rules (chromosomes), to depict network intrusions. Each chromosome, as a collection of genes, is a representation of the status of a network under one of two conditions, namely, attack and normalcy. A chromosome is therefore a rule, required to dichotomize normal from anomalous traffic. This rule may also be construed as a pattern of network traffic, that needs to be identified in real-time. The derivation of the initial chromosomes of the GA-based IDS is done through an analysis of the KDD-99 dataset (of size (5˜ million rows)). Each row of the dataset is constituted of 41 feature values, and 1 class value. Each feature represents a network connection state, such as: the duration of the connection, source bytes, destination bytes, etc. In previous work, several techniques were tested on a subset of the entire KDD-99 dataset, based on the type of anomaly that was to be detected. The dataset provides classification of network connections into five main categories, namely, Normal, Denial of Service (DoS), Remote to Local (R2L), User to Root (U2R), and Probing. The normal category refers to normal and trusted connections. DoS attacks aim to incapacitate the network from providing network services, by making resources such as servers and routers unavailable for access by intending users. Remote to local attacks account for an unauthorized user’s attempt to access a machine without having the corresponding privileges. User-to-root attacks are those where a normal user tries to access super user privileges. Eavesdropping corresponds to probing of a network for vulnerabilities. Based on the data set and after exploring the characteristics for each type of attack, we model a chromosome to act as

INITIAL CHROMOSOME GENERATION

PATTERN MATCHING VALUES GENERATED (TRAINING)

TESTING

RESULT COLLATION AND ANALYSIS

Fig. 1.

The Proposed Intrusion Detection Scheme

1) Data Set Preprocessing: During this phase, each {feature, value} pair, for each of the 23 network traffic classes, is analyzed in terms of its distinctness. For all individual features under a given attack category, distinct occurrences of each value of the feature, are computed and sorted. Subsequently, all distinct chromosome types for a given attack are programmed within the intrusion detection system, as a representation of the attack. In Table I, we illustrate the outcome of the data preprocessing steps, in identifying redundant (don’t care) features of the KDD-99 dataset, so as to reduce the total number of distinct chromosomes required for detection of a given attack type. 2) Chromosome Modeling and Training: All parameter values obtained from Stage 1 above are subsequently adapted for modeling chromosomes for a given attack type, as illustrated in Table II. Depending on the number of distinct combinations of parameter values, each attack type will have a specific number of chromosomes defined. Chromosome modeling is the most important step in any genetic algorithm-based application, and is totally application dependent. Our chromosomes are modeled in binary format as this will ease in the application of crossover and mutation operators between two

TABLE I PARAMETER R ELEVANCE TO C HROMOSOME M ODELING , BASED ON F REQUENCIES OF O CCURRENCES IN THE D ATASET. SB = Src Bytes, DB = Dest Bytes,SC = Src Count, DHC = Dest. Host Count, DHSC=Dest. Host Server Count, DHSER = Dest. Host Serv. Error Rate, DHRER =Dest. Host RError Rate. Attack satan smurf back land Guess Password rootkit snmpguess

Class Probe DoS DoS DoS R2L U2R R2L

Duration √ √ √ √ √ √ √

Protocol √ √ √ √ √ √ √

Network Traffic Parameters from the Data Set Service Flag SB DB Count SC DHC DHSC √ √ √ √ X X X X √ √ √ √ X X X X √ √ √ √ X X X X √ √ √ √ X X X X √ √ √ √ √ √ X X √ √ √ X X X X X √ √ √ √ √ X X X

steps of evolution. For example, one chromosome for a Satan attack (see Table II), will be of the form {tcp, ftp, REJ, 0, 0, 0.91, xx} which contains the parameters: protocol type, service, flag, src bytes, dst bytes, dst hostr error rate and dst host srv rerror rate. The field marked as ’xx’ implies that this parameter can take up any value, while the rest of the genes should have an exact match with a network connection, so as to detect a Satan attack accurately. 3) Evolution: During each step of the evolution process, chromosomes are mutated and crossed-over, to form new chromosomes, with the new generation of chromosomes expected to enhance the detection rate for the attacks, through more accurate differentiation of normal and anomalous traffic. A random number is selected, and the Mutation of the chromosomes is done based on flipping of these random number of binary bits within each chromosome. In addition, we use 1-point crossover, with a random value to identify the position of a given bit in a pair of chromosomes, for crossing over, to generate the next chromosome generation, for each attack type. 4) Testing: Once a chromosome has been generated, the next step is to search for connections in the test set and make decisions based on it. The more connections a chromosome detects successfully the higher is its fitness. In short, the whole intrusion detection system can be described as a pattern matching exercise, which takes a chromosome as input and searches the connection set (test set) for relevant matches, so as to identify a connection as either anomalous or normal. Anomalous connections are further classified on the basis of the 5 distinct attack types. In the initial phase, we had kept the chromosome in the binary format and had successfully applied crossover and mutation to it. Table I represents the parameters for the different classes of attacks that we had determined after running the √ experiment and performing some analysis. In this table, corresponds to yes and ”X” indicates that this particular parameter is not relevant to the corresponding chromosome representation of this attack type. For each attack type, the fitness function f is defined as

DHSER X X √ √ X X X

DHRER √ √ √ √ √ X X

follows: f = Detection Rate + (1 − F alse Alarm Rate)

(1)

The fitness function essentially is a constituent of the attack detection rate and the false alarm rate. A higher detection rate and a lower false alarm rate will yield higher fitness values for a chromosome, during a given evolution step, as illustrated through Equation 5. Connections Detected Correctly ∗ 100 T otal Connections (2) where, for the KDD-99 dataset, the total number of rows is approximately 5 million. In addition, the false alarm rate is given by: Detection rate =

F alse Alarm Rate = F N R + F P R

(3)

where, FNR is defined as the False Negative Rate FPR is defined as the False Positive Rate FNR =

F alse N egatives F alse N egatives + T rue P ositives

(4)

The False Negative Rate is defined as the number of Anomalous traffic rows in the dataset classified incorrectly as N ormal. On the contrary, the False Positive Rate is defined as the number of N ormal traffic rows from the dataset incorrectly classified as Anomalous. FPR =

F alse P ositives F alse P ositives + T rue N egatives

(5)

IV. S IMULATION AND A NALYSIS The simulator for modeling the GA chromosomes associated with each attack type, was implemented in C++. The chromosomes were modeled based on the pre-processed KDD99 dataset. For purposes of data pre-processing and training, the entire dataset, constituted of 2984154 connection details, was introduced to the simulator as input. The testing was subsequently performed on the same dataset. In other words, we performed data pre-processing on the entire dataset, and subsequently, identified key features to constitute individual chromosomes. Following which, the entire dataset was introduced to the formulated chromosomes for testing. The reason

TABLE II C HROMOSOME M ODELS FOR A S ATAN ATTACK AND A N ORMAL N ETWORK C ONNECTION P ROFILE . U NDER EACH CATEGORY, THE C OLUMN REPRESENTS A DISTINCT CHROMOSOME OF THE CORRESPONDING TRAFFIC TYPES . tcp other REJ 0 0 1 1

tcp other REJ 0 0 0.83 1

Satan tcp priv S0 0 0 0.83 0

Attack udp priv SF 1 0 0 0

udp priv SF 1 0 0.73 0

udp dom u SF 1 0 0.88 0

for testing the defined chromosomes on the entire dataset was due to the modification made to the feature set representing each attack type, to model a chromosome. As a result, it was of keen interest to learn the effect of such change to the {feature, value} pairs on the accuracy in detection. All simulations were performed on a machine with 2.8 GHz Intel Core2Duo processor, 4 GB of RAM and 6 MB of a combined 2-level cache. It took nearly 13 minutes to complete the test with 31 initial chromosomes (depicting patterns of individual attacks of 5 types). We performed simulations for varying numbers of evolution steps. In Figure 2, a total of 200 evolution steps were run, and the resulting detection rates have been plotted. The purpose of this simulation was to test the effect of increasing evolution step on the detection rate of the scheme. In Figure 2, we provide results on the effect of the increasing numbers of evolution steps on the attack detection rate. As can be observed from the figure, a detection rate of nearly 15% is seen for the initially defined chromosomes. Further, the detection rate steadily increases with increasing steps of evolution (i.e., generations). It is clearly evident from the results that the initial definition of gene values based on data pre-processing provides a weak modeling of attack patterns from the dataset. Subsequently, flip-bit mutation and 1-point crossover are performed during each evolution step (generation). As a result, the chromosomes begin converging towards a more accurate representation of the attack traffic. This is witnessed as a positive effect on the accuracy of the pattern matching process, invariably affecting the outcome of the detection experiment. The detection rate is seen to approach 45% at generation number = 60, and nearly 99% when the simulation reaches generation number = 200. With increasing numbers of generations, the False Alarm Rate of the detection scheme decreases considerably. As evident from Figure 3, the false alarm rate is nearly 80% at time of chromosome initialization, decreasing smoothly, until it reaches 0% when the evolution step count is 200. The false alarms generated by the detection scheme are high when the first chromosomes defined at time of initialization, fail to correctly categorize normal and anomalous traffic. Most of the false alarms generated are false positives, wherein normal network connections are incorrectly classified as anomalous. The reason behind this higher number of false positives is the nature of the dataset, where the proportion of normal data rows to anomalous is on the lower side.

udp other SF 1 0 0 0

udp priv SF 105 146 0 0

udp priv SF 105 146 0 0

udp other SF 146 105 0 0

Normal Profile udp udp other dom u SF SF 146 45 105 133 0 0 0 0

tcp ftp SF 175337 0 0 0

tcp ftp SF 501760 0 0 0

100 90 80 Detection Rate

tcp ftp REJ 0 0 0.91 1

70 60 50 Generations = 200

40 30 20 10 0 1

16

Fig. 2.

31

46

61

76 91 106 121 136 151 166 181 196 Generation Number

Attack Detection Rate for Evolution Steps = 200

100.00 90.00 80.00 False Alarm Rate

Parameter P rotocol T ype Service Flag Src Bytes Dst Bytes DErrorRate DSErrorRate

70.00 60.00 50.00

Generations = 200

40.00 30.00 20.00 10.00 0.00 1

16

Fig. 3.

31

46

61

76 91 106 121 136 151 166 181 196 Generation Number

False Alarm Rate for Evolution Steps = 200

In Table III, we categorize the attack detection rate after 200 evolutionary steps of execution, based on attack type. As can be observed, Probing attacks were detected successfully 95% of the time, R2L were detected 97% of the time, DoS were detected 99% of the time, and finally U2R had the lowest detection rate of nearly 1%. It is observed that the fewest number of rows in the dataset belong to the U2R category, TABLE III ATTACK D ETECTION R ATE C ATEGORIZED BASED ON ATTACK T YPE . Attack Type Probing R2L U2R DoS

Detection Rate 95% 97% 1% 99%

TABLE IV P ERFORMANCE C OMPARISON BETWEEN VARIOUS INTELLIGENT TECHNIQUES FOR INTRUSION DETECTION

FAR 2% 2.25% 00.1% 1.08% 3.4% 3.5%

DR 96.07% 97.72% 99.54% 88.55% 99.56% 93%

Method PCC [15] GMDH [1] AODE [1] NB [1] ESOM [16] MLP [17]

and therefore, the chromosomes modeled and evolved did not provide an accurate representation of such attacks. A Receiver Operating Curve (ROC) provides an insight into relationships between the attack detection and false alarm rates, to statistically model false positive and false negative rates of a given scheme. It is generated by identifying the detection rate for each false positive data point. We illustrate the effect of increasing attack detection on the false alarms generated by the scheme through the ROC curve given in Figure 4. As can be observed, there is a linear decrease in the false alarm rate when the attack detection rate of the scheme converges to 100%. Ideally, the larger the area under the ROC curve, the higher is a scheme’s accuracy in attack detection. The linearity depicted by this figure proves that the proposed technique based on evolutionary steps for modifying chromosomes depicting network traffic connections, is reasonably accurate, and triggers fewer false alarms. Through the following table, a comparison of our results with those acquired through other attack detection schemes are illustrated: Considering the detection rate of the proposed scheme approaching 95% and above for all four attack types of the KDD-99 dataset, we can state with confidence that our approach competes with the best intrusion detection systems that exist in the literature. 100

VI. ACKNOWLEDGEMENTS The authors would like to acknowledge the support provided by the King AbdulAziz City for Science and Technology (KACST) through the Science and Technology Unit at King Fahd University of Petroleum and Minerals (KFUPM), for funding this work through Project No. 08-INF101-4, as part of the National Science, Technology, and Innovation Plan. The authors would like to also thank all Saudi Honeynet Project team members for their feedback, especially Khaled Salah, Marwan Abu-Amara, and Hakim Adiche. R EFERENCES

90 Attack Detection Rate

dataset, namely, KDD-99, to ascertain that all features of network traffic relevant for malicious attack detection, are left unscathed, and redundancies are eliminated. The scheme operates on the pre-processed data by randomly initializing a set of chromosomes (rules) constructed from a set of network traffic parameters (aka features). This type of chromosome formulation can also be termed as offline training of the intrusion detection system. The algorithm is iterated based on the principles of natural evolution, with single point crossover and single feature bit-flip mutation, executed on each chromosome during each evolution step. The results from our simulation analysis show reasonably good attack detection rate with generation size = 200. In addition, the false alarm rates were observed to decrease with increasing attack detection rate, through analysis of the ROC curve. It is anticipated that the proposed scheme is capable of detecting new variants of known malicious attacks, through rule cross over and mutation. As part of our future work, we intend to analyze the effect of crossover on generation of unknown attacks, on varying data sets including those acquired from honeynet traffic, wherein decoys are deployed to lure attackers, and study their consequent behavior. In addition, we intend to modify the cross over and mutation strategies of the scheme, to study its corresponding effect on the attack detection and false alarm rates.

80 70 60 50 40 30 20 10 0 0

10

20

30

40 50 60 False Alarm Rate

70

80

90

100

Fig. 4. Receiver Operating Characteristic (ROC) Curve for the Detection Scheme, Evolution Steps = 200

V. C ONCLUSIONS & F UTURE W ORK We have presented an intrusion detection technique based on the concept of natural evolution (i.e. the Genetic Algorithm). The proposed scheme performs pre-processing of a large

[1] Z. Baig, A. Shaheen, and R. Abdel-Aal, “An aode-based intrusion detection system for computer networks,” in In Proc. of the World Congress on Information Security (WorldCIS), 2011. [2] D. Ashlock, Evolutionary Computation for Modeling and Optimization. Springer, 2006. [3] S. M. Sait and H. Youssef, Iterative computer algorithms with applications in engineering - Solving combinatorial optimization problems. IEEE Computer Society Press, 1999. [4] N. Kayacik, G. Zincir-Heywood, and M. Heywood, “Selecting features for intrusion detection: A feature relevance analysis on kdd 1999 intrusion detection datasets,” in In Proc. of the Third Annual Conf. on Privacy, Security and Trust, 2005. [5] A. Chittur, “Model generation for an intrusion detection system using genetic algorithms,” Systems, 2001. [6] Z. Pan, S. Chen, and G. Hu, “Hybrid neural network and c4.5 for misuse detection,” in In Proceedings of the 2nd Intl’ Conf. on Machine Learning and Cybernetics, 2003, pp. 2–5. [7] G. Folino, C. Pizzuti, and G. Spezzano, “An ensemble-based evolutionary framework for coping with distributed intrusion detection,” Genetic Programming and Evolvable Machines, vol. 11, no. 2, pp. 131–146, 2010. [8] S. Zanero and G. Serazzi, “Unsupervised learning algorithms for intrusion detection,” in In Proceedings of the IEEE Network Operations and Management Symposium, 2008, pp. 1043–1048.

[9] Z. Bankovic, D. Stepanovic, S. Bojanic, and O. Nieto-Taladriz, “Improving network security using genetic algorithm approach,” Computers and Electrical Engineering, vol. 33, no. 5/6, pp. 438–451, 2007. [10] Z. Bankovi, S. Bojani, and O. Nieto-Taladriz, “Evaluating sequential combination of two genetic algorithm-based solutions for intrusion detection,” in In Proc. of the Intl’ Workshop on Computational Intelligence in Security for Information Systems, Advances in Soft Computing, 2009. [11] W. Li, “Using genetic algorithm for network intrusion detection,” in In Proceedings of the United States Department of Energy Cyber Security Group 2004 Training Conference, 2004, pp. 24–27. [12] O. Folorunso, O. O. Akande, A. O. Ogunde, and O. R. Vincent, “Idsomga: A self organising migrating genetic algorithm-based solution for intrusion detection,” Computer and Information Science, vol. 3, no. 4, pp. 80–92, 2010. [13] R. H. Gong, M. Zulkernine, and P. Abolmaesumi, “A software implementation of a genetic algorithm based approach to network intrusion detection,” in Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks. Washington, DC, USA: IEEE Computer Society, 2005, pp. 246–253. [Online]. Available: http://portal.acm.org/citation.cfm?id=1069813.1070290 [14] A. A. A. Islam, M. A. Azad, M. K. Alam, and M. S. Alam, “Security attack detection using genetic algorithm (ga) in policy based networks,” in In Proc. of the Intl Conf on Information and Communication Technology (ICICT), 2007. [15] M. Shyu, S. C. an K. Sarinnapakorn, and L. Chang, “A novel anomaly detection scheme based on principal component classifier,” in In Proc. of the IEEE foundations and new directions of data mining workshop, 2003. [16] A. Mitrokotsa and C. Douligeris, “Detecting denial of service attacks using emergent self-organizing maps,” in Signal Processing and Information Technology, 2005. Proceedings of the Fifth IEEE International Symposium on, 2005, pp. 375–380. [17] M. Sabhnani and G. Serpen, “Application of machine learning algorithms to kdd intrusion detection dataset within misuse detection context,” in In Proc. of the Intl’ Conf. on Machine Learning, Models, Technologies, and Applications, 2003.

Suggest Documents