See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/223565584
Critical study of neural networks in detecting intrusions Article in Computers & Security · October 2008 DOI: 10.1016/j.cose.2008.06.001 · Source: DBLP
CITATIONS
READS
49
54
1 author: Rachid Beghdad Université de Béjaïa 80 PUBLICATIONS 142 CITATIONS SEE PROFILE
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Available from: Rachid Beghdad Retrieved on: 27 August 2016
computers & security 27 (2008) 168–175
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/cose
Critical study of neural networks in detecting intrusions Rachid Beghdad* Faculty of Sciences, 12 Boulevard Bouaouina, Be´jaı¨a 06000, Algeria
article info
abstract
Article history:
This paper presents a critical study about the use of some neural networks (NNs) to detect
Received 22 September 2006
and classify intrusions. The aim of our research is to determine which NN classifies well
Accepted 5 June 2008
the attacks and leads to the higher detection rate of each attack. This study focused on two classification types of records: a single class (normal, or attack), and a multiclass,
Keywords:
where the category of attack is also detected by the NN. Five different types of NNs were
Intrusion detection systems
tested: multilayer perceptron (MLP), generalized feed forward (GFF), radial basis function
Neural networks
(RBF), self-organizing feature map (SOFM), and principal component analysis (PCA) NN. A
Misuse intrusion detection
KDD data subset containing 18,285 records manually chosen was trained in order to be
KDD features
tested on the KDD testing set. Our simulations show that the GFF NN leads to the best con-
Attack categories
fusion matrix in the multiclass case. For the same case, the RBF performs the higher detection rate of the DoS attack category. In the single class case, the PCA NN performs the higher detection rate. ª 2008 Elsevier Ltd. All rights reserved.
1.
Introduction
An intrusion can be defined as a serie of activities aiming at compromising the security of a computer network system (Innella and McMillan, 2001). Intrusions may take many forms: external attacks, internal misuses, network-based attacks, information gathering, denial of service, and so on. Intrusion detection is an important step of protecting the computer network system from intrusions. Intrusion detection systems (IDSs) are used to detect, identify and stop intruders. The administrators can rely on them to find out successful attacks and prevent a future use of known exploits. IDSs are also considered as a complementary solution to firewall technology by recognizing attacks against the network that are missed by the firewall. There are two basic types of intrusion detection: host-based and network-based. Each has a distinct approach to monitoring and securing data, and each has distinct advantages and disadvantages. In short, host-based IDSs examine data held on individual computers that serve as hosts, while networkbased IDSs examine data exchanged between computers.
In addition to that, intrusion detection techniques can be mapped into four classes: anomaly detection, misuse detection, specification-based detection, and model-based detection. Anomaly detection consists of establishing normal behavior profile for user and system activity and observing significant deviations of actual user activity with respect to the established habitual pattern. Misuse detection, refers to intrusions that follow well defined attack patterns that exploit weaknesses in system and application software. In specification-based detection, the correct behaviors of critical objects are manually abstracted and crafted as security specifications, which are compared with the actual behavior of the objects. Intrusions, which usually cause object to behavior in an incorrect manner, can be detected without exact knowledge about them. Model-based intrusion detection compares a process’s execution against a program model to detect intrusion attempts. Neural networks (NNs) have been identified since the beginning as a very promising technique of addressing the intrusion detection problem. Many researches have been performed to this end, and the results varied from
* Tel.: þ213 34 22 33 14. E-mail address:
[email protected] 0167-4048/$ – see front matter ª 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.cose.2008.06.001
computers & security 27 (2008) 168–175
inconclusive to extremely promising. The primary premise of neural networks that initially made it attractive was its generalization property, which makes it suitable to detect day0 attacks. In addition neural networks also posses the ability to classify patterns, and this property can be used in other aspects of intrusion detection systems such as attack classification and alert validation. That is why an artificial neural network is so successful in detecting network intrusions; it is also capable of identifying new attacks. We introduce here a misuse intrusion detection method based on NNs. Our research aims to detect and classify attacks in single/multiclass cases, after training 18,285 records of the KDD data set. To do so, MLP, GFF, RBF, SOFM, and PCA NNs will be tested and compared in both the learning and the testing NN steps. The rest of this paper is organized as follows. Section 2 presents a brief survey of the NNs used in intrusion detection. All the tested NNs are defined in Section 3. Section 4 describes the evaluation data sets. Our experiments and their results are detailed in Section 5. Section 6 concludes the paper.
2.
Neural networks and intrusion detection
2.1.
Neural networks
A neural network (NN) is an information processing system that is inspired by the way biological nervous systems, such as the brain, process information. It is composed of a large number of highly interconnected processing elements (PEs) working with each other to solve specific problems. Each processing element (or neuron) is basically a summing element followed by an activation function. The output of each PE (after applying the weight parameter associated with the connection) is fed as the input to all of the PEs in the next layer. The learning process is essentially an optimization process in which the parameters of the best set of connection coefficients (weights) for solving a problem are found and includes the following basic steps (Theodorios and Koutroumbas, 1999): - Present the neural network with a number of inputs (vectors each representing a pattern). - Check how closely the actual output generated for a specific input matches the desired output. - Change the neural network parameters (weights) to better approximate the outputs.
2.2.
Related research
One of the projects dealing with the approach results in the system is called Hyperview (Debar et al., 1992). It is a system that is built on two components. An ordinary expert system component has a task to monitor logs and, according to the defined policy, search the intrusions. It is a signature based IDS. Second component is a neural network that can observe the behavior of a user and send the alarm if the observed behavior is violated. This work shows how neural network can be used in combination with expert systems and improves intrusion detection qualities.
169
In Ghosh and Schwartzbard (1999), it is shown how neural networks can be employed for the anomaly and misuse detection. The works present an application of neural network to learn previous behavior since it can be utilized to detection of the future intrusions against systems. Experimental results indicate that neural networks are ‘‘suited to perform intrusion state of art detection and can generalize from previously observed behavior’’ according to the authors. Horeis (2003) describes and concludes that the combination of RBF and SOM is convenient to use as an intrusion detection model. They conclude that the ‘‘evaluation of human integration’’ is necessary to reduce the classification error. Experimental results are promising and show that RBF–SOM achieves, compared to RBF, similar or even better results. Lin et al. (1998) design a new intrusion detection system based on the neural network NNID (neural network intrusion detector) and back-propagation algorithm. The experimental results show that NNID can be used and can identify users by what commands they use and how often. In Charron et al. (1998), the work in using neural networks for detecting misuse of programs is described. The authors conclude that their work gives two distributions to the community. First one is a demonstration of how misuse of programs can be detected with help of neural networks and the second is that the result of their work shows ‘‘the benefit of applying anomaly detection to the process level such that an abnormal process behavior can be detected irrespective of individual users’ behavior’’. Heywood et al. (2002) describe an approach to dynamic intrusion detection using SOM. The authors estimate that ‘‘hierarchically built unsupervised neural network approach is able to produce encouraging results’’. Binh Viet (2002) presents a machine learning approach that can be used for the anomaly detection problem. SOM is, according to the authors, a powerful mechanism for modeling the network traffic. Ryan et al. (1997) described an offline anomaly detection system (NNID) which utilized a back-propagation MLP neural network. The MLP was trained to identify users’ profile and at the end of each log session, the MLP evaluated the users’ commands for possible intrusions (offline). The authors described their research in a small computer network with 10 users. Each feature vector described the connections of a single user during a whole day. One hundred most important commands are used to describe a user’s behavior. They used a three layer MLP (two hidden layers). The MLP identified the user correctly in 22 cases out of 24. Cannady (1998) used a three layer neural network for offline classification of connection records in normal and misuse classes. The system designed in this study was intended to work as a standalone system (not as a preliminary classifier whose result may be used in a rule-based system). The feature vector used was composed of nine features all describing the current connection and the commands used in it. A data set of 10,000 connection records including 1000 simulated attacks was used. The training set included 30% of the data. The final result is a two class classifier that succeeded in classification of normal and attack records in 89–91% of the cases.
170
computers & security 27 (2008) 168–175
KDD data set
KDD testing set
22 attack types + 1 normal connection type
Attacks merged in one or more classes
37 attack types + 1 normal connection type
No attack classification
Data codification
Data codification
NN Learning
NN Learning
NN Model
NN Model
End
Test ? No
Test ? Yes
< Attack,
< Normal,
Normal>
DoS,
PRB,
R2L,
U2R >
Fig. 1 – Neural network steps to detect and classify attacks.
In yet another study (Mukkamala, 2002), the authors used three and four layer neural networks and reported results of about 99.25% correct classification for their two class (normal and attack) problem. Cunningham and Lippmann (1999) used NNs in misuse detection. They used an MLP to detect Unix-host attacks by searching for attack specific keywords in the network traffic. Different groups used self-organizing maps (SOM) for intrusion detection (Lichodzijewski et al., 2002).
2.3.
Critics
In most of the previous studies, the implemented systems were neural networks with two possible outputs: normal or anomaly. In these studies, some types of attacks and a set of normal records were included in the data set; however, the output of the neural network was 1 or 0 for normal or attack conditions (the attack type was not determined by the neural network). In one hand, the behavior of some studied NNs in
Table 1 – Distribution of normal and attack connections (records) in our data set Connection type Normal DoS PRB R2L U2R
the learning step is not described, and therefore, we do not know exactly which attacks are learned or not, and what is the classification rate of each attack. In the other hand, we cannot deduce the attack class. That is why, the present study is aimed to analyze the performances of five NNs while solving the single/multiclass problems in which not only the attack records are distinguished from normal ones, but also the attack type is identified. It will also allow us to identify the attacks that are learned by each used NN.
KDD data set 97,278 391,458 4107 1126 52
19.69% 79.24% 0.83% 0.22% 0.01%
Our data set 3000 10,000 4107 1126 52
16.4% 54.68% 22.46% 6.15% 0.28%
3.
The studied neural networks
3.1.
Multilayer perceptron
Multilayer perceptrons (MLPs) are layered feed forward networks typically trained with static back propagation. These networks have found their way into countless applications requiring static pattern classification. Their main advantage is that they are easy to use, and that they can approximate any input/output map. The key disadvantages are that they train slowly, and require lot of training data (typically three times more training samples than network weights).
3.2.
Generalized feed forward
Generalized feed forward (GFF) networks are a generalization of the MLP such that connections can jump over one or more
171
computers & security 27 (2008) 168–175
Table 2 – Attacks contained in our data set DoS
PRB
smurf ¼ 4354
ipsweep ¼ 1247
R2L
Table 4 – Classification rate of each attack U2R
ftp_write ¼ 8
buffer_over flow ¼ 30 back ¼ 2203 nmap ¼ 231 guess_passwd ¼ 53 loadmodule ¼ 9 land ¼ 21 satan ¼ 1589 imap ¼ 12 perl ¼ 3 neptune ¼ 2177 portsweep ¼ 1040 multihop ¼ 7 rootkit ¼ 10 pod ¼ 266 spy ¼ 2 teardrop ¼ 979 phf ¼ 4 warezclient ¼ 1020 warezmaster ¼ 20
layers. In theory, an MLP can solve any problem that a generalized feed forward network can solve. In practice, however, generalized feed forward networks often solve the problem much more efficiently. A classic example of this is the two spiral problem. Without describing the problem, it suffices to say that a standard MLP requires hundreds of times more training epochs than the generalized feed forward network containing the same number of processing elements.
Attack categories
Attack types
U2R
MLP NN
GFF
RBF/ GR/P
SOFM
PCA
buffer_overflow rootkit loadmodule perl
0 0 0 0
53.33 0 11.11 0
0 0 0 0
0 0 0 0
0 0 0 0
Normal
normal
97.93
98.43
97.8
62.67
0
DoS
smurf neptune back teardrop pod land
99.95 100 0 100 0 0
99.97 100 99.77 99.89 0 0
100 99.72 0 0 0 0
99.19 99.67 94.68 93.05 0 0
100 0 0 0 0 0
R2L
warezclient warezmaster spy guess_passwd ftp_write imap multihop phf
55.39 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
93.62 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
PRB
ipsweep satan portsweep nmap
0.24 99.37 98.84 0
99.59 88.73 99.03 53.67
0 0 0 0
98.87 97.86 93.94 0
0 0 0 0
3.3. Radial basis function/generalized regression/ probabilistic (RBF/GR/P) Radial basis function (RBF) networks are nonlinear hybrid networks typically containing a single hidden layer of processing elements (PEs). This layer uses Gaussian transfer functions, rather than the standard sigmoidal functions employed by MLPs. The centers and widths of the Gaussians are set by unsupervised learning rules, and supervised learning is applied to the output layer. These networks tend to learn much faster than MLPs. If a ‘‘generalized regression (GR)/probabilistic (P)’’ net is chosen, all the weights of the network can be calculated analytically. In this case, the number of cluster centers is by definition equal to the number of exemplars, and they are all set to the same variance.
3.4.
Self-organizing feature map
Self-organizing feature maps (SOFMs) transform the input of arbitrary dimension into a one or two dimensional discrete map subject to a topological (neighborhood preserving) constraint. The feature maps are computed using Kohonen unsupervised learning. The output of the SOFM can be used as input to a supervised classification neural network such as the MLP. This network’s key advantage is the clustering produced by the SOFM which reduces the input space into
Table 3 – Distribution of normal and attack connections (records) in KDD testing set Connection type Normal DoS PRB R2L U2R
KDD testing set 60,593 229,853 4166 16,189 228
19.48% 73.90% 1.34% 5.2% 0.07%
Neural networks
representative features using a self-organizing process. Hence the underlying structure of the input space is kept, while the dimensionality of the space is reduced.
3.5.
Principal component analysis
Principal component analysis networks (PCAs) combine unsupervised and supervised learning in the same topology. Principal component analysis is an unsupervised linear procedure that finds a set of uncorrelated features, principal components, from the input. An MLP is supervised to perform the nonlinear classification from these components.
4.
Evaluation data sets
The 1999 version of MIT Lincoln Laboratory – DARPA (Defense Advanced Research Projects Agency) intrusion detection evaluation data were used in this research (KDD data set, 1999). The sample version of the data set included 494,021 connection records. In this version, there are four different known categories of computer attacks including Denial of Service (DoS) attacks, User to Root attacks (U2R), Remote to Local (R2L) attacks and probing (PRB) attacks. These four cited categories contain 22 training attack types and one normal connection category (KDD data set, 1999). In addition to that, the KDD testing set contains 311,029 records belonging to 37 attack types and one normal connection category. So, this set contains 15 more attacks than the KDD data. In DARPA data set, each event (connection) is described with 41 features. Twenty-two of these features describe the
172
computers & security 27 (2008) 168–175
Table 5 – Attacks detection rates
Table 7 – The RBF/GR/P confusion matrix
Categories Attack types MLP NN
GFF
U2R
buffer_overflow rootkit loadmodule perl
0 0 0.0003 0
Normal
normal
16.9
16.96
37.93
DoS
smurf neptune back teardrop pod land
53.3 5.61 0.008 8.07 0 0.001
53.02 5.67 8.61 0.05 0 0
53.68 42.67 99.99 8.38 5.51 0 0 7.92 0 0 0.37 0 0 0 0 0 0 0
R2L
PRB
warezclient warezmaster spy guess_passwd ftp_write imap multihop phf ipsweep satan portsweep nmap
0 0 0 0
0.41 0 0 0 0 0 0 0 0.03 1.49 14.15 0
RBF/ SOFM PCA GR/P 0 0 0 0
0 0 0 0
0.001 0 0 0
8.78
0
0 0.0006 0.0003 0.02 0 0 0.0003 0
0 0 0 0 0 0 0 0
0.15 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0.63 6.60 7.46 0.75
0 0 0 0
0.64 33.76 0.18 0
0 0 0 0
connection itself and 19 of them describe the properties of connections to the same host in last two seconds. Using NNs to detect attacks’ leads, in our study, to train a portion of the KDD data set containing some new attacks, and to test the result on the KDD testing set. This test is possible, if and only if, the used data set is classified. In fact, as mentioned previously, the KDD data set contains less attack types than the testing set. If the attacks are merged in one or more classes, then it will be possible to test, as shown in Fig. 1. Else, it is not possible. A subset of the data that contained the desired attack types and a reasonable number of normal events was selected manually. The final data set used in this study included 18,285 records. The PRB, R2L, and U2R attack classes were totally selected because of their low portion in the KDD data set. Three thousand normal connections (records) and 10,000 DoS connections were randomly selected. Table 1 shows detailed information about the number of all records included in the considered data set. The attack types contained in our data set are detailed in Table 2.
Table 6 – The GFF confusion matrix
Normal U2R DoS R2L PRB PCC
Normal
U2R
61.87 0 0 0 0.05
0 0 0 0 0
DoS
R2L
35.5 0.97 34.61 65.38 98.11 0.07 7.9 91.56 1.02 0 91.66%
Normal U2R DoS R2L PRB PCC
Normal
U2R
67.33 0 6.88 0 0
0 0 0 0 0
DoS 32.66 100 93.06 99.91 44.07 74.49%
R2L
PRB
DR (%)
0 0 0 0 0
0 0 0.05 0.08 55.9
7.86 0 76.67 0 15.45
For the testing step, the KDD testing set was used (Table 3). The KDD testing set contains 311,029 records where some of them describe 15 new attack types (KDD data set, 1999).
5.
Experiments
NeuroSolutions (Lefebvre and Principe, 1994–2005) software was used for the implementation of all the studied NNs, on a Pentium 4 (2.88 GHz), with 512 Mb of memory. This section presents three experiments and discusses the results according to some performance measures: In order to measure the performance of an intrusion detection system, two types of rates are identified, true positive rate (detection rate) according to the threshold value of the neural network, and false positive rate (false alarm). The system reaches its best performance for high value of detection rate and low value of false alarm rate. A good detection system must establish a compromise between the two situations. A confusion matrix (CM) is defined by associating classes as labels for the rows and columns of a square matrix: in the KDD data set, there are five classes, {Normal, PRB, DoS, U2R, R2L}, and therefore the matrix has dimensions of 5 5. An entry at row i and column j, CM(i,j ), represents the number of misclassified patterns, which originally belong to class i yet mistakenly identified as a member of class j. The percent of correct classification (PCC) will be used to evaluate the classification efficiency of the instances belonging to our data set.
5.1.
First experiment
At the first time, we tried to implement an intrusion classification system, to classify each intrusion to one of the learned
Table 8 – MLP confusion matrix PRB
DR (%)
1.67 0 1.82 0.53 98.93
10.19 0 74.83 0.34 15.02
Normal U2R DoS R2L PRB PCC
Normal
U2R
96 17.3 0.07 53.19 0.02
0 0 0 0 0
DoS
R2L
PRB
DR (%)
2.2 0 65.38 0 99.49 0 32.77 0.09 0.36 0 92.54%
1.8 17.31 0.44 13.94 99.61
16.18 0 70.21 0 13.6
173
computers & security 27 (2008) 168–175
attack (KDD data set, 1999), but the results demonstrate that a poor classification rate is obtained in this case. In addition to that, all the new attacks are not detected. This can be interpreted by the fact that the power of the neural networks approach resides in their ability to discriminate the normal comportment from the intrusive one, and the discrimination between attacks remains a hard task and gives limited performances. In addition to that, as mentioned in the previous section, the KDD data set contains less attack than the testing set. Nevertheless, this experiment allowed us to analyze the behavior of each used NN in the training step. This first implemented intrusion detector was a four layers NN: an input layer of 41 neurons, three hidden layers containing, respectively, 28, 23, and 23 neurons, and an output layer of 23 neurons for 22 attack types and 1 normal connection. This structure is referred as: [41; 28; 23; 23; 23]. The mean square error (MSE) in the training step is 0.001. For the PCA NN, eight components were used for the unsupervised learning step.
5.1.1.
Results
In this experiment there are 23 outputs, and therefore, the dimension of the confusion matrixes is (23 23). That is why, we will present here only the classification rate (CR) of each attack, according to the used NN, as detailed in Table 4. According to Table 4, some attacks are not classified at all by the five studied NNs, and some of them are only classified by some NNs. Only the GFF and SOFM NNs highly classified at most eight attacks. The smurf and neptune attacks are highly classified by all the NNs. The loadmodule and nmap attacks are only classified, even poorly, by the GFF NN. The warezclient attack is only classified by the MLP and SOFM NNs. The PCA NN classifies only the smurf attack. The buffer_overflow attack is only classified by the GFF. Only one attack belonging to the R2L category was classified. U2R attacks are not classified at all by MLP, RBF/GR/P, SOFM, and PCA NNs. The ipsweep attack is poorly classified by the MLP, and highly classified by both the GFF and SOFM NNs. To conclude, only the DoS attacks are highly classified by all the NNs. After the testing step, all the studied NNs detect some of the classified attacks with different detection rate. All the results of this step are summarized in Table 5. Unfortunately, all the new attacks are not mentioned here. In fact, in one hand, with the considered NN structure there are 23 outputs for 22 attacks and one normal connection. In the other hand, there are 15 new attacks in the KDD testing set. According to Table 5, only the smurf detection rate (99.99%) of the PCA NN is significant. All the remaining attack detection rates are either poor or very poor. This can be explained by the fact that the NNs learned all the KDD features, and some of these features are redundant and not useful in the NN training step. This leads to the learning by the NN of many useless features which will not only contribute little (if anything) to the detection process, but also will prevent the NN from learning other useful features. The MLP, GFF, RBF, SOFM, and PCA NNs were learned, respectively, in 33 min and 11 s, 52 min and 30 s, 1 h 7 min and 37 s,
Table 9 – SOFM confusion matrix
Normal U2R DoS R2L PRB PCC
Normal
U2R
56.83 0 2.39 0 0
0 0 0 0 0
DoS
R2L
PRB
DR (%)
40.1 0.37 9.61 19.23 94.07 0.02 3.81 82.68 0.46 0.58 88.08%
2.7 71.15 3.58 13.49 98.95
9.92 0 73.53 0.15 16.38
33 min and 11 s, 25 min and 7 s.
5.2.
Second experiment
This second implemented intrusion detector was another four layers NN: an input layer of 41 neurons, three hidden layers containing, respectively, 39, 19, 13 neurons, and an output layer of five neurons for four attack categories and one normal connection. This structure is referred as [41; 39; 19; 13; 5]. The mean square error (MSE) in the training step is 0.001. For the PCA NN, eight components were used for the unsupervised learning step.
5.2.1.
Results
The primary results of this second experiment are a set of five confusion matrixes of the five connection categories (four attack categories, and one normal category), corresponding to the five studied NNs. The detection rate (DR) column and the PCC are added to these matrixes. From Tables 6–10, we can deduce that even if the MLP NN has the higher PCC, the GFF confusion matrix seems to be the best one. In fact, only the GFF NN classifies highly (CR > 90%) three attack categories: DoS, PRB, and R2L. The PCC of GFF is lightly less than the MLP one. U2R attack category is not classified at all (0%) by all the NNs. The R2L attack category is classified significantly and only by the GFF and SOFM NNs. In addition to that, the MLP NN reaches the high CR of the normal connection (96%). For detecting attacks, the (RBF/GR/P) NN leads to the higher DR of the DoS category (76.67%). All the other attacks are poorly detected by all the NNs. Even if the (RBF/GR/P) DoS attack rate is the higher one, it is not a high detection rate (DR < 90%). This can be due to the same reasons cited in the first experiment (see Section 5.1.1). The learning time of the GFF, RBF, MLP, SOFM, and PCA is, respectively, 1 h and 6 min. 1 h and 7 min.
Table 10 – PCA confusion matrix
Normal U2R DoS R2L PRB PCC
Normal
U2R
90.6 75 18.56 74.06 0.53
0 0 0 0 0
DoS
R2L
7.53 5.77 77.84 15.63 1.7 79.39%
0 0 0 0 0
PRB
DR (%)
1.87 19.23 3.59 10.3 97.76
15.37 0 69.58 0 15.03
174
computers & security 27 (2008) 168–175
Table 11 – Comparison between the studied NNs in the normal/attack behavior case Neural networks
Performance parameters
MLP GFF PCA SOFM RBF/GR/P
Learning time
Attack CR (%)
Normal CR (%)
Detection rate (%)
False alarm (%)
PCC (%)
26 min 56 s 39 min 7 s 20 min 7 s 28 min 21 s 1 h 14 min 35 s
99.93 99.88 97.07 98.48 95.56
97.3 97.16 54.1 61.57 65.97
84.25 83.99 93.83 91.28 90.11
15.74 16 6.16 8.71 9.88
99.41 99.34 88.6 91.21 89.73
42 min and 4 s. 25 min and 25 s. 20 min and 5 s.
5.3.
Firstly, a [41; 28; 23; 23; 23] NN structure was used. This experiment shows that:
Third experiment
This third implemented intrusion detector was a four layers NN: an input layer of 41 neurons, three hidden layers composed, respectively, of 43, 21, and 14 neurons, and an output layer of one neuron for ‘‘intrusion’’ or ‘‘normal’’ connection. During the learning phase, the output value is set to ‘‘0’’ for normal records, and to ‘‘1’’ for the attack records. This structure is referred as [41; 43; 21; 14; 1]. The mean square error (MSE) in the training step is 0.001. For the PCA NN, eight components were used for the unsupervised learning step. The results of this experiment are detailed in Table 11. According to Table 11, the PCA NN performs the higher detection rate (93.83%) and the lower false alarm rate (6.16%), in a minimum learning time (20 min and 21 s). This result may be due to the fact that the PCA NN detects only and highly the smurf attack (see Table 5). In addition to that, both the higher DR of the normal/attack patterns (respectively, 97.3% and 99.93%) are reached by the MLP NN. These results are competitive with those found in some previous works (Ben Amor et al., 2004; Faraoun and Boukelif, 2006), as shown in Table 12.
6.
Conclusion
A critical study of five NNs to classify the normal and attack patterns and the category of attack has been presented. The main contribution of our approach is to analyze the performances of the MLP, GFF, RBF, SOFM, and PCA NNs, while detecting attacks and classifying them in one or more classes. Three ‘‘four layers’’ NN structure were used in our tests for the classification of records.
some tested NNs are able to highly classify only some attacks, after the testing step, all the studied NNs detect some of the classified attacks with different detection rate, the maximum number of the classified attacks is reached by both the GFF and SOFM NNs (eight attacks are highly classified), only the smurf attack is highly detected (99.99%) while using the PCA NN, unfortunately, all the new attacks are not detected at all with this NN structure, because there are more attacks in the testing set (37) than the data subset (23 neurons of the output layer correspond to only 23 attacks). Secondly, a [41; 39; 19; 13; 5] NN structure was used. The results were five confusion matrixes of the five connection categories (four attack categories, and one normal category) corresponding to the five studied NNs, the GFF NN is the better NN in the classification phase, with a high classification rate of DoS, R2L, and PRB attacks, (RBF/GR/P) is the better NN for detecting attacks. It detects DoS attacks with a 76.67% detection rate. Nevertheless, this DR is poor (DR < 90%). except the DoS attack, all the other attacks are very poorly detected by all the used NNs. Finally, a [41; 43; 21; 14; 1] NN structure was used. This experiment shows that: the PCA NN performs the higher detection rate (93.83%) and the lower false alarm rate (6.16%), in a minimum learning time (20 min and 21 s).
Table 12 – Comparison between some previous works and our approach Performance parameters
Normal CR Attack CR PCC Detection rate False alarm (%) Learning time
Method Naive Bayes (Ben Amor et al., 2004) (%)
Classification tree (Ben Amor et al., 2004) (%)
98.59 89.72 91.45 – – –
99.39 91.47 93.02 – – –
K-means NN (Faraoun and Boukelif, 2006) – – – 92% 6.21% 28 min 21 s
Our approach 97.3% (MLP) 99.93% (MLP) 99.41% (MLP) 93.83% (PCA) 6.16% (PCA) 26 min 56 s (PCA)
computers & security 27 (2008) 168–175
As future work, we will focus on the study of another set of NNs to detect and classify attacks, and how to improve the detection rate of U2R, PRB, and R2L attacks using NNs.
references
Ben Amor N, et al. Naı¨ve Bayes vs decision trees in intrusion detection systems. In: The proceeding of the ACM symposium on applied computing, Cyprus; 2004. p. 420–4. Binh Viet N. Self organizing map (SOM) for anomaly detection, ; 2002 [accessed August 2006]. Cannady J. Artificial neural networks for misuse detection. In: The proceedings of the 1998 national information systems security conference (NISSC’98), Arlington, VA, ; 1998 [accessed August 2006]. Charron F, Ghosh A, Wanken J. Detecting anomalous and unknown intrusions against programs. In: 14th Annual Computer Security Applications Conference ; 1998. p. 259–67 [accessed August 2006]. Cunningham R, Lippmann R. Improving intrusion detection performance using keyword selection and neural networks. In: The proceedings of the international symposium on recent advances in intrusion detection, Purdue, IN, ; 1999 [accessed on August 2006]. Debar H, Becker M, Siboni D. A neural network component for an intrusion detection system. In: The proceedings of the 1992 IEEE symposium on research in computer security and privacy, Oakland, CA; May 1992. p. 240, 250. Faraoun KM, Boukelif A. Neural networks learning improvement using the K-means clustering algorithm to detect network intrusions. International Journal of Computational Intelligence 2006;3(2):161–8. Ghosh AK, Schwartzbard A. A study in using neural networks for anomaly and misuse detection. In: The proceeding on the 8th USENIX security symposium, ; 1999 [accessed August 2006]. Heywood M, Lichodzijewski P, Zincir-Heywood N. Dynamic intrusion detection using self-organizing maps. In: The annual Canadian information technology security symposium, ; 2002 [accessed August 2006]. Horeis T. Intrusion detection with neural networks-combination of self-organizing maps and radial basis function networks for
175
human expert integration. Computational Intelligence Society, Research report, ; 2003 [accessed August 2006]. Innella P, McMillan O. An introduction to intrusion detection systems. Tetrad Digital Integrity, LLC, ; 2001 [accessed August 2006]. KDD data set, ; 1999 [accessed on July 2006]. Lefebvre Curt, Principe Jose. NeuroSolutions, version 5.03. Copyright ª. NeuroDimension Inc., www.nd.com; 1994–2005. Lichodzijewski P, Zincir Heywood AN, Heywood MI. Host-based intrusion detection using self-organizing maps. In: The proceedings of the 2002 IEEE world congress on computational intelligence, Honolulu, HI; 2002. p. 1714–9. Lin M, Miikkulainen R, Ryan J. Intrusion detection with neural networks’’. Advances in Neural Information Processing Systems 1998:943–9. Mukkamala S. Intrusion detection using neural networks and support vector machine. In: The proceedings of the 2002 IEEE international joint conference on neural networks, Honolulu, HI, ; 2002 [accessed August 2006]. Ryan J, Lin M, Miikkulainen R. Intrusion detection with neural networks. In: AI approaches to fraud detection and risk management: papers from the 1997 AAAI workshop, Providence, RI; 1997. p. 72–9. Theodorios S, Koutroumbas K. Pattern recognition. Cambridge: Academic Press; 1999.
Rachid Beghdad received his computer science engineer degree in 1991 from the Polytechnical school of engineers, Algiers, Algeria. He received his Master computer science degree from Clermont-Ferrand University, France in 1994. He earned his Ph.D. computer science degree from Toulouse University, France in 1997. He is a reviewer for some journals, such as the Advances in Engineering Software journal, Elsevier, UK, the Computer Communications journal, Elsevier, UK, the WESEAS transactions on computer journal, Greece, and the IJCSSE journal, UK. He was also a reviewer for the CCCT’04 and CCCT’05 International Conferences, Austin, Texas, USA. His main current interest is in the area of computer communication systems including intrusion detection methods, unicast and multicast routing protocols, real-time protocols, and wireless LAN protocols.