HIDS are installed locally on host machines and detects intrusions by examining system calls, application logs ... 6 Dst-bytes. 20 Num-outbound-cmds ..... metrics based on Random Forests (RF). QIIA uses ..... (2)QIIA2(Center Data). 5. 23, 32 ...
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 3, Sep 2012 1-25 © TJPRC Pvt. Ltd.,
A STUDY OF FEATURE SELECTION METHODS IN INTRUSION DETECTION SYSTEM: A SURVEY AMRITA & P AHMED Department of CSE, Sharda University, Greater Noida, India
ABSTRACT Nowadays, detection of security threats, commonly referred to as intrusion, has become a very important and critical issue in network, data and information security. Therefore, an intrusion detection system (IDS) has become a very essential component in computer or network security. Prevention of such intrusions entirely depends on detection capability of Intrusion Detection System (IDS). As network speed becomes faster, there is an emerge need for IDS to be lightweight with high detection rates. Therefore, many feature selection approaches/methods are proposed in the literature. There are three broad categories of approaches for selecting good feature subset as filter, wrapper and hybrid approach. The aim of this paper is to present a survey of various feature selection methods for IDS on KDD CUP’99 bench mark dataset based on these three categories and different evaluation criteria.
KEYWORDS : Feature selection, intrusion detection systems, filter method, wrapper method, hybrid method.
INTRODUCTION In the last three decades computer networks have grown in size and complexity drastically. This tremendous growth has posed challenging issues in network and information security, and detection of security threats, commonly referred to as intrusion, has become a very important and critical issue in network, data and information security. The security attacks can cause severe disruption to data and networks. Therefore, Intrusion Detection System (IDS) becomes an important part of every computer or network system. An IDS can monitor computer or network traffic and identify malicious activities that compromise the integrity, confidentiality, and availability of information resources and alerts the system or network administrator against malicious attacks. Since, an IDS needs to examine very large data with high dimension even for small network. Due to this, IDS has to meet the challenges of low detection rate and large computation. Therefore, Feature selection is a very important issue and plays a key role in intrusion detection in order to achieve maximal performance. It is one of the important and frequently used techniques in data preprocessing for selecting a subset of relevant features to build robust IDS. Feature selection is the selection of that minimal cardinality feature subset of original feature set that retains the high detection accuracy as the original feature set [1]. The efficient feature subset can improve the training and testing time that helps to build lightweight IDS guaranteeing high detection rates and makes IDS suitable for real time and on-line detection of attacks.
2
Amrita & P Ahmed
This survey paper categorizes the feature selection algorithms that have been developed for IDS building, critically evaluates their usefulness, and recommends ways of enhancing the quality of feature selection algorithms. The paper is organized into the following sections. Intrusion Detection Systems is reviewed in Section 2. Section 3 gives the details of the Datasets and Performance Evaluation used in this survey. In Section 4, different methodologies of feature selection in IDSs are discussed. Related research in the literature for feature selection methods together with their performance is addressed in Section 5. Section 6 summaries the different results reported in the literature in tabular form. Section 7 concludes and discusses future research.
IINTRUSION DETECTION SYSTEM An intrusion is defined as an attempt to compromise the confidentiality, integrity, availability, unauthorized use of resources, or to bypass the security mechanisms of a computer system or network and James P. Anderson introduced Intrusion Detection (ID) early in 1980s [2]. Dorothy Denning proposed several models for IDS in 1987 [3]. Ideally, Intrusions Detection (ID) should be an intelligent monitoring process of events occurring in system and analyzing them for security violations policies. An IDS is required to have a high attack Detection Rate (DR) with a low False Alarm Rate (FAR). Refer [4] for the organization of a generalized IDS. Approaches of IDS based on detection are anomaly based and misuse based intrusion detection approach. In anomaly based intrusion detection approach [5], the system first learns the normal behavior or activity of the system or network to detect the intrusion. In misuse or signature based intrusion detection approach [6], the system first define the attack and the characteristics of the attack that distinguish this attack from normal data or traffic to detect the intrusion. Approaches of IDS based on location of monitoring are Network based intrusion detection system (NIDS) [7] and Host-based intrusion detection system (HIDS)[8]. NIDS detects intrusion by monitoring network traffic in terms of IP packet. HIDS are installed locally on host machines and detects intrusions by examining system calls, application logs, file system modification and other host activities made by each user on a particular machine.
DATASETS AND PERFORMANCE EVALUATION This section summarizes the popular benchmark datasets and performance evaluation measures in the intrusion detection domain to evaluate different feature selection methods in intrusion detection system
3
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
DATASETS The KDD CUP 1999 [9] benchmark datasets are used to evaluate different feature selection method for IDS. It consists 4,940,000 connection records for training data set and 311,029 connection records for test data set. The training set contains 24 attacks and the test set contains 38 attacks. Since the training and test set are prohibitively large, another 10% of the KDD Cup’99 dataset is frequently used [9]. Each connection had a label of either normal or the attack type, with exactly one specific attack type falls into one of the four attacks categories [10] as: Denial of Service Attack (DoS), User to Root Attack (U2R), Remote to Local Attack (R2L) and Probing Attack. Each connection record consisted of 41 features and are labeled in order as 1,2,3,4,5,6,7,8,9,.....,41 and falls into the four categories are shown in Table 1: Category 1 (1-9) : Basic features of individual TCP connections Category 2 (10-22) : Content features within a connection suggested by domain knowledge Category 3 (23-31) : Traffic features computed using a two-second time window Category 4 (32-41) : Traffic features computed using a two-second time window from destination to host Table 1: Lists of features in the KDD cup 99 Feature # Feature Name Feature # 1 Duration
15
Feature Name Su-attempted
Feature # 29
Feature Name Same-srv-rate
2
Protocol-type
16
Num-root
30
Diff-srv-rate
3
Service
17
Num-file-creations
31
Srv-diff-host-rate
4
Flag
18
Num-shells
32
Dst-host-count
5
Src-bytes
19
Num-access-files
33
Dst-host-srv-count
6
Dst-bytes
20
Num-outbound-cmds
34
7
Land
21
Is-hot-login
35
8
Wrong-fragment
22
Is-guest-login
36
Dst-host-same-srvrate Dst-host-diff-srvrate Dst-host-same-srcport-rate
9
Urgent
23
Count
37
10
Hot
24
Srv-count
38
11
Num-failed-logins
25
Serror-rate
39
12
Logged-in
26
Srv-serror-rate
40
13
Num-compromised
27
Rerror-rate
41
14
Root-shell
28
Srv-rerror-rate
Dst-host-srv-diffhost-rate Dst-host-serror-rate Dst-host-srv-serrorrate Dst-host-rerror-rate Dst-host-srv-rerrorrate
Performance Evaluation The effectiveness of an IDS is evaluated by its ability to make correct predictions. According to the real nature of a given event compared to the prediction from the IDS, four possible outcomes are shown in Table 2, known as the confusion matrix [4]. True Positive Rate(TPR) or Detection Rate(DR), True Negative Rate(TNR), False Positive Rate (FPR) or False Alarm Rate (FAR) and False Negative
4
Amrita & P Ahmed
Rate(FNR) are measures that can be applied to quantify the performance of IDSs [4] based on the above confusion matrix.
Table 2. Confusion Matrix Predicted
Negative Class (Normal)
Positive Class (Attack)
Negative Class (Normal)
True Negative (TN)
False Positive (FP)
Positive Class (Attack)
False Negative (FN)
True positive (TP)
Actual
FEATURE SELECTION Real time intrusion detection is merely impossible due to the huge amount of data flowing on the Internet. Feature selection can reduce the computation and model complexity. Research on feature selection started in early 60s [11]. Feature selection is a technique of selecting a subset of relevant features by removing most irrelevant and redundant features [12] from the data for building robust learning models [13].
Process of Feature Selection Feature selection processes involve four basic steps in a typical feature selection method [13] shown in Figure 2. They are generation procedure to generate the next candidate subset; an evaluation function to evaluate the subset under examination; a stopping criterion to decide when to stop; and a validation procedure to check whether the subset is valid. Figure 2 demonstrates the feature selection process to determine and validate a best feature subset.
Figure 1 : Feature selection process with validation [13].
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
5
METHODS FOR FEATURE SELECTION Blum and Langley [14] divide the feature selection methods into three categories named filter, wrapper and hybrid (embedded) method. These methods are currently used in intrusion detection. The filter method [15][16] selects features subsets based on the general characteristics of the data. Filter method is independent of classification algorithms. Filter algorithm [18] uses external learning algorithm to evaluate the performance of selected features. The wrapper method [19] “Wrap around” the learning algorithm. It uses one predetermined classifier to evaluate features or feature subsets. Wrapper algorithm [18] uses a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Many feature subsets are evaluated based on classification performance and best one is selected This method is more computationally expensive than the filter method [17][19]. The hybrid method [17][20] combines wrapper and filter approach to achieve best possible performance with a particular learning algorithm. More efficient search strategies and evaluation criteria are needed for feature selection with large dimensionality in hybrid algorithm [18] to achieve similar time complexity of filter algorithms. These methods are discussed in detail in Section 5 and summarized in section 6.
RELATED WORKS In this section, we thoroughly discusses the different feature selection methods used in intrusion detection based on filter, wrapper and hybrid method, number of feature selected, feature number (according to Table 1), its performance on KDD Cup’99 dataset, strength, limitation and future work reported in the literature. Filter Method A feature selection algorithm, FSMDB based on DB index criterion is proposed in [21] (Zhang et al., 2004). Criterion function is constructed according to the characters of DB index criterion. 24 features {features no. : 6, 5, 1, 34, 33, 36, 32, 8, 27, 29, 28, 30, 26, 38, 39, 35, 13, 24, 23, 11, 3, 10, 12 and 4} are selected and tested using two classifiers BP network and SVM. Classification accuracy of FSMDB algorithm by classifiers BP network and SVM are 0.1017 and 0.056 respectively. This method can be used for supervised or unsupervised classification problems but has computational complexity in unsupervised learning mode. Future Work: To find a better approach to reduce high computational complexity in unsupervised learning mode. Two neural network methods: (1) neural network principal component analysis (NNPCA) and (2) nonlinear component analysis (NLCA) are presented in [22] (Kuchimanchi et al., 2004). The number of significant features extracted from methods PCA, NNPCA and NLCA are 19, 19 and 12. The first 19 selected features based on the results of Scree test and critical eignvalues test are {feature no. : 5, 6, 1, 22, 21, 31, 30, 3, 4, 2, 16, 10, 13, 34, 32, 27, 24, 37, 23 and 36}. The performance of the Non-linear classifier (NC) and the CART decision tree classifier (DC) are tested on four datasets (Table 3). DC has
6
Amrita & P Ahmed
relatively high detection accuracies and low false positive rates. Future Work: This work can be extended on quantitative measures to find optimal combinations of classifiers and feature extractors for IDS.
Table 3 : False Positive Rates (FPR) And Detection Accuracies (DA) for NC and DC on the four Datasets DATASET
#Features
FPR
DA
NC
DC
NC
DC
ORIGDATA
41
8.2821
0.2268
99.0198
99.9428
PCADATA
19
29.4105
0.2609
99.1161
99.9167
NNPCADATA
19
50.5463
0.4922
98.8206
99.7516
NLDATA
12
51.2756
0.8227
97.2306
99.6359
RICGA (ReliefF Immune Clonal Genetic Algorithm), a combined feature subset selection method based on the ReliefF algorithm, Immune Clonal selection algorithm and GA is proposed in [23] (Zhu et al., 2005). BP networks is used as classifier.. RICGA has higher classification accuracy (86.47%) for small size feature subsets (8) than ReliefF-GA. Features are not mentioned in the paper. This paper [24] (Zainal et al., 2006) investigated the effectiveness of Rough Set (RS) theory in identifying important features and used as a classifier. The 6 significant features obtained are {feature no.: 41, 32, 24, 4, 5 and 3}. Classification results obtained by Rough Set are compared with Multivariate Adaptive Regression Splines (MARS), Support Vector Decision Function (SVDF) and Linear Genetic Programming (LGP). Classification accuracy of RS is ranked second for normal category and performed almost same to MARS and SVDF for attack category. Future Work: This work can be extended in terms of accuracy by focusing on fusion of classifiers after a set of optimum feature subset is obtained. Wong and Lai (2006) [25] combined Discriminant Analysis (DA) and Support Vector Machine (SVM) to detect intrusion for anomaly-based network IDS. Nine features (feature no. : 12, 23, 32, 2, 24, 36, 31, 29 and 39) are extracted by Discriminant Analysis and evaluated by SVM. The TN (%), FP(%), FN(%) and TP(%) of the proposed method are 99.58%, 0.42%, 9.93% and 90.07% respectively. Future Work: Multiple Discriminant Analysis (MDA) can be applied to find the optimal feature set for each type of attack. Li et al. (2006) [26] proposed a lightweight intrusion detection model. Information Gain and Chi-Square approach are used to extract important features and Classic Maximum Entropy (ME) model is used to learn and detect intrusions. The top 12 important features selected by both methods are {feature no.: 3, 5, 6, 10, 13, 23, 24, 27, 28, 37, 40 and 41}. Experimental results are shown in Table 4. Future Work: This model can be applied in realistic environment to verify its real-time performance and effectiveness.
7
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
Table 4. Detection Results All 41 features
Selected features
Class
Testing Time(s)
Acc.(%)
Testing Time(s)
Acc.(%)
Normal
1.28
99.75
0.78
99.73
Probe
2.09
99.8
1.25
99.76
DoS
1.93
100
1.03
100
U2R
1.05
99.89
0.7
99.87
R2L
1.02
99.78
0.68
99.75
Tamilarasan et al. (2006) [27] performed different feature selection and ranking methods on the KDD Cup’99 dataset. Chi-Square analysis, logistic regression, normal distribution and beta distribution experiments are performed for feature selection. The 25 most significant features ranked by Chi-square test are {feature no.: 35, 27, 41, 28, 40, 30, 34, 3, 33, 12, 37, 24, 29, 2, 13, 8, 36, 10, 26, 39, 22, 25, 5, 1, 38}. Experiments are performed for normal, probe, DoS, U2R, and R2L using resilient back propagation neural network. The overall accuracy of the classification is 97.04% with a FPR of 2.76% and FNR of 0.20%. Fadaeieslam et al. (2007) [28] proposed a feature selection method based on Decision Dependent Correlation (DDC). Mutual information of each feature and decision is calculated and top 20 important features {feature no.: 3, 5, 40, 24, 2, 10, 41, 36, 8, 13, 27, 28, 22, 11, 14, 17, 18, 7, 9 and 15} are selected and evaluated by SVM classifier. The classified result is 93.46% and it outperforms Principal Component Analysis PCA. Shina Sheen and R Rajesh (2008) [29] considered different methods: Chi square, Information Gain and ReliefF for feature selection. Top 20 features {feature no.: 2, 3, 4, 5, 12, 22, 23, 24, 27, 28, 30, 31, 32, 33, 34, 35, 37, 38, 40 and 41} are selected and evaluated using decision tree (C4.5). The Classification accuracy of Chi Square, Info Gain and ReliefF are 95.8506%, 95.8506% and 95.6432% respectively. In [30] (Kiziloren and German, 2009), Principal Component Analysis (PCA) is used for feature selection to increase quality of extracted feature vectors and Self Organizing Network (SOM) as a classifier to detect network anomalies. The highest success rate 98.83% of the system is obtained when number of feature vector size equals to 10. Features are not mentioned in the paper. The average success rate of the system without using PCA is 97.76%. PCA provides faster classification operation which is important for a real-time system. Suebsing and Hiransakolwong (2009) [31] proposed a combination of Euclidean Distance and Cosine Similarity to select robust features subsets with smaller size. Euclidean Distance is used to select the features to detect the known attacks and Cosine Similarity is used to select the features to detect the unknown attacks to build a model. The known detection method extracts 30 important features {feature no. : 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40, 41, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19
8
Amrita & P Ahmed
and 22}. The unknown detection method extracts 24 important features {feature no. : 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40, 41, 3, 4, 23, 24, 29, 32, 33, 34 and 36}. 15 features {feature no.: 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40 and 41} are selected by both methods. The C5.0 method is used as a classifier. The experimental results are shown in Table 5. Table 5: Results for known and unknown attack Parameter
Known attack
Unknown attack
Full Set (41)
Known detection method(30)
Full Set (41)
Unknown detection method(24)
Overall TP %
97.95
98.12
53.31
68.28
Overall FP %
2.04
1.87
46.69
31.72
Time to Build Model(s)
75
51
75
45
A new approach named Quantitative Intrusion Intensity Assessment (QIIA) is proposed in the paper [32] (Lee et al., 2009). QIIA evaluates the proximity of each instance of audit data using proximity metrics based on Random Forests (RF). QIIA uses Random Forests (RF) to select important features by using the numerical feature importance of RF. Two approaches QIIQ1 and QIIA2 are proposed to determine the threshold parameters value. The top 5 important features selected are {feature no.: 23, 32, 10, 6 and 3}. Only DoS attacks are used since other attack types have very small number of instances. The experimental results show that the detection rates (DR) of QIIA1 and QIIA2 are 97.94 and 99.37 respectively. An entropy-based traffic profiling scheme for detecting security attacks is presented in [33] (Lee and He, 2009). Only denial-of-service (DoS) attack is focused in this paper. The top six features ranked by the accuracy are {feature no.: 5, 6, 31, 32, 36 and 37}. The true positive rate (TPR) of this scheme is 91%. [34] (Xiao et al., 2009) presented a two-step feature selection algorithm. It eliminates two kinds of features: irrelevant features in first step and redundant features in second step. 21 features {feature no.: 1, 3, 4, 5, 6, 8, 11, 12, 13, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36 and 39} are selected and evaluated using C4.5 algorithm and Support Vector Machine (SVM). The Detection Rate (%), False Alarm Rate (%) and Processing Time of selected features (All features) are 86.3 (87.0), 1.89 (1.85) and 15.163 sec (21.891 sec) respectively. A novel approach for selecting features and comparing the performance of various BN classifiers is proposed in [35] (Khor et al., 2009). Two feature selection algorithms Correlation-based Feature Selection Subset Evaluator (CFSE) and Consistency Subset Evaluator (CSE) and domain experts are utilised to form the proposed feature set.. This feature set contains 7 features as {feature no.: 3, 6, 12, 23, 32*, 14* and 40*}. Bayesian Network (BN) is employed as a classifier. The classification accuracy (%) of the BN for Normal, DoS, Probe, R2L and U2R types are (99.8, 99.9, 89.4, 91.5 and 69.2%). *: Features that were selected based on domain knowledge.
9
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
Bahrololum et al. (2009) [36] used three machine learning methods : Decision Tree(DT), Flexible Neural Tree (FNT) and Particle Swarm Optimization (PSO) for feature selection. The five important features {feature no.: 10, 17, 14, 13 and 11} are selected depending on the contribution of the variables for the construction of the decision tree. The experimental results are shown in (Table 6). Table 6 : Detection Performance using DT, FNT and PSO Methods Attack Class
DT
FNT
PSO
Normal
9.96%
99.19%
95.69%
DoS
100%
98.75%
90.41%
R2L
99.02%
99.09%
98.10%
U2R
88.33%
99.70%
100%
Probe
99.66%
98.39%
95.53%
An automatic feature selection method based on filter method is proposed by Nguyen et al. (2010) [37]. The globally optimal subset of relevant features is found by the Correlation Feature Selection (CFS) and evaluated by C4.5 and BayesNet. The selected features for Normal&DoS are 3 {5, 6 and 12}; for Normal&Probe are 6 {5, 6, 12, 29, 37 and 41}; for Normal&U2R is 1 {14}; for Normal&R2L are 2 {10 and 22}. Average classification accuracies of C4.5 and BayesNet are 99.41% and 98.82% respectively. Chen et al. [38] (2010) proposed a novel inconsistency-based feature selection method. Data consistency is applied to find the optimal features and evaluated by decision tree method (C4.5). The proposed method is compared with CFS (Table 7). Table 7 : Performance Comparision (CC: Classification Correctness) Attac k Type
All features
Proposed Method
CFS Method
CC(% )
Time(s )
Features
CC(% )
Time(s )
Features
CC(% )
Probe
99.85
0.66
4(3,5,35,36)
99.77
0.16
4(5,6,25,37)
94.35
DoS
99.94
1.08
4(3,4,10,23)
99.81
0.22
4(2,5,16,22)
99.32
U2R
100
0.11
2(3,41)
100
0.09
9(3,10,24,29,31,32,33,34,40)
100
R2U
98.99
0.22
5(3,5,12,32,35)
99.13
9.13
5(3,5,10,24,33)
98.05
All
99.5
3.72
8(1,3,5,25,32,34,36,40 )
99.45
0.48
11(2,3,4,5,6,10,23,24,25,36,3 7)
99.67
Ti me (s) 0.2 7 0.3 3 0.0 8 0.1 1 6.2 8
A novel unsupervised statistical varGDLF, a variational framework for the GD mixture model with localized feature selection (GDLF) approach is proposed in [39] (Fan et al., 2011) for detecting network based attacks. Eleven features {feature no.: 1, 5, 12, 15, 18, 21, 22, 29, 33, 38 and 41} are selected. The performance of varGDLF approach is compared with other four variational mixture models
10
Amrita & P Ahmed
and it outperforms with the highest accuracy rate (85.2%), the lowest FP rate (7.3%) and the most accurately detected number of components (4.95). Accuracy rate for Normal, DOS, R2L, U2R and Probing is 99.5, 96.5, 75.4, 69.6 and 85.1%, respectively. FP rate is 11.5, 0.8, 1.4, 11.5 and 11.3%, respectively. An improved information gain (IIG) algorithm is proposed in [40] (Xian et al., 2011) based on feature redundancy.. Twenty two features are selected after applying Information Gain (IG) algorithm and then 12 {feature no.:2, 3, 5, 6, 8, 10, 12, 23, 25, 36, 37 and 38} features are selected after applying IIG. Naive Bayes (NB) is used to carry out the experiment on the three feature set as the original feature set (41 features), feature subset 1 (22 features) and feature subset 2(12 features). The Processing times (s) of the three feature subsets are 8.34, 4.16 and 2.08; the Detection Rates (DR) (%) are 96.187, 96.407 and 96.801; the False Positive Rates (FPR) (%) are 5.22, 2.58 and 1.02 respectively.
WRAPPER METHOD In paper [41] (Middlemiss and Dick, 2003), a simple Genetic Algorithm (GA) is used to evolve weights for the features and k-nearest neighbour (KNN)classifier is used as fitness function of the GA and also as classifier. Top five ranked features for each class are selected {DoS-23,29,1,11,24; R2U24,3,12,23,36; U2R-24,6,31,41,17; Probe-2,37,30,3,6}. The result shown indicates an increase in intrusion detection accuracy. Mukkamala and Sung (2003) [42] presented two methods to rank the important features: (1)Performance-Based Ranking Method (PBRM) and (2) Support Vector Decision Function Ranking Method (SVDFRM). Thirty one features are selected by union of important features for each of the 5 classes ranked by PBRM. In SVDFRM, the union of important features for each of the 5 classes are 23. The 8 important features identified by both ranking methods are {feature no.: 1, 3, 5, 6, 23, 24, 32 and 33}. Experiments are performed by both methods with classifier SVM (Table 8). Future Work: Ongoing experiments include making 23-class (22 attack classes plus normal) feature identification using SVMs.
Table 8 : Performance of SVMs Ranked by PBRM (31)
Ranked by SVDFM (23)
Class
Training Time (s)
Testing Time(s)
Acc.(%)
Training Time (s)
Testing Time(s)
Acc.(%)
Normal
7.67
1.02
99.51
4.85
0.82
99.55
Probe
44.38
2.07
99.67
36.23
1.4
99.71
DOS
18.64
1.41
99.22
7.77
1.32
99.2
U2R
3.23
0.98
99.87
1.72
0.75
99.87
R2L
9.81
1.01
99.78
5.91
0.88
99.78
11
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
The Ant Colony Optimization (ACO) based intrusion feature selection algorithm is proposed in [43] (Gao et al., 2005). The fisher discrimination rate is adopted as the heuristic information for ants’ traversal. The Least Square based SVM classifier is adopted as the base classifier to evaluate the generated feature subset. The number of features selected by applying ACO-SVM methods is 11 for Probe, 9 for DoS, and 14 for U2R & R2L. Features name is not mentioned in this paper. Table 9 shows the experimental results.
Table 9: Performance of ACO-SVM Type
#Feature
Correct Classification Rates
False Positive Rates
Average Detection Time
Probe
11
99.40%
0.35%
0.074
DoS
9
95.20%
3.24%
0.031
U2R&R2L
14
98.70%
1.60%
0.078
This paper [44] (Banković et al., 2007) investigated the possibility to increase the detection rate (DR) of U2R attacks in misuse detection. Extracted features obtained by using Principal Component Analysis(PCA) and Multi Expression Programming(MEP) are {U2R-14, 33; DoS- 1, 5, 39; Normal- 3, 10, 12}. Genetic algorithm is employed to implement rules for detecting various types of attacks. Additional two more rule sets are deployed to re-check the decision of the rule set for detecting U2R attacks. The experiments show (Table 10) that this system outperforms the best-performed model reported in literature.
Table 10. Performance of the System #Rules
DR Total System
FPR
U2R Rule System
Total System
U2R Rule System
50
50
46.3
0.0055
0.007
75
77.8
77.8
7.2
10.2
100
100
100
16.54
27.4
Chen et al. (2007) [45] presented a wrapper based feature selection method. A random search method named modified random mutation hill climbing (MRMHC) is introduced as search strategy to select features subsets and Support Vector Machines (SVMs) as classifier. The experiments are shown in Table 11. Future Work: This method can be improved on search strategy and evaluation criterion.
12
Amrita & P Ahmed
Table 11: Selected feature subsets, time for selecting process for different feature selection algorithm, average time of building and testing process for ALL Attacks, DOS, PROBE, R2L and U2R Attack Type
ALL
DOS
PROBE
R2L
U2R
#Features
5
4
5
3
5
Selected features
3,5,23,33,34
5,12,23,34
1,3,5,23,37
1, 5,6
1,3,6,14,33
Time of Selecting Process(h)
GASVMs MRMHCSVMs
1.3
0.5
4
1.5
1.5
0.4
0.2
2.2
0.8
0.6
Avg. Time to Build Process(s)
All
78
136
245
317
193
Selected
30
31
96
24
78
All
18
22
49
55
50
selected
6
5
17
7
15
Avg. Time to Test Process(s)
A multi-objective genetic fuzzy intrusion detection system (MOGFIDS) is proposed by Tsang et al. (2007) [46]. The MOGFIDS is used as a genetic wrapper to search for a near-optimal feature subset. The 27 features selected by MOGFIDS are {feature no.: 2 (tcp, udp, icmp), 5, 6, 7, 8, 9, 11, 12, 13, 14, 17, 18, 22, 23, 25, 30, 32, 33, 34, 35, 36, 37, 38, 39 and 40}. The MOGFIDS has second highest ACC (99.24%) and lowest FPR (1.1%) among the wrappers in the paper. Future Work: This can be applied to other complex problem domains such as face recognition and DNA computing. This paper [47] (Wang and Gombault, 2008) proposed a system that extracts important features from raw network traffic only for DDoS attacks in real computer networks. The first 9 important features {feature no.: 23, 32, 37, 33, 5, 24, 31, 39 and 3} based on rank are selected by Information Gain and Chisquare method and evaluated by Bayesian Networks and decision trees (C4.5) shown in Table 12. Future Work: A practical real-time system for fast detection of DDoS attacks can be developed. Table 12: Detection rate, False Positive Rate and Construction Time Results Evaluatio n Criteria Methods #Feature s
Dr
FPR
C4.5
BN
9
41
9
99. 8
99. 8
99. 6
C4.5 41
9
41
Features Training Time (s) Construction Time BN
9
-
C4.5
41
99. 0.3 0.3 1.6 1.5 0
237(s)
2043(s)
BN
Testing time (s) C4.5 9
41
BN
9
41
9
41
9
41
1. 7
15. 3
0. 7
4.4 0. 0.9 0.2 0.9 2
Li et al. (2009) [48] proposed a wrapper-based feature selection method to build lightweight intrusion detection system. Modified Random Mutation Hill Climbing (RMHC) method are applied as search strategy to find a candidate feature subset and modified linear Support Vector Machines (SVMs) to evaluate the candidate feature subset. A classification algorithm based on a decision tree whose nodes consist of linear SVMs is used to build the IDS from selected features subsets. The experiments show
13
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
that the systems have higher ROC (Receiver Operating Characteristic) scores than all 41 features in terms of detecting known attacks, new attacks and computational cost (Table 13). Table 13 – Selected feature subsets, Average time of building and testing processes with all and selected features for ALL attacks, DOS, PROBE, R2L and U2R Attack Type
Features
Building time(s)
Testing time(s)
All features
Selected features
All features
Selected features
ALL
4(3,5,23,32)
78
36
18
8
DOS
4(2,5,23,34)
136
41
22
9
PROBE
6(1,3,5,6,23,35)
245
123
49
29
R2L
3(1,3,5)
317
35
55
8
U2R
5(1,3,5,14,32)
193
85
50
18
This paper [49] (Ali et al., 2010) improve the accuracy of Signature Detection Classification (SDC) Model by applying the features extraction based customized features. Features are extracted by using GA (Genetic Algorithm), two-second-time and Hidden Markov from customized features. Eleven features {feature no.: 5, 6, 13, 23, 24, 25, 26, 33, 36, 37 and 38} are extracted and the best signature detection classification model is developed using JRip, Ridor, PART and Decision tree. The extracted features have increased the detection rates between 0.4% to 9% and reduced false alarm rates between 0.17% to 0.5%. Gong et al. (2011) [50] proposed a novel approach for feature selection based on Genetic Quantum Particale Swarm Optimization (GQPSO) for network intrusion detection. Support Vector Machine (SVM) is used for classification algorithm. Selected features and experimental results are shown in Table 14. Table 14 : Selected Feature and performance of SVM with GQPSO Algorithm
Attack Type
Features
Training
Detecting
Time(ms)
Time(ms)
DR
Error Report Rate(%)
DoS
10 (2, 6, 3, 12, 21, 22,31, 26, 28, 30)
0.0627
0.0581
99.98
0
Probe
5 (5, 12, 26, 32, 34)
0.0431
0.0478
91.77
0.001
R2L
7 (10, 23, 25, 29, 26, 33, 35)
0.053
0.014
98.26
0
U2R
5 (2, 3, 17, 32, 36)
0.0006
0.0016
100
0.0003
14
Amrita & P Ahmed
Li et al. (2012) [51] proposed an effective wrapper-based feature reduction method, called gradually feature removal (GFR) method. The GFR method extracted 19 critical features {feature no.: 2, 4, 8, 10, 14, 15, 19, 25, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38 and 40}. The accuracy of SVM classifier is achieved 98.6249% and MCC (Matthews correlation coefficient) is 0.861161. The training and testing time of SVM classifier is greatly reduced. An advanced intelligent systems using ensemble soft computing techniques is proposed by Sindhu et al. (2012) [52] for a lightweight IDS to detect anomalies in networks. GA (Genetic Algorithm) is used to extract the feature subset and a neurotree paradigm is proposed as a classifier. Features extracted by this method are 16 {feature no.: 2, 3, 4, 5, 6, 8, 10, 12, 24, 25, 29, 35, 36, 37, 38 and 40}. The detection rate is 98.4% which is superior to other methods.
HYBRID METHOD In this paper [53] (NG et al., 2003), a feature importance ranking methodology based on the stochastic radial basis function neural network output sensitivity measure (RBFNN-SM) is presented. RBFNN-SM is used to evaluate the features for only the normal and six classes of denial of service (DOS) attack. The experiments show that 8 {feature no.: 2, 24, 23, 29, 32, 34, 33 and 36} most significant sensitive features are enough to classify normal and DOS attacks. The computation complexity reduced to 9 seconds from 23 seconds. The classification accuracy for normal and DOS attacks are 99.77% and 99.06%; the FAR for 8 (41) features are 0.18% (0.01%) and 0.27% (0.03%); the FPR are 0.93% (0.70%); and training and testing are 0.94% and (0.71%) respectively. Shazzad and Park (2005) [54] proposed a fast hybrid feature selection method to determine an optimal feature set. This method is a fusion of Correlation-based Feature Selection (CFS), Support Vector Machine (SVM) and Genetic Algorithm (GA). Subsets of features are generated by Genetic Algorithm and evaluated by CFS and SVM. The 12 selected features are {feature no.: 1, 6, 12, 14, 23, 24, 25, 31, 32, 37, 40 and 41}. Optimal subset set has 99.56% as DR and 37.5% as FPR in average. Chebrolu, Abraham and Thomas(2005) [7] investigated the performance of two feature selection techniques, Bayesian Networks (BN) and Classification and Regression Trees (CART) and developed the ensemble classifier of both techniques for building an IDS and best in classifying R2L and DoS. Seventeen important features are {feature no.: 1, 2, 3, 5, 7, 8, 11, 12, 14, 17, 22, 23, 24, 25, 26, 30 and 32} are selected by Markov blanket model and a classifier is constructed using BN and tested. Twelve features {feature no.: 3, 5, 6, 12, 23, 24, 25, 28, 31, 32, 33 and 35} are selected by decision tree and a classifier using CART is constructed and tested. Normal class is classified 100% correctly and the accuracies of classes U2R and R2L have increased by using the 12-variable reduced data set. It is observed that CART classifies accurately on smaller data sets. In ensemble approach, the BN classifier and the CART models are constructed first individually. Then the ensemble approach is used for the 12, 17 and 41-variable data sets. By using the ensemble model, Normal, Probe and DOS could be detected with 100% accuracy and U2R and R2L with 84% and 99.47% accuracies, respectively.
15
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
In this paper [55] (Chen et al., 2007), a new hybrid approach named as C4.5-PCA-C4.5 is proposed. It uses PCA (Principal Component Analysis) and decision tree classifier C4.5 as feature selection method and C4.5 as classifiers. The important features extracted are {feature no.: 33, 34, 4, 1, 3, 10 and 22}. The performance of C4.5-PCA-C4.5 is compared with other four systems C4.5-ALL, C4.5-PCA, SVM-CFS and SVM-CFS-SVM. The experiment results show that C4.5-PCA-C4.5 has lower testing time, fast training and testing process, highest TPR, lowest FPR. Average building process time for C4.5-PCA-C4.5 is 6 sec. Lee et al. (2007) [56] uses two machine learning algorithms Random Forests (RF) for feature selection and Minimax Probability Machine (MPM) for intrusion detection. The top 5 {feature no.: 23, 6, 29, 3 and 5} important features are selected. Only Denial of Service (DoS) attacks are used. The detection rate is 99.84% and average simulation time is 0.1039 sec. Wei Wang et al. (2008) [57] used filter and wrapper scheme for feature selection. Information gain (IG) based filter model and Bayesian networks (BN) and decision trees (C4.5) based wrapper model are employed to select features for network intrusion detection and Bayesian networks (BN) and decision trees (C4.5) as classifier. Experiments results and selected 10 features for each class are shown in Table 15. Table 15. Results comparison using 41 features and 10 features Attacks
DoS
DDoS
Probe
R2L
U2R
Features Selected
3, 4, 5, 6, 8, 10, 13, 23, 24, 37 3, 4, 5, 6, 8, 10, 13, 23, 24, 37 3, 4, 5, 6, 29, 30, 32, 35, 39, 40 1, 3, 5, 6, 12, 22, 23, 31, 32, 33 1, 2, 3, 5, 10, 13, 14, 32, 33, 36
Methods
Using 41 Features
Using 10 Features
DR
FPR
Training Time(s)
Test Time(s)
DR
FPR
Training Time(s)
Test Time(s)
BN
98.73
0.08
4.7
2.1
100
0
0.8
0.6
C4.5
99.96
0.15
16.3
1.2
100
0.14
4.6
0.5
BN
99.03
1.53
-
-
99
1.92
-
-
C4.5
99.8
0.26
-
-
100
0.34
-
-
BN
92.89
6.08
3.1
2.8
83
3.06
0.5
0.4
C4.5
82.59
0.04
14.5
1.1
83
0.05
1.2
0.3
BN
92.22
0.33
2.6
1.8
89
0.32
0.5
0.4
C4.5
80.29
0.02
10.5
0.8
87
0.01
0.5
0.2
BN
75.86
0.29
2.6
1.8
66
0.12
0.4
0.4
C4.5
24.14
0
9.9
0.7
24
0
0.6
0.2
Hong and Haibo (2009) [58] proposed a new hybrid selection algorithm to build lightweight network IDS. Chi-Square and enhanced C4.5 algorithm are used for feature selection in the preprocessing phase. The top fifteen most important features extracted from Chi-Square algorithms are
16
Amrita & P Ahmed
{feature no.: 5, 3, 23, 35, 4, 8, 30, 34, 36, 6, 33, 38, 24, 25 and 2}. The top five features extracted by C4.5 and C4.5-Chi2 methods are {feature no.:25, 4, 2, 5 and 29} and {feature no.: 5, 3, 4, 8 and 25} respectively. The experimental results are shown in Table 16. Table 16: Detection & False Positive Rate Results based on C4.5- CHI2 Attack Type
Evaluation Criteria DR
FPR
Normal
99.9
1.6
DOS
99.3
1.48
Probe
93.87
1.82
U2R
50.01
28.32
R2L
61.55
12.17
Training Time
Testing Time
0.02 Sec
0.03 Sec.
In this paper [59] (Xiang et al., 2009), a hybrid method named Robust Artificial Intelligence Selection Algorithm (RAIS) is presented. Mutual information and artificial intelligence method are used for feature subsets selection and SVMs as classifier. Selected features are not mentioned in this paper. The experimental results show that the RAIS algorithm has the lowest false alarm rate, 3.49%, the highest rate of accuracy, 99.01%, and detection rate, 99.27%. Zaman and Karray (2009) [60] proposed a novel and simple method named Enhanced Support Vector Decision Function (ESVDF) for features selection. This method utilizes the Support Vector Machines (SVMs) approach based on Forward Selection Ranking (FSR) and Backward Elimination Ranking (BER) algorithms. The ESVDF (SVDF/FSR or SVDF/BER) method applies SVDF in the FSR and BER approaches to select the most effective features set. Two classifiers: Neural Networks (NNs) and SVMs are used to evaluate features. The experimental results are shown in Table 17. Feature’s name is not mentioned. Table 17 : Comparison of ESVDF/FSR, ESVDF/BER, and All 41 Features using NNs and SVMs classifiers. Classifier
Algorithm
#Features
Accuarcy
FPR
Training Time
Testing Time
NN
ESVDF/FSR
6
99.55%
0.0032
217.57
0.047
ESVDF/BER
9
99.57%
0.003
255.047
0.053
Non
41
99.65%
0.0036
911.68
0.075
ESVDF/FSR
6
99.46%
0.0033
2.039
0.052
ESVDF/BER
9
99.58%
0.0031
2.1
0.046
Non
41
99.71%
0.0032
5.182
0.17
SVM
Ming-Yang Su (2011) [61] proposed a method for feature selection to detect DoS/DDoS attacks in real time for designing an anomaly-based NIDS. Genetic algorithm (GA) combined with KNN (k-
17
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
nearest-neighbor) are used for feature selection and weighting. The result of KNN classification is used as the fitness function in a genetic algorithm to evolve the weight vectors of features. Initial 35 features in the training phase are weighted. The top 19 features are considered for known attacks and the top 28 features for unknown attacks. Extracted features are not mentioned in the paper. An overall accuracy rate of 97.42% is obtained for known attacks and 78% for unknown attacks.
A SYSTEMATIC REVIEW OF RELATED WORK The afore-mentioned work of feature selection is summarized in a systematic way according to approach as filter in Table 18, wrapper in Table 19 and hybrid in Table 20. These tables consist of literature reference, proposed method name, number of features selected by paper, feature number according to Table 1, classifier used to evaluate the proposed method, evaluation criteria and results of
F I L T E R
M E T H O D
proposed method. Table 18: Summary of Filter Method No of Feature No Classifier Feature Used
Lit. Ref.
Method Name
[21] 2004
FSMDB
24
[22] 2004
NNPCA & NLCA
19 12
[23] 2005 [24] 2006 [25] 2006
RICGA
12
6,5,1,34,33,36,32,8,27,29,2 8,30,26,38,39,35,13,24,23,1 1,3,10,12,4 5, 6, 1, 22, 21, 31, 30, 3, 4, 2, 16, 10, 13, 34, 32, 27, 24, 37, 23 Not Mentioned
Rough Set
6
41, 32, 24, 4, 5, 3
Combined DA and SVM
9
12, 23, 32, 2, 24, 36, 31, 29, 39
SVM
[26] 2006
Information Gain and Chi-Square approach Artificial Neural Networks and Statistical Methods Decision Dependent Correlation(DDC) Chi Square, Info Gain and ReliefF PCA-SOM
12
3,5,6,10,13,23,24,27,28,37, 40,41
ME
25
35,27,41,28,40,30,34,3,33,1 2,37, 24,29, 2, 13,8,36,10, 26,39,22, 25,5,1,38 3,5,40,24,2,10,41,36,8,13,2 7,28,22,11,14,17,18,7,9,15 2,3,4,5,12,22,23,24,27,28, 30,31,32,33,34, 35,37,38, 40,41 Not mentioned
RBP Neural Network SVM
1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40 41 23, 32, 10, 6 , 3
C5.0
[27] 2006 [28] 2007 [29] 2008
20 20
[30] 2009 [31] Euclidean Distance 2009 & Cosine Similarity [32] (1) QIIA1(Max value) 2009 (2)QIIA2(Center Data) [33] Entropy-Based Scheme 2009 with Chi-Square [34] Mutual Information 2009 based Algorithm
10
[35] 2009
Proposed feature set using CFSE and CSE
[36] 2009
Based on DT, FNT and PSO
15 5
BP Network, SVM NC & DC
BP Network Rough Set
Decision Tree(C4.5) SOM
(1)
Evaluation Criteria Classification Accuracy FPR Detection Accuracies Classification Accuracy Classification Accuracy TN (%) FP ( %) FN (%) TP (%) Accuracy Testing Time Accuracy FPR FNR Classification Accuracy Classification Accuracy Avg. Success Rate Table 5
^ P^ x max DR
(2) P x T 5
5, 6, 31, 32, 36, 37
21
7
1, 3, 4, 5, 6, 8, 11, 12, 13, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36, 39 3, 6, 12, 23, 32*, 14*, 40*
5
10,17,14,13, 11
Chi-Square Test C4.5 & SVM BN
DT, FNT and PSO
TPR
Result BP-0.1017 SVM0.056 Table 3
88.15%. 99.743 99.58% 00.42% 09.93% 90.07% Table 4
97.04% 2.76% 0.20% 93.46% 95.8506% 95.8506% 95.6432% 98.83% Table 5 (1) 97.94 (2) 99.37 91%
DR 86.3 FAR 1.89 Process. Time 15.163s Classification NormalAccuracy (%) 99.8 DoS-99.9 Probe-89.4 R2L-91.5 U2R-69.2 Detection Table 6 Accuracy
18
Amrita & P Ahmed
[37] 2010
[38] 2010 [39] 2011 [40] 2011
M01LPfrom CFS
Inconsistency-based feature selection method varGDLF
IIG(Improved Information Gain)
3 6
Normal&Dos-5,6,12; Normal&Probe5,6,12,29,37,41; Normal&U2R-14; Normal&R2L-10,22; Table 7
C4.5 BayesNet
11
1, 5, 12, 15, 18, 21, 22, 29, 33, 38, 41
varGDLF
12
2, 3, 5, 6, 8, 10, 12, 23, 25, 36, 37, 38
NB
1 2 Table 7
C4.5
Classification Accuracy
99.41% 98.82%
Classification Correctness Time(s) Accuracy Rate FPR No of Comp. DR FPR Processing Time
Table 7
85.2% 7.3% 4.95 96.801 1.02 2.08 s
W R A P P E R M E T H O D
*: Features that were selected based on domain knowledge.
Lit. Ref. [41] 200 3
Method Name
GA combination with a k-nearest neighbour classifier [42] PBRM and 200 SVDFRM 3 [43] ACO-SVM 200 5 [44] PCA & MEP 200 7 [45] MRMHC-SVMs 200 7 [46] MOGFIDS 200 7
[47] 200 8 [48] 200 9 [49] 201 0
Information Gain and Chi-square
Table 19: Summary of Wrapper Method No of Feature No Classifie Feature r Used 5 for KNN DoS-23,29,1,11,24; each R2U-24,3,12,23,36; class U2R-24,6,31,41,17; Probe-2,37,30,3,6 8 SVM 1,3,5,6,23,24,32,33
Evaluation Criteria Detection Accuracy
Table 8
Result Increase in ID Accurac y Table 8
Table 9 Not Mentioned
SVM
Table 9
Table 9
8
14, 33,1, 5, 39, 3, 10, 12
GA
DR FPR
Table 10
Table 11
Table 11
SVM
Table 11
Table 11
27
2(tcp,udp,icmp),5,6,7,8,9 ,11, 12,13,14,17,18,22,23,25, 30,32, 33,34,35,36,37,38,39, 40 23, 32, 37, 33, 5, 24, 31, 39, 3
MOGFID Accuracy S FPR
99.24 % 1.1%
C4.5 & BN
Table 12
Table 12
Decision Tree
Table 13
Table 13
JRip, Ridor, DR PART & FAR Decision tree SVM Table 14
Increase d Decrease d Table 14
9
Modified RMHC and modified linear SVM Features Selection based on Customized Features [50] GQPSO 201 1 [51] GFR (Gradually 201 Feature Removal) 2
Table 13
Table 13
11
5, 6, 13, 23, 24, 25, 26, 33, 36, 37, 38
Table 14
Table 14
19
2,4,8,10,14, 15,19,25,27, 29,31,32,33, 34,35,36,37, 38,40
[52] A combined GA 201 and neurotree 2 method
16
2,3,4,5,6,8, 10,12,24, 25,29,35,36,37,38,40
SVM
Training time (s) Testing time (s) Accuracy (%) MCCavg Neurotre DR e
0.118356 4.63227 98.6249 0.861161
98.38
19
H Y B R I D
M E T H O D
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
Table 20: Summary of Hybrid Method No of Feature No Classifie Feature r Used
Lit. Ref.
Method Name
[53] 200 3
RBFNN-SM
8
2, 24, 23, 29, 32, 34, 33, 36
[54] 200 5 [7] 200 5
A fusion of CFS, SVM & GA
12
[55] 200 7 [56] 200 7 [57] 200 8 [58] 200 9 [59] 200 9 [60] 200 9 [61] 201 1
Markov blanket model and Decision Tree for feature selection
17-BN 12-CART
Evaluation Criteria
Result
RBFNN
Class. Acc. FAR FPR
99.415% 0.065% 0.935%
1, 6, 12, 14, 23, 24, 25, 31, 32, 37, 40, 41
SVM
DR FPR
99.56% 37.5%
{1,2,3,5,7,8, 11,12,14,17, 22,23,24, 25, 26,30,32}; {3,5,6,12,23, 24,25,28,31, 32,33,35}
Ensemble of BN and CART
Accuracy (%)
33, 34, 4, 1, 3, 10, 22
C4.5
100% Normal, DoS,Probe 84% U2R 99.47-R2L 6 sec -, -
C4.5-PCA-C4.5
5
RF
5
Information gain & BN and C4.5
10
Table 15
BN & C4.5
C4.5-Chi2
5
5, 3, 4, 8, 25
Enhanced C4.5
Table 16
Table 16
RAIS
-
Not mentioned
SVM
ESVDF/FSR ESVDF/BER
6 9
Not mentioned
NN SVM
DR FAR Accuracy Table 17
99.17% 3.49% 98.60% Table 17
GA/KNN Hybrid
19 28
Not Mentioned
GA/KNN
Accuracy Rate
97.42% 78.00%
23, 6, 29, 3, 5
MPM
Testing Time, TPR, FPR DR 99.84% Avg Sim. Time 0.1039 s DR Table FPR 15
CONCLUSIONS & FUTURE RESEARCH DIRECTIONS Intrusion Detection Systems (IDS) have become vital and a necessary component of almost every computer and network security. As network speed becomes faster, there is an emerge need for IDS to be lightweight, efficient and accurate with high detection rates (DR) and low false positive rates (FAR). Other difficulties faced by intrusion detection systems are curse of feature dimensionality and emerging data complexities. Therefore, feature selection has become very important part in intrusion detection systems due to curse of feature dimensionality and emerging data complexities. Feature selection selects a subset of relevant features, removes irrelevant and redundant features from the dataset to build robust, efficient, accurate and lightweight intrusion detection system to ensure timeliness for real time. A plenty of feature selection methods have been proposed by researchers in intrusion detection system to deal with these problems. This paper has presented to survey this fast developing field and addresses the main contribution of feature selection research proposed for intrusion detection. We showed that why feature selection method is vital in IDS. We surveyed the existing feature selection methods for IDS categorised as filter, wrapper and hybrid. We also presented the performance of these methods based on different metric on KDD Cup’99 dataset, mentioned extracted feature set and classifier
20
Amrita & P Ahmed
to evaluate these extracted feature set, strength, limitation and future work of these proposed method in section 5 and 6. The following are useful future research issues:
FUTURE RESEARCH Single classifier for evaluation of the extracted feature set may be no longer good solution for building the robust IDS. Therefore, designing more sophisticated classifiers by combining multiple classifiers or combining ensemble [7] and hybrid classifiers may enhance the robustness and performance of IDS. After comparing the existing feature selection methods in intrusion detection, we discovered that finding an optimal and best feature set still needs to be researched. Feature selection algorithms always need improvement on search strategy and evaluation criterion for building efficient and lightweight intrusion detection system. Robustness of the extracted feature can be enhanced by using ensemble of feature selection methods, combined with appropriate evaluation criteria. After surveying these many feature selection methods, we cannot say that which method perform the best under which classifier for intrusion detection (to the best of our knowledge). Most of the proposed method works on two-class classification (normal and attack type) (to the best of our knowledge). Very little work has been done on multiple class classification (five-class four classes of attack and one class of normal) [62][63]. Therefore, the research in many papers can be further extended in the future on multiple class classification. Classes in KDD Cup’99 are unbalanced in both training and test sets as it can be seen in Table 1. Normal and DoS classes have enough instances, whereas Probe and R2L have small instances, particularly U2R. These classes (Probe, R2L, U2R) have not good classification rate due to small number of instances in training set [56][31][39]. So, this is future research to develop the method combined with appropriate evaluation criteria to alleviate the small instance of dataset. We can conclude that there are features that really significant in classifying the normal and attacks type as reported in literature. Also, there is no specific generic classifier that can best classify all the attack types as seen in this survey. Different researchers use different classifier to evaluate the feature set. This paper systematically summarized the contributions of each researcher and also projected the number of significant research problem in this field. We hope that this survey will provide useful insights, broad overview and new research directions about this field to the readers.
REFERENCES [1]
Mitra, P. et al. (2002). Unsupervised Feature Selection Using Feature Similarity.
Transactions on Pattern Analysis and Machine Intelligence, 24, 301–312
IEEE
21
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
[2]
Anderson, J. P. (1980). Computer security threat monitoring and surveillance. Technical Report
98-17, James P. Anderson Co., Fort Washington, Pennsylvania, USA [3]
Denning, D. E. (1987). An intrusion detection model. IEEE Transaction on Software
Engineering, Software Engineering 13(2), 222-232 [4]
Wu, S.X. & Banzhaf, W. (2010). The use of computational intelligence in intrusion detection
systems: A review. Applied Soft Computing Journal, 10, 1–35 [5] Lazarevic, A., Ertoz, L., Kumar V., Ozgur A. & Srivastava J. (2003). A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Conference on Data Mining [6] Kumar, S. & Spafford, E. H. (1994). A pattern matching model for misuse intrusion detection. In Proceedings of the 17th National Computer Security Conference, 11-21 [7] Chebrolu, S. et al. (2005). Feature deduction and ensemble design of intrusion detection systems. Computer Security, 24( 4), 295–307 [8] Yeung, D.Y. & Ding, Y. (2003). Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognition, 36, 229-243 [9]
sKDD
Cup
1999
Intrusion
detection
dataset:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [10] Mukkamala, S. et al. (2005). Intrusion detection using an ensemble of intelligent paradigms. Journal of Network and Computer Applications, 28(2), 167–82 [11] Lewis, P. M. (1962). The characteristic selection problem in recognition system. IRE Transaction on Information Theory, 8, 171-178 [12] John, G.H. et al. (1994). Irrelevant Features and the Subset Selection Problem. Proc. of the 11th Int. Conf. on Machine Learning, Morgan Kaufmann Publishers, 121-129 [13] Dash, M. & Liu, H. (1997). Feature Selection for Classification. Intelligent Data Analysis, 1(3), 131–56 [14] Blum, Avrim L. & Pat Langley (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2), 245–271 [15] Dash, M. et al. (2002). Feature Selection for Clustering-a Filter Solution. Proc. 2nd Int’l Conf. Data Mining, 115-122 [16] Włodzisław, W. Tomasz et al. (2003). Feature Selection and Ranking Filters. [17] Das, S. (2001). Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection. Proc. 18th Int’l Conf. Machine Learning, 74-81
22
Amrita & P Ahmed
[18] Liu, H. & Yu, L. (2005). Towards integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502 [19] R. Kohavi and G.H. John (1997). Wrappers for Feature Subset Selection. Artificial Intelligence. 97 (1-2), 273-324 [20] Xing, E. et al. (2001). Feature Selection for High-Dimensional Genomic Microarray Data. Proc. 15th Int’l Conf.Machine Learning, 601-608 [21] Zhang, L. et al. (2004). Feature Selection for Pattern Classification Problems. Proceedings of the Fourth International Conference on Computer and Information Technology (CIT’04) [22] Kuchimanchi, Gopi K. et al. (2004). Dimension Reduction Using Feature Extraction Methods for Real-time Misuse Detection Systems. Proceedings of the 2004 IEEE Workshop on Information Assurance and Security United States Military Academy, West Point, NY, 195-202 [23] Zhu, Y. et al. (2005). Modified Genetic Algorithm based Feature Subset Selection in Intrusion Detection System. Proceedings of ISCIT 2005, 9-12 [24] Zainal, A. et al. (2006). Feature selection using rough set in intrusion detection. In Proc. IEEE TENCON, 1-4 [25] Wong, Wai-Tak & Lai, Cheng-Yang (2006). Identifying Important Features for Intrusion Detection Using Discriminant Analysis and Support Vector Machine. Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 3563-3567 [26] Yang, L. et al. (2006). A Lightweight Intrusion Detection Model Based on Feature Selection and Maximum Entropy Model. International Conference on Communication Technology (ICCT '06), 1-4 [27] Tamilarasan, A. et al. (2006). Feature Ranking and Selection for Intrusion Detection Using Artificial Neural Networks and Statistical Methods. Int’l Joint Conf. on Neural Networks (IJCNN’06), 4754-4761 [28] Fadaeieslam, M. J.et al. (2007). Comparison of two feature selection methods in Intrusion Detection Systems. Seventh International Conference on Computer and Information Technology, 83-86 [29] Sheen, Shina & Rajesh, R. (2008). Network Intrusion Detection using Feature Selection and Decision tree classifier. IEEE Region 10 Conference, TENCON 2008, 1-4. [30] Kiziloren, T. & Germen, E. (2009).Anomaly Detection with Self-Organizing Maps and Effects of Principal Component Analysis on Feature Vectors. Fifth Int’l Conf. on Natural Computation, 509-513 [31] Suebsing, A. & Hiransakolwong, N. (2009). Feature Selection Using Euclidean Distance and Cosine Similarity for Intrusion Detection Model. Asian Conf. on Intelligent Info. and Database Systems, 86-91
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
23
[32] Lee, S. M.et al. (2009). Quantitative Intrusion Intensity Assessment using Important Feature Selection and Proximity Metrics. 15th IEEE Pacific Rim Int’l Symposium on Dependable Computing, 127-134 [33] Lee, Tsern-Huei & He, Jyun-De (2009). Entropy-Based Profiling of Network Traffic for Detection of Security Attack. TENCON, 1-5 [34] Xiao, L. et al. (2009). A Two-step Feature Selection Algorithm Adapting to Intrusion Detection. International Joint Conference on Artificial Intelligence, 618-622 [35] Kok-Chin Khor et al. (2009). From Feature Selection to Building of Bayesian Classifiers: A Network Intrusion Detection Perspective. American Journal of Applied Sciences, 6 (11), 1948-1959 [36] Bahrololum, M. et al. (2009). Machine Learning Techniques for Feature Reduction in Intrusion Detection Systems: A Comparison. Fourth International Conference on Computer Sciences and Convergence Information Technology (ICCIT), 2009, Pp. 1091-1095. [37] Nguyen, H. et al. (2010). Improving Effectiveness of Intrusion Detection by Correlation Feature Selection. 2010 International Conference on Availability, Reliability and Security, 17-24 [38] Chen, T. et al. (2010). A Naive Feature Selection Method and Its Application in Network Intrusion Detection. 2010 International Conference on Computational Intelligence and Security (CIS), 416-420. [39] Fan, W. et al. (2011). Unsupervised Anomaly Intrusion Detection via Localized Bayesian Feature Selection. 2011 11th IEEE International Conference on Data Mining, 1032-1937 [40] Xian, J. et al. (2011). An Algorithm Application in Intrusion Forensics Based on Improved Information Gain. Web Society (SWS), 3rd Symposium on Date of Conference, 100-104 [41] Middlemiss, Melanie J. & Dick, G. (2003). Weighted Feature Extraction using a Genetic Algorithm for Intrusion Detection, IEEE, 1669- 1675 [42] Mukkamala, S. & Sung, A. H. (2003). Feature Selection for Intrusion Detection Using Neural Networks and Support Vector Machines. Journal of the Transportation Research Board of the National Academics, Transportation Research Record No 1822, 33-39 [43] Gao, Hai-Hua et al. (2005). Ant Colony Optimization based network intrusion feature selection and detection. Proc. of the Fourth Int’l Conf. on Machine Learning and Cybernetics, Guangzhou, 387175 [44] Banković, Z. et al. (2007). Increasing Detection Rate of User-to-Root Attacks Using Genetic Algorithms. Int’l Conf. on Emerging Security Information, Systems and Technologies, 48-53 [45] Chen,Y. Et al. (2007). Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM. ICON, 83-88
24
Amrita & P Ahmed
[46] CHi-Ho Tsang et al. (2007). Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recognition, 40, 2373-2391. [47] Wang, W. & Gombault, S. (2008). Efficient Detection of DDoS Attacks with Important Attributes. Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008, 61-67 [48] Li, Y. et al. (2009). Building lightweight intrusion detection system using wrapper-based feature selection mechanisms. Computers and security, 28(6), 466–75 [49] Zulaiha, A.O. et al.(2010).Improving Signature Detection Classification Model Using Features Selection based on Customized Features.10th Int’l Conf. on Intelligent Systems Design and Applications,1026-31 [50] Gong, S. (2011). Feature Selection Method for Network Intrusion Based on GQPSO Attribute Reduction. International Conference on Multimedia Technology (ICMT), 6365 - 6368 [51] Li, Y. et al. (2012). An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications, 39, 424–430 [52] Sindhu, Siva S. et al. (2012). Decision tree based light weight intrusion detection using a wrapper approach. Expert Systems with Applications, 39, 129–141 [53] Wing, W.Y. NG et al.(2003).Dimensionality
Reduction for Denial of Service Detection
Problems using RBFNN Output Sensitivity.Proc.of 2nd Int’l Conf. on Machine Learning and Cybernetics, Wan, 1293-98 [54] Shazzad, K. M. & Park, J. S. (2005). Optimization of Intrusion Detection through Fast Hybrid Feature Selection. Proc.of 6th Int’l Conf. on Parallel and Distributed Computing, Applications and Technologies [55] Chen, Y. et al. (2007). Building Lightweight Intrusion Detection System Based on Principal Component Analysis and C4.5 Algorithm. ICACT2007, 2109-2112 [56] Lee, S. M. et al. (2007). A Hybrid Approach for Real-Time Network Intrusion Detection Systems. International Conference on Computational Intelligence and Security, 712-715 [57] Wang, W.et al. (2008). Towards fast detecting intrusions: using key attributes of network traffic. The Third International Conference on Internet Monitoring and Protection, 86-91 [58] Hong, D. & Haibo, L. (2009). A Lightweight Network Intrusion Detection Model Based on Feature Selection. 15th IEEE Pacific Rim International Symposium on Dependable Computing, 165-168 [59] Xiang,C. et al. (2009). Robust Observation Selection for Intrusion detection. Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 269-272
A Study of Feature Selection Methods in Intrusion Detection System: A Survey
25
[60] Zaman, S. & Karray, F. (2009). Features Selection for Intrusion Detection Systems Based on Support Vector Machines. 6th IEEE Consumer Communications and Networking Conference (CCNC), 1- 8 [61] Ming-Yang Su (2011). Real-time anomaly detection systems for Denial-of-Service attacks by weighted k-nearest-neighbor classifiers. Expert Systems with Applications, 38, 3492–3498 [62] Bruzzone, L. & Serpico, S. B. (2000). A technique for feature selection in multiclass problems. International Journal of Remote Sensing, 21(3), 549–563 [63] Chiblovskii, B., & Lecerf, L. (2008). Scalable feature selection for multiclass problems. In Proc. of the European conf. on machine learning and knowledge discovery in databases (ECML PKDD’08), 227