a study of feature selection methods in intrusion ...

4 downloads 506 Views 321KB Size Report
HIDS are installed locally on host machines and detects intrusions by examining system calls, application logs ... 6 Dst-bytes. 20 Num-outbound-cmds ..... metrics based on Random Forests (RF). QIIA uses ..... (2)QIIA2(Center Data). 5. 23, 32 ...
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 3, Sep 2012 1-25 © TJPRC Pvt. Ltd.,

A STUDY OF FEATURE SELECTION METHODS IN INTRUSION DETECTION SYSTEM: A SURVEY AMRITA & P AHMED Department of CSE, Sharda University, Greater Noida, India

ABSTRACT Nowadays, detection of security threats, commonly referred to as intrusion, has become a very important and critical issue in network, data and information security. Therefore, an intrusion detection system (IDS) has become a very essential component in computer or network security. Prevention of such intrusions entirely depends on detection capability of Intrusion Detection System (IDS). As network speed becomes faster, there is an emerge need for IDS to be lightweight with high detection rates. Therefore, many feature selection approaches/methods are proposed in the literature. There are three broad categories of approaches for selecting good feature subset as filter, wrapper and hybrid approach. The aim of this paper is to present a survey of various feature selection methods for IDS on KDD CUP’99 bench mark dataset based on these three categories and different evaluation criteria.

KEYWORDS : Feature selection, intrusion detection systems, filter method, wrapper method, hybrid method.

INTRODUCTION In the last three decades computer networks have grown in size and complexity drastically. This tremendous growth has posed challenging issues in network and information security, and detection of security threats, commonly referred to as intrusion, has become a very important and critical issue in network, data and information security. The security attacks can cause severe disruption to data and networks. Therefore, Intrusion Detection System (IDS) becomes an important part of every computer or network system. An IDS can monitor computer or network traffic and identify malicious activities that compromise the integrity, confidentiality, and availability of information resources and alerts the system or network administrator against malicious attacks. Since, an IDS needs to examine very large data with high dimension even for small network. Due to this, IDS has to meet the challenges of low detection rate and large computation. Therefore, Feature selection is a very important issue and plays a key role in intrusion detection in order to achieve maximal performance. It is one of the important and frequently used techniques in data preprocessing for selecting a subset of relevant features to build robust IDS. Feature selection is the selection of that minimal cardinality feature subset of original feature set that retains the high detection accuracy as the original feature set [1]. The efficient feature subset can improve the training and testing time that helps to build lightweight IDS guaranteeing high detection rates and makes IDS suitable for real time and on-line detection of attacks.

2

Amrita & P Ahmed

This survey paper categorizes the feature selection algorithms that have been developed for IDS building, critically evaluates their usefulness, and recommends ways of enhancing the quality of feature selection algorithms. The paper is organized into the following sections. Intrusion Detection Systems is reviewed in Section 2. Section 3 gives the details of the Datasets and Performance Evaluation used in this survey. In Section 4, different methodologies of feature selection in IDSs are discussed. Related research in the literature for feature selection methods together with their performance is addressed in Section 5. Section 6 summaries the different results reported in the literature in tabular form. Section 7 concludes and discusses future research.

IINTRUSION DETECTION SYSTEM An intrusion is defined as an attempt to compromise the confidentiality, integrity, availability, unauthorized use of resources, or to bypass the security mechanisms of a computer system or network and James P. Anderson introduced Intrusion Detection (ID) early in 1980s [2]. Dorothy Denning proposed several models for IDS in 1987 [3]. Ideally, Intrusions Detection (ID) should be an intelligent monitoring process of events occurring in system and analyzing them for security violations policies. An IDS is required to have a high attack Detection Rate (DR) with a low False Alarm Rate (FAR). Refer [4] for the organization of a generalized IDS. Approaches of IDS based on detection are anomaly based and misuse based intrusion detection approach. In anomaly based intrusion detection approach [5], the system first learns the normal behavior or activity of the system or network to detect the intrusion. In misuse or signature based intrusion detection approach [6], the system first define the attack and the characteristics of the attack that distinguish this attack from normal data or traffic to detect the intrusion. Approaches of IDS based on location of monitoring are Network based intrusion detection system (NIDS) [7] and Host-based intrusion detection system (HIDS)[8]. NIDS detects intrusion by monitoring network traffic in terms of IP packet. HIDS are installed locally on host machines and detects intrusions by examining system calls, application logs, file system modification and other host activities made by each user on a particular machine.

DATASETS AND PERFORMANCE EVALUATION This section summarizes the popular benchmark datasets and performance evaluation measures in the intrusion detection domain to evaluate different feature selection methods in intrusion detection system

3

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

DATASETS The KDD CUP 1999 [9] benchmark datasets are used to evaluate different feature selection method for IDS. It consists 4,940,000 connection records for training data set and 311,029 connection records for test data set. The training set contains 24 attacks and the test set contains 38 attacks. Since the training and test set are prohibitively large, another 10% of the KDD Cup’99 dataset is frequently used [9]. Each connection had a label of either normal or the attack type, with exactly one specific attack type falls into one of the four attacks categories [10] as: Denial of Service Attack (DoS), User to Root Attack (U2R), Remote to Local Attack (R2L) and Probing Attack. Each connection record consisted of 41 features and are labeled in order as 1,2,3,4,5,6,7,8,9,.....,41 and falls into the four categories are shown in Table 1: Category 1 (1-9) : Basic features of individual TCP connections Category 2 (10-22) : Content features within a connection suggested by domain knowledge Category 3 (23-31) : Traffic features computed using a two-second time window Category 4 (32-41) : Traffic features computed using a two-second time window from destination to host Table 1: Lists of features in the KDD cup 99 Feature # Feature Name Feature # 1 Duration

15

Feature Name Su-attempted

Feature # 29

Feature Name Same-srv-rate

2

Protocol-type

16

Num-root

30

Diff-srv-rate

3

Service

17

Num-file-creations

31

Srv-diff-host-rate

4

Flag

18

Num-shells

32

Dst-host-count

5

Src-bytes

19

Num-access-files

33

Dst-host-srv-count

6

Dst-bytes

20

Num-outbound-cmds

34

7

Land

21

Is-hot-login

35

8

Wrong-fragment

22

Is-guest-login

36

Dst-host-same-srvrate Dst-host-diff-srvrate Dst-host-same-srcport-rate

9

Urgent

23

Count

37

10

Hot

24

Srv-count

38

11

Num-failed-logins

25

Serror-rate

39

12

Logged-in

26

Srv-serror-rate

40

13

Num-compromised

27

Rerror-rate

41

14

Root-shell

28

Srv-rerror-rate

Dst-host-srv-diffhost-rate Dst-host-serror-rate Dst-host-srv-serrorrate Dst-host-rerror-rate Dst-host-srv-rerrorrate

Performance Evaluation The effectiveness of an IDS is evaluated by its ability to make correct predictions. According to the real nature of a given event compared to the prediction from the IDS, four possible outcomes are shown in Table 2, known as the confusion matrix [4]. True Positive Rate(TPR) or Detection Rate(DR), True Negative Rate(TNR), False Positive Rate (FPR) or False Alarm Rate (FAR) and False Negative

4

Amrita & P Ahmed

Rate(FNR) are measures that can be applied to quantify the performance of IDSs [4] based on the above confusion matrix.

Table 2. Confusion Matrix Predicted

Negative Class (Normal)

Positive Class (Attack)

Negative Class (Normal)

True Negative (TN)

False Positive (FP)

Positive Class (Attack)

False Negative (FN)

True positive (TP)

Actual

FEATURE SELECTION Real time intrusion detection is merely impossible due to the huge amount of data flowing on the Internet. Feature selection can reduce the computation and model complexity. Research on feature selection started in early 60s [11]. Feature selection is a technique of selecting a subset of relevant features by removing most irrelevant and redundant features [12] from the data for building robust learning models [13].

Process of Feature Selection Feature selection processes involve four basic steps in a typical feature selection method [13] shown in Figure 2. They are generation procedure to generate the next candidate subset; an evaluation function to evaluate the subset under examination; a stopping criterion to decide when to stop; and a validation procedure to check whether the subset is valid. Figure 2 demonstrates the feature selection process to determine and validate a best feature subset.

Figure 1 : Feature selection process with validation [13].

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

5

METHODS FOR FEATURE SELECTION Blum and Langley [14] divide the feature selection methods into three categories named filter, wrapper and hybrid (embedded) method. These methods are currently used in intrusion detection. The filter method [15][16] selects features subsets based on the general characteristics of the data. Filter method is independent of classification algorithms. Filter algorithm [18] uses external learning algorithm to evaluate the performance of selected features. The wrapper method [19] “Wrap around” the learning algorithm. It uses one predetermined classifier to evaluate features or feature subsets. Wrapper algorithm [18] uses a search algorithm to search through the space of possible features and evaluate each subset by running a model on the subset. Many feature subsets are evaluated based on classification performance and best one is selected This method is more computationally expensive than the filter method [17][19]. The hybrid method [17][20] combines wrapper and filter approach to achieve best possible performance with a particular learning algorithm. More efficient search strategies and evaluation criteria are needed for feature selection with large dimensionality in hybrid algorithm [18] to achieve similar time complexity of filter algorithms. These methods are discussed in detail in Section 5 and summarized in section 6.

RELATED WORKS In this section, we thoroughly discusses the different feature selection methods used in intrusion detection based on filter, wrapper and hybrid method, number of feature selected, feature number (according to Table 1), its performance on KDD Cup’99 dataset, strength, limitation and future work reported in the literature. Filter Method A feature selection algorithm, FSMDB based on DB index criterion is proposed in [21] (Zhang et al., 2004). Criterion function is constructed according to the characters of DB index criterion. 24 features {features no. : 6, 5, 1, 34, 33, 36, 32, 8, 27, 29, 28, 30, 26, 38, 39, 35, 13, 24, 23, 11, 3, 10, 12 and 4} are selected and tested using two classifiers BP network and SVM. Classification accuracy of FSMDB algorithm by classifiers BP network and SVM are 0.1017 and 0.056 respectively. This method can be used for supervised or unsupervised classification problems but has computational complexity in unsupervised learning mode. Future Work: To find a better approach to reduce high computational complexity in unsupervised learning mode. Two neural network methods: (1) neural network principal component analysis (NNPCA) and (2) nonlinear component analysis (NLCA) are presented in [22] (Kuchimanchi et al., 2004). The number of significant features extracted from methods PCA, NNPCA and NLCA are 19, 19 and 12. The first 19 selected features based on the results of Scree test and critical eignvalues test are {feature no. : 5, 6, 1, 22, 21, 31, 30, 3, 4, 2, 16, 10, 13, 34, 32, 27, 24, 37, 23 and 36}. The performance of the Non-linear classifier (NC) and the CART decision tree classifier (DC) are tested on four datasets (Table 3). DC has

6

Amrita & P Ahmed

relatively high detection accuracies and low false positive rates. Future Work: This work can be extended on quantitative measures to find optimal combinations of classifiers and feature extractors for IDS.

Table 3 : False Positive Rates (FPR) And Detection Accuracies (DA) for NC and DC on the four Datasets DATASET

#Features

FPR

DA

NC

DC

NC

DC

ORIGDATA

41

8.2821

0.2268

99.0198

99.9428

PCADATA

19

29.4105

0.2609

99.1161

99.9167

NNPCADATA

19

50.5463

0.4922

98.8206

99.7516

NLDATA

12

51.2756

0.8227

97.2306

99.6359

RICGA (ReliefF Immune Clonal Genetic Algorithm), a combined feature subset selection method based on the ReliefF algorithm, Immune Clonal selection algorithm and GA is proposed in [23] (Zhu et al., 2005). BP networks is used as classifier.. RICGA has higher classification accuracy (86.47%) for small size feature subsets (8) than ReliefF-GA. Features are not mentioned in the paper. This paper [24] (Zainal et al., 2006) investigated the effectiveness of Rough Set (RS) theory in identifying important features and used as a classifier. The 6 significant features obtained are {feature no.: 41, 32, 24, 4, 5 and 3}. Classification results obtained by Rough Set are compared with Multivariate Adaptive Regression Splines (MARS), Support Vector Decision Function (SVDF) and Linear Genetic Programming (LGP). Classification accuracy of RS is ranked second for normal category and performed almost same to MARS and SVDF for attack category. Future Work: This work can be extended in terms of accuracy by focusing on fusion of classifiers after a set of optimum feature subset is obtained. Wong and Lai (2006) [25] combined Discriminant Analysis (DA) and Support Vector Machine (SVM) to detect intrusion for anomaly-based network IDS. Nine features (feature no. : 12, 23, 32, 2, 24, 36, 31, 29 and 39) are extracted by Discriminant Analysis and evaluated by SVM. The TN (%), FP(%), FN(%) and TP(%) of the proposed method are 99.58%, 0.42%, 9.93% and 90.07% respectively. Future Work: Multiple Discriminant Analysis (MDA) can be applied to find the optimal feature set for each type of attack. Li et al. (2006) [26] proposed a lightweight intrusion detection model. Information Gain and Chi-Square approach are used to extract important features and Classic Maximum Entropy (ME) model is used to learn and detect intrusions. The top 12 important features selected by both methods are {feature no.: 3, 5, 6, 10, 13, 23, 24, 27, 28, 37, 40 and 41}. Experimental results are shown in Table 4. Future Work: This model can be applied in realistic environment to verify its real-time performance and effectiveness.

7

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

Table 4. Detection Results All 41 features

Selected features

Class

Testing Time(s)

Acc.(%)

Testing Time(s)

Acc.(%)

Normal

1.28

99.75

0.78

99.73

Probe

2.09

99.8

1.25

99.76

DoS

1.93

100

1.03

100

U2R

1.05

99.89

0.7

99.87

R2L

1.02

99.78

0.68

99.75

Tamilarasan et al. (2006) [27] performed different feature selection and ranking methods on the KDD Cup’99 dataset. Chi-Square analysis, logistic regression, normal distribution and beta distribution experiments are performed for feature selection. The 25 most significant features ranked by Chi-square test are {feature no.: 35, 27, 41, 28, 40, 30, 34, 3, 33, 12, 37, 24, 29, 2, 13, 8, 36, 10, 26, 39, 22, 25, 5, 1, 38}. Experiments are performed for normal, probe, DoS, U2R, and R2L using resilient back propagation neural network. The overall accuracy of the classification is 97.04% with a FPR of 2.76% and FNR of 0.20%. Fadaeieslam et al. (2007) [28] proposed a feature selection method based on Decision Dependent Correlation (DDC). Mutual information of each feature and decision is calculated and top 20 important features {feature no.: 3, 5, 40, 24, 2, 10, 41, 36, 8, 13, 27, 28, 22, 11, 14, 17, 18, 7, 9 and 15} are selected and evaluated by SVM classifier. The classified result is 93.46% and it outperforms Principal Component Analysis PCA. Shina Sheen and R Rajesh (2008) [29] considered different methods: Chi square, Information Gain and ReliefF for feature selection. Top 20 features {feature no.: 2, 3, 4, 5, 12, 22, 23, 24, 27, 28, 30, 31, 32, 33, 34, 35, 37, 38, 40 and 41} are selected and evaluated using decision tree (C4.5). The Classification accuracy of Chi Square, Info Gain and ReliefF are 95.8506%, 95.8506% and 95.6432% respectively. In [30] (Kiziloren and German, 2009), Principal Component Analysis (PCA) is used for feature selection to increase quality of extracted feature vectors and Self Organizing Network (SOM) as a classifier to detect network anomalies. The highest success rate 98.83% of the system is obtained when number of feature vector size equals to 10. Features are not mentioned in the paper. The average success rate of the system without using PCA is 97.76%. PCA provides faster classification operation which is important for a real-time system. Suebsing and Hiransakolwong (2009) [31] proposed a combination of Euclidean Distance and Cosine Similarity to select robust features subsets with smaller size. Euclidean Distance is used to select the features to detect the known attacks and Cosine Similarity is used to select the features to detect the unknown attacks to build a model. The known detection method extracts 30 important features {feature no. : 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40, 41, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19

8

Amrita & P Ahmed

and 22}. The unknown detection method extracts 24 important features {feature no. : 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40, 41, 3, 4, 23, 24, 29, 32, 33, 34 and 36}. 15 features {feature no.: 1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40 and 41} are selected by both methods. The C5.0 method is used as a classifier. The experimental results are shown in Table 5. Table 5: Results for known and unknown attack Parameter

Known attack

Unknown attack

Full Set (41)

Known detection method(30)

Full Set (41)

Unknown detection method(24)

Overall TP %

97.95

98.12

53.31

68.28

Overall FP %

2.04

1.87

46.69

31.72

Time to Build Model(s)

75

51

75

45

A new approach named Quantitative Intrusion Intensity Assessment (QIIA) is proposed in the paper [32] (Lee et al., 2009). QIIA evaluates the proximity of each instance of audit data using proximity metrics based on Random Forests (RF). QIIA uses Random Forests (RF) to select important features by using the numerical feature importance of RF. Two approaches QIIQ1 and QIIA2 are proposed to determine the threshold parameters value. The top 5 important features selected are {feature no.: 23, 32, 10, 6 and 3}. Only DoS attacks are used since other attack types have very small number of instances. The experimental results show that the detection rates (DR) of QIIA1 and QIIA2 are 97.94 and 99.37 respectively. An entropy-based traffic profiling scheme for detecting security attacks is presented in [33] (Lee and He, 2009). Only denial-of-service (DoS) attack is focused in this paper. The top six features ranked by the accuracy are {feature no.: 5, 6, 31, 32, 36 and 37}. The true positive rate (TPR) of this scheme is 91%. [34] (Xiao et al., 2009) presented a two-step feature selection algorithm. It eliminates two kinds of features: irrelevant features in first step and redundant features in second step. 21 features {feature no.: 1, 3, 4, 5, 6, 8, 11, 12, 13, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36 and 39} are selected and evaluated using C4.5 algorithm and Support Vector Machine (SVM). The Detection Rate (%), False Alarm Rate (%) and Processing Time of selected features (All features) are 86.3 (87.0), 1.89 (1.85) and 15.163 sec (21.891 sec) respectively. A novel approach for selecting features and comparing the performance of various BN classifiers is proposed in [35] (Khor et al., 2009). Two feature selection algorithms Correlation-based Feature Selection Subset Evaluator (CFSE) and Consistency Subset Evaluator (CSE) and domain experts are utilised to form the proposed feature set.. This feature set contains 7 features as {feature no.: 3, 6, 12, 23, 32*, 14* and 40*}. Bayesian Network (BN) is employed as a classifier. The classification accuracy (%) of the BN for Normal, DoS, Probe, R2L and U2R types are (99.8, 99.9, 89.4, 91.5 and 69.2%). *: Features that were selected based on domain knowledge.

9

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

Bahrololum et al. (2009) [36] used three machine learning methods : Decision Tree(DT), Flexible Neural Tree (FNT) and Particle Swarm Optimization (PSO) for feature selection. The five important features {feature no.: 10, 17, 14, 13 and 11} are selected depending on the contribution of the variables for the construction of the decision tree. The experimental results are shown in (Table 6). Table 6 : Detection Performance using DT, FNT and PSO Methods Attack Class

DT

FNT

PSO

Normal

9.96%

99.19%

95.69%

DoS

100%

98.75%

90.41%

R2L

99.02%

99.09%

98.10%

U2R

88.33%

99.70%

100%

Probe

99.66%

98.39%

95.53%

An automatic feature selection method based on filter method is proposed by Nguyen et al. (2010) [37]. The globally optimal subset of relevant features is found by the Correlation Feature Selection (CFS) and evaluated by C4.5 and BayesNet. The selected features for Normal&DoS are 3 {5, 6 and 12}; for Normal&Probe are 6 {5, 6, 12, 29, 37 and 41}; for Normal&U2R is 1 {14}; for Normal&R2L are 2 {10 and 22}. Average classification accuracies of C4.5 and BayesNet are 99.41% and 98.82% respectively. Chen et al. [38] (2010) proposed a novel inconsistency-based feature selection method. Data consistency is applied to find the optimal features and evaluated by decision tree method (C4.5). The proposed method is compared with CFS (Table 7). Table 7 : Performance Comparision (CC: Classification Correctness) Attac k Type

All features

Proposed Method

CFS Method

CC(% )

Time(s )

Features

CC(% )

Time(s )

Features

CC(% )

Probe

99.85

0.66

4(3,5,35,36)

99.77

0.16

4(5,6,25,37)

94.35

DoS

99.94

1.08

4(3,4,10,23)

99.81

0.22

4(2,5,16,22)

99.32

U2R

100

0.11

2(3,41)

100

0.09

9(3,10,24,29,31,32,33,34,40)

100

R2U

98.99

0.22

5(3,5,12,32,35)

99.13

9.13

5(3,5,10,24,33)

98.05

All

99.5

3.72

8(1,3,5,25,32,34,36,40 )

99.45

0.48

11(2,3,4,5,6,10,23,24,25,36,3 7)

99.67

Ti me (s) 0.2 7 0.3 3 0.0 8 0.1 1 6.2 8

A novel unsupervised statistical varGDLF, a variational framework for the GD mixture model with localized feature selection (GDLF) approach is proposed in [39] (Fan et al., 2011) for detecting network based attacks. Eleven features {feature no.: 1, 5, 12, 15, 18, 21, 22, 29, 33, 38 and 41} are selected. The performance of varGDLF approach is compared with other four variational mixture models

10

Amrita & P Ahmed

and it outperforms with the highest accuracy rate (85.2%), the lowest FP rate (7.3%) and the most accurately detected number of components (4.95). Accuracy rate for Normal, DOS, R2L, U2R and Probing is 99.5, 96.5, 75.4, 69.6 and 85.1%, respectively. FP rate is 11.5, 0.8, 1.4, 11.5 and 11.3%, respectively. An improved information gain (IIG) algorithm is proposed in [40] (Xian et al., 2011) based on feature redundancy.. Twenty two features are selected after applying Information Gain (IG) algorithm and then 12 {feature no.:2, 3, 5, 6, 8, 10, 12, 23, 25, 36, 37 and 38} features are selected after applying IIG. Naive Bayes (NB) is used to carry out the experiment on the three feature set as the original feature set (41 features), feature subset 1 (22 features) and feature subset 2(12 features). The Processing times (s) of the three feature subsets are 8.34, 4.16 and 2.08; the Detection Rates (DR) (%) are 96.187, 96.407 and 96.801; the False Positive Rates (FPR) (%) are 5.22, 2.58 and 1.02 respectively.

WRAPPER METHOD In paper [41] (Middlemiss and Dick, 2003), a simple Genetic Algorithm (GA) is used to evolve weights for the features and k-nearest neighbour (KNN)classifier is used as fitness function of the GA and also as classifier. Top five ranked features for each class are selected {DoS-23,29,1,11,24; R2U24,3,12,23,36; U2R-24,6,31,41,17; Probe-2,37,30,3,6}. The result shown indicates an increase in intrusion detection accuracy. Mukkamala and Sung (2003) [42] presented two methods to rank the important features: (1)Performance-Based Ranking Method (PBRM) and (2) Support Vector Decision Function Ranking Method (SVDFRM). Thirty one features are selected by union of important features for each of the 5 classes ranked by PBRM. In SVDFRM, the union of important features for each of the 5 classes are 23. The 8 important features identified by both ranking methods are {feature no.: 1, 3, 5, 6, 23, 24, 32 and 33}. Experiments are performed by both methods with classifier SVM (Table 8). Future Work: Ongoing experiments include making 23-class (22 attack classes plus normal) feature identification using SVMs.

Table 8 : Performance of SVMs Ranked by PBRM (31)

Ranked by SVDFM (23)

Class

Training Time (s)

Testing Time(s)

Acc.(%)

Training Time (s)

Testing Time(s)

Acc.(%)

Normal

7.67

1.02

99.51

4.85

0.82

99.55

Probe

44.38

2.07

99.67

36.23

1.4

99.71

DOS

18.64

1.41

99.22

7.77

1.32

99.2

U2R

3.23

0.98

99.87

1.72

0.75

99.87

R2L

9.81

1.01

99.78

5.91

0.88

99.78

11

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

The Ant Colony Optimization (ACO) based intrusion feature selection algorithm is proposed in [43] (Gao et al., 2005). The fisher discrimination rate is adopted as the heuristic information for ants’ traversal. The Least Square based SVM classifier is adopted as the base classifier to evaluate the generated feature subset. The number of features selected by applying ACO-SVM methods is 11 for Probe, 9 for DoS, and 14 for U2R & R2L. Features name is not mentioned in this paper. Table 9 shows the experimental results.

Table 9: Performance of ACO-SVM Type

#Feature

Correct Classification Rates

False Positive Rates

Average Detection Time

Probe

11

99.40%

0.35%

0.074

DoS

9

95.20%

3.24%

0.031

U2R&R2L

14

98.70%

1.60%

0.078

This paper [44] (Banković et al., 2007) investigated the possibility to increase the detection rate (DR) of U2R attacks in misuse detection. Extracted features obtained by using Principal Component Analysis(PCA) and Multi Expression Programming(MEP) are {U2R-14, 33; DoS- 1, 5, 39; Normal- 3, 10, 12}. Genetic algorithm is employed to implement rules for detecting various types of attacks. Additional two more rule sets are deployed to re-check the decision of the rule set for detecting U2R attacks. The experiments show (Table 10) that this system outperforms the best-performed model reported in literature.

Table 10. Performance of the System #Rules

DR Total System

FPR

U2R Rule System

Total System

U2R Rule System

50

50

46.3

0.0055

0.007

75

77.8

77.8

7.2

10.2

100

100

100

16.54

27.4

Chen et al. (2007) [45] presented a wrapper based feature selection method. A random search method named modified random mutation hill climbing (MRMHC) is introduced as search strategy to select features subsets and Support Vector Machines (SVMs) as classifier. The experiments are shown in Table 11. Future Work: This method can be improved on search strategy and evaluation criterion.

12

Amrita & P Ahmed

Table 11: Selected feature subsets, time for selecting process for different feature selection algorithm, average time of building and testing process for ALL Attacks, DOS, PROBE, R2L and U2R Attack Type

ALL

DOS

PROBE

R2L

U2R

#Features

5

4

5

3

5

Selected features

3,5,23,33,34

5,12,23,34

1,3,5,23,37

1, 5,6

1,3,6,14,33

Time of Selecting Process(h)

GASVMs MRMHCSVMs

1.3

0.5

4

1.5

1.5

0.4

0.2

2.2

0.8

0.6

Avg. Time to Build Process(s)

All

78

136

245

317

193

Selected

30

31

96

24

78

All

18

22

49

55

50

selected

6

5

17

7

15

Avg. Time to Test Process(s)

A multi-objective genetic fuzzy intrusion detection system (MOGFIDS) is proposed by Tsang et al. (2007) [46]. The MOGFIDS is used as a genetic wrapper to search for a near-optimal feature subset. The 27 features selected by MOGFIDS are {feature no.: 2 (tcp, udp, icmp), 5, 6, 7, 8, 9, 11, 12, 13, 14, 17, 18, 22, 23, 25, 30, 32, 33, 34, 35, 36, 37, 38, 39 and 40}. The MOGFIDS has second highest ACC (99.24%) and lowest FPR (1.1%) among the wrappers in the paper. Future Work: This can be applied to other complex problem domains such as face recognition and DNA computing. This paper [47] (Wang and Gombault, 2008) proposed a system that extracts important features from raw network traffic only for DDoS attacks in real computer networks. The first 9 important features {feature no.: 23, 32, 37, 33, 5, 24, 31, 39 and 3} based on rank are selected by Information Gain and Chisquare method and evaluated by Bayesian Networks and decision trees (C4.5) shown in Table 12. Future Work: A practical real-time system for fast detection of DDoS attacks can be developed. Table 12: Detection rate, False Positive Rate and Construction Time Results Evaluatio n Criteria Methods #Feature s

Dr

FPR

C4.5

BN

9

41

9

99. 8

99. 8

99. 6

C4.5 41

9

41

Features Training Time (s) Construction Time BN

9

-

C4.5

41

99. 0.3 0.3 1.6 1.5 0

237(s)

2043(s)

BN

Testing time (s) C4.5 9

41

BN

9

41

9

41

9

41

1. 7

15. 3

0. 7

4.4 0. 0.9 0.2 0.9 2

Li et al. (2009) [48] proposed a wrapper-based feature selection method to build lightweight intrusion detection system. Modified Random Mutation Hill Climbing (RMHC) method are applied as search strategy to find a candidate feature subset and modified linear Support Vector Machines (SVMs) to evaluate the candidate feature subset. A classification algorithm based on a decision tree whose nodes consist of linear SVMs is used to build the IDS from selected features subsets. The experiments show

13

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

that the systems have higher ROC (Receiver Operating Characteristic) scores than all 41 features in terms of detecting known attacks, new attacks and computational cost (Table 13). Table 13 – Selected feature subsets, Average time of building and testing processes with all and selected features for ALL attacks, DOS, PROBE, R2L and U2R Attack Type

Features

Building time(s)

Testing time(s)

All features

Selected features

All features

Selected features

ALL

4(3,5,23,32)

78

36

18

8

DOS

4(2,5,23,34)

136

41

22

9

PROBE

6(1,3,5,6,23,35)

245

123

49

29

R2L

3(1,3,5)

317

35

55

8

U2R

5(1,3,5,14,32)

193

85

50

18

This paper [49] (Ali et al., 2010) improve the accuracy of Signature Detection Classification (SDC) Model by applying the features extraction based customized features. Features are extracted by using GA (Genetic Algorithm), two-second-time and Hidden Markov from customized features. Eleven features {feature no.: 5, 6, 13, 23, 24, 25, 26, 33, 36, 37 and 38} are extracted and the best signature detection classification model is developed using JRip, Ridor, PART and Decision tree. The extracted features have increased the detection rates between 0.4% to 9% and reduced false alarm rates between 0.17% to 0.5%. Gong et al. (2011) [50] proposed a novel approach for feature selection based on Genetic Quantum Particale Swarm Optimization (GQPSO) for network intrusion detection. Support Vector Machine (SVM) is used for classification algorithm. Selected features and experimental results are shown in Table 14. Table 14 : Selected Feature and performance of SVM with GQPSO Algorithm

Attack Type

Features

Training

Detecting

Time(ms)

Time(ms)

DR

Error Report Rate(%)

DoS

10 (2, 6, 3, 12, 21, 22,31, 26, 28, 30)

0.0627

0.0581

99.98

0

Probe

5 (5, 12, 26, 32, 34)

0.0431

0.0478

91.77

0.001

R2L

7 (10, 23, 25, 29, 26, 33, 35)

0.053

0.014

98.26

0

U2R

5 (2, 3, 17, 32, 36)

0.0006

0.0016

100

0.0003

14

Amrita & P Ahmed

Li et al. (2012) [51] proposed an effective wrapper-based feature reduction method, called gradually feature removal (GFR) method. The GFR method extracted 19 critical features {feature no.: 2, 4, 8, 10, 14, 15, 19, 25, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38 and 40}. The accuracy of SVM classifier is achieved 98.6249% and MCC (Matthews correlation coefficient) is 0.861161. The training and testing time of SVM classifier is greatly reduced. An advanced intelligent systems using ensemble soft computing techniques is proposed by Sindhu et al. (2012) [52] for a lightweight IDS to detect anomalies in networks. GA (Genetic Algorithm) is used to extract the feature subset and a neurotree paradigm is proposed as a classifier. Features extracted by this method are 16 {feature no.: 2, 3, 4, 5, 6, 8, 10, 12, 24, 25, 29, 35, 36, 37, 38 and 40}. The detection rate is 98.4% which is superior to other methods.

HYBRID METHOD In this paper [53] (NG et al., 2003), a feature importance ranking methodology based on the stochastic radial basis function neural network output sensitivity measure (RBFNN-SM) is presented. RBFNN-SM is used to evaluate the features for only the normal and six classes of denial of service (DOS) attack. The experiments show that 8 {feature no.: 2, 24, 23, 29, 32, 34, 33 and 36} most significant sensitive features are enough to classify normal and DOS attacks. The computation complexity reduced to 9 seconds from 23 seconds. The classification accuracy for normal and DOS attacks are 99.77% and 99.06%; the FAR for 8 (41) features are 0.18% (0.01%) and 0.27% (0.03%); the FPR are 0.93% (0.70%); and training and testing are 0.94% and (0.71%) respectively. Shazzad and Park (2005) [54] proposed a fast hybrid feature selection method to determine an optimal feature set. This method is a fusion of Correlation-based Feature Selection (CFS), Support Vector Machine (SVM) and Genetic Algorithm (GA). Subsets of features are generated by Genetic Algorithm and evaluated by CFS and SVM. The 12 selected features are {feature no.: 1, 6, 12, 14, 23, 24, 25, 31, 32, 37, 40 and 41}. Optimal subset set has 99.56% as DR and 37.5% as FPR in average. Chebrolu, Abraham and Thomas(2005) [7] investigated the performance of two feature selection techniques, Bayesian Networks (BN) and Classification and Regression Trees (CART) and developed the ensemble classifier of both techniques for building an IDS and best in classifying R2L and DoS. Seventeen important features are {feature no.: 1, 2, 3, 5, 7, 8, 11, 12, 14, 17, 22, 23, 24, 25, 26, 30 and 32} are selected by Markov blanket model and a classifier is constructed using BN and tested. Twelve features {feature no.: 3, 5, 6, 12, 23, 24, 25, 28, 31, 32, 33 and 35} are selected by decision tree and a classifier using CART is constructed and tested. Normal class is classified 100% correctly and the accuracies of classes U2R and R2L have increased by using the 12-variable reduced data set. It is observed that CART classifies accurately on smaller data sets. In ensemble approach, the BN classifier and the CART models are constructed first individually. Then the ensemble approach is used for the 12, 17 and 41-variable data sets. By using the ensemble model, Normal, Probe and DOS could be detected with 100% accuracy and U2R and R2L with 84% and 99.47% accuracies, respectively.

15

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

In this paper [55] (Chen et al., 2007), a new hybrid approach named as C4.5-PCA-C4.5 is proposed. It uses PCA (Principal Component Analysis) and decision tree classifier C4.5 as feature selection method and C4.5 as classifiers. The important features extracted are {feature no.: 33, 34, 4, 1, 3, 10 and 22}. The performance of C4.5-PCA-C4.5 is compared with other four systems C4.5-ALL, C4.5-PCA, SVM-CFS and SVM-CFS-SVM. The experiment results show that C4.5-PCA-C4.5 has lower testing time, fast training and testing process, highest TPR, lowest FPR. Average building process time for C4.5-PCA-C4.5 is 6 sec. Lee et al. (2007) [56] uses two machine learning algorithms Random Forests (RF) for feature selection and Minimax Probability Machine (MPM) for intrusion detection. The top 5 {feature no.: 23, 6, 29, 3 and 5} important features are selected. Only Denial of Service (DoS) attacks are used. The detection rate is 99.84% and average simulation time is 0.1039 sec. Wei Wang et al. (2008) [57] used filter and wrapper scheme for feature selection. Information gain (IG) based filter model and Bayesian networks (BN) and decision trees (C4.5) based wrapper model are employed to select features for network intrusion detection and Bayesian networks (BN) and decision trees (C4.5) as classifier. Experiments results and selected 10 features for each class are shown in Table 15. Table 15. Results comparison using 41 features and 10 features Attacks

DoS

DDoS

Probe

R2L

U2R

Features Selected

3, 4, 5, 6, 8, 10, 13, 23, 24, 37 3, 4, 5, 6, 8, 10, 13, 23, 24, 37 3, 4, 5, 6, 29, 30, 32, 35, 39, 40 1, 3, 5, 6, 12, 22, 23, 31, 32, 33 1, 2, 3, 5, 10, 13, 14, 32, 33, 36

Methods

Using 41 Features

Using 10 Features

DR

FPR

Training Time(s)

Test Time(s)

DR

FPR

Training Time(s)

Test Time(s)

BN

98.73

0.08

4.7

2.1

100

0

0.8

0.6

C4.5

99.96

0.15

16.3

1.2

100

0.14

4.6

0.5

BN

99.03

1.53

-

-

99

1.92

-

-

C4.5

99.8

0.26

-

-

100

0.34

-

-

BN

92.89

6.08

3.1

2.8

83

3.06

0.5

0.4

C4.5

82.59

0.04

14.5

1.1

83

0.05

1.2

0.3

BN

92.22

0.33

2.6

1.8

89

0.32

0.5

0.4

C4.5

80.29

0.02

10.5

0.8

87

0.01

0.5

0.2

BN

75.86

0.29

2.6

1.8

66

0.12

0.4

0.4

C4.5

24.14

0

9.9

0.7

24

0

0.6

0.2

Hong and Haibo (2009) [58] proposed a new hybrid selection algorithm to build lightweight network IDS. Chi-Square and enhanced C4.5 algorithm are used for feature selection in the preprocessing phase. The top fifteen most important features extracted from Chi-Square algorithms are

16

Amrita & P Ahmed

{feature no.: 5, 3, 23, 35, 4, 8, 30, 34, 36, 6, 33, 38, 24, 25 and 2}. The top five features extracted by C4.5 and C4.5-Chi2 methods are {feature no.:25, 4, 2, 5 and 29} and {feature no.: 5, 3, 4, 8 and 25} respectively. The experimental results are shown in Table 16. Table 16: Detection & False Positive Rate Results based on C4.5- CHI2 Attack Type

Evaluation Criteria DR

FPR

Normal

99.9

1.6

DOS

99.3

1.48

Probe

93.87

1.82

U2R

50.01

28.32

R2L

61.55

12.17

Training Time

Testing Time

0.02 Sec

0.03 Sec.

In this paper [59] (Xiang et al., 2009), a hybrid method named Robust Artificial Intelligence Selection Algorithm (RAIS) is presented. Mutual information and artificial intelligence method are used for feature subsets selection and SVMs as classifier. Selected features are not mentioned in this paper. The experimental results show that the RAIS algorithm has the lowest false alarm rate, 3.49%, the highest rate of accuracy, 99.01%, and detection rate, 99.27%. Zaman and Karray (2009) [60] proposed a novel and simple method named Enhanced Support Vector Decision Function (ESVDF) for features selection. This method utilizes the Support Vector Machines (SVMs) approach based on Forward Selection Ranking (FSR) and Backward Elimination Ranking (BER) algorithms. The ESVDF (SVDF/FSR or SVDF/BER) method applies SVDF in the FSR and BER approaches to select the most effective features set. Two classifiers: Neural Networks (NNs) and SVMs are used to evaluate features. The experimental results are shown in Table 17. Feature’s name is not mentioned. Table 17 : Comparison of ESVDF/FSR, ESVDF/BER, and All 41 Features using NNs and SVMs classifiers. Classifier

Algorithm

#Features

Accuarcy

FPR

Training Time

Testing Time

NN

ESVDF/FSR

6

99.55%

0.0032

217.57

0.047

ESVDF/BER

9

99.57%

0.003

255.047

0.053

Non

41

99.65%

0.0036

911.68

0.075

ESVDF/FSR

6

99.46%

0.0033

2.039

0.052

ESVDF/BER

9

99.58%

0.0031

2.1

0.046

Non

41

99.71%

0.0032

5.182

0.17

SVM

Ming-Yang Su (2011) [61] proposed a method for feature selection to detect DoS/DDoS attacks in real time for designing an anomaly-based NIDS. Genetic algorithm (GA) combined with KNN (k-

17

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

nearest-neighbor) are used for feature selection and weighting. The result of KNN classification is used as the fitness function in a genetic algorithm to evolve the weight vectors of features. Initial 35 features in the training phase are weighted. The top 19 features are considered for known attacks and the top 28 features for unknown attacks. Extracted features are not mentioned in the paper. An overall accuracy rate of 97.42% is obtained for known attacks and 78% for unknown attacks.

A SYSTEMATIC REVIEW OF RELATED WORK The afore-mentioned work of feature selection is summarized in a systematic way according to approach as filter in Table 18, wrapper in Table 19 and hybrid in Table 20. These tables consist of literature reference, proposed method name, number of features selected by paper, feature number according to Table 1, classifier used to evaluate the proposed method, evaluation criteria and results of

F I L T E R

M E T H O D

proposed method. Table 18: Summary of Filter Method No of Feature No Classifier Feature Used

Lit. Ref.

Method Name

[21] 2004

FSMDB

24

[22] 2004

NNPCA & NLCA

19 12

[23] 2005 [24] 2006 [25] 2006

RICGA

12

6,5,1,34,33,36,32,8,27,29,2 8,30,26,38,39,35,13,24,23,1 1,3,10,12,4 5, 6, 1, 22, 21, 31, 30, 3, 4, 2, 16, 10, 13, 34, 32, 27, 24, 37, 23 Not Mentioned

Rough Set

6

41, 32, 24, 4, 5, 3

Combined DA and SVM

9

12, 23, 32, 2, 24, 36, 31, 29, 39

SVM

[26] 2006

Information Gain and Chi-Square approach Artificial Neural Networks and Statistical Methods Decision Dependent Correlation(DDC) Chi Square, Info Gain and ReliefF PCA-SOM

12

3,5,6,10,13,23,24,27,28,37, 40,41

ME

25

35,27,41,28,40,30,34,3,33,1 2,37, 24,29, 2, 13,8,36,10, 26,39,22, 25,5,1,38 3,5,40,24,2,10,41,36,8,13,2 7,28,22,11,14,17,18,7,9,15 2,3,4,5,12,22,23,24,27,28, 30,31,32,33,34, 35,37,38, 40,41 Not mentioned

RBP Neural Network SVM

1, 2, 12, 25, 26, 27, 28, 30, 31, 35, 37, 38, 39, 40 41 23, 32, 10, 6 , 3

C5.0

[27] 2006 [28] 2007 [29] 2008

20 20

[30] 2009 [31] Euclidean Distance 2009 & Cosine Similarity [32] (1) QIIA1(Max value) 2009 (2)QIIA2(Center Data) [33] Entropy-Based Scheme 2009 with Chi-Square [34] Mutual Information 2009 based Algorithm

10

[35] 2009

Proposed feature set using CFSE and CSE

[36] 2009

Based on DT, FNT and PSO

15 5

BP Network, SVM NC & DC

BP Network Rough Set

Decision Tree(C4.5) SOM

(1)

Evaluation Criteria Classification Accuracy FPR Detection Accuracies Classification Accuracy Classification Accuracy TN (%) FP ( %) FN (%) TP (%) Accuracy Testing Time Accuracy FPR FNR Classification Accuracy Classification Accuracy Avg. Success Rate Table 5

^ P^ x max DR

(2) P x T 5

5, 6, 31, 32, 36, 37

21

7

1, 3, 4, 5, 6, 8, 11, 12, 13, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 36, 39 3, 6, 12, 23, 32*, 14*, 40*

5

10,17,14,13, 11

Chi-Square Test C4.5 & SVM BN

DT, FNT and PSO

TPR

Result BP-0.1017 SVM0.056 Table 3

88.15%. 99.743 99.58% 00.42% 09.93% 90.07% Table 4

97.04% 2.76% 0.20% 93.46% 95.8506% 95.8506% 95.6432% 98.83% Table 5 (1) 97.94 (2) 99.37 91%

DR 86.3 FAR 1.89 Process. Time 15.163s Classification NormalAccuracy (%) 99.8 DoS-99.9 Probe-89.4 R2L-91.5 U2R-69.2 Detection Table 6 Accuracy

18

Amrita & P Ahmed

[37] 2010

[38] 2010 [39] 2011 [40] 2011

M01LPfrom CFS

Inconsistency-based feature selection method varGDLF

IIG(Improved Information Gain)

3 6

Normal&Dos-5,6,12; Normal&Probe5,6,12,29,37,41; Normal&U2R-14; Normal&R2L-10,22; Table 7

C4.5 BayesNet

11

1, 5, 12, 15, 18, 21, 22, 29, 33, 38, 41

varGDLF

12

2, 3, 5, 6, 8, 10, 12, 23, 25, 36, 37, 38

NB

1 2 Table 7

C4.5

Classification Accuracy

99.41% 98.82%

Classification Correctness Time(s) Accuracy Rate FPR No of Comp. DR FPR Processing Time

Table 7

85.2% 7.3% 4.95 96.801 1.02 2.08 s

W R A P P E R M E T H O D

*: Features that were selected based on domain knowledge.

Lit. Ref. [41] 200 3

Method Name

GA combination with a k-nearest neighbour classifier [42] PBRM and 200 SVDFRM 3 [43] ACO-SVM 200 5 [44] PCA & MEP 200 7 [45] MRMHC-SVMs 200 7 [46] MOGFIDS 200 7

[47] 200 8 [48] 200 9 [49] 201 0

Information Gain and Chi-square

Table 19: Summary of Wrapper Method No of Feature No Classifie Feature r Used 5 for KNN DoS-23,29,1,11,24; each R2U-24,3,12,23,36; class U2R-24,6,31,41,17; Probe-2,37,30,3,6 8 SVM 1,3,5,6,23,24,32,33

Evaluation Criteria Detection Accuracy

Table 8

Result Increase in ID Accurac y Table 8

Table 9 Not Mentioned

SVM

Table 9

Table 9

8

14, 33,1, 5, 39, 3, 10, 12

GA

DR FPR

Table 10

Table 11

Table 11

SVM

Table 11

Table 11

27

2(tcp,udp,icmp),5,6,7,8,9 ,11, 12,13,14,17,18,22,23,25, 30,32, 33,34,35,36,37,38,39, 40 23, 32, 37, 33, 5, 24, 31, 39, 3

MOGFID Accuracy S FPR

99.24 % 1.1%

C4.5 & BN

Table 12

Table 12

Decision Tree

Table 13

Table 13

JRip, Ridor, DR PART & FAR Decision tree SVM Table 14

Increase d Decrease d Table 14

9

Modified RMHC and modified linear SVM Features Selection based on Customized Features [50] GQPSO 201 1 [51] GFR (Gradually 201 Feature Removal) 2

Table 13

Table 13

11

5, 6, 13, 23, 24, 25, 26, 33, 36, 37, 38

Table 14

Table 14

19

2,4,8,10,14, 15,19,25,27, 29,31,32,33, 34,35,36,37, 38,40

[52] A combined GA 201 and neurotree 2 method

16

2,3,4,5,6,8, 10,12,24, 25,29,35,36,37,38,40

SVM

Training time (s) Testing time (s) Accuracy (%) MCCavg Neurotre DR e

0.118356 4.63227 98.6249 0.861161

98.38

19

H Y B R I D

M E T H O D

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

Table 20: Summary of Hybrid Method No of Feature No Classifie Feature r Used

Lit. Ref.

Method Name

[53] 200 3

RBFNN-SM

8

2, 24, 23, 29, 32, 34, 33, 36

[54] 200 5 [7] 200 5

A fusion of CFS, SVM & GA

12

[55] 200 7 [56] 200 7 [57] 200 8 [58] 200 9 [59] 200 9 [60] 200 9 [61] 201 1

Markov blanket model and Decision Tree for feature selection

17-BN 12-CART

Evaluation Criteria

Result

RBFNN

Class. Acc. FAR FPR

99.415% 0.065% 0.935%

1, 6, 12, 14, 23, 24, 25, 31, 32, 37, 40, 41

SVM

DR FPR

99.56% 37.5%

{1,2,3,5,7,8, 11,12,14,17, 22,23,24, 25, 26,30,32}; {3,5,6,12,23, 24,25,28,31, 32,33,35}

Ensemble of BN and CART

Accuracy (%)

33, 34, 4, 1, 3, 10, 22

C4.5

100% Normal, DoS,Probe 84% U2R 99.47-R2L 6 sec -, -

C4.5-PCA-C4.5

5

RF

5

Information gain & BN and C4.5

10

Table 15

BN & C4.5

C4.5-Chi2

5

5, 3, 4, 8, 25

Enhanced C4.5

Table 16

Table 16

RAIS

-

Not mentioned

SVM

ESVDF/FSR ESVDF/BER

6 9

Not mentioned

NN SVM

DR FAR Accuracy Table 17

99.17% 3.49% 98.60% Table 17

GA/KNN Hybrid

19 28

Not Mentioned

GA/KNN

Accuracy Rate

97.42% 78.00%

23, 6, 29, 3, 5

MPM

Testing Time, TPR, FPR DR 99.84% Avg Sim. Time 0.1039 s DR Table FPR 15

CONCLUSIONS & FUTURE RESEARCH DIRECTIONS Intrusion Detection Systems (IDS) have become vital and a necessary component of almost every computer and network security. As network speed becomes faster, there is an emerge need for IDS to be lightweight, efficient and accurate with high detection rates (DR) and low false positive rates (FAR). Other difficulties faced by intrusion detection systems are curse of feature dimensionality and emerging data complexities. Therefore, feature selection has become very important part in intrusion detection systems due to curse of feature dimensionality and emerging data complexities. Feature selection selects a subset of relevant features, removes irrelevant and redundant features from the dataset to build robust, efficient, accurate and lightweight intrusion detection system to ensure timeliness for real time. A plenty of feature selection methods have been proposed by researchers in intrusion detection system to deal with these problems. This paper has presented to survey this fast developing field and addresses the main contribution of feature selection research proposed for intrusion detection. We showed that why feature selection method is vital in IDS. We surveyed the existing feature selection methods for IDS categorised as filter, wrapper and hybrid. We also presented the performance of these methods based on different metric on KDD Cup’99 dataset, mentioned extracted feature set and classifier

20

Amrita & P Ahmed

to evaluate these extracted feature set, strength, limitation and future work of these proposed method in section 5 and 6. The following are useful future research issues:

FUTURE RESEARCH Single classifier for evaluation of the extracted feature set may be no longer good solution for building the robust IDS. Therefore, designing more sophisticated classifiers by combining multiple classifiers or combining ensemble [7] and hybrid classifiers may enhance the robustness and performance of IDS. After comparing the existing feature selection methods in intrusion detection, we discovered that finding an optimal and best feature set still needs to be researched. Feature selection algorithms always need improvement on search strategy and evaluation criterion for building efficient and lightweight intrusion detection system. Robustness of the extracted feature can be enhanced by using ensemble of feature selection methods, combined with appropriate evaluation criteria. After surveying these many feature selection methods, we cannot say that which method perform the best under which classifier for intrusion detection (to the best of our knowledge). Most of the proposed method works on two-class classification (normal and attack type) (to the best of our knowledge). Very little work has been done on multiple class classification (five-class four classes of attack and one class of normal) [62][63]. Therefore, the research in many papers can be further extended in the future on multiple class classification. Classes in KDD Cup’99 are unbalanced in both training and test sets as it can be seen in Table 1. Normal and DoS classes have enough instances, whereas Probe and R2L have small instances, particularly U2R. These classes (Probe, R2L, U2R) have not good classification rate due to small number of instances in training set [56][31][39]. So, this is future research to develop the method combined with appropriate evaluation criteria to alleviate the small instance of dataset. We can conclude that there are features that really significant in classifying the normal and attacks type as reported in literature. Also, there is no specific generic classifier that can best classify all the attack types as seen in this survey. Different researchers use different classifier to evaluate the feature set. This paper systematically summarized the contributions of each researcher and also projected the number of significant research problem in this field. We hope that this survey will provide useful insights, broad overview and new research directions about this field to the readers.

REFERENCES [1]

Mitra, P. et al. (2002). Unsupervised Feature Selection Using Feature Similarity.

Transactions on Pattern Analysis and Machine Intelligence, 24, 301–312

IEEE

21

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

[2]

Anderson, J. P. (1980). Computer security threat monitoring and surveillance. Technical Report

98-17, James P. Anderson Co., Fort Washington, Pennsylvania, USA [3]

Denning, D. E. (1987). An intrusion detection model. IEEE Transaction on Software

Engineering, Software Engineering 13(2), 222-232 [4]

Wu, S.X. & Banzhaf, W. (2010). The use of computational intelligence in intrusion detection

systems: A review. Applied Soft Computing Journal, 10, 1–35 [5] Lazarevic, A., Ertoz, L., Kumar V., Ozgur A. & Srivastava J. (2003). A comparative study of anomaly detection schemes in network intrusion detection. In Proc. of the SIAM Conference on Data Mining [6] Kumar, S. & Spafford, E. H. (1994). A pattern matching model for misuse intrusion detection. In Proceedings of the 17th National Computer Security Conference, 11-21 [7] Chebrolu, S. et al. (2005). Feature deduction and ensemble design of intrusion detection systems. Computer Security, 24( 4), 295–307 [8] Yeung, D.Y. & Ding, Y. (2003). Host-based intrusion detection using dynamic and static behavioral models. Pattern Recognition, 36, 229-243 [9]

sKDD

Cup

1999

Intrusion

detection

dataset:

http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [10] Mukkamala, S. et al. (2005). Intrusion detection using an ensemble of intelligent paradigms. Journal of Network and Computer Applications, 28(2), 167–82 [11] Lewis, P. M. (1962). The characteristic selection problem in recognition system. IRE Transaction on Information Theory, 8, 171-178 [12] John, G.H. et al. (1994). Irrelevant Features and the Subset Selection Problem. Proc. of the 11th Int. Conf. on Machine Learning, Morgan Kaufmann Publishers, 121-129 [13] Dash, M. & Liu, H. (1997). Feature Selection for Classification. Intelligent Data Analysis, 1(3), 131–56 [14] Blum, Avrim L. & Pat Langley (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2), 245–271 [15] Dash, M. et al. (2002). Feature Selection for Clustering-a Filter Solution. Proc. 2nd Int’l Conf. Data Mining, 115-122 [16] Włodzisław, W. Tomasz et al. (2003). Feature Selection and Ranking Filters. [17] Das, S. (2001). Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection. Proc. 18th Int’l Conf. Machine Learning, 74-81

22

Amrita & P Ahmed

[18] Liu, H. & Yu, L. (2005). Towards integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502 [19] R. Kohavi and G.H. John (1997). Wrappers for Feature Subset Selection. Artificial Intelligence. 97 (1-2), 273-324 [20] Xing, E. et al. (2001). Feature Selection for High-Dimensional Genomic Microarray Data. Proc. 15th Int’l Conf.Machine Learning, 601-608 [21] Zhang, L. et al. (2004). Feature Selection for Pattern Classification Problems. Proceedings of the Fourth International Conference on Computer and Information Technology (CIT’04) [22] Kuchimanchi, Gopi K. et al. (2004). Dimension Reduction Using Feature Extraction Methods for Real-time Misuse Detection Systems. Proceedings of the 2004 IEEE Workshop on Information Assurance and Security United States Military Academy, West Point, NY, 195-202 [23] Zhu, Y. et al. (2005). Modified Genetic Algorithm based Feature Subset Selection in Intrusion Detection System. Proceedings of ISCIT 2005, 9-12 [24] Zainal, A. et al. (2006). Feature selection using rough set in intrusion detection. In Proc. IEEE TENCON, 1-4 [25] Wong, Wai-Tak & Lai, Cheng-Yang (2006). Identifying Important Features for Intrusion Detection Using Discriminant Analysis and Support Vector Machine. Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 3563-3567 [26] Yang, L. et al. (2006). A Lightweight Intrusion Detection Model Based on Feature Selection and Maximum Entropy Model. International Conference on Communication Technology (ICCT '06), 1-4 [27] Tamilarasan, A. et al. (2006). Feature Ranking and Selection for Intrusion Detection Using Artificial Neural Networks and Statistical Methods. Int’l Joint Conf. on Neural Networks (IJCNN’06), 4754-4761 [28] Fadaeieslam, M. J.et al. (2007). Comparison of two feature selection methods in Intrusion Detection Systems. Seventh International Conference on Computer and Information Technology, 83-86 [29] Sheen, Shina & Rajesh, R. (2008). Network Intrusion Detection using Feature Selection and Decision tree classifier. IEEE Region 10 Conference, TENCON 2008, 1-4. [30] Kiziloren, T. & Germen, E. (2009).Anomaly Detection with Self-Organizing Maps and Effects of Principal Component Analysis on Feature Vectors. Fifth Int’l Conf. on Natural Computation, 509-513 [31] Suebsing, A. & Hiransakolwong, N. (2009). Feature Selection Using Euclidean Distance and Cosine Similarity for Intrusion Detection Model. Asian Conf. on Intelligent Info. and Database Systems, 86-91

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

23

[32] Lee, S. M.et al. (2009). Quantitative Intrusion Intensity Assessment using Important Feature Selection and Proximity Metrics. 15th IEEE Pacific Rim Int’l Symposium on Dependable Computing, 127-134 [33] Lee, Tsern-Huei & He, Jyun-De (2009). Entropy-Based Profiling of Network Traffic for Detection of Security Attack. TENCON, 1-5 [34] Xiao, L. et al. (2009). A Two-step Feature Selection Algorithm Adapting to Intrusion Detection. International Joint Conference on Artificial Intelligence, 618-622 [35] Kok-Chin Khor et al. (2009). From Feature Selection to Building of Bayesian Classifiers: A Network Intrusion Detection Perspective. American Journal of Applied Sciences, 6 (11), 1948-1959 [36] Bahrololum, M. et al. (2009). Machine Learning Techniques for Feature Reduction in Intrusion Detection Systems: A Comparison. Fourth International Conference on Computer Sciences and Convergence Information Technology (ICCIT), 2009, Pp. 1091-1095. [37] Nguyen, H. et al. (2010). Improving Effectiveness of Intrusion Detection by Correlation Feature Selection. 2010 International Conference on Availability, Reliability and Security, 17-24 [38] Chen, T. et al. (2010). A Naive Feature Selection Method and Its Application in Network Intrusion Detection. 2010 International Conference on Computational Intelligence and Security (CIS), 416-420. [39] Fan, W. et al. (2011). Unsupervised Anomaly Intrusion Detection via Localized Bayesian Feature Selection. 2011 11th IEEE International Conference on Data Mining, 1032-1937 [40] Xian, J. et al. (2011). An Algorithm Application in Intrusion Forensics Based on Improved Information Gain. Web Society (SWS), 3rd Symposium on Date of Conference, 100-104 [41] Middlemiss, Melanie J. & Dick, G. (2003). Weighted Feature Extraction using a Genetic Algorithm for Intrusion Detection, IEEE, 1669- 1675 [42] Mukkamala, S. & Sung, A. H. (2003). Feature Selection for Intrusion Detection Using Neural Networks and Support Vector Machines. Journal of the Transportation Research Board of the National Academics, Transportation Research Record No 1822, 33-39 [43] Gao, Hai-Hua et al. (2005). Ant Colony Optimization based network intrusion feature selection and detection. Proc. of the Fourth Int’l Conf. on Machine Learning and Cybernetics, Guangzhou, 387175 [44] Banković, Z. et al. (2007). Increasing Detection Rate of User-to-Root Attacks Using Genetic Algorithms. Int’l Conf. on Emerging Security Information, Systems and Technologies, 48-53 [45] Chen,Y. Et al. (2007). Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM. ICON, 83-88

24

Amrita & P Ahmed

[46] CHi-Ho Tsang et al. (2007). Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recognition, 40, 2373-2391. [47] Wang, W. & Gombault, S. (2008). Efficient Detection of DDoS Attacks with Important Attributes. Third International Conference on Risks and Security of Internet and Systems: CRiSIS’2008, 61-67 [48] Li, Y. et al. (2009). Building lightweight intrusion detection system using wrapper-based feature selection mechanisms. Computers and security, 28(6), 466–75 [49] Zulaiha, A.O. et al.(2010).Improving Signature Detection Classification Model Using Features Selection based on Customized Features.10th Int’l Conf. on Intelligent Systems Design and Applications,1026-31 [50] Gong, S. (2011). Feature Selection Method for Network Intrusion Based on GQPSO Attribute Reduction. International Conference on Multimedia Technology (ICMT), 6365 - 6368 [51] Li, Y. et al. (2012). An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications, 39, 424–430 [52] Sindhu, Siva S. et al. (2012). Decision tree based light weight intrusion detection using a wrapper approach. Expert Systems with Applications, 39, 129–141 [53] Wing, W.Y. NG et al.(2003).Dimensionality

Reduction for Denial of Service Detection

Problems using RBFNN Output Sensitivity.Proc.of 2nd Int’l Conf. on Machine Learning and Cybernetics, Wan, 1293-98 [54] Shazzad, K. M. & Park, J. S. (2005). Optimization of Intrusion Detection through Fast Hybrid Feature Selection. Proc.of 6th Int’l Conf. on Parallel and Distributed Computing, Applications and Technologies [55] Chen, Y. et al. (2007). Building Lightweight Intrusion Detection System Based on Principal Component Analysis and C4.5 Algorithm. ICACT2007, 2109-2112 [56] Lee, S. M. et al. (2007). A Hybrid Approach for Real-Time Network Intrusion Detection Systems. International Conference on Computational Intelligence and Security, 712-715 [57] Wang, W.et al. (2008). Towards fast detecting intrusions: using key attributes of network traffic. The Third International Conference on Internet Monitoring and Protection, 86-91 [58] Hong, D. & Haibo, L. (2009). A Lightweight Network Intrusion Detection Model Based on Feature Selection. 15th IEEE Pacific Rim International Symposium on Dependable Computing, 165-168 [59] Xiang,C. et al. (2009). Robust Observation Selection for Intrusion detection. Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 269-272

A Study of Feature Selection Methods in Intrusion Detection System: A Survey

25

[60] Zaman, S. & Karray, F. (2009). Features Selection for Intrusion Detection Systems Based on Support Vector Machines. 6th IEEE Consumer Communications and Networking Conference (CCNC), 1- 8 [61] Ming-Yang Su (2011). Real-time anomaly detection systems for Denial-of-Service attacks by weighted k-nearest-neighbor classifiers. Expert Systems with Applications, 38, 3492–3498 [62] Bruzzone, L. & Serpico, S. B. (2000). A technique for feature selection in multiclass problems. International Journal of Remote Sensing, 21(3), 549–563 [63] Chiblovskii, B., & Lecerf, L. (2008). Scalable feature selection for multiclass problems. In Proc. of the European conf. on machine learning and knowledge discovery in databases (ECML PKDD’08), 227

Suggest Documents