Intrusion Detection Model Using fusion of PCA and optimized SVM I.
Sumaiya Thaseen
Ch.Aswani Kumar
School of Computing Science and Engineering VIT University, Chennai, Tamil Nadu
[email protected]
Abstract— Intrusion detection systems (IDS) play a major role in detecting the attacks that occur in the computer or networks. Anomaly intrusion detection models detect new attacks by observing the deviation from profile. However there are many problems in the traditional IDS such as high false alarm rate, low detection capability against new network attacks and insufficient analysis capacity. The use of machine learning for intrusion models automatically increases the performance with an improved experience. This paper proposes a novel method of integrating principal component analysis (PCA) and support vector machine (SVM) by optimizing the kernel parameters using automatic parameter selection technique. This technique reduces the training and testing time to identify intrusions thereby improving the accuracy. The proposed method was tested on KDD data set. The datasets were carefully divided into training and testing considering the minority attacks such as U2R and R2L to be present in the testing set to identify the occurrence of unknown attack. The results indicate that the proposed method is successful in identifying intrusions. The experimental results show that the classification accuracy of the proposed method outperforms other classification techniques using SVM as the classifier and other dimensionality reduction or feature selection techniques. Minimum resources are consumed as the classifier input requires reduced feature set and thereby minimizing training and testing overhead time. Keywords—Cross Validation; Dimensionality Reduction; Intrusion Detection System; Principal Component Analysis; Radial Basis Function Kernel; Support Vector Machine;
I.
INTRODUCTION
Much attention has been paid to the development of intrusion detection systems as the end user requires safe use of network services. IDSs are developed to build defensive classification models to classify normal behavior from abnormal behavior in a network. The two assumptions in the area of intrusion detection are 1) user and program activities are monitored by computer systems and 2) normal and intrusion activities will have different behavior. Many of the existing IDS are not able to deal with the new types of attack or environment and hence the IDS has to be continuously updated which is very time consuming for security analysts. Data mining techniques aid the network
c 978-1-4799-6629-5/14/$31.00 2014 IEEE
School of Information Technology and Engineering VIT University Vellore, Tamil Nadu
[email protected]
administrators to automatically learn the behavior by analyzing the data. The advantage of using a data mining technique is that new attacks can be detected automatically and IDS can be built for wide computing environments. Data mining has gained lot of scope in the IT industry as well. The major reasons are 1) able to analyze the information from heavy volumes of noisy data 2) identifies the data of user interest and predicts the results which can be used for future purpose. Artificial intelligence algorithms provide an automatic mechanism to improve the performance of IDS. Many artificial intelligence algorithms such as artificial neural networks (ANN) [3], fuzzy logic [5], k-nearest neighbor [6], decision tree [1], genetic algorithm [2], principle component analysis [4] and support vector machine(SVM)[8] have improved the performance of intrusion detection models. Each method is based on a separate view and can achieve versatile performances for anomaly intrusion detection models, but it is hard to estimate that one method performs better than another. SVM is an efficient technique among the techniques specified as the distribution of various attacks is not balanced considering the fact that the number of samples for low frequent attacks is very small compared to high frequency attacks.SVM is very popular due to the generalization capability resulting from a small sample learning data. The rest of the paper is organized as follows. The review of various machine learning techniques used for intrusion detection and the importance of SVM technique for classification along with other hybrid techniques in combination with SVM is introduced in section 2. The background of various techniques used in the model is detailed in section 3.Th e proposed methodology is discussed in section 4.The experiments and results of the model are reported in section 5. Section 6 contains the conclusion. II.
RELATED WORK
In this section we analyze the literature of traditional intrusion detection models, intrusion models using machine learning techniques, intrusion models using SVM classifiers and integrated intrusion models using SVM and dimensionality reduction techniques. A number of payload anomaly based IDS have been proposed to monitor the payload of a packet for identifying the
879
anomalies. NETAD[33] monitors the first 48 bytes of an IP packet. A number of models are developed corresponding to the network protocols. Anomaly score is computed to detect rare behavior. Payl[35] determines the frequency of the ngrams in the payload and develops one model of normal traffic depending on the packet length. A new version of PAYL [36] developed adds new functionalities to the original version. In this model, inbound and outbound traffic correlation is performed to identify the worm propagation. Wang et al[34] proposed a new payload based anomaly detection system called ANAGRAM wherein we use multiple one-class SVM classifiers [32] to construct anomaly detector. The curse of dimensionality has been a major issue for datasets with large number of features. Hence many feature selection methods have been proposed by researchers in the recent years. Liu et al [27] performed feature selection using PCA and only 22 features were considered for classification from the entire 38 feature set. Aswani et al[28] analyzed different supervised dimensionality reduction techniques such as LDA(Linear Discriminant Analysis), Maximum Margin Criterion(MMC) and orthogonal centroid (OC).The author also compared the performance of KPCA,PCA and ICA for feature extraction with respect to SVM. Sumaiya et al[18] analyzed different supervised tree based classifiers along with feature selection for building an intrusion detection model. Feature selection provided an optimized record set for identifying whether packet is of normal or anomaly type. Hybrid models are often used to improve prediction accuracy by deploying several supervised learning methods. Bhavesh et al [29] constructed a hybrid model by integrating Latent Dirichlet Allocation(LDA) and genetic algorithm(GA). LDA is used to identify the optimal set of attributes for classification and GA computes the initial score of data items and performs breeding, evaluation of fitness and finally filtering to produce a new generation. Zeon et al[38] integrated Self Organizing Map with consistency based feature selection for identifying the attacks in the network. Kuang et al[12] deployed KPCA (Kernel Principal Component Analysis) and multi layer SVM classifier for building an intrusion detection model. This model resulted in better performance and reduced training time.GA performs optimization of kernel parameters and the punishment factor C. Experimental results show that the model has greater accuracy and better generalization. Shih et al[15] developed an intelligent intrusion detection combining SVM, Decision Tree(DT) and Simulated Annealing(SA). SVM and SA can identify the best features to improve the accuracy. The parameters of SVM and DT are adjusted automatically by SA. Hence from the literature it is very clear that optimization of SVM parameters along with dimensionality reduction techniques results in good accuracy by improving the classification rate and shorter detection time. Therefore in this paper we propose a optimized model to reduce the dimensionality and improve the classification rate by combining PCA and SVM with automatic parameter selection for optimizing the kernel parameters.
880
III.
BACKGROUND
In this section we briefly review the data mining techniques that are employed in our proposed model. A. Scaling Network traffic is very huge and contains a large number of features with different range of values. Processing the data directly is time consuming and classification may not be accurate. Hence data packets undergo a normalization process before dimensionality reduction. Many methods are available for normalization. The commonly used are z-score, min-max normalization and decimal scaling. The min-max normalization technique is chosen for the proposed model as it is the simplest normalization technique where the limits (maximum and minimum) of the scores produced by a match are known. Min-Max normalization maps a value d of P to d1 in the range [new_min(p), new_max(p)] where d is the original value of the attribute P and d1 is the new value after normalization.Min-Max normalization is calculated by the formula given below:
d1 =
[d − min( p)*[new_max( p) − new_min( p)] + new_min( p) [max( p) − min( p)] -----(1)
Where min(p)= minimum value of attribute. max(p) = maximum value of attribute. Min-Max normalization maps a value d of P to d1 in the range [0,1], so new_min(p) = 0 and new_max(p)=1 in the above equation (1) ,Hence the simplified formula is as follows:
d1 =
d − min( p ) max( p ) − min( p )
------------ (2)
Min-Max normalization preserves the relationship among the original data values. B. Principal Component Analysis (PCA) There are large number of attributes in network traffic which may be irrelevant for identifying the attack type. Hence PCA is chosen to overcome the curse of dimensionality. An optimal attribute subset of network traffic improves detection rate and thereby increasing accuracy.PCA transforms a cluster of possibly correlated variables into a cluster of uncorrelated variables called as the principal components. Principal component analysis is a variable reduction technique that results in a smaller number of components that report for most of the variance in a set of observed variables [41] [42]. Reliable results can be obtained if the minimum number of subjects providing usable data for analysis is higher than 100 or five times the number of variables being analyzed.PCA makes no assumptions with respect to the causal structure that is more appropriate for co variation in the data.
2014 International Conference on Contemporary Computing and Informatics (IC3I)
C. Support Vector Machine Classification Model Support Vector Machines (SVM) also known as Support Vector Networks are supervised learning models that analyze and recognize patterns by classification and regression analysis. SVM is widely used for multi class classification [21]. It constructs a mechanism to predict which category a new data falls in to by computing the hyper plane of the given training sample set [39]. Given some training data D, a set of n points of the form D= {(xj , yj) |
xj Rp
, yj {-1, 1}}j=1…n
Dataset
Scaling of the data Stage 1
Training set
Testing set
-------- (3)
where the yj is either -1 or 1 specifying the class to which the point xj belongs. Each xj is a q-dimensional vector.We have to determine the maximum-margin hyperplane that seperates the points containing yj =1 and yj = -1.The hyperplane can be written as
Selected feature subset using PCA
Stage 2 yj ( w.xj – b) 1 for all 1 j n
-------------- (4)
D. Binary Tree SVM model Multiple types of intrusion detection data is present in the network and hence multi-SVM classifiers are applied to such models [19]. Popular methods in SVM classification algorithms are ‘One-against-one’, ’One-against-all’ and ‘Binary tree’. ‘Binary tree’ method requires only m-1 twoclass SVM classifiers for a case of ‘m’ classes, ‘One-againstall’ method needs ‘m’ two-class classifiers of SVM wherein each class is trained with all the samples and finally ‘Oneagainst-One’ SVM classification needs m(m-1)/2 two-class classifiers of SVM wherein each sample is trained with data of the two classes[20].The rate of training and recognition gets accelerated if there are less two-class classifiers. Hence ‘Binary tree’ SVM technique is chosen for constructing the intrusion detection model. SVM classifiers are constructed to identify the five different states: normal state (NS) and abnormal states (DoS, R2L, U2R and probe). The training sample is a combination of the normal and majority attacks such as Dos and probe. The test sample is a combination of normal and minority attacks not present in the training sample to identify unknown attacks. Normal state is labeled as 1 and intrusion state is labeled as -1 SVM classifiers are built to identify the five different states: normal state (NS) and abnormal states (DoS, R2L, U2R and probe). The training sample is a combination of the normal and majority attacks such as Dos and probe. The test sample is a combination of normal and minority attacks not present in the training sample to identify unknown attacks. Normal state is labeled as 1 and intrusion state is labeled as -1.
Training set with selected feature subset
Testing set with selected feature subset
Training SVM classifier
Trained SVM Classifier
No Automatic parameter selection by calculating the mean of samples belonging to same and different class
Average accuracy achieved?
Yes Optimized ( C, o ) and feature subset.
Classify using Binary Tree SVM Fig 1:Proposed optimized SVM model of intrusion detection
2014 International Conference on Contemporary Computing and Informatics (IC3I)
881
IV. PROPOSED WORK In this section we propose a novel hybrid model for intrusion identification. D. Proposed Methodology The proposed model integrates PCA with SVM after automatic parameter selection of kernel parameters. Figure 1 shows the block diagram of the proposed system. Normalization is performed as the initial preprocessing step followed by dimensionality reduction using principal component analysis. The proposed approach employs two stages: In the first stage, PCA finds an optimal subset of all attributes and removes irrelevant and redundant attributes. Variance threshold is usually set to a higher value so that the various principal components cumulative variance falls above the threshold and can be selected to form the feature vectors. The second stage uses the optimal subset obtained from PCA as training and test data set for SVM to perform classification. RBF kernel is adopted in this model and the optimal parameters of SVM are obtained using grid search with automatic parameter selection as depicted in figure 2. In the next section we discuss the methodology for obtaining the optimal subset from PCA and optimization performed to obtain the optimal SVM kernel parameters for classification. . Algorithm for obtaining the optimal feature subset using PCA Input(Training Set, Test Set) Output(Optimal Training Set, Optimal Test Set) Step 1: Calculate size of training and test data Step 2: Scale the training and test date. Step 3 : Subtract off the mean for each row n
¦x
k
m=
k =1
n
where
x denotes the
individual
elements and ‘n’ denotes the no. of samples. Step 4 : Calculate the covariance matrix
XIXI C= n
T
where X is the matrix after subtracting the mean and XT is the transpose matrix and n is the number of elements. Step 5: Calculate the eigen vectors and eigen values of the covariance matrix. v = Ȝ v Step 6: Form a feature vector = (eig1, eig2 … eigp) where eig1 is principal component and p0 So o should be calculated such that b(o) closes to 0.It is easy to find that 0 < w(o) 1 and 0< b(o) 1. Then the optimal parameters are deployed in training the model. This parameter optimization avoids misclassification of training sample. The cross validation accuracy before optimization is 85% with an initial value of C and being 1 and after many iterations of varying C and keeping constant and vice versa the best accuracy rate of 99% is achieved when C and is 4.This result is achieved nearly after 600 iterations of modifying the kernel parameter values. IV.
IMPLEMENTATION AND RESULTS
The experiments were conducted on MATLAB R2012A integrated with libSVM package which supports support vector classification (C-SVC, mu-SVC),regression (epsilon SVR, nu-SVR) and distribution estimation(one-class SVM) [45].It also supports multi class classification. Experiments were performed on KDD CUP99[17] dataset The data sets contain five categories of network traffic namely normal, denial of service (DoS), unauthorized access to local
2014 International Conference on Contemporary Computing and Informatics (IC3I)
supervisor privileges (User to Root,U2R) and probe. There are 41 attributes in each network record of which 34 attributes are continuous and 7 attributes are discrete. Normalization is performed before the start of the experiments by counting the frequency of the values and modifying into numerical attributes and thereby transforming all attributes into the normalized format. A description of KDD cup data set and its attacks can be obtained from [19]. Sampling is performed on the original dataset to obtain a 10% subset of the data which is around 20,000 records. Data sets are divided in to two separate sets D1 and D2 wherein D1 contains an equal number of training (Normal and Dos data) and testing (Normal and Probe) data. The testing data usually contains new data to detect unknown attacks. Similarly D2 contains Normal and Dos attack data in training set and Normal and R2L attack in the test set. Figure 2 shows the variance estimation of the ten principal components retained in the data set. The components with higher variance will be retained in the parameter selection results after dimensionality reduction using PCA.
Fig 2: Variance estimation of the principal components. A. Performance Analysis The performance of our model is measured using the following metrics.These values are true positives (TP), true negatives (TN) ,false positives (FP) and false negatives (FN) where TP specifies the normal behavior that is correctly predicted, FP denotes the normal behavior wrongly assumed as abnormal and TN indicates the abnormal performance that is misdetected as correct. i) ii)
TN Detection rate (DR) = -------(7) TN + FP FN ---False Alarm rate (FAR) = TN + FP (8)
iii)
Correlation Coefficient:(cc)=
TP * TN − FP * FN (TP + FP)(TP + FN )(TN + FP)(TN + FN ) ----(9) These parameters play a crucial role in evaluating the performance of the intrusion detection model. Another parameter cc denotes the correlation between forecast result and the actual result ranging from -1 to 1 where 1 specifies the forecast result is steady with the actual calculation and -1 is on behalf of a random calculation. Table 1 and 2 shows the performance metrics of the proposed model. VI. CONCLUSION This paper proposes an intrusion detection model based on principal component analysis (PCA) and support vector machines (SVM) using RBF kernel. Dimensionality reduction using PCA improves accuracy. SVMs construct classification models based on training data obtained from PCA. Optimization of SVM parameters c and for RBF kernel by proposed automatic parameter selection along with cross validation reduces the training and testing time and produces better accuracy. Constructing equal amounts of data samples for training and testing produces better accuracy for minority attacks such as U2R and R2L. The experimental results show that the classification accuracy of the proposed method outperforms other classification techniques using SVM as the classifier and other dimensionality reduction or feature selection techniques. Minimum resources are consumed as the classifier input requires reduced feature set and thereby minimizing training and testing overhead time. Our future work is to model an intrusion detection system for multi class data by obtaining an optimal subset of features. Table 1: Performance Metrics of the proposed model Datasets
C
DR
FAR
Training time(Secs)
D1
0.9955
0.9940
0.0015
276
D2
0.9940
0.9985
0.0030
237
Table 2: Performance Metrics of the proposed model
2014 International Conference on Contemporary Computing and Informatics (IC3I)
883
Datasets
Precision
F-score
Overall-
Recall
acc
[1]
[2]
[3]
[4]
[5]
[6] [7] [8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
884
D1
0.9985
0.9977
0.9978
0.9970
D2
0.9950
0.9953
0.9970
0.9939
REFERENCES J.H. Lee, J.H. Lee, S.G. Sohn, et al. Effective value of decision tree with KDD 99 intrusion detection datasets for intrusion detection system. 10th International Conference on Advanced Communication Technology (ICACT’08), 2008, 1170–1175. K. Sha, H.A. Abbass. An adaptive genetic based signature learning system intrusion detection.Expert Syst. Appl. (2009), 36 (10) ,12036– 12043. G. Wang, J.X. Hao, J. Ma, L.H. Huang.A new approach to intrusion detection using articial neural networks and fuzzy clustering.Expert Syst. Appl. (2010) ,37: 6225–6232. W. Wang, R. Battiti. Identifying intrusions in computer networks with principal Component analysis. In: Proceedings of the First International Conference on Availability Reliability and Security (ARES’06), 2006, 270-279. .Chimphlee,A.H.Abdullah,M.N.M.Sap,S.Srinoy,S.Chimphlee,Anomaly based intrusion detection using fuzzy rough clustering, in:Paper Presented at the International Conference on Hybrid Information Technology (ICHIT’06),2006,pp.329-334. C.F.Tsai,C.Y.Lin. A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognition. (2010),43: 222–229. P. Yang, Q.S. Zhu. Finding key attribute subset in dataset for outlier detection. Knowl. Syst. 2011,24(2):269–274. L.Khan, M.Awad, B.Thuraisingham. A new intrusion detection system using support vector machines and hierarchial clustering, Journal on very large databases, 2007, 16(4):507-521. Sandhya Peddabachigiri, Ajith Abraham, Crina Grosan, Johnson Thomas. Modeling Intrusion detection system using hybrid intelligent systems, Journal of Network and Computer Applications,2007,30:114132. Han Suang Kim, Sung-Deok Cha, Empirical evaluation of SVM based masquerade detection using UNIX commands, Computers and Security, 2005,24(2),160-168. Srilatha Chebrolu, Ajith Abraham, Johnson P Thomas. Feature deduction and ensemble design of intrusion detection systems, Computers and Security, 2005, 24(4):295-307. Fangjun Kuang, Weihong Xu,Siyang Zhang.A novel hybrid KPCA and SVM with GA model for intrusion detection.Applied Soft Computing,2014,178-184. Eskin, “Anomaly detection over noisy data using learned probability distributions.In: Proceedings of the International Conference on Machine Learning,2000,255-262. Yinhui Li, Jingbo Xia, Silan Zhang, Jiakai Yan, Xiaochuan Ai, Kuobin Da.An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications,2012, 39: 424-430. Shih-Wei Lin,Kuo-Ching Ying,Chou-Yuan Lee,Zne-Jung Lee. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection.Applied Soft Computing, 2012, 3285-3290. Fatima Ardjani, Kaddour Sadouni. Optimization of SVM multi class by particle swarm. International Journal Modern Education Computer Science,2010,32-38. http://kdd.ics.uci.edu/databases/kddcup99/task.html.
[18] Sumaiya Thaseen and Ch.Aswani Kumar. An Analysis of supervised tree based classifiers for intrusion detection systems. 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering(PRIME), Salem, 2013,294-299. [19] D.Srivastava,L.Bhambu.Data Classification using support vector machine.Journal of Theoretical Applied Information Technolog, Vol.12(1), pp.1-7,2010. [20] C.W.Hsu,C.C.Chang,C.J.Lin. A practical guide to support vector classification [EB/OL],2010. [21] Ambwani, T. Multi Class support vector machine implementation to intrusion detection. In paper presented at the international joint conference on neural networks, July 2003. [22] Iftikhar Ahmad, Muhammad Hussain,Abdullah Alghamadi, Abdulhameed Alelaiwi. Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components.Neural Computing Applications, February 2013, 1370-1376. [23] Kim D, Nguyen H,Syng-Yup O,Jong SP.Fusions of GA and SVM for anomaly detection in intrusion detection system. Advances in neural networks, Lecture Notes in Computer Science, 2005, 3498:415-420. [24] Tong X, Wang Z,Haining Y.A research using hybrid RBF/Elman neural networks for intrusion detection system secure model. Comput Phys Communication,2009, 180:1795-1801. [25] Jerome Friedman,Trevor Hastie,Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of statistical software,2010,33(1):1-22. [26] Cao LJ, Chua KS, Chong WK,Lee HP,Gu QM. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing, 2003, 55:321-326. [27] Liu G, Yi Z, Yang S.A hierarchical intrusion detection model based on the PCA and neural networks. Neurocomputing, 2007,70:1561-1568. [28] Ch. Aswani Kumar.Analysis of Unsupervised Dimensionality Reduction Techniques. Computer Science and Information Systems,2009, 6(2): 217- 227. [29] Bhavesh Kasliwal ,Shraey Bhatia,Shubham Saini,I. Sumaiya Thaseen and Ch.Aswani Kumar.A hybrid anomaly detection model using GLDA.2014 IEEE International Advance Computing Conference”,pp.288-293. [30] S.Peddabachigaria,A.Abrahamb,C.Grosanc,J.Thomas.Modeling intrusion detection system using hybrid intelligent systems, Journal of Network and Computer Applications pp.114-132,2007. [31] E.F.Codd, "Further Normalization of the Database Relational Model".In Database System, R.Rustin ed., Prentice-Hall, pp.33-64,1972. [32] J. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution, Neural Computation 13 (2001) 1443–1471.. [33] Mahoney MV, Chan PK.PHAD: packet header anomaly detection for identifying hostile network traffic. Technical report CS-2001-4 2004,Florida Institute of Technology: 2001. [34] Mahoney MV. Detecting novel attacks by identifying anomalous network packet headers. Technical report CS-2001-4 2004.Florida Institute of Technology:2001.[35] K. Wang, S. Stolfo, Anomalous payload-based network intrusion detection, in:Recent Advances in Intrusion Detection (RAID),2004. [35] K. Wang, S. Stolfo, Anomalous payload-based worm detection and signature generation, in: Recent Advances in Intrusion Detection (RAID), 2005. [36] K. Wang, S. Stolfo, Anagram: a content anomaly detector resistant to mimicry attack, in: Recent Advances in Intrusion Detection (RAID), 2006. [37]http://www.mathworks.in/help/stats/svmtrain.html [38] Zeon Trevor Fernando,I. Sumaiya Thaseen,Ch.Aswani Kumar,Network Attacks Identification Using Self Organizing Maps,2014 First International Conference on Networks and Soft Computing,19-20 August 2014.
2014 International Conference on Contemporary Computing and Informatics (IC3I)