Efficient Probabilistic Classification Methods for NIDS

2 downloads 0 Views 1MB Size Report
S.M.Aqil Burney. M.Sadiq Ali Khan. Mr.Jawed Naseem. Department of Computer Science. Department of Computer Science. Principal Scientific Officer-PARC.
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

Efficient Probabilistic Classification Methods for NIDS S.M.Aqil Burney

Department of Computer Science University of Karachi, Karachi-Pakistan

M.Sadiq Ali Khan

Department of Computer Science University of Karachi, Karchi-Pakistan

approach for determination of attack probability. Naïve Bayes’ classifiers assume conditional independence while Bayesian network consider assumes conditional dependence. Two methods can be used to compare whether conditional independency or interdependency really contribute to probability of attack. In the next section we discussed some related works which are already proposed, in section 3 we discussed the two methods of classification, in section 4 the methodology is mentioned and finally in section 5 results and discussions are presented.

Abstract: As technology improve, attackers are trying to get access of the network system resources by so many means, open loop holes in the network allow them to penetrate in the network more easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods Naïve Bayes and Junction Tree Algorithm on reduced set of features by improving the performance as compared to full data set. For feature reduction PCA is used that helped in proposing a new method for efficient classification. We proposed a Bayesian network-based model with reduced set of features for Intrusion Detection. Our proposed method generates a less false positive rate that increase the detection efficiency by reducing the workload and that increase the overall performance of an IDS. We also investigated that whether conditional independence really effect on the attacks/ threats detection. Keywords-Network Intrusion Detection Bayesain Networks; Junction Tree Algorithm

Mr.Jawed Naseem

Principal Scientific Officer-PARC

II. BACKGROUND For intrusion most network based systems become the target to the hacker, so building efficient IDS is the main task now a day [4]. Intrusion based systems needs a component that generates an alerts on the basis of rule set, to detect the malicious activity correctly it is necessary to manage the alerts correctly [1]. Data Mining approaches are being applied by researchers for the attacks detection in their Intrusion Detection Systems[2]..Probabilistic approaches for reducing the false alarm rate are proposed for example, see [3]. The enormous amount of network data traffic is accumulated each day. Numbers of data mining approaches are used for collecting knowledge domain for intrusion detection which includes clustering, association rules and classification [12]. Data analysis supports by data mining techniques and now it becomes one of the important features/component in intrusion based system. The main concern of using data mining techniques in attacks detection system to differentiate between normal packet vs abnormal. For applying data mining in intrusion detection we need a data set and a classification model. That classification model may be Bayesian Network, neural network, rule based decision tree based and other soft computing techniques as Support Vector Machines(SVM) [10,11]. Intrusion Detection System is now becomes the necessicity for an organizational security system with its credibility that may depend upon the data mining techniques.

System(NIDS);

I. INTRODUCTION Network Security whether in a commercial organization or in a critically important research network, is a major issue of concern with the increasing use of web even the personal information in under threat. Efficient network intrusion detection system is only solution to such threats [4]. IDS is a monitoring system of networks to control / avoid / secure the networks from cyber terrorist or it is the process of examing the events occurring in a network or computer system and detecting the signs of incidents which are the threats of computer security policies. Network system monitored by the IDS for detection of any rules violation. Having such violation in the system, efficient IDS generates notification by means of an alarm generation that alert the administrator to put some steps/major according to such vulnerabilities. Common intrusion attacks are classified based on various features/ parameter. KDD-99 data set usually used for investigating the nature of attack. The data set has 41 features listed. Information value of these features and interdependence among them is an interest of investigation. How much reduction in features can be made without reducing the efficiency of classification algorithm and whether interdependency really contributes to detection efficiency? We are tried to find the answers of such kind of questions in this paper. PCA is an effective data dimension reduction technique. Similarly Naïve Bayes’ classifier and Bayesian Network both use probabilistic

2.1 Clustering The process of labeling data and arranging it in groups is called clustering. By grouping we basically improve the performance of different classifiers used. The genuine cluster contains data corresponding to single category [5]. The data set belongs to the cluster is modeled with respect to them exciting

168

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

features. You may define the term clustering in such a way that it refers as unsupervised machine learning mechanism for

Bayes classifier is compared with Junction Tree algorithm. For modeling Naive Bayes classifier several distribution including normal gamma or Poisson density function can be employed.

2.2 Classification

3.2

In classification we break the data sets into different classes and it is much less exploratory than clustering. By means of classification we need to classify data into set of classes normal /not normal and to sub classify into different types. Naïve Bayes’ used as a classification algorithm in this research by which data classification for intrusion detection be achieved. Due to the collection of huge amount of data traffic needed classification is less famous [6].

Its a graphical method of belief updation or probabilistic reasoning. For Probabilistic reasoning, we are using Bayesian Networks and Decision Graphs (BNDG) for which details can be found in [9]. The basic concept in junction tree is clustering of predicted attributes [8]. In belief updation instead of approximating joint probability distribution of all targeted variable (cliques) cluster attributes are formed and potential of clusters are used to approximate probability. So basically junction tree is the graphical representation of potential cluster nodes or cliques and a suitable algorithm to update this potential. Junction tree algorithm involve several steps as moralizing the graph, triangulation junction tree formulation, assigning probabilities to cliques, message passing and reading cliques marginal potentials from junction tree.

patterns matching in unlabeled data with numerous aspects.

III. 3.1

CLASSIFICATION METHODS

Naïve Bayes Classifier

Naïve Bayes classifier is an effective technique for classification of data. The technique is particularly useful for large data dimension. The Naïve Bayes is a special case of Bayes theoram which presuppose independence in data attributes [7]. Even though Naïve Bayes assumes data independence, its performance is efficient and at par with other techniques assuming data conditionality. Naïve Bayes classifier can manage continuous or categorical data. Let for a set of given variable X={x1,x2,.....xn } with possible outcomes O={o1,o2,…..on}. The posterior probability of the dependent variable is obtained by Bayes rule.

Junction Tree Algorithm

Using Junction tree algorithm requires that directed graph is changed to undirected graph to ensure uniform application process is called moralization which involve adding edges between parents and dropping the direction let = ( be a directed graph to be changed into undirected graph G (NG,EG) so infect two new sets along with EG required to be added i.e. The set can be defined as

and

P(Oj | x1,x2,.....xn) * P(x1,x2,.....xn)Oj P(Oj) We can obtain a new case with X with a class label Oj have highest posterior probability as

In moralization undirected moralized graph is given as

is obtained and new

d Junction tree is formed after moralization which is basically hyper graphs of cliques if cliques of undirected graph G is given by C(G) than junction tree with a unique property that intersections of any two nodes is contained in every node in the unique path joining the nodes.

The efficiency of Naive Bayes classifier lies in the fact that it converts multi dimensionality of data to one dimensional density estimation. The occupations of evidence do not affect the posterior probability so generally classification task is efficient. The same is proved in this study also when Naive

Let consider a cluster representation having to neighbor cluster U and V sharing a variable S in common

169

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

U

random sampling. For Naive Bayes classification two data sets (stratified sample of equal size of 10000) were used for learning and testing using software BN classifier. In junction tree algorithm structure learning is carried out by drawing a random sample of 5000 from KDD data sets using netica. Then five data sets each of size 1000 are selected through simple random sample, data set is used for learning and drawing junction tree. Data set 2 to 5 were used for testing belief update learned by junction tree.

V

S

The aim of JTA is to modify potential in such a way that the distribution of P (V) is obtained by modified potential Ψ(V). In such case probability of S can be given as P(S)= ∑ Ψ(V)

Ψ(S)

Ψ(U)

Ψ(V)

V.

RESULTS & DISCUSSION

The 41 features of KDD’99 data set were reduced to 14 features. The PCA identified 12 major components having Eigen values greater than and around more than 80% variability of data explained by these features while 98% variability can be explained 24 components.

Similarly P(S) = ∑ Ψ(U) Let Ψ(S) represent modified potential so Ψ(S) = P(S), so now if potential of let say Ψ(V) is delayed as result of new evidence f the potential of both Ψ(S) & Ψ(U) can be updated realizing the equivalence

The difference of variability between 24 and 14 features selection is only 18% but computational cost highly increased if 24 parameters are selected, so optimize the processing speed 14 has been selected. It is evident from the graph mentioned above that first 24 components represent 98.866% data and 14 components explained 80% variability which is quite sufficient, and work was carried out on these components only, neglecting the other components which seem less worthy. Besides this, structure learning also support selection of 14 features. The Bayesian network model shown in Figure 2 represents interdependence among various attributes. It is evident that mainly two factors as count & src_byte are effected by various features and in turn these two ultimately affect the attack types. The KDD’99 data set classification list 18 attack types however normal & neptune are more frequent.

Ψ(U) = P(S) = Ψ(V) Belief updation in junction tree is carried out through message passing let U and V are two adjacent node with separator S. so the task is to absorb V and W through S. potential Ψ(W) and Ψ(S) with condition ∑ Ψ*(W) = Ψ*(S) = ∑ Ψ*(V) In absorption Ψ*(S) and Ψ*(W) are replaced as under Ψ*(S) = ∑ Ψ(V) Ψ(S) Ψ*(W) = Ψ (W) Ψ(S) In this way belief of the whole network is updated through message passing. IV.

METHODOLOGY

KDD’99 data set of intrusion detection was used. PCA technique was used and 14 features were selected on the basis of analysis. Selection of data set for training and testing plays a vital role in accuracy of prediction. In intrusion detection frequency of some attacks are very large as compare to others. To ensure inclusion of all attacks type in learning stratified random sample were drawn relative to proportion of each attack type. This produces better result as compare to simple

Figure 1: Scree Plot of attributes.

170

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 Prediction Accuracy of Major Attack Category 3500 3000 Numbers

2500 2000 1500 1000 500 0

DoS

Probe

R2l

Actual

2872

726

7

U2R 3

Predicted

2868

538

2

1

Attack Category Actual

Figure 4: Prediction Accuracy of Major Attacks

Figure:2 Bayesian Network Model Intrusion Detection System

BN classifier learned more effectively the attack which is more frequent. In case of identify normal attacks it showed error rate of 0.8% only and identification of most frequent attack neptune is 6.8% refers in table 1.

BN classification also supports the importance of these two type normal (0.527) and neptune (0.399) in Table 1. The probability of features buffer overflow, imap and multihop are less than 0.001% and that of ftp_write, guess_password and load_module are close to 0. It suggests that this classification can be merged.

TABLE 1

4271 4287

Numbers

4200 4000 3729 3713

3800 3600 3400 normal

Attack Actual

ACCURACY OF C LASSIFICATION(B AYESIAN C LASSIFIER)

Class back buffer_overflow guess_passwd imap ipsweep multihop neptune nmap normal phf pod portsweep rootkit satan smurf teardrop warezclient warezmaster Total

Prediction Accuracy 4400

Predicted

Predicted

Figure3: Prediction accuracy using BN Classifier

Figure 4 shows majors attacks category predictions. DoS attacks are 99.86% detected while probe attacks about 75% detected.

Actual 62 2 3 2 225 1 2630 96 4271 1 12 186 1 219 168 60 57 4 8000

Predicted 62 0 0 0 284 0 2587 35 4287 0 0 219 0 273 180 39 34 0 8000

Diff 0 2 3 2 -59 1 43 61 -16 1 12 -33 1 -54 -12 21 23 4

Error % 0 100 100 100 -26.2 100 1.6 63.5 -0.37 100 100 -17.7 100 -24.6 -7.1 35 40.35 100

TABLE 2. PROBABILITY OF ATTACK(AVERAGE) Class

171

back buffer_overflow imap ipsweep multihop neptune nmap normal

Junction Tree 0.0102 0.0008 0.0006 0.0368 0.0002 0.3992 0.0176 0.527

Naïve Bayes Classifier 0.0086 0.001 0.0005 0.0368 0 0.3936 0.0147 0.5432

Total

1

1

Diff 0.0016 -0.0002 0.0001 0 0.0002 0.0056 0.0029 -0.0162

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

[6] Tasleem Mustafa, Ahmed Mateen, Ahsan Raza Sattar, Nauman ul Haq and M. Yahya Saeed,“Forensic Data Security for Intrusions”, European Journal of Scientific Research ISSN 1450-216X Vol.39 No.2 (2010), pp.296308,2010. [7] Karl Friston, Carlton Chu, Jnaina Mourao,Oliver Hulme, Geriant Rees, Will Penny and John Ashburner, “Bayesian decoding of brain images”, Elsevier NeuroImage Volume 39, Issue 1, 1, Pages 181-205, January 2008. [8] Jaydip Sen, “An agent-based intrusion detection system for local area networks”,IJCNIS, Vol. 2, No. 2, August 2010. [9] F.V.Jensen and T.S.nielsen, “ Bayesian Networks and Decision Graphs” Springer.Berlin Heidelberg, New York,2007. [10] C.Cortes and V. Vapnik,“ Support Vector Networks”. Machine Learning, 20, 1995, pp. 273-297,1995. [11] Jungtaek Seo,“ An Attack Classification Mechanism Based on Multiple Support Vector Machines”, LNCS 4706, Part II, pp. 94–103, Springer-Verlag Berlin Heidelberg, ICCSA 2007. [12] Hebah H. O. Nasereddin, “Stream Data Mining”, International Journal of Web Applications, Volume 1 Number 4 December 2009.

Using junction tree algorithm accuracy of identification is utmost 98%. Junction tree also identified neptune as most frequent attack. Probability identified of various attacks is depicted in table 2. It is evident that estimation of probability almost equal. This has been statistically compared that there is no significance difference between two methods. Frequencies of remaining attacks are very small and their probability almost near to zero. Probability of Attack 0.6 Probability

0.5 0.4 0.3 0.2

AUTHORS PROFILE

0.1 0 P L P P E P E D D E K C OW IT W IMA EE AN UL IHO UN MA MA L D T BA FL WR SS N OR W O L T EP A R S _ U N M N IP VE TP S_P M AD F S _O O E L R U E G FF BU

Dr.S.M.Aqil Burney is the Meritorious Professor and approved Supervisor in Computer Science and Statistics by the Higher Education Commission, Govt of Pakistan. He is also the Director & Chairman of Computer Science Department, University of Karachi. Additionally he is also a Director of Main Communication Network University of Karachi. He is also member of various higher academic boards of different universities of Pakistan. His research interest includes AI, Soft Computing, Neural Network, Fuzzy Logic, Data Mining, Statistics, Simulation and Stochastic Modeling of Mobile Communication system and Networks, Network Security and MIS in health services. Dr.Burney is also referee of various journals and conferences proceedings, nationally & internationally. He is member of IEEE(USA), ACM(USA) and

Attack Type Avg JT

VI.

Naïve Bayes

CONCLUSION & FUTURE RECOMMENDATIONS

Despite the fact that Naïve Bayes classifiers assume conditional independence and junction tree algorithm parameter interdependence, even though Naïve Bayes and junction tree classifiers are almost equally effective. It is recommended that only those attacks should be considered which are more frequents in order to achieve better performance. It is also found that in selection of learning and testing data set appropriate sampling techniques are utilized for better result prediction.

M.Sadiq Ali Khan received his BS & MS Degree in Computer Engineering from SSUET in 1998 and 2003 respectively. Since 2003 he is serving Computer Science Department, University of Karachi as an Assistant Professor. He has about 12 years of teaching experience and his research areas includes Data Communication & Networks, Network Security, Cryptography issues and Security in Wireless Networks. He is member of CSI, PEC and NSP.

REFERENCES [1] Moon Sun Shin, Eun Hee Kim, and Keun Ho Ryu, “ False Alarm classification model for network-based IDS”; Springer-verlag berlin Heidelberg, LNCS 3177, pp. 259–265, 2004. [2] M.J.Lee,M.S.Shin,H.S.Moon,” Design and implementation of alert analyzer with data mining engine. Proc. IDEAL ’03, Hongkong, 2003. [3] A.Valdes and K. Skinner, “Probabilistic alert correlation”; 4th international symposium on Recent Advances in ID, RAID, 54-68, 2003. [4] S.M.Aqil Burney and M.Sadiq Ali Khan , “Network Usage Security Policies for Academic Institutions”, International Journal of Computer Applications, October Issue, Published By Foundation of Computer Science,2010. [5] Anoop Singhal and Sushil Jajodia, “Data warehousing and data mining techniques for intrusion detection systems”, Distributed and Parallel Databases Volume 20, Number 2, 149-166, DOI: 10.1007/s10619-006-94965,2006.

Jawed Naseem is Principal Scientific Officer in Pakistan Agricultural Research Council. He has M.Sc(Statistics) and MCS from University of Karachi, currently doing MS (Computer Science) from University of Karachi. His research interest are data modeling, Information Management & Security and Decision Support System particularly in agricultural research. He has been a team member in development of several regional(SAARC) level agricultural databases.

172

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Suggest Documents