Semi-Naïve Bayesian Method for Network Intrusion Detection System

3 downloads 1718 Views 195KB Size Report
1 Department of ECE, Gandhi Institute of Engineering and Technology, ... 2 Department of Computer Science, Berhampur University-760007, Orissa, India .... Here we, explore the effectiveness of the simple semi-Naïve Bayesian ranking method ..... Proc. of the 21st Intl. Florida Artificial Intelligence society conference.
Semi-Naïve Bayesian Method for Network Intrusion Detection System Mrutyunjaya Panda1 and Manas Ranjan Patra2 1

Department of ECE, Gandhi Institute of Engineering and Technology, Gunupur, Orissa-765022, India [email protected] 2 Department of Computer Science, Berhampur University-760007, Orissa, India [email protected]

Abstract. Intrusion detection can be considered as a classification task that attempts to classify a request to access network services as safe or malicious. Data mining techniques are being used to extract valuable information that can help in detecting intrusions. In this paper, we evaluate the performance of rule based classifiers like: JRip, RIDOR, NNge and Decision Table (DT) with Naïve Bayes (NB) along with their ensemble approach. We also propose to use the Semi-Naïve Bayesian approach (DTNB) that combines Naïve Bayes with the induction of Decision Tables in order to enhance the performance of an intrusion detection system. Experimental results show that the proposed approach is faster, reliable, and accurate with low false positive rates, which are the essential features of an efficient network intrusion detection system. Keywords: Intrusion Detection, Rule Based Classifiers, Hybrid DTNB, Ensemble approach, Accuracy.

1 Introduction With the growing use of Internet, information security threat is becoming one of the most forbidding problems. The demand for reliable connection, information integrity and privacy is more intense today than ever before. One possible precaution is the use of an effective Intrusion Detection System (IDS). Data Mining is a relatively new approach for intrusion detection. Data mining approaches for intrusion detection was first implemented in mining audit data for building automated models for Intrusion Detection [1]. The raw data is first converted into ASCII network packet information which in turn is converted into connection level information. These connection level information records contain connection features like service, duration, protocol, etc. Data mining algorithms are applied to this data to create models to detect intrusions. In this paper, we investigate and evaluate the performance of various rule based classifiers like JRip, Ridor, NNge, and Decision Table (DT), Bayesian classification using Naïve Bayes (NB), Hybrid DTNB and an ensemble approach. The motivation for using the hybrid approach is to improve the detection accuracy of an IDS compared to C.S. Leung, M. Lee, and J.H. Chan (Eds.): ICONIP 2009, Part I, LNCS 5863, pp. 614–621, 2009. © Springer-Verlag Berlin Heidelberg 2009

Semi-Naïve Bayesian Method for Network Intrusion Detection System

615

using individual approaches. Finally, we use AdaBoost algorithm as an ensemble approach to all the above for further enhancement in the intrusion detection accuracy while maintaining low false positive rate. The rest of the paper is organized as follows. Related research is presented in Section 2 followed by a short theoretical background on the rule based classification algorithms in Section 3. A brief introduction to Naïve Bayesian classifiers is presented in Section 4. Hybrid classifiers and ensemble approach used in this research is discussed in Section 5. Experimental results and analysis is presented in Section 6 followed by conclusion in Section 7.

2 Related Research In [2], the authors include a hybrid statistical approach which uses Data Mining and Decision tree classification in identifying the false alarms. In that, the authors conclude that their strategy can be used to evaluate and enhance the capability of an IDS to detect and at the same time to respond to the threats and benign traffic in critical network applications. . The authors in [3] present two hybrid approaches for modeling IDS. Decision trees and SVM are combined as a hierarchical hybrid intelligent system model (DT-SVM) and an ensemble approach combining the base classifiers. They conclude that the proposed research provides more accurate intrusion detection capabilities. Intrusion detection using an ensemble of intelligent paradigms is proposed in [4]. In this, the authors show that an ensemble of ANNS, SVMs and MARS is superior to individual approaches for intrusion detection in terms of classification accuracy. In [5], the authors present an intrusion detection model based on hybrid neural network and C4.5. The key idea is to take advantage of different classification capabilities of neural network and the C4.5 algorithm for different attacks. However, in this, they consider only few selected attacks from each category for their analysis. A review of various supervised classification techniques is presented in [6]. In [7], the authors propose hybrid GA (genetic algorithm) /decision tree algorithm which outperform the decision tree classifier in order to build a network intrusion detection model. In this, they conclude that this improvement is due to the fact that the hybrid approach is able to focus on relevant features and eliminate unnecessary and distracting features. However, the hybrid GA /decision tree algorithm needs to be tested more in depth for its true potential. The authors propose a double multiple-model approach capable of enhancing the overall performance of IDS in [8]. In that, the authors adopted three reasoning methods: Naïve Bayesian, Neural Nets, and Decision Trees for IDS model. Finally, the authors conclude that even if a given model outperforms others in specific problem, it is incapable of producing better results in general. This is specifically true in case of intrusion detection because often single algorithm can’t deal with all attack classes at the desired accuracy level. Thus, combination of multiple models tries to take advantage of the characteristics of the individual base models to improve overall performance of an IDS.

3 Rule Based Classifiers In this section, we will focus on some very important and yet novel rule based classification algorithms like NNge, JRip, RIDOR, Decision table(DT), which are not yet explored by intrusion detection researchers to the best of our knowledge.

616

M. Panda and M.R. Patra

3.1 NNge (Non-Nested Generalized Exemplars) NNge is a novel algorithm that generalizes exemplars without nesting or overlap. NNge is an extension of Nge [9], which performs generalization by merging exemplars, forming hyperrectangles in feature space that represent conjunctive rules with internal disjunction. NNge forms a generalization each time a new example is added to the database, by joining it to its nearest neighbor of the same class. Details about this algorithm can be found in [10]. 3.2 JRip (Extended Repeated Incremental Pruning) JRip implements a propositional rule learner, “Repeated Incremental Pruning to Produce Error Reduction” (RIPPER), as proposed in [11]. JRip is a rule learner alike in principle to the commercial rule learner RIPPER. RIPPER rule learning algorithm is an extended version of learning algorithm IREP (Incremental Reduced Error Pruning). Initially, a set of training examples is partitioned into two subsets, a growing set and a pruning set. The rule set begins with an empty rule set and rules are added incrementally until no negative examples are covered. This approach performs efficiently on large and noisy datasets. 3.3 RIDOR (Ripple-Down Rules) RIDOR generates the default rule first and then the exceptions for the default rule with the least (weighted) error rate. Later, it generates the best exception rules for each exception and iterates until no exceptions are left. It performs a tree-like expansion of exceptions and the leaves have only default rules but no exceptions. The exceptions are a set of rules that predict the improper instances in default rules [12]. 3.4 Decision Tables Decision Tables are one of the possible simplest hypothesis spaces, and usually they are easy to understand. A decision table is an organizational or programming tool for the representation of discrete functions. It can be viewed as a matrix where the upper rows specify sets of conditions and the lower ones indicate sets of actions to be taken when the corresponding conditions are satisfied; thus each column, called a rule, describes a procedure of the type “if conditions, then actions”. Details about the rule based classifiers can be found in [13].

4 Naïve Bayesian Approach The Naïve Bayes model is a heavily simplified Bayesian probability model [14]. Here, one considers the probability of an end result given several related evidence variables. The probability of end result is encoded in the model along with the probability of the evidence variables occurring, given that the end result occurs. The probability of an

Semi-Naïve Bayesian Method for Network Intrusion Detection System

617

evidence variable given that the end result occurs is assumed to be independent of the probability of other evidence variables given that end results occur. In [15], the authors examine the circumstances under which the Naïve Bayes classifier performs well and why. They state that the error is a result of three factors: training data noise, bias, and variance. Training data noise can only be minimized by choosing good training data. The training data must be divided into various groups by the machine learning algorithms. Bias is the error due to groupings in the training data being very large. Variance is the error due to those groupings being too small.

5 Proposed Methodology 5.1 Hybrid DTNB: A Semi-Naïve Bayesian Approach Here we, explore the effectiveness of the simple semi-Naïve Bayesian ranking method that combines Naïve Bayes (NB) with induction of Decision Tables (DT), which is called as hybrid DTNB. This algorithm is recently proposed in [16], which to the best of our knowledge has not been used by any of the intrusion detection researchers. In this model, Naïve Bayes and Decision tables can both be trained efficiently, and the same holds true for the combined semi-Naïve Bayes model. Figure 1 shows the architecture of the semi-Naïve Bayesian approach by combining DT with NB. Input KDDCup’99 Intrusion Detection Data

Decision (DT)

Naïve Bayes (NB)

Output

Tables

Fig. 1. Semi-Naïve Bayesian Approach

Algorithm Description. The algorithm for learning the combined model (DTNB) proceeds in much the same way as the DTs alone. At each point in the search; it evaluates the merit associated with splitting the attributes into two disjoint subsets: one for the Naïve Bayes and the other for the Decision Tables. In this, forward selection is used, where at each step, selected attributes are modeled by NB and the remainder by the DT and all attributes are modeled by the DT initially. We use leave-one-out cross validation to evaluate the quality of a split based on the probability estimates generated by the combined model. In [16], the authors use the AUC (Area under the curve) as the performance measures for the evaluation of classifiers in 2-class classification problem, whereas we aim to use accuracy as our performance measures in a 5-class classification process in building a network intrusion detection system. The class probability estimates of the Naïve Bayes and Decision Tables must be combined to generate overall class probability estimates. All probabilities are estimated using Laplace corrected

618

M. Panda and M.R. Patra

observed counts. In addition to this, a variant that includes attribute selection, which can discard attributes entirely from the combined model, is considered. To achieve this, an attribute can be discarded rather than added to the NB model, in each step of the forward selection. 5.2 Ensemble Approach Here the idea is to apply an ensemble approach which basically does not rely on a single best classifier for decision on an intrusion; instead information from different individual classifiers is combined to take the final decision. However, the effectiveness of the ensemble approach depends on the accuracy and diversity of the base classifiers used. The architecture of the proposed ensemble approach is shown in Figure 2.

Intrusion Detection Dataset

Rule based classifiers (JRip, RIDOR, NNge and DT) Naïve Bayes (NB)

Ensemble Approach

Output

Hybrid Model

Fig. 2. Ensemble Approach

6 Experimental Results and Discussion We use KDDCup 1999 intrusion detection benchmark dataset for our experiments. The data set contains 24 attacks and 41 attributes. We have randomly selected 1000 connection records out of those, which contains all intrusion types, where the care has been taken to include all the rare attacks that fall under U2R and R2L category. We use five class classifications for our experimentation in building a network intrusion detection system. Full dataset is used as training data in order to build an intrusion detection system, while 10-fold cross validation is used in order to find the efficacy of the model built in the training phase. All our experiments are carried out on a Pentium 4 IBM PC with 2.8GHz CPU, 40GB HDD and 512 MB RAM. 6.1 Comparison of Results It can be observed from Table 1 that DTNB approach enhances the detection rate of Naïve Bayes classifier in detecting Normal, Probe and U2R attack, where as it fails to perform well in case of DoS and R2L attacks. It is also observed that DTNB does not perform well for our intrusion detection dataset in comparison to NNge rule based classifiers. So, we need to use ensemble approach for Semi naïve Bayesian method in

Semi-Naïve Bayesian Method for Network Intrusion Detection System

619

order to build an efficient intrusion detection system, which is shown in Table 2. It can be observed from Table 2 that the performance of Hybrid DTNB is enhanced after using the ensemble approaches. It is also quite clear that the ensembled DTNB produces better detection rate in all five class than the individual DT and NB approaches. However, still, it produces low detection rate in case of rare attacks in comparison to NNge rule based classifier. Low Root mean square error (RMSE) and high kappa value makes our proposed approach more interesting in designing a network intrusion detection system. Other performance measures are also presented in Table 2 in order to provide a comparative view of the performance of each of the classifiers under consideration. Table 1. Performance Comparison of Classifiers JRip

RIDOR

NNge

DT

NB

Hybrid DTNB

DR

Normal

0.9835

0.9859

0.9835

0.9859

0.96

0.979

RR

Probe DoS U2R R2L Normal

0.5625 0.998 0.25 0.353 0.9698

0.75 1.0 0.0 0.4706 0.979

0.6562 1.0 0.75 0.647 0.9721

0.4375 0.9684 0.4444 0.353 0.9188

0.279 0.984 0.0 0.353 0.927

0.406 0.972 0.5 0.294 0.941

FPR

Probe DoS U2R R2L Normal

0.9 0.9656 0.8333 0.6666 0.0126

0.75 0.99 0.8333 0.6666 0.0107

0.7241 0.998 0.7273 0.9166 0.0126

0.8235 0.9723 1.0 0.75 0.0124

0.414 0.958 0.0 1.0 0.031

0.684 0.974 1.0 0.263 0.017

Probe DoS

0.0142 2.23x10-

8.35x10-3 0.0

0.0114 0.0

0.0182 0.033

0.032 0.017

0.019 0.0288

3

FNR

U2R R2L Normal

4x10-3 0.0111 0.0302

4x10-3 9.15x10-3 0.021

1.02x10-3 6.1x10-3 0.0279

5x10-3 0.011 0.081

1.0 0.011 0.073

0.005 0.001 0.058

F-Value

Probe DoS U2R R2L Normal

0.1 0.0343 0.1666 0.3333 0.9766

0.25 9.76x10-3 0.1666 0.3333 0.9824

0.2759 1.97x10-3 0.2727 0.0833 0.9777

0.1765 0.0277 0.0 0.25 0.9512

0.586 0.042 0.0 0.0 0.943

0.31 0.026 0.0 0.73 0.96

Probe DoS U2R R2L

0.6923 0.9815 0.6666 0.4616 0.9308

0.75 0.9951 0.6666 0.5517 0.9477

0.6885 0.999 0.8 0.7586 0.9495

0.5714 0.9703 0.615 0.48 0.8915

0.333 0.971 0.0 0.522 0.87

0.51 0.973 0.615 0.278 0.887

0.49

0.72

0.23

0.34

0.08

0.92

0.0601

0.0537

0.0546

0.0919

0.072

0.0855

Kappa Time Taken Seconds RMSE

in

620

M. Panda and M.R. Patra Table 2. Ensemble Approach Ensemble Approach with Base Classifiers DR

Normal

JRip 0.9929

RR

Probe DoS U2R R2L Normal

0.6875 0.998 0.75 0.353 0.9635

0.375 1.0 0.5 0.412 0.9813

0.6875 0.998 0.75 0.647 0.972

FPR

Probe DoS U2R R2L Normal

0.88 0.9883 0.8571 0.75 5.56x10-3

0.8 0.9512 0.8 0.7 0.01

0.7333 0.996 0.8 0.846 0.014

FNR

Probe DoS U2R R2L Normal

0.01 2.09x10-3 3.04x10-3 0.011 0.0365

0.02 0.0 5x10-3 0.01 0.0187

0.01 2.08x10-3 1.02x10-3 6.11x10-3 0.028

0.12 0.0117 0.1428 0.25 0.978

0.2 0.0487 0.2 0.3 0.984

0.0266 3.94x10-3 0.2 0.154 0.9766

Probe DoS U2R R2L Kappa

0.772 0.993 0.75 0.48 0.947

0.51 0.975 0.572 0.52 0.9477

Time Taken (Sec.)

2.66

Probe DoS U2R R2L F-Value Normal

RIDOR 0.986

NNge 0.9812

DT 0.993

NB 0.957

DTNB 0.995

0.75 1.0 0.5 0.42 0.979

0.791 1.0 0.5 0.412 0.958

0.854 1.0 0.6 0.47 0.972

0.654 0.998 0.667 0.875 0.031

0.921 1.0 0.67 0.888 3.6 x10-3

8.28x10-3 0.0 3.03x10-3 0.01 0.021

9.62x10-3 0.0 2x10-3 0.01 0.042

6.2x10-3 0.0 2.03x10-3 9.1x10-3 0.027

0.143 0.017 0.0 0.222 0.986

0.346 1.97x10-3 0.333 0.125 0.957

0.079 0.0 0.333 0.111 0.994

0.716 0.999 0.571 0.56 0.921

0.886 1.0 0.633 0.625 0.958

0.857 0.9826 1.0 0.78 5.52x10-3

0.71 0.997 0.842 0.733 0.9442

0.8 0.9912 0.803 0.546 0.9489

0.94

1.53

0.59

10.6

0.0512

0.0481

0.058

0.0451

1 RMSE

0.0502

3.28 0.0482

3

7 Conclusion In this research, we have investigated some new techniques for network intrusion detection and evaluated their performance based on the KDDCup 1999 benchmark intrusion detection dataset. We have explored rule based classifiers and Naïve Bayes as intrusion detection models. Next, we designed a semi-Naïve Bayesian approach Hybrid DTNB by combining Decision Table (DT) and Naïve Bayes (NB) and an ensemble approach with all the rule based classifiers and Hybrid DTNB as base classifier. The experimental results reveal that the proposed ensemble approach for semi-Naïve Bayesian classification performs well for Normal, Probe and Dos attacks. In Normal and DoS attacks, the detection rate is almost 100%. This result suggests that by choosing proper base classifiers 100% accuracy might be possible for other classes too.

Semi-Naïve Bayesian Method for Network Intrusion Detection System

621

References [1] MIT Lincoln Laboratory, http://www.ll.mit.edu/IST/ideval/ [2] Annur, N.B., Sallehudin, H., Gani, A., Zakari, O.: Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malaysian journal of computer science 21(2), 101–115 (2008) [3] Peddabachigari, S., Abraham, A., Grosan, C., Thomas, J.: Modelling IDS using hybrid intelligent systems. Journal of network and computer applications 30(1), 114–132 (2007) [4] Mukkamala, S., Sung, A.H., Abraham, A.: Intrusion detection using an ensemble of intelligent paradigms. Journal of network and computer applications 28(2005), 167–182 (2005) [5] Pan, Z.-S., Chen, S.-C., Hu, G.-B., Zhang, D.-Q.: Hybrid neural network and C4.5 for Misuse detection. In: Proc. of International conference on Machine Learning and Cybernatics, Xi’an, November 2-5, pp. 2463–2467. IEEE Press, USA (2003) [6] Kotsiantis, S.B.: Supervised machine learning: A review of classification Techniques. Informatica 31, 249–268 (2007) [7] Stein, G., Chen, B., Wu, A.S., Hua, K.A.: Decision Tree classifier for network intusion detection with GA-based feature selection. In: Proc. of the 43rd Annual South East Regional Conference, kennesa, Georgia, vol. 12, pp. 136–141 (2005) [8] Katar, C.: Combining multiple techniques for intrusion detection. Intl. Journal of Comp.Sc and Net.Security (IJCSNS) 6(2B), 208–218 (2006) [9] Salzberg, S.: A nearest hyperrectangle learning method. Machine learning 6, 277–309 (1991) [10] Roy, S.: Nearest Neighbour with generalization, Christchurch, NZ (2002) [11] Cohen, W.W.: Fast effective rule induction. In: 12th Intl.Conf. On Machine learning, pp. 115–123 (1995) [12] Gaines, B.R., Cronpton, P.: Induction of Ripple-Down rules applied to modelling large databases. Journal of Intelligent information system 5(3), 221–228 (1995) [13] Panda, M., Patra, M.R.: Ensembling rule based classifiers for detecting network intrusions. In: International conference on advances in recent techniques communication techniques (ARTCOM 2009), Kerla, India. IEEE Computer Society Press, USA (2009) [14] Russel, S.J., Norvig, P.: Artificial Intelligence: A modern approach. International Edition. Pearson US Imports and PHIPES, London (2002) [15] Domingos, P., Pizzani, M.J.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach.learning 29(2-3), 103–130 (1997) [16] Hall, M., Frank, E.: Combining Naïve Bayes and Decision Tables. In: Wilson, D.L., Chad, H. (eds.) Proc. of the 21st Intl. Florida Artificial Intelligence society conference (FLAIRS), pp. 318–319. AAAI Press, Menlo Park (2008)

Suggest Documents