SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
EVALUATING MACHINE LEARNING ALGORITHMS FOR DETECTING NETWORK INTRUSIONS 1
Mrutyunjaya Panda1and Manas Ranjan Patra2
Department of ECE, Gandhi institute of Engineering and Technology, Gunupur, Orissa-765022, India Email:
[email protected] 2 Department of Computer Science, Berhampur University, Orissa, India Email:
[email protected] builds models of normal behavior, and automatically detects any deviation from it, flagging the latter as suspect. Anomaly detection techniques thus identify new types of intrusions as deviations from normal usage [2]. While an extremely powerful and novel tool, a potential drawback of these techniques is the rate of false positives (False alarm rate). This can happen primarily because previously unseen (yet legitimate) system behaviors may also be recognized as anomalies, and hence flagged as potential intrusions. After a brief review of our research in building predictive models for learning from rare classes, the paper gives a comparative study of the several anomaly detection schemes for identifying novel network intrusions. In this paper, apart from Naïve Bayes, various ensemble learning algorithms such as Random Forest and AdaBoost are used, in order to build an efficient network intrusion detection model. Experimental results on KDDCup’1999 data sets have demonstrated the effectiveness of choosing a machine learning algorithm for building such an efficient intrusion detection model. The rest of the paper is organized as follows. In Section 2, we discuss the related works; introduce the various data mining algorithms used in section 3; in section 4, to illustrate experimental design; and evaluate our intrusion detection model through experiments in section 5; and finally, in section 6 ends the paper with a conclusion and future work.
Abstract- With recent advances in network based technology and increased dependability of our everyday life on this technology, assuring reliable operation of network based system is very important. Signature based intrusion detection systems cannot detect new attacks. These systems are the most used and developed ones. Current anomaly based intrusion detection systems are also unable to detect all kinds of new attacks because they are designed to restricted applications on limited environments. It is important problems to increase the detection rates and reduce the false positive rates in network intrusion detection systems (NIDS). In this paper, we propose machine learning algorithms such as Random Forest and AdaBoost, along with Naïve Bayes, to build an efficient intrusion detection model. We also report our experimental results over KDDCup’1999 datasets. The results shows that the choice of any data mining algorithm is a compromise among the time taken to build the model, detection rate and low false alarm rate . Index Terms- Intrusion Detection, Machine Learning algorithms, Precision-Recall Characteristics, ROC, Cost Matrix.
I. INTRODUCTION According to [1], intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions. It is also defined as attempts to compromise the confidentiality, availability, or to bypass the security mechanisms of a computer or network.
II.RELATED WORKS Warrender et al. [3] have proposed several intrusion detection methods based upon system call trace data. They tested a method that utilizes sliding windows to determine a database of normal sequences to form a database for testing against test instances. They then used a similar method to compare windows in the test instances against the database and classify instances according to those in the normal sequence database. The function requires sequential analysis of a window of system calls for each call made by a process. This requires the maintenance of a large database of normal system call trace sequences. Most intrusion occurs via network using the network protocols to attack their targets. Twycross proposed a new paradigm in immunology, Danger Theory, to be applied in developing an intrusion detection system [4]. Alves et al. [5] presents a classification-rule discovery algorithm integrating artificial immune system (AIS) and
Data mining based intrusion detection techniques generally fall into one of the two categories: misuse detection and anomaly detection. In misuse detection, each instance in a data set is labeled as ‘normal’ or ‘intrusion’ and a learning algorithm is trained over the labeled data. These techniques are able to automatically retrain intrusion detection models on different input data that include new types of attacks, as long as they have been labeled appropriately. Unlike signature-based intrusion detection systems, models of misuse are created automatically, and can be more sophisticated and precise than manually created signatures. A key advantage of misuse detection techniques is their high degree of accuracy in detecting known attacks and their variations. Their obvious drawback is the inability to detect attacks whose instances has not yet been observed. Anomaly detection, on the other hand,
472 © 2009 ACADEMY PUBLISHER
SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
with the aim of creating an improved composite model M*.
fuzzy systems. For example, during a certain intrusion, a hacker follows fixed steps to achieve his intention, first sets up a connection between a source IP address to a target IP, and sends data to attack the target. Chang et al. have focused on the combination of data reduction and classification with a query-based learning methodology, which can reduce processing time, communications overhead and storage requirements in mining network intrusions [6].
AdaBoost Boosting is a general method for improving the accuracy of any given learning algorithm. Boosting refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb. Boosting has its roots in a theoretical framework for studying machine learning called the “PAC” learning model [10]-[11]. They were the first to pose the question of whether a “weak” learning algorithm which performs just slightly better than guessing in the PAC model can be “boosted” into an arbitrary accurate “strong” learning algorithm. Finally, the AdaBoost algorithm, introduced by [12], solved many of the practical difficulties of the earlier boosting algorithms, and is the focus of this paper. The algorithm takes
The authors have used four different learning algorithms to produce a set of classifiers for this evaluation experiment [7]. The J48 algorithm, which we already have introduced in the example, has produced one pruned and one unpruned decision tree classifier. Both sub tree rising and reduced error pruning was applied on the first classifier. One of WEKA’s nearest neighbor implementations, called IBk, has been used to produce one classifier based on one neighbor (IB1) and another classifier based on ten neighbors (IB10).
as input a training set (x1 , y1 )… (xm , ym ) , where each xi
The back propagation (Neural Network) algorithm has been used to produce five different neural network classifiers, each with a different combination of the number of hidden nodes (2 or 3) and the number of epochs for training. The Naïve Bayes algorithm has also been used to produce one classifier.
belongs to some domain or instance space X, and each label
yi is in some label set Y. For most of this paper, we assume Y= {-1, +1}; later, we discuss extensions to the multiclass case. AdaBoost calls a given weak or base learning algorithm repeatedly in a series of rounds t=1… T. One of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set. The weight of this distribution on training
Mitchell [8] argues that this algorithm is known to perform comparably with decision tree and neural network learning in some domains. This makes it an interesting algorithm to use in the experiments concerning the evaluation of classifiers using a measure function. Panda and Patra [9] have compared the performance of Naïve Bayes with the Neural Network approach and found its suitability in building an intrusion detection model. III.
example i on round t is denoted Dt (I). Initially, all weights are set equally, but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on the hard examples in the training set. The
weak learner’s job is to find a weak hypothesis ht : X→ {-1, +1} appropriate for the distribution Dt. The goodness of a weak hypothesis is measured by its error
MACHINE LEARNING ALGORITHMS
∈t = Pri Dt [ht ( xi ) ≠ yi ] =
In this section, we describe the methods employed in the proposed framework, and illustrate how to apply these methods to build an efficient intrusion detection system model.
Dt (i ).
(1)
Notice that the error is measured with respect to the distribution Dt on which the weak learner was trained. In practice, the weak learner may be an algorithm that can use the weights Dt on the training examples. Alternatively, when this is not possible, a subset of the training examples can be used to train the weak learner. Once the weak
A. Overview of the framework The proposed framework applies machine learning algorithms to build patterns for network intrusion detection. In the initial phase, the system captures the packets from network traffic. The features for each connection are constructed by the pre-processors from the captured network traffic. Then, we fed training dataset into the pattern builder module, which can build the patterns of intrusions. After mining the patterns for intrusions, the module outputs the patterns as the input to the detector module. Then, in the detector module, the connections are classified as different intrusions or normal traffic using the patterns built in the initial phase. Finally, the system raises an alert when it detects any intrusions.
hypothesis ht has been received, AdaBoost chooses a parameter
αt
.Intuitively,
αt
is assigned to ht . Note that
measures the importance that
αt ≥0 if εt ≤1/2 (which can be
used without loss of generality), and that αt gets larger as εt gets smaller. The distribution Dt is next updated using a rule, the effect of which is to increase the weight of examples misclassified by ht , and to decrease the weight of correctly classified examples. Thus, the weight tends to concentrate on “hard” examples. The final hypothesis H is a weighted majority vote of the T weak hypotheses where
αt is the weight assigned to ht .
B. Ensemble Learning Ensemble learning methods are the general strategies for improving the classifier and predictor accuracy that use a combination of models. Each combines a series of T
Random Forest Random forest is an ensemble of unpruned classification or regression trees, induced from bootstrap samples of the
learned models (classifiers or predictors), M1, M2… Mt,
473 © 2009 ACADEMY PUBLISHER
∑
i:ht ( xi ) ≠ yi
SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
training data, using random feature selection in the tree induction process [13]. Prediction is made by aggregating (majority vote for classification or averaging for regression) s the predictions of the ensemble. Random forest generally exhibits a substantial performance improvement over the single tree classifier such as CART and C4.5. It yields generalization error rate that compares favorably to AdaBoost, yet is more robust to noise. However, similar to most classifiers, random forest can also suffer from the curse of learning from extremely imbalanced training data set. As it is constructed to minimize the overall error rate, it will tend to focus more on the prediction accuracy of the majority class, which often results in poor accuracy, for the minority class.
network is built one can derive the probability of an event, conditioned by a set of observations for classification. Naïve Bayesian Classification assumes attribute independence. It thus makes computation possible and yields optimal classifiers when the assumptions are satisfied. As per Yang [19], the independence assumption is seldom satisfied in practice, however, as attributes (variables) are often correlated. To explore the probabilistic dependencies which underlie a particular model, learning a Bayesian network from data can be subdivided into parameter learning and structural learning, the latter being the more difficult concept. Recently, there has been significant work on methods whereby both the structures and the parameters of the graphic models can be learned directly from databases [18]. The problem of learning a probabilistic model is to find a network that best matches the given training data set. To exhaustively explore the dependencies among attributes, a complete graph, where every attribute is connected to every other attribute, is favorable. However, such networks do not provide any useful representation of the independence assertions in the learned distributions and over fit the training data [20]. Hence, the networks are learned according to certain scoring functions to approximate those dependency patterns which dominate the data.
In random forests, there is no need for cross validation or a test set to get an unbiased estimate of the test error. Since each tree is constructed using the bootstrap sample, approximately rd 1/3 of the cases are left out of the bootstrap samples and not used in training. These cases are called out of bag (oob) cases. These oob cases are used to get a run –time unbiased estimate of the classification error as trees are added to the forest.
The error rate of a forest depends on the correlation between any two trees and the strength of each tree in the forest. Increasing the correlation increases the error rate of the forest. The strength of the tree relies on the error rate of the tree. Increasing the strength decreases the error rate of the forest. When forest is growing, random features are selected at random out of the all features in the training data.
Obviously, for a given imbalanced data set, dependency patterns inherent in the small classes are usually not significant and hard to be adequately encoded in the networks. When the learned networks are inferred for classification, the samples of the small classes are most likely misclassified. Experimental results reported this observation [21].
Naïve Bayes The Naïve Bayes model is a heavily simplified Bayesian probability model [14]. In this model, consider the probability of an end result given several related evidence variables. The probability of the end result is encoded in the model along with the probability of the evidence variables occurring given that the end result occurs. The probability of an evidence variable given that the end result occurs is assumed to be independent of the probability of other evidence variables given that the end results occur. This method is important for several reasons. It is very easy to construct, not needing any complicated iterative parameter estimation schemes. This means it may be readily applied to Hugh data sets. It is easy to interpret, so users unskilled in classifier technology can understand why it is making the classification it makes. And finally, it often does surprisingly well: it may not be the best possible classifier in any particular application, but it can usually be relied on to be robust and to do quite well. General discussion of the Naïve Bayes method and its merits are given [15]-[16]. According to Pearl [17], Bayesian classification is based on the inferences of probabilistic graphic models which specify the probabilistic dependencies underlying a particular model using a graph structure. In its simplest form, a probabilistic graphical model is a graph in which nodes represent random variables, and the arcs represent conditional dependence assumptions. Hence it provides a compact representation of joint probability distributions. An undirected graphical model is called as a Markov network, while a directed graphical model is called as a Bayesian network or a Belief Network [18]. Once a probabilistic
Consider the following example, where farmer has a bottle of milk that can be either infected or clean. She also has a test that can determine with a high probability whether the milk is infected or not (i.e. the outcome of the test is either positive or negative).This situation can be represented with two random Boolean variables, infected and positive. The variable infected is true when the milk is actually infected and false otherwise. The variable positive is true when the test claims that the milk is infected and false when the outcome of the test is negative. Note that it is possible that the milk is clean, when the test has a positive outcome and vice versa. A possible Bayesian network that models this situation is outlined below. The node that represents the outcome of the test is often is called an information variable. The state of these variables is usually given or can be measured in a straight forward manner. The node that represents the actual state of the milk is called a hypothesis variable. The states of such variables cannot be obtained immediately. The purpose of a Bayesian Network is to allow one to calculate the probability of the hypothesis variables given the evidence gathered from information variables. In our example, the farmer might want to calculate the probability that the milk is infected given a positive test result. By entering the evidence (e.g., positive is true) into the Bayesian Network, the probability that infected is true can be derived. The numerical value for this probability, called the a-posteriori probability given the support of the observed evidence .Intuitively, one would expect a higher value, especially when considering that the test is very accurate. However, the low initial probability of the milk being infected, called the a-priori probability before any observations are made.
474 © 2009 ACADEMY PUBLISHER
SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
KDD’99 to measure the damage of misclassification, where, Mij denotes the number of samples in class misclassified as class j, and Cij indicates the corresponding cost in the cost matrix. Let, N is the total number of samples used for the experimentation. The cost that indicates the average damage of misclassification for each connection is computed as:
IV. EXPERIMENTAL DESIGN In our experiments, we used network connection data from MIT Lincoln Laboratory, to test the anomaly intrusion detection model. A. Data Sets The network data used for testing is distributed by MIT Lincoln Laboratory for 1998 DARPA evaluation [22]. The data contains traffic in a simulated military network that consists of hundreds of hosts. The data includes 7 weeks of training set and 2 weeks of test set that were not from the same probability distribution as the training set In our experiments, we only use 10% KDDCup’1999 data set, which Contain 65525 connections. As suggested in [23], a connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows from a source IP address to a target IP address under some well defined protocol. In the 10% subset data, each network connection is labeled as either normal, or as an exactly one specific kind of attack. More details about the data set can be found in [24].
Cost =
Error Rate=
The ROC curve, also called as AUC (Area under ROC), is a graphical representation of the trade off between the Detection rate and the False Alarm rate, allow for a performance evaluation independent of costs and priors by integrating performance over a range of decision thresholds. The P-ROC represents the characteristics between Precision and Recall. The precision is in fact dependant on the priors, i.e. a new operating
R2L
Normal
0
1
2
2
2
Probe DoS
1 2
0 1
2 0
2 2
2 2
U2R
3
2
2
0
2
R2L
4
2
2
2
0
(2)
(TotalTestData−TotalCorrectlyClassifiedData) (TotalTestData)
(3)
From the ROC curves shown in Figure1, it can be observed that the our earlier work on Naïve Bayes Algorithm[9] performs well in comparison to its Neural Network counterpart in terms of High Detection Rate and less time taken to build the model. In this paper, we have compared our proposed Ensemble approaches with the Naïve Bayesian classification, in order to build an efficient network intrusion detection model. The results shown in Figure 2 and Figure 3 show its effectiveness over the other. In the same way, it can be observed from the Table3 that, Random Forest algorithm outperforms the AdaBoost and Naïve Bayes algorithms, as well as the best KDD result, in regard to overall error rate and cost, while taking more time to build the intrusion detection
Finally, we use cost matrix [26], as Table 1 published in
475 © 2009 ACADEMY PUBLISHER
N
Table 2 compares the performance of different algorithms on the network audit data. Results shows that Naïve Bayes provides better results over AdaBoost and Random Forest based on Detection Rate, Recall rate and F-measure. However, in case of G-means, Ensemble approaches provides good result in detecting Probe and DoS attack, but, performs poorly in detecting the U2R and R2L attacks.
characteristic is obtained, if the priors vary, as opposed to the ROC where thresholds and priors are synonymous [25]. Table 1. Cost Matrix U2R
× Cij )
A. Discussion Different misclassifications have different levels of consequences. For example, misclassifying R2L as Normal is more dangerous than misclassifying DoS as Normal. We use the confusion matrix as Table 2 to measure the testing of the model in more detail. The advantage of using this matrix is that it is not only tells us how many got misclassified but also what misclassification occurred. In this paper, we also use the 37 to cost matrix as Table 1 published in KDD’99 measure the damage of misclassification. In this paper, we will calculate the error rate, which is only an estimate of the true error rate and is expressed to be a good estimate, if the number of test data is large and representative of the population and is defined as:
We also use the ROC (Receiver Operating Characteristic) and P-ROC (Precision-Recall Operating Characteristic) curve to compare the performance of the Ensemble approaches with the Naïve Bayes classification, in order to build an efficient intrusion detection model.
DoS
ij
To evaluate the performance of our method and compare with different methods, we summarize performance measurement parameters of the four attack categories in Table 2 in comparison with other methods reported in [9] and [27]. We have carried out the experiments on a computer with 2.8GHz Pentium4 CPU and 512 MB DDR memory.
In learning extremely imbalanced data, the overall classification accuracy is often not an appropriate measure of performance. A trivial classifier that predicts every case as the majority class can still achieve very high accuracy. We use metrics such as true negative rate, true positive rate, G-Mean, Precision, Recall, and F-Measure to evaluate the performance of learning algorithms on imbalanced data. These metrics have been widely used for comparison.
Probe
(M
V. PERFORMANCE EVALUATION AND COMPARISON
B. Performance Measurement
Normal
∑
SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
model. VI. CONCLUSION AND FUTURE WORK
Precision-Recall Characteristics
Recall Rate
In this paper, we employ ensemble algorithms in modeling network intrusion detection systems, to improve detection performance. We presented Naïve Bayes algorithm also in order to compare our earlier results on the proposed method, to find its suitability in building an efficient network intrusion detection model. Results on the network audit data shows that AdaBoost is not suitable for building the network intrusion detection model, while there is a compromise between the Naïve Bayes and Random forest. Several intrusion detection schemes for detecting network intrusions are proposed in this paper. When applied to KDDCup’99 data set, developed algorithms for learning classifiers were successful in detecting network attacks than standard data mining techniques based on neural networks. Our future works are to extend this concept to develop more learning methods for more real world applications.
1.2 1 0.8 0.6 0.4 0.2 0 -0.5-0.2 0
-0.5
D etectio n R ate
0
0.5
Measures/ Methods
1
False Positive Rate Figure 1. ROC for Different Classifiers Receiver Operating Characteristics 1.2
Detection Rate
1 Naïve Bayes
0.8 0.6
Random Forest
0.4
AdaBoost
0.2 0 -0.1-0.2 0
0.1
0.2
0.3
False Postive Rate
Figure 2. ROC – Comparison between Ensemble approaches with Naïve Bayes
476 © 2009 ACADEMY PUBLISHER
0.5
1
1.5
Table 2. Confusion Matrix for the Performance Measurement
Naïve Bayes BPNSQAQ BPN
0 -0.5
AdaBoost
Figure 3. P-ROC Comparisons for Different machine Learning Algorithms
1.5
0.5
Random Forest
Precision Rate
Probe (%)
Dos (%)
U2R (%)
R2L (%)
BPNSQAQ [9] BPN [6] FPR K-NN[7] Naïve Bayes [6] AdaBoost Random Forest B PNSQAQ BPN FNR K-NN Naïve Bayes AdaBoost Random Forest BPNSQAQ BPN K-NN DR Naïve Bayes AdaBoost Random Forest
Ov eral l (%) 8 91 -
2.42 8.96 0.14 4.1 0.001 39.4 39.6 0.13 100 23.5 75.8 91.1 96.0 0.0 76.24
0.9 0.42 26 4.97 1.72 1.2 03.4 0.02 1.43 1.2 99.1 99.6 99 91.4 91.7
0.908 0.0 0.016 0 0.07 63.6 100 39.68 0.0 100 9.2 0.0 90.47 0.0 0.0
0.235 0 0.025 0.0 0.01 58.6 100 85.8 0.0 0.0 76.5 0.0 90 0.0 86.9
BPNSQAQ Recal BPN K-NN l Naïve Bayes AdaBoost Random forest BPNSQAQ BPN F-MeasureK-NN Naïve Bayes AdaBoost Random Forest BPNSQAQ BPN G-Mean K-NN Naïve Bayes AdaBoost Random Forest
-
60.7 60.3 99.8 0.0 76.5 100 0 76.37 62.1 0.0 87.4
98.8 96.7 99.5 98.5 98.8 99.2 94.8 95.1 97.8 96.8 98.6
36.4 0.0 60.3 0.0 0.0 72 0.0 0.0 75.3 0.0 0.0
41.2 0.0 14.2 0.0 100 25 0.0 93 89.8 0.0 0.0
Comparison of Classifiers
1
Naïve Bayes
SHORT PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009
Table 3. Performance Comparison on the KDD dataset. Experiments Overall Cost Time in Error Rate Seconds (%) Best KDD 7.29 0.2331 Not Result Provided Naïve 5.1 0.16 1.89 Bayes AdaBoost 22.3243 0.0926 9.56 Random 18.94 2.475 0.0476 Forest
[11] Kearns, Michael J. and Vazirani, Umesh V., An introduction to computational learning Theory, MIT press, 1994. [12] Freund, Yoav and Schapire, Robert E., “A decisiontheoretic generalization of on-line learning and an application to boosting”, Journal of computer and system sciences, 55(1), Aug1997, pp.119-139. [13] Breimn, L., Random forest, Machine learning, (45), 2001, pp.5-32. [14] Russell, S.J.and Norvig, P., Artificial Intelligence: A Modern approach, International Edition, Pearson US imports and PHIPEs, Nov.2002. [15] Domingos. P, Pazzini M., “On the optimality of the simple Bayesian classifier under zero-one loss”, Machine learning, (29), 1997, pp.103-130. [16] Hand D J and Yu K, “Idiot’s Bayes –not stupid after all?” , International statistical review, (69), 2001, pp. 385- 398. [17] Pearl, J., “ Probabilistic Reasoning in Intelligent systems: Networks of plausible Inference”, Morgan Kauffmann, 1988. [18] Hacherman, D., “Bayesian Networks for knowledge discovery”, in Advances in KDDM, Chapter-11, AAAI Press/The MIT press, 1996, pp.273-305. [19] Yang, W., “ High-order pattern-discovery and analysis of discrete-valued data set”, PhD thesis, University of waterloo, Waterloo, Ontario, Canada, 1997. [20] Friedman, N., et al, “Bayesian Network Classifiers”, Machine Learning, 29(2/3), 1997:131-163. [21] Kuck, H., “Bayesian formulations of multiple instances learning with applications to general object recognition”, Master Thesis, University of British Columbia, Vancouver, BC, Canada, 2004. [22] MIT Lincoln Laboratory-1999, DARPA Intrusion Detection evaluation Documentation, http://www.II.mit.edu/ideval/docs/docs_index.html, 1999. [23] KDDCup 1999 Data, < http://www.kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>,1999. [24] Wang, Wei, Guan, Xiao Hong and Zhong, Xiangliang, “Processing of massive audit data streams for real time anomaly intrusion detection”, Computer communications, Elsevier, 31-2008, pp.58-72. [25] Landgrebe, Thomas C.W., Paclik Pavel and Robert P.w.Duin, “Precision-Recall Operating characteristic (PROC) curves in imprecise environments”, in Proc. of Intl. conference on Pattern 18th Recognition,ICPR2006,Vol.4, IEEE, 2006, pp.123-127. [26] Elkan, Charles, “Results of the KDD’99 classifier learning”, SIGKDD Explorating, 2000, pp.63-64. [27] Lee, W.and Stolfo, S., “ A frame work for constructing features and models for Intrusion Detection system”, ACM Transactions on Information and system security, 3 (4) ,2000, pp.227-261.
REFERENCES [1] Base, R. and Mell, P., “Intrusion Detection Systems”, NIST Special Publications, sp800, 31-November, 2001. [2] Javitz, H.S. and Valdes, A., “The NIDES statistical component: Description and Justification”, Technical report, A010, Computer science laboratory, SRI International, March 1994. http://www.cs.ucdavis.edu/~wce/ecs236/papers/hw2_NID ES-STA_description.pdf. [3] Warrender, C., Forrest, S. and Pearl mutter, B.A., “Detecting intrusions using system calls: Alternative data models”, in IEEE symposium on security and privacy, 1999, pp.133-145. [4] Twycross, J., “Immune systems, danger theory and intrusion detection”, presented at the AISB, 2004, symposium on Immune system and cognition, Leeds, UK, March 2004, pp.40-42. [5] Alves, R.T., et al., “An artificial immune system for fuzzyrule induction in data mining”, Lecture notes in computer science, Berlin: Springer Verlag, v3242, 2004, pp.10111020. [6] Chang, Ray-I ,et al., “Intrusion Detection by Back propagation Neural Network with sample query and attribute query(BPNSQAQ)”, International Journal of computational Intelligence Research, vol.3,no.1, 2007, pp.6-10. [7] Lavesson, Niklas and Davidson, Paul, “A multidimensional measure function for classifier performance”, in Proc. Of 2nd IEEE international conference on intelligent systems, June2004, pp.508-513. [8] Mitchell, T. M., Machine Learning, International Edition, McGraw-Hill Book Co., Singapore, 1997, ISBN: 0-07-042807-7. [9] Panda, Mrutyunjaya and Patra, Manas Ranjan , “ Network Intrusion Detection using Naïve Bayes”, International journal of computer science and network security, Dec’30-2007, pp.258-263. [10] Valient, L.G., “A theory of the learnable”, Communications of the ACM, 27(11), Nov.1984, pp.11341142.
477 © 2009 ACADEMY PUBLISHER