International Conference on Advances in Computing, Communication and Control (ICAC3’09)
ENSEMBLE OF CLASSIFIERS FOR DETECTING NETWORK INTRUSION Mrutyunjaya Panda
Manas Ranjan Patra
Assistant Professor, Department of ECE, GIET, Gunupur, Orissa, India +91-9437338270
Reader, Department of Computer Science Berhampur University, Orissa, India
[email protected]
[email protected] are usually protected against attacks by a number of access restriction policies that act as a coarse grain filter. Intrusion detection systems (IDS) are the fine grain filter placed inside the protected networks, looking for known or potential threats in network traffic and/or audit data recorded by hosts.
ABSTRACT Intrusion detection technology is an effective approach to deal with problems of malicious attacks on computer networks. In this paper, we present an intrusion detection model based on Ensemble of classifiers such as AdaBoost, MultiBoosting and Bagging to gain more opportunity of training misclassified samples and reduce the error rate by the majority voting of involved classifiers. Our main goal is to build an efficient intrusion detection model based on the error rate of classifiers if unfair distribution exists either within or between data sets. We employ data from the third international knowledge discovery and data mining tools competition (KDDCup’99) to train and test feasibility of our proposed model. From our experimental results with KDDCUP’99 benchmarking dataset, the proposed ensemble of classifiers with REP tree as base classifier outperforms others in building a network intrusion detection model with high detection rate, low overall error rate with low false positive rate.
Two approaches to intrusion detection are currently used. The first one, called misuse detection, is based on attack signatures, i.e. on a detailed description of the sequence of actions performed by the attacker. This approach allows detection of intrusions that match perfectly with the attack signatures. The effectiveness is strictly related to the extent to which IDSs are updated with the signatures of the latest attacks. This is currently a challenge since new attacks and new attack variants are constantly being developed. The development of signatures with a limited scope is motivated by the following reasons: • The difficulty in capturing the “root cause” of an attack, • The requirement of a very small false alarm rate.
Categories and Subject Descriptors C2.0 [Computer-Communication Protection
Network]:
Security
If the attacks are too general, then high attack detection rates may be associated with unacceptable false alarm rates, as a relevant number of normal traffic events may match the signatures. It is easy to see that the availability of signatures coding the “root-cause of the attacks should be capable of protecting from all attacks by exploiting the same vulnerability. Unfortunately, this is beyond the current state of the art of IDSs [17]. The second approach is based on statistical knowledge about the normal activity of the computer system, i.e. a statistical profile of what constitutes the legitimate traffic in the network. In this case, intrusions correspond to anomalous network activity, i.e. traffic whose statistical profile deviates from the normal one [13, 17]. These difficulties in current IDSs lead to applying statistical pattern recognition approaches based on learning by examples paradigm. The main motivation for using pattern recognition approaches for the development of advanced IDSs is their generalization capability, which may support the recognition of intrusions that have not been seen previously and have no previously described patterns. At present, research on advanced IDSs based on learning by example paradigms is at an early stage; therefore a number of issues need to be solved for its use in operational environments [1, 4].
and
General Terms Security
Keywords Intrusion Detection, AdaBoost, Confusion Matrix, ROC, P-ROC.
MultiBoosting,
Bagging,
1. INTRODUCTION Now-a-days, an increasing number of commercial and public services are being offered through Internet. While the use of such services is gaining popularity, security concerns are becoming equally alarming. The so-called “attacks” to the internet service providers, which are usually carried out by exploiting unknown weaknesses or bugs, are most of the times contained within system and application software [13, 17]. Computer Networks Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICAC3’09, January 23–24, 2009, Mumbai, Maharashtra, India., ISBN978-1-60558-351-8. Copyright 2009 ACM 978-1-60558-351-8…$5.00.
In this paper, an approach to build an efficient network intrusion detection system is proposed. A review of the current state of the art on IDSs is given in Section 2. Section 3 presents a brief description about the problem formulation for our experimentation. In Section 4, various proposed ensemble of classifier algorithms are discussed. In Section 5, we evaluate the
510
International Conference on Advances in Computing, Communication and Control (ICAC3’09)
proposed method on several other UCI dataset by comparing it with some standard classification techniques. Finally, Section 6 concludes the paper and suggests further directions in current research.
•
•
2. RELATED WORK A recent technical report on current Intrusion Detection (ID) technology, where both commercial and research products are briefly reviewed, provides a discussion on the challenges to developing effective IDSs [1]. In particular it has been pointed out that advanced research issues on IDSs should involve the use of pattern recognition and learning by example approaches. Neural networks for intrusion detection provide a solution to the problem of modeling the users’ behavior in anomaly detection because they do not require any explicit user model [3,18, 11]. In addition, the extensive evaluation of pattern classification techniques carried out on a sample data set of network traffic performed during the KDD’99 conference pointed out the feasibility of a pattern recognition approach to ID [5]. Lee and Heinbuch [11] used hierarchical back-propagation neural network to detect TCP SYN flooding and port scanning intrusions. Lee also used the RIPPER rules and other data mining technologies to build an intrusion detection model [12]. Panda and Patra developed an efficient intrusion detection model using Naïve Bayes [15]. It has also been seen from [16] that, when the decision trees are useful in detecting the new attacks, higher detection rates are achieved in case of Naïve Bayes. Pan et al. [23] used hybrid neural network and C4.5 for misuse detection. In this, they have observed that after the event (intrusion) module, the average detection rate is 93.28 with false alarm rate of 0.2 percent. In [14], the authors have provided a summary of the leading ensemble methods followed by a discussion of their application to four broad classes of real world classification problems. Kotsiantis et al. [9] proposed a technique of Boosting localized weak learners; rather than having constant weights attached to each learner. They performed the comparison with other well known combining methods on standard classification and regression benchmark datasets using decision stump as based learner, and claimed that their proposed technique produces the most accurate results.
•
Content features- These feature were derived from the payload of the TCP packets using domain knowledge. They include features like the number of failed login attempts and whether or not root access was obtained. Time-based traffic features- Calculated over a two second time interval$ these features include things like the number of connections to the same host as the current connection and the number of connections do the same service as the current connection. Host-based traffic features- Analogous tm the time-based traffic features, host-based traffic features are derived over the past 100 connections. They are meant to catch attacks that span longer than two seconds.
TCP/IP Packets
Connection Records
Feature Extraction Intrinsic Features
Content Features
Traffic Features
Classification
Normal Class
or
Attack
Figure 1. Problem Formulation of Network Intrusion Detection A connection in the training data was either a normal connection or was one of 24 different attack types. If order to make the task more realistic, the test data contained an additional 14 attack types not present in the training data. Each confection was either normal or fall into one of the following 4 categories of attacks such as: Remote-to Local, User-to-Root, and Denial-of Service. Probing attacks.
3. PROBLEM FORMULATION From the pattern recognition point of view, the network intrusion detection problem can be formulated as shown in Figure 1. The term “connection” refers to a sequence of data packets related to a particular service, e.g., the transfer of a web page via the http protocol. As the aim of a network intrusion detector is to detect those connections that are related to malicious activities, each network connection can be defined as a “pattern” to be classified.
4. ENSEMBLE OF CLASSIFIERS Ensemble of Classifiers which comes under Decision committee learning has demonstrated spectacular success in reducing classification error from learned classifiers. These techniques develop a classifier in the form of a committee of subsidiary classifiers. The committee members are applied to a classification task and their individual outputs combined to create a single classification from the committee as a whole. This combination of outputs is often performed by majority vote. Three decision committee learning approaches, AdaBoost, Multi Boosting and Bagging have received extensive attention. They are recent methods for improving the predictive power of classifier learning systems.
The KDDCup 1999 Intrusion Detection Evaluation data set consists of about 5 million connections of labeled training data and 2 million connections of test data [19].The connections were in chronological order. Each connection was described by 41 features. The features can be categorized as follows: • Basic TCP features- These features include the duration, protocol type, and service of the connection, as well as the amount of data transferred.
511
International Conference on Advances in Computing, Communication and Control (ICAC3’09)
of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set. The weight of this distribution on training example i on round t is denoted Dt (I). Initially, all weights are set equally, but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on the hard examples in the training set. The weak learner’s job is to find a weak
Some classification methods are unstable in the sense that small perturbations in their training sets may result in large changes in the changes in the constructed classifier. Breiman [2] proved that decision tress with neural networks are unstable during classification. Unstable classifications can have their accuracy improved by perturbing and combining, i.e. generating a series of classifiers by perturbing the training set, and then combining these classifiers to predict together. Boosting is one of the efficient perturbing and combining methods. Though a number of variants of boosting are available, we use the most popular form of boosting, known as AdaBoost (Adaptive Boosting) for our experimentation.
hypothesis ht : X→ {-1, +1} appropriate for the distribution Dt. The goodness of a weak hypothesis is measured by its error
∈t = Pri
MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with Wagging. It is able to harness both AdaBoost’s high bias and variance reduction with Wagging’s superior variance reduction.
hypothesis
yi is in some label set
ht has
been
received,
AdaBoost
chooses
a
smaller. The distribution Dt is next updated using the rule specified. The effect of this rule is to increase the weight of examples misclassified by
ht
, and to decrease the weight of
correctly classified examples. Thus, the weight tends to concentrate on hard” examples. The final hypothesis H is a weighted majority vote of the T weak hypotheses where
αt
is
the weight assigned to ht .
4.2 MultiBoosting MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with Wagging [21]. MultiBoosting can be considered as wagging committees formed by AdaBoost. A decision has to be made as to how many subcommittees should be formed for a single run, and the size of those sub-committees. In the absence of an a-priori reason for selecting any specific values for these factors, the current implementation of MultiBoosting, takes as an argument a single committee size T, from which it by default sets the number of sub-committees and the size of those sub-committees to T .As both these values must be whole numbers, it is necessary round off the result. The details can be found in [21]. For ease of implementation, this is achieved by setting a target final subcommittee member index, where each member of the final committee is given an index, starting from one. This allows the premature termination of boosting one sub-committee, due to too great or too low error, to lead to an increase in the size of the next sub-committee. . If the last sub-Committee is prematurely terminated; an additional sub-committee is added with a target of
,….,
space X, and each label
(1)
α t measures the importance that is assigned to ht . Note that α t ≥0 if ε t ≤1/2 (which can be used without loss of generality), and that α t gets larger as ε t gets
Boosting is a general method for improving the accuracy of any given learning algorithm. Boosting refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate rules of thumb. Boosting has its roots in a theoretical framework for studying machine learning called the “PAC” learning model, due to Valiant [20] & Kearns and Vazirani [7], for a good introduction to this model. They were the first to pose the question of whether a “weak” learning algorithm which performs just slightly better than guessing in the PAC model can be “boosted” into an arbitrary accurate “strong” learning algorithm. Finally, the AdaBoost algorithm, introduced by Freund and Schapire [6], solved many of the practical difficulties of the earlier boosting algorithms, and is the focus of this paper. The
xi belongs to some domain or instance
Dt (i )
parameter α t . Intuitively,
4.1 AdaBoost
( xm , ym ) , where each
∑ i:ht ( xi ) ≠ yi
Alternatively, when this is not possible, a subset of the training examples can be used to train the weak learner. Once the weak
Databases can have nominal, numeric or mixed attributes and classes. Not all classification algorithms perform well for different types of attributes, classes and for databases of different sizes. In order to design a generic classification tool, one should consider the behavior of various existing classification algorithms on different datasets. In this work, classification algorithms based on AdaBoost, MultiBoosting and Bagging are analyzed. These are applied to KDDCup’99 dataset and then evaluated for accuracy by using 10-fold cross validation strategy [22].
( x1 , y1 )
[ht ( xi ) ≠ yi ] =
Notice that the error is measured with respect to the distribution Dt on which the weak learner was trained. In practice, the weak learner may be an algorithm that can use the weights Dt on the training examples.
Bagging (Bootstrapped Aggregating) on the other hand, this combined voting with a method for generating the classifiers that provide the votes. The simple idea was based on allowing each base classifier to be trained with a different random subset of the patterns with the goal of bringing about diversity in the base classifiers.
algorithm takes as input a training set
Dt
Y. For most of
this paper, we assume Y= {-1, +1}; later, we discuss extensions to the multiclass case. AdaBoost calls a given weak or base learning algorithm repeatedly in a series of rounds t=1… T. One
512
International Conference on Advances in Computing, Communication and Control (ICAC3’09)
completing the full complement of committee members. If this sub-committee also fails to reach the target, this process is repeated; adding further sub-committees until the target total committee size is achieved. In addition to the bias and variance reduction properties that this algorithm may inherit from each of its constituent committee learning algorithms, MultiBoost has the potential computational advantage over AdaBoost that the sub-committees may be learned in parallel, although this would require a change to the handling of early termination of learning a sub-committee. The AdaBoost process is inherently sequential, minimizing the potential for parallel computation. However, each classifier learned with wagging is independent of the rest, allowing parallel computation, a property that MultiBoost inherits at the sub-committee level.
intrusion detection dataset for testing our experiment. The selected subset contains 12790 connection records with non zero values, because some attacks are represented with few examples and the attack distribution in the large data set is un-balanced. We have carried out our experiments with full data set for training and 10-fold cross validation for testing purposes on a system with 2.667GHz Pentium-4 processor having 40GB HDD and 512MB DDR400 RAM running windows XP.
5.1 Evaluation and Discussion This section presents experimental results using AdaBoost, Multi Boosting and Bagging with REP tree as base classifier along with the results obtained from various existing algorithms. In Table-2, comparison of our proposed algorithm with all other existing methods is done with respect to the error rate and time taken to build the model. The confusion matrix for all these algorithms to build an efficient intrusion detection model is shown in Table-3. Classification rate for various existing classifiers is shown in Table 4. We have also compared the results of our proposed methods with various base classifiers like, C4.5, Decision stumps, which is shown in Table 5. The results show the effectiveness of our proposed methodology in terms of High detection rate, low false positive rate, and its ability to detect the rare attacks, at the cost of somewhat more time needed to build the model.
4.3 Bagging Bootstrapped Aggregating (Bagging) combines voting with a method for generating the classifiers that provide the votes. The simple idea was based on allowing each base classifier to be trained with a different random subset of the patterns with the goal of bringing about diversity in the base classifiers. Devising different ways of generating base classifiers that perform well but are diverse (i.e. make different errors) has historically been one of the most active subtopics within ensemble methods which comes under Meta classifier research. Bagging is a simple example of one such method of generating diverse base classifiers; therefore, we discuss it in more details here. Bagging generates multiple bootstrap training sets from the original training set and uses each of them to generate a classifier for inclusion in the ensemble. The algorithms for bagging and doing the bootstrap sampling (sampling with replacement) are shown in Table 1.
The performance evaluation of each type of the algorithm is measured with the help of ROC (Receiver Operating Characteristic) analysis [10]. Measurement of area under the ROC (AUC) [8] allows for performance evaluation which is a measure of effectiveness of IDSs, as shown in Figure 3. This can also be viewed as a performance measure that is integrated over a region of possible operating points. In this paper, precision-recall analysis, which is also treated as a performance evaluation criterion is shown in Figure 4. , in order to demonstrate the efficacy of the proposed analysis. It can be observed from all these analysis that our proposed Bagging algorithm with REP tree as base classifier performs better than the other exising algorithms. The kappa statistics for all values, as shown in Figure-2 suggest that the ability of the various classification methods examined could be considered as good, i.e. the classifier stability is very strong. Root mean square error in the same way suggests that the error rate is very small, which can be considered as a measure of effectiveness of the model.
Table 1. Bagging Algorithm Bagging (T, M) For each m=1, 2… M, Tm = Sample with Replacement (T,
T
)
hm = Lb (Tm ) Return M
h fin ( x ) = arg max ∑ I ( hm ( x ) = y ) y∈Y
Table 2. Comparison of data mining algorithms
m =1
Sample with Replacement (T, N) S= φ For I = 1, 2... N,
R = random integer (1, N)
Add T[r] to S.
5. EXPERIMENTAL RESULTS AND DISCUSSIONS In this section, the performance of each of the above data mining algorithm based on classification techniques are discussed along with their significance. We select a subset of 10% KDDCup’99
513
Experiments
Overall Rate
Error
Time taken to build the Model in Seconds
Best KDD
7.29%
-
Naïve Bayes[15]
3.56%
0.11
J48[16]
3.47%
0.88
ID3[16]
3.47%
1.8
AdaBoost+REP tree
0.7351%
22.5
MultiBoosti+REP tree
0.7429%
23.5
Bagging +REP tree
0.6257%
31.05
International Conference on Advances in Computing, Communication and Control (ICAC3’09)
simplicity, elegance, robustness and effectiveness. Finally, there are some open problems in Ensemble classifiers for data mining applications where comprehensibility is crucial,. In that, voting methods normally result in incomprehensible classifier that cannot be easily understood by end-users. These are some future scope for the proposed research.
Table 3.Confusion Matrix and comparison Probe
DoS
U2R
R2L
FPR NaïveBayes
0.0014
0.26
0.000163 0.00025
J48
0.0008
0.0458
0.0004
ID3
0.0006
0.0458
0.00038 0.000229
AdaBoost
0.00249 0.00208
0.0018
0.00094
MultiBoosting
0.00241 0.00397
0.0032
0.000863
Bagging with REP tree
0.00241 0.00187
0.00179 0.000942
0.00026
Comparison of Meta Classifiers Performance
NaïveBayes
96%
99%
90.47%
93.16%
92.41%
40%
75%
ID3
94.75%
92.41%
41.86%
77.94%
AdaBoost
96.2%
99.87%
MultiBoosting
96.32% 99.76%
Bagging with REP tree
96.3%
99.88%
NaïveBayes
99.8%
99.5%
J48
86.17%
99.24%
ID3
86.26%
99.3%
AdaBoost
98.7%
99.4%
MultiBoosting
98.44% 99.19%
Bagging with REP tree
98.8%
43.9% 0%
Accuracy
Root Mean Square Error
ng +R EP
Precision Rate J48
Kappa
1.2 1 0.8 0.6 0.4 0.2 0 tre B oo e st in g+ R e M p ul tre tiB e oo st +R EP tre e
Actual\ Predicted
B ag gi
90%
81%
Figure 2. Meta Classifiers Performance Comparison
82.54%
ROC of Classifiers
44%
81%
60.3%
14.2%
1
100%
100%
0.5
100%
100%
0
Different
Detection 1.5 Rate
Recall Rate
99.5%
100% 0%
98%
-0.5 0
98.11%
94.7%
0.004
MultiBoosting with REP tree
0.006
Figure 3. Receiver-Operating Characteristics analysis of different Ensemble Classifiers
NaïveBayes
0.13%
0.02%
13.83%
0.76%
ID3
13.74%
0%
R
AdaBoost
1.3%
0.566%
0%
1.92%
E
Multi Boosting
1.55%
0.803%
0%
1.88%
C
Bagging with REP tree
1.2%
0.45%
5.2%
89.2%
A
39.68% 85.8% 0%
0.002
91%
J48
0.69%
AdaBoost with REP Tree
False Alarm Rate
FNR 0%
Bagging with Rep tree
Precision- Recall Characteristics ofDifferent Ensemble Classifiers
0%
Bagging with
1.5 1 0.5 0 0
L
F-Value
NaïveBayes
1
J48
0.8953
ID3
0.9031
0.9573
0.5901
AdaBoost
0.974
0.996
0.61
MultiBoosting
0.9737
0.9947
0.0
0.8965
Bagging with REP tree
0.975
0.997
0.6
0.857
0.992
0.72
0.25
0.9571
0.5714
0.8571
0.5
1
Precision Rate
L
1.5
Rep Tree Boosting with REP tree
MultiBoostin with g REP tree
RATE
0.876
Figure 4. P-ROC of Different Ensemble Classifiers
0.887
Table 4. Classification rate of Existing Methods [15]
6. CONCLUSIONS In this paper, we have compared the effectiveness of the Ensemble classification algorithm with REP tree with the other existing algorithms by taking different base classifiers into consideration. This helps one to construct an effective network intrusion detection system. It is observed from the results obtained by our experimentation that the proposed ensemble method especially bagging is quite appealing because of its
514
Probe
DoS
U2R
R2L
MLP
0.887
0.972
0.132
0.056
K-Mean
0.876
0.973
0.298
0.064
Nearest Cluster
0.888
0.971
0.002
0.034
Incremental RBF
0.932
0.730
0.061
0.059
Leader Alg.
0.838
0.972
0.066
0.001
Hyper sphere
0.848
0.972
0.083
0.010
International Conference on Advances in Computing, Communication and Control (ICAC3’09)
Fuzzy ARTMAP
0.772
0.97
0.061
0.037
Quadratic Classifier
0.902
0.824
0.228
0.096
[9] Kotsiantis, S.B. ,.Kanellopoulis, D . and.Pintelas,.P.E 2006. Local Boosting of Decision Steps for Regression and Classification Problems. Journal of Computers, 1, 4(July 2006), Academic Publisher, 30-37. [10] Landgrebe, T.C.W., Bradley, A.P., et.al. 2006. Precision– Recall operating characteristic (P-ROC) curves in imprecise environments, Proc. Of 18th Intl. Conference on pattern recognition, 4(2006). IEEE, 123-127.
Table 5. Comparison of ensemble of classifiers over different data sets. Meta Classifiers
Iris Dataset
Wavefor m Dataset
Students Dataset
KDDCup’ 99
Bagging+C4.5
94.8
82.81
86.49
AdaBoost+C4.5
94.47
83.32
81.44
MultiBoosting+C4. 5
94.47
83.73
81.68
Bagging + Decision Stump
70.33
57.41
87.22
67.94
AdaBoost+Decision Stump
95.07
67.68
87.16
77.68
MultiBoosting+Dec ision Stump
94.73
66.44
86.95
77.67
Not provide d
Not provided
Not provided
99.37
Bagging +REP tree AdaBoost+ tree
REP
MultiBoosting+RE P tree
[11] Lee, S.C and Heinbuch, D.V. 2001. Training a Neural Network based Intrusion Detector tm recognize novel attacks, IEEE transactions on System, Man and Cybernetics, 31,4(2001), 294-299.
IDS Dataset
Not provided
[12] Lee, W. and Stolfo, S. J. 1994. Data Mining approaches for intrusion detection, in Proc. of Very large data bases. [13] McHugh, G., Christie, A. and Allen, J. 2000. Defending Yourself: The role of Intrusion Detection Systems, IEEE software, (Sept. /Oct. 2000), 42-51. [14] Oza, N. C. and Tumer, K. 2008. Key Real-world Applications of Classifier Ensembles, Information Fusion, Special Issue on Applications of Ensemble Methods, 9,1(2008),4-20. [15] Panda, M, and.Patra, M.R 2007. Network intrusion Detection using Naïve Bayes, IJCSNS, 7, 12(2007), 258263.
99.26
[16] Panda, M. and Patra, M.R. 2008. A study of classification algorithms for intrusion detection, Proc. of the 1st. International conference on emerging trends in engineering and technology (India, 2008), IEEE, 504-507.
99.257
7. REFERENCES [1] Allen, J., Christie, A., Fithen, G., McHugh, Pickle, J., Storner, E., 2000. State of the Practice of Intrusion Detection Technologies, Technical Report, CMU/SEI-99TR-028.
[17] Proctor, P.E. 2001. The Practical Intrusion Detection Hand Book, Prentice Hall. [18] Ryan, J.,.Lin, M.J .and Miikkulainen,R. 1998. Intrusion Detection with neural Networks, in Advances in Neural Information processing systems 10,M.Jordan et al., Eds.,(Cambridge, 1998). MA: MIT Press, 943-949.
[2] Breimn L., 1996. Bagging Predictors, Machine Learning. 24, 3(1996), 123-140. [3] Debar, H., Becker, M., and Siboni, D. 1992. A Neural Network Component for an Intrusion detection System, Proc. Of the IEEE symposium on Research in Security and Privacy, (Oakland, CA, USA, 1992), .240-250.
[19] The Third International KDDM tools Competition, May 2002. http:// kdd.ics.uci.edu/databases/kddCup99.html [20] Valient, L.G., 1984. A theory of the learnable, Commun. ACM, 27, 11(1984), 1134-1142. DOI bookmark: http://doi.acm.org/10.1145/1968.1972http://portal.acm.org/ citation.cfm?id=1972
[4] Duda, R., Hart, P. and Stork, D.G. 2001. Pattern Classification, John Wiley &Sons. [5] Elkan, C. 2000. Results of the KDD’99 Classifier Learning, ACM SIGKDD Explorations, 1(2000), 63-64.
[21] Webb, G.I., 2000. MultiBoosting: A technique for combining Boosting and Wagging, Editor Robert Schapire, in Machine Learning, 40(2000), Kluwer Academic Publisher. 159-39.
[6] Freund, Y. and Schapire, R.E. 1997. A decision- theoretic generalization of on-line learning and an application to boosting. J. Compu. Syst. Sci, 55, 1(Orlando, FL, USA, 1997). Academic Press Inc.119139. DOI:10.1006/jcss.1997.1504.
[22] White, S. and Jagielska. 2004. Investigation into the application of data mining techniques to classification of call centre data, in decision support in an uncertain and complex world: IFIPTC81WG8.3, International conference,(2004), 793-802.
[7] Kearns M. J., and Vazirani, U.V. 1994. An introduction to Computational Learning Theory, MIT Press. [8] Kerdprasop, N. and Kerdprasop. 2003. Data Partitioning for Incremental Data Mining, Proc. of 1st. International forum on Information and computer Technology, (Jan9-10, 2003, Sizouka Univ., Hamamatsu, Japan), 114-118.
[23] Pan Zhi-Song, Chen Song-can, Gen-Baottle and Dao-Qiang zhang, 2003. Hybrid neural Network and C4.5 for misuse detection, Proc. Of 2nd Intl. conference on mach. Learn. and cybernetics, (Xian, 2-5 November 2003), IEEE, 24632467.
515