A DGC-Based Data Classification Method Used for Abnormal Network Intrusion Detection Bo Yang1,2, Lizhi Peng2, Yuehui Chen2, Hanxing Liu1, and Runzhang Yuan1 1
State Key Lab. of Advanced Technology for Materials Synthesis and Processing, Wuhan University of Science and Technology, China 2 School of Information Science and Technology, Jinan University, Jinan, 250022, China
[email protected]
Abstract. The data mining techniques used for extracting patterns that represent abnormal network behavior for intrusion detection is an important research area in network security. This paper introduces the concept of gravitation and gravitation field into data classification by utilizing analogical inference, and studied the method to calculate data gravitation. Based on the theoretical model of data gravitation and data gravitation field, the paper presented a new classification model called Data Gravitation based Classifier (DGC). The proposed approach was applied to an Intrusion Detection System (IDS) with 41 inputs (features). Experimental results show that the proposed method was efficient in data classification and suitable for abnormal detection using netowrk processor-based platforms. Keywords: Data Classification, Network Intrusion Detection, Data Gravitation based Classifier, Network Processor.
1 Introduction Intrusion detection system (IDS) is an important component of today’s network security framework. Its main idea is to differentiate between normal activities of the network system and behavior that can be classified as suspicious or intrusive. IDS approaches can be divided into two main categories: misuse or anomaly detection [1]. Anomaly detection systems assume that an intrusion should deviate the system behavior from its normal pattern. This approach can be implemented using variety of data classification approaches such as statistical methods, neural networks, predictive pattern generation and association rules. The classification model in an IDS is usually constructed according to a given training set. Once the model has been built, it can map a test data to a certain class in the given class set. Many classification techniques including decision tree [2,3], neural network (NN) [4], support vector machine (SVM) [5, 6], etc. have been proposed. Among these techniques, decision tree is simple and easy to be comprehended by human beings. It can get high classification efficiency, but its classification accuracy is usually lower than neural network. Neural network has been proved to be an approach that can get high accuracy in many classification tasks, however, its training efficiency is usually a problem. SVM is a new machine learning method developed on I. King et al. (Eds.): ICONIP 2006, Part III, LNCS 4234, pp. 209 – 216, 2006. © Springer-Verlag Berlin Heidelberg 2006
210
B. Yang et al.
the Statistical Learning Theory, which is gaining popularity due to many attractive features, and promising empirical performance. But SVM is based on the hypothesis that the training samples obey a certain distribution, this restricts its application scope. Rough set [7] is also been applied to data classification in recent years, it was used for feature selection [8] or hybridize with other classification methods [9,10,11,12]. Y. Shi et al. presented a novel data preprocessing technique called shrinking [13]. This technique optimizes the inner structure of data inspired by the Newton's Universal Law of Gravitation. In [14], a dimension deduction approach for multidimensional data analysis was also presented according to shrinking technique. A spatial clustering algorithm called GRAVIclust was proposed in [15]. This algorithm uses a heuristic to pick the initial cluster centers and utilizes centre of cluster gravity calculations in order to arrive at the optimal clustering solution. Although both of the two former approaches are focus on the clustering problem, they both had been inspired by the concepts of physical gravitation. The natural principles of gravity was further applied to another research area in [16]. This paper introduces the concept of gravitation and gravitation field into data classification by utilizing analogical inference, and studied the method to calculate data gravitation. Based on the theoretical model of data gravitation and data gravitation field, the paper presented a new classification model called Data Gravitation based Classifier (DGC). The proposed approach was applied to the intrusion detection problem with 41 features. Experimental results show that the proposed method is efficient in data classification for network intrusion systems.
2 Data Gravitation and Data Gravitation Field [17] Definition 1. (Data Particle): Data particle is defined as a kind of data unit that has “Data Mass”. Data particle is made up of a group of data elements in data space that have a certain relationship between them. The “mass” of a data particle is the number of data elements that the data particle contains. An data particle composed by only one data element is called “atomic data article”. The data mass of an atomic data particle is 1. Definition 2. (Data Centroid): Suppose x1, x2, . . . , xm ( xi =< xi1, xi2, . . . , xin >, i = 1, 2, . . . , m) are a group of data elements in n-dimensional data space S, P is a data article built up by x1, x2, . . . , xm. Therefore, the data centroid of P, xi =< x01, x02, . . . , x0n > is the geometrical center of x1, x2, . . . , xm. It can be described with following formula:
x0 j =
∑im=1 x ij m
, i=0, 1, . . . , m; j=0, 1, . . . n
(1)
Since data particle has data mass and data centroid, so data particle can be described by a pair expression < m, x >, after the class information (feature y) has been added, it can be described as a triple expression < m, x, y > where m is the data mass of the data particle and x is the data centroid.
A DGC-Based Data Classification Method
211
Definition 3. (The Law of Data Gravitation): Data gravitation is defined as the similarity between data. It is a kind of scalar quality without direction. The data gravitation can be described as:
F=
m1 m 2
(2)
r2
Where F is Gravitation between two data particles; m1 is Data mass of data particle 1; m2 is Data mass of data particle 2; r is The Euclidian distance between the two data particle in data space. The data gravitations from data particles in the same class also obey the superposition principle:
Lemma 1. (Superposition Principle): Suppose P1, P2, . . . , Pm are m data particles in the same data class, the gravitations they act on another data element are F1, F2, . . . , Fm, m
and then the composition of gravitations is: F = ∑ Fi . i =1
Definition 4. (Data Gravitation Field): Data particles act on each other by data gravitation, and form a field that congests the whole data space. This field is named as data gravitation field. Because data gravitation can belong to different data classes, when data gravitation field is discussed, it refers to the field that is formed by the same kind of data gravitations. Field strength is a key factor of data gravitation field. Field strength of an appointed point equals the composition of data gravitations that all data elements belong to the same data class act on an atomic data particle on the appointed point. Similar to the isopiestic surface in physical field, all points in data gravitation field that have equivalent field strength form a surface in data space, and this surface is called isopiestic surface in data gravitation field.
3 Data Classification Based on Gravitation Based on the data gravitation and data gravitation field, a new classification scheme can be given. The main ideas of this classification scheme are: 1) A training data particle set is formed according to the training data set. The calculation of data particles obeys some certain principles. 2) All test data in the test set are treated as atomic data particles. And any data particle in training data particle set has data gravitation on any test atomic data particle. 3) Gravitations between training data particles and test atomic data particles obey the Law of Data Gravitation. 4) Once training data particle set has been built, the data gravitation field in the data space has also been built and data gravitation field strength on any position in the data space can be calculated. 5) The degree of a test data element belongs to a data class is determined by the data gravitation field strength on the test data's position, and the gravitation field refers to the field produced by the before-mentioned data class.
212
B. Yang et al.
3.1 Principle of Classification
Lemma 2. Suppose c1, c2 are two data classes in training data set. For a given test data element P, the gravitation c1 acts on P is F1, and F2 is the gravitation c2 acts on P. If F1> F2, then the degree of P belongs to c1 is stronger than that to c2. Fig.1 describes the principle of classification. Suppose T={, , . . . ,} is a training set in n-dimensional data space, y ∈ {c1, c2, . . . , ck}, ci represents data class i, k is the number of data classes, l is the number of training samples. A new set of training data particles is created from the original training set. The new training data particle set is T’={, < m2, x2’, y2>, . . . ,< ml, xl’, yl>}, where l' is the number of data particles, l’ ≤ l, xi' is the centroid of data particle i, mi is the data mass of data particle i. After the training data particle set has been built, the strength of data gravitation field on any position in data space can be calculated. So when a test data element is given, which data class it belong to can be determined by the field strength of the data class. Suppose c1, c2, . . . , ck are the data classes in training set, they have l1, l2, . . . , lk samples (data elements), the training data particle set created from training set has l1’+ l2’+ . . . + lk’ data particles, where li’ is the number of data particles which belong to data class i. A given test data can be treated as an atomic data particle P, the centroid is its position x.
Fig. 1. Classification Based on Data Gravitation. The strength of gravitation determines which class a test data element belongs to. The black dots denote data particles in class c1. The circles denote data particles in class c2 .
The gravitation that data class i act on it is: li
Fi = ∑
j =1| x ij
mij − x |2
(3)
Where mij is the data mass of data particle j in data class i, xij is its centroid. If Fi’=max{F1, F2, . . . , Fk}, then according to lemma 2, the test data element belongs to data class i'.
A DGC-Based Data Classification Method
213
3.2 Principle to Create Data Particle
The simplest method to create data particle is to treat a single data element as one data particle. By means of this, a training sample in training data set can create a data particle. This method is simple and easy to realize, but the shortage of the method is also obvious: The calculation will grow up tremendously with the expanding of the training data set and the efficiency of classification will be reduced smartly. Another method to create data particle is the Maximum Distance Principle (MDP), the algorithms can be found at [17].
4 Data Classification in Intrusion Detection The data for our experiments was prepared by the 1998 DARPA intrusion detection evaluation program by MIT Lincoln Lab[19]. This data set has 41 features and five different classes named Normal, Probe, DoS, U2R and R2L. The training and test set comprises of 5092 and 6890 records respectively. As the data set has five different classes we performed a 5-class binary classification. The normal data belongs to class 1, Probe belongs to class 2, DoS belongs to class 3, U2R belongs to class 4 and R2L belongs to class 5. 4.1 Feature Selection
Feature selection is very important in data mining because the quality of data is an important factor which can affect the success of data mining algorithms on a given task. According to the lemma and method in CLIQUE clustering algorithm, we studied a effective feature selection principle by utilizing the Lemma of Monotonicity[20].Table 1 gives the result of feature selection using this algorithm. Table 1. The feature selection result CLASS Class 1 Class 2 Class 3 Class 4 Class 5
IMPORTANT VARIABLES (FEATURES) 3,10,23…26, 29,30,32,33,34,35, 38…41 3,23,24,25,27,29, 30,32,33,34,35,36,38,40 1,3,5,6,10,11,22…41 3,23,24,33 2,3,23,24,33
4.2 Classification Algorithm and Experiment Results
Suppose the number of data elements in the test data set is m, the whole detection algorithm can be described as follows[17]: 1). Select important features using the Lemma of Monotonicity; 2). Build up the training data particle set;
214
B. Yang et al.
3). for i=1 to m Calculate the gravitations that normal training set acts on the test data ti; Calculate the composition of normal gravitations Fn; Calculate the gravitations that anomaly training set acts on the test data ti; Calculate the composition of normal gravitations Fa; if Fa > Fn then ti is an anomaly data. else ti is a normal data. end if end for For comparison purpose, two other classic classification methods named ID3 [2] and C4.5 [21] were applied in the experiment. A neural network classifier trained by MA with flexible bipolar sigmoid activation functions was constructed using the same training data sets and then the neural network classifier was used on the test data set to detect the different types of attacks. All the input variables were used for the experiments. Table 2 depicts the detection performance of the NN by using the original 41 variable data set, the table also shows the detection results using C4.5 decision tree. The data from the table shows that DGC can get higher detection performance than NN and C4.5 do, except U2R and R2L attacks. Table 2. Detection accuracy using DGC, NN,and C4.5 classification models ATTACK CLASS Normal Probe DOS U2R R2L
DGC 99.93% 97.69% 97.92% 99.59% 98.59%
IXP-2400
PC Server
Xscale-Core PCI
Ethernet
Micro-Engines
SDRAM
NN 96.82% 95.00% 88.40% 99.79% 98.92%
DGC-based Packet Classifier
C4.5 82.32% 94.83% 77.10% 99.83% 94.33%
Web Server
Sockets
Web-based User Interface
Data Sets
DISK
Administrators
Fig. 2. Prototype Platform based on IXP-2400 Network Processor
A DGC-Based Data Classification Method
215
4.3 A Prototype Platform Based on Network Processor
To test the classification method under a real IDS environment, we are constructing a prototype platform using a PCI-based Radisys ENP-2611 network processor developing board. In the test platform, an IXP-2400 network processor with eight microengines is adopted to give fast processing of the Ethernet packets. The data sets gathered are processed by the XScale embedded processor and a more powerful PC server with ENP-2611 attached. The feature collected from the network processor will be tested via the DGC module running on the PC server to determine if the network activity is normal or deviant. The micro-engines in the network processor can also discard the deviant packets with the same feature in the future according the decision of the DGC module. A web-based user interface is used for the administrative purpose in the system. Figure 2 shows the architecture of the platform.
5 Conclusion and Future Works The experiment results on the intrusion detection data set proved that the DGC model is very effective. As evident from Table 2, the DGC model gave the best accuracy for most of the data classes (except U2R and R2L). As demonstrated in the paper, the feature selection and the choice of training data set may influence the accuracy of classification. Now an improved method named WDGC (Weighted-feature DGC) is studied, which proposed the concept of weighted feature. By weighting every feature of target classification problem, the degree of importance of every feature can be obtained by its weight. Besides of this, a test prototype of Intelligent Intrusion Detection System which adopts the IXP-2400 network processor (NP) is also under development. Acknowledgments. This work was supported by the National Natural Science Foundation of China (No.69902005); the National High Technology Development Program of China (No.2002AA4Z3240); and the Science and Technology Development Project of Shandong Province (No.2004GG1101023).
Referencess 1. J. Allen, A. Christie, W. Fithen, J. McHugh, J. Pickel, and E. Stoner. State of the Practice of Intrusion Detection Technologies. CMU/SEI-99-TR-028. Carnegie Mellon Software Engineering Institute. 2000. 2. J. R. Qinlan, Introduction of decision trees. Machine Learning, 1986;1, pp 86-106. 3. Freund Y, Boosting a weak learning algorithm by majority, Information Computation, 1995;121, pp 256-285. 4. Lu Hongjun, Setiono Rudy, Liu Huan, Effect data mining using neural networks, IEEE Transaction on knowledge and data engineering, 1996; 8(6), pp 957-961 5. B. E. Boser, I. M. Guyon, V. N. Vapnik, "A trining algorithm for optimal margin classifiers", Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, ACM Press,1992, pp 144-152. 6. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag, 1995. 7. Pawlak A, Rough sets ,theoretical aspects of reasoning about data, Dordrecht :Kluwer Academic Publishers , 1991.
216
B. Yang et al.
8. Xiaohua Hu, Nick Cercone, Data Mining via Discretization, Generalization and Rough Set Feature Selection, Knowl. Inf. Syst. 1(1), pp 33-60 (1999). 9. Minz S, Jain R, Rough Set based Decision Tree Model for Classification, In Proc of 5th Intl. Conference, DaWaK 03, Prague, Czech Republic, Springer, LNCS 2737, September (2003), pp 172-181. 10. Minz S, Jain R, Hybridized Rough Set Framework for Classification: An Experimental View. HIS 2003, pp 631-640 11. Peters, J. F., Skowron A, A Rough Set Approach to Knowledge Discovery, 17'th IJIS, 2002, pp 109-112. 12. Xiaohua Hu, Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications, ICDM 2001, pp 233-240. 13. Yong Shi, Yuqing Song and Aidong Zhang, A shrinking-based approach for multidimensional data analysis, In the 29th VLDB conference, Berlin, Germany, September 2003, pp 440-451. 14. Yong Shi and Aidong Zhang, A Shrinking-Based Dimension Reduction Approach for Multi-Dimensional Data Analysis, The 16th International Conference on Scientific and Statistical Database Management, Santorini Island, Greece, June 2004. 15. M. Indulska, M. E. Orlowska, Gravity Based Spatial Clustering, ACM Symposium on GIS, 2002. 16. Barry Webster, Philip J, Bernhard, A Local Search Optimization Algorithm Based on Natural Principles of Gravitation, IKE'03, Las Vegas, Nevada, USA, June 2003, Volume 1, CSREA Press 2003, pp 255-261. 17. Lizhi Peng, Yuehui Chen, Bo Yang, Zhenxiang Chen, "A Novel Classification Method Based on Data Gravitation", In Proc. of International Conference on Neural Networks and Brain (ICNN&B), 2005,pp.667-672. 18. J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings Fifth Berkeley Symposium on Math. Stat. and Prob, University of California Press, 1967, pp 281-297. 19. KDD cup 99 Intrusion Detection Data Set, online at http://kdd.ics.uci.edu/database/ kddcup99/kddcup.data\_10\_percent.gz 20. R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan, Automatic Subspace Clustering of High dimensional Data for Data Mining Applications. Proc, ACM SIGMOD'98 Int.Conf. on Management of Data, Seattle, WA (1998), pp 94-105. 21. Quinlan J R, C4.5: Program for Machine Learning, Morgan Kauffman Publishers, 1993.