Document not found! Please try again

A Cooperative Network Intrusion Detection Based ... - Semantic Scholar

8 downloads 287 Views 418KB Size Report
detection effect with cooperative network intrusion detection based on multi ..... set B. The first method is that we look the set SVs as the clustering center set and ...
JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

475

A Cooperative Network Intrusion Detection Based on Fuzzy SVMs Shaohua Teng, Hongle Du, Naiqi Wu, Wei Zhang, Jiangyi Su Guangdong University of technology Guangzhou, Guangdong, China, 510006 [email protected], [email protected], [email protected] Abstract—As the network information includes a large

number of noise data, in order to reduce or eliminate the noise impact on constructing the hyperplane of SVM, this paper firstly preprocesses the data. Then the fuzzy membership function is introduced into SVM. The fuzzy membership function acquires different values for each input data according to different effects on the classification result. Because different network protocol has different attributes, that must affect the detection effect. This paper proposes cooperative network intrusion detection Based on Fuzzy SVMs. Three types of detecting agents are generated according to TCP, UDP and ICMP protocol. Finally, simulate with KDD CUP 1999 data set, and the experiment results show there are a better detection effect with cooperative network intrusion detection based on multi fuzzy SVMs. Index Terms—Fuzzy Support Vector Machine; Intrusion Detection; Membership Function ; Incremental Learning; Cooperative; Network

I.

INTRODUCTION

Intrusion detection is the second Line of defense in network security. Intrusion Detection System can be divided into three categories according to protecting objects: network intrusion detection system, host intrusion detection system and hybrid intrusion detection system. Network-based intrusion detection system is used to protect the local network or whole network segment, monitor the network packets and find out attacks from data packets. Then these attack actions are dealt with correspondingly, such as cut the connection, send out an alarm signal and so on. Host-based intrusion detection system is used to protect the critical computer. It recognizes penetration behavior by fetching and analyzing internal system auditable events, system logs, system status, and logs of application program, and then makes the appropriate response. Depending on the type of analysis carried out, intrusion detection systems are classified as either signature-based or anomaly-based. Signature-based schemes (also denoted as misuse-based) seek defined patterns, or signatures, within the analyzed data. For this purpose, a signature database corresponding to known attacks is specified a priori. On the other hand, anomaly detection needs to establish the user's normal behavior patterns in the protected system, and generate an

© 2010 ACADEMY PUBLISHER doi:10.4304/jnw.5.4.475-483

alarm wherever the deviation between a given observation at an instant and the normal behavior exceeds a predefined threshold. Another possibility of anomaly detection is to model the “abnormal” behavior of the system and to raise an alarm when the difference between the observed behavior and expected one falls below a given limit. Anomaly detection is used to detect unknown attacks. Both methods need to establish profiles of user behaviors. Of course they can also be used to classify the user’s behavior. There are machine learning methods for pattern recognition such as neural networks, Bayesian theory and genetic algorithm and so on. Support vector machine (SVM) is also a new machine learning method and is widely applied to the field of pattern recognition. It will be widely applied in intrusion detection Support Vector Machine is a popular topic based on statistical machine learning [1]. In a nutshell, a SVM is an algorithm that works as follows [2]. It uses a nonlinear mapping to transform the original training data into a higher dimension. Within this new dimension, it searches for the linear optimal separating hyperplane (that is, a “decision boundary” separating the tuples of one class from another). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane. The SVM finds this hyperplane using support vector (training tuples) and margins (defined by the support vectors). Especially in high-dimensional data space, the effective overcome of the dimension disaster and excessive learning problems are very important. SVM has widely been applied in pattern recognition fields [3, 4]. Network connection includes much information of user behavior. The traditional SVM-based intrusion detection methods are rarely taken into considering the differences among different network protocols. They found SVM by adopting unified data formats. That takes much time and leads to low efficient. In addition, there are also two problems which need to been solved: processing capability of large-scale training set and eliminating the impact of noise data. On the one hand with increasing the number of training samples, training time and storage space will increase dramatically (the time complexity of SVM is O(k3)). On the other hand, the ultimate decisionmaking function depends on the small number of support vectors of training samples. Therefore SVM is very sensitive to outlier and noise sample. Accordingly, this paper presents a cooperative network intrusion detection based on fuzzy SVMs. According to different network

476

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

protocols, the text shows to build a different network behavior detection classifier. The experimental results show that this method can reduce evidently the training time and storage space and improve the classification accuracy. The rest of this paper is organized as follows. Section 2 presents related work on intrusion detection based on FSVM. Section 3 gives a cooperative network intrusion detection model of FSVMs. Section 4 describes v-FSVM, fuzzy membership functions and their calculation. Section 5 presents a detailed process of building a detection agent about TCP attacks, and proposes a new incremental learning algorithm of support vector machine. Section 6 presents some experiment results done in KDD CUP 99 data set. It verifies that our method is efficient. Section 7 draws conclusions and outlines future work. II.

RELATED WORK

Intrusion detection can be seen as a classification problem. According to the network information, network behavior can be divided into to normal behavior and abnormal behavior. Therefore intrusion detection problem transforms into pattern recognition problem. Paper [4] presents a SVM-based intrusion detection model, and discussed the work process of the model. Paper [5] builds a lightweight intrusion detection system by using a feature selection algorithm, which can reduce training time and storage space. Experiments show that the method improves the detection efficiency of the system. Paper [6] combines the C-SVM-based supervision algorithm and unsupervised algorithm of One-Class SVM, and defines the RBF kernel function bases based on HVDM distance. The method is used to deal with heterogeneous data sources. Paper [7] gives a SVM based on the fuzzy C-means algorithm. Fuzzy membership functions computed iteratively forms a member matrix, which is used as power weight of inputting samples. This method improves the effect of detecting intrusion. To reduce the impact of noise data,Lin and et al [8-11] apply fuzzy technology to support vector machine, called FSVM. Categories of these samples and measures belonging different classes decide the effects on the objective function. That assures eliminating or reducing the impact of the noise and outlier samples for the objective function. Paper [9] presents a new kernel function which is constructed through fuzzy cluster method. That improves the classification effect by fuzzy measures of the samples. Paper [10] gives two membership functions to each sample point. These functions present respectively the degree of positive and negative instances. Based on the multi-SVM classifier that is proposed by Weston and Watkins [12], Lin and Li [8, 11] give an effective method that can eliminate the impact from noise and outlier points for training dataset by applying fuzzy member function. Paper [8-11] describes how to optimize FSVM by adopting different membership functions. In this paper, fuzzy membership functions are used in V-SUM, called v-FSVM. By setting

© 2010 ACADEMY PUBLISHER

samples different power weights in the object functions, samples play different roles during training. That increases FSVM efficient. Incremental learning arises out of solving two kinds of problem. One is the training of large-scale data sets, lack of memory and too much time. The other is unable to obtain a complete data set. We have to use learning on line. Data samples are accumulated during continuous applications to improve the learning accuracy. The key of incremental learning is to retain the information of original samples and how to cope with the increased samples. Syed [13], who is the first man, proposed an incremental algorithm of SVM. The idea is to acquire the support vectors by training initial sample set. Both the new data sets and the previous support vectors form new samples, and train them to produce new support vectors. In order to reduce training time, paper [13-14] eliminates samples outside support vectors. That leads to lower SVM accuracy due to lack of efficient learning. Paper [15] analyzes the relationship between KKT conditions and sample distribution and presents an increment learning algorithm. Author thinks that the initial sample set and the new sample set have the same impact of the final hyperplane. Paper [16] summarizes the current study and application of incremental support vector machine and the gives the generalized KKT conditions. The sample points near the hyperplane may be new support vectors in the new case due to increment samples. Paper [17] proposed redundant incremental learning. It means that some samples near the hyperplane are added to participate in new training according to predefined rules. That can reduce the loss of information and improve training efficiency. In order to solve the problem of incremental learning and large-scale learning, paper [18] proposed a rapid incremental SVM learning algorithm based on active set iteration. Iterations contain the results of previous work of the active set. III.

A COOPERATIVE INTRUSION DETECTION MODEL BASED ON FSVMS

A. The Architecture of Cooperative Intrusion Detection Network intrusion detection is essentially a classification problem. Network data is classified as normal and anomaly data. The network intrusion detection can be transformed into a network behavior classifier. The recognition of the network behavior is related to network protocols. The different network protocols have different formats of the network packets, which lead to different network connections, such as connection oriented TCP protocol and unconnection oriented UDP protocol. Therefore this paper use different classifiers for different network protocol data. The detection model is shown in flag1, which includes data collector, data preprocessor, detection agent and decision response and saving unit.

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

Figure 1.

477

The architecture of intrusion detection system based on multi-FSVM

B. Components A cooperative network intrusion detection model based on fuzzy SVMs includes data collector, data preprocessor, detecting agent and response unit, shown as figure 1. Data collector is used to collect network data. Data preprocessor is used for filtering, cleaning, integrating, preprocessing data, attribute selection and format conversion. After data preprocessor completes data preprocessing, data is sent to corresponding detecting agent according to TCP, UDP or ICMP protocol. Because SVM just accesses numerical data, non-numerical data must be transformed into numerical type. SVM requires that all of data have the same dimension. So we extract the effective network connection information, and the original network data has to transform into digital vector. Detection agent analyzes the data submitted by data preprocessor, and decides intrusion or not. Response unit makes corresponding decision according to the result of detecting agent. IV.

V-FSVM

A. V-SVM In the C-SVM, there are two contradictory objectives: maximize margin and minimize the training mistakes. The constant C plays the role to reconcile these two objectives, however it is difficult to select C. To solve this problem, many scholars proposed v-SVM algorithm in 2000 [19], which by introducing parameter v replace the classical C-SVM algorithm [1]. Parameter V has the practical significance: v is up-border of the proportion of classification error samples and low-border of support vector samples in the training data sets. Although v-SVM and C-SVM may get the same classification hyperplane by choosing the appropriate parameters, the parameter v in v-SVM has a specific and intuitive meaning. It is easier to select it than C in the C-SVM. This will avoid the shortcomings of the choice of parameter C relying mainly on experience in the C-SVM.

© 2010 ACADEMY PUBLISHER

B. Introducing membership into v-SVM Suppose a training data set labeled by class with associated fuzzy membership

s = {( x1, y1 , u( x1 )),( x2 , y2 , u( x2 )),L,( xl , yl , u( xl ))} . Each

n training point xi ∈ R is given a label yi ∈ {1, −1} and a fuzzy membership 0 < u ( xi ) < 1 . Since the fuzzy membership ui is the attitude of the corresponding point xi toward one class and the parameter εi is a measure of error in SVM, the term u ( x i ) ε i is a measure of error with different weighting. We transform the input data xi into a high-dimensional approximately linearly separable feature space using a nonlinear mapping Z =Φ(x). Then the optimal hyperplane problem is regarded as the following solution:

m in

1 1 < w, w > + l 2

l



i =1

u i (ε i − v ρ )

s.t. yi (< w, xi > +b) ≥ ρ − ε i

(1)

ε i ≥ 0 , ρ ≥ 0 , i = 1, 2,L , l

From (2), we can see that a smaller ui reduces the effect of the parameter εi in problem (2) such that the corresponding point is treated as less important. Those also reduce the importance of the corresponding samples xi. In order to solve the problem (2) with constrained optimization problem, define as follows function for the Lagrange: L (w, b,α , β , ρ ) = −δρ −

l

∑α i =1

i

1 T 1 w w + l 2

l



i =1

l

u i (ε i − v ρ ) − ∑ β iε i i =1

[ yi (< w , xi > + b ) − ρ + ε i ]

α i ≥ 0 , βi ≥ 0 , i = 1, 2, L, l , δ ≥ 0 (2) s.t. Solving (3) the minimum, respectively w, b, εi, ρ for the partial derivative, and let its equivalent be 0 ∂L = w − ∂w

l



i=1

α iβ i xi = 0

( 3 )

478

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

l ∂L = − ∑ α i yi = 0 ∂b i=1

∂L 1 = ui − β i − α ∂ε i l

∂L v = − ∂ρ l

l



i =1

r+ = m ax || x i − x + ||

(4)

r− = m ax || x i − x − ||

= 0

i

ui − δ +

i

And i

(5) l



α

i =1

i

So we define the membership function: u ( xi ) = 1 −

= 0

(6)

a

l



α

i=1

l

s.t.



i =1



1 2

l

l

i=1

j=1

∑ ∑

α iα

j

y

i

y jK ( xi, x j )

α i yi = 0

1 0 ≤ α i ≤ ui l αi ≥

i

(7)

vui l

Where K(xi, xj) is kernel function, and let K(xi, xj) = .. After solving this quadratic programming problem, we obtain the following classifier: f ( x ) = sgn(

u ( xi) = 1 −



xi ∈ sv

a i* y i K ( x , x i ) + b * )

(8) w* =

Where

l



i =1

a i * y iφ ( x i )



1 b * = − ( < w * , φ ( x i ) > + < w * .φ ( x j ) > ) 2 .

C. Membership terms In FSVM, many scholars do a deal of research work for the fuzzy membership. The literature [20] defines membership function to use samples to the class centre distance; because there is the lack of distance membership function because of no taking account to the number of samples in one class, the literature [21] gives the density of membership functions; The literature [22] uses the theory of rough set to define membership functions. The method can avoid the problem that radius are difficult to compute. The principles of determining the size of membership base on the importance of the samples in the category, or the samples contribution to the category. Class samples to the centre distance are one of the measures of the contribution for sample category. To determine the membership based on the distance, the membership of the samples is known as the distance from the sample to the centre in feature space Given x + and x are the centers of tow classes, and we can get radius: −

© 2010 ACADEMY PUBLISHER

|| x

i

− x r−



||

+ σ

xi

is abnormal class. is small number predefined to Where u ( x i ) = 0 . In this paper, we suppose the number of avoid samples in one cluster as the fuzzy membership. σ > 0

V.

i = 1, 2 , L , l

xi is normal class

or

Apply (4)-(7) into (3). Transform the Optimal classification of the problem based on the fuzzy membership Support Vector Machine into its dual form: m ax W (a ) =

|| x i − x + || +σ r+

DETECTION AGENT BASED ON FSVM

A. The architecture of detecting agent According to the Network protocol type, Network packets are divided into three types: TCP packets, UDP packets and ICMP packets. And the Network Data Flow is looked as three types: TCP Data Flow, UDP Data Flow and ICMP Data Flow. Therefore, three types of detection agents should be constructed to meet the TCP, UDP and ICMP detection. Every detection agent has three processes: construction, adaptive and detection. We give the TCP detection agent as follows. The UDP and ICMP detection agent are similar with TCP detection agent. The architecture of TCP detection agent is shown in figure 2. From figure 2, there are tow stages to construct every agent in the model: training and predicting. The data after preprocessing is divided into training data set and testing data set. The training stage trains support vector machine based on the samples of known types applying (7). According to (2), we can obtain support vectors and the corresponding parameters. Predicting stage is a process. It is implemented by using support vector machines to classify the network data processed with above method. According to the discriminant function (8), we can get the computing results of network behavior. The results are submitted to the decision-making system. The appropriate decision is made. The TCP detection agent based on FSVM is generated after FSVM is tested with testing data set. The self-adaptive module learns the new knowledge through incremental learning to improve the detection ability of detection agents; the detecting module gets the network data and predicts results with FSVM, and then sends them to deciding and responding module to make decisions.

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

479

Detecting intrusion Data collector Response & decision

Constructing the model Data preprocessing

Data preprocessing Trainning dataset

TCP FSVM TCP FSVM agent

TCP Dataset Test dataset

Response unit

TCP FSVM model Data preprocessing

New case New dataset Self-adaptor Figure 2.

The architecture of TCP detection agent

B Data Preprocess of TCP Trainer For every classifier, data preprocessing is similar in training stage and predicting stage. Taking the classifier of TCP data flow as an example, we discuss the

Figure 3.

realization process of every detection agent. Data preprocessing is shown in figure 3. After preprocessing, we can get data to meet SVM training. And then go to the next process.

Data preprocessing

Figure 3 is described as follow: Data cleaning is a process to deal with the problem of data inconsistency through filling in missing values, smoothing noise data, recognizing or deleting the data including very small information. Attribute selection reduce the size of analyzed data set through deleting the unrelated attributes or redundant attributes. For example, the formats of TCP, UDP, and ICMP are different, so detection agent of UDP does not need the specific attributes of TCP. And it ensures that the experiment results with smaller data set are similar or same with the original data set. Data integration combines the data that comes from different data source. Data is stored respectively according to TCP, UDP and ICMP protocol. Data transformation transforms the non-numerical values into numerical attributes in order to meet FSVM training. Data discretization is used to generalize data. It discretizes the attributes to reduce the amount of analyzed data through After preprocessing, we select 32 attributes for TCP detection agent, 21 attributes for UDP detection agent and

© 2010 ACADEMY PUBLISHER

18 attributes for ICMP detection agent from 41 attributes. Experiments verify that the results are the same as before discretization. Cluster chooses the simple unsupervised clustering algorithm (UC). C. The SVM based on clustering Based on above analyzing, this paper presents a new incremental learning algorithm combined SVM with clustering algorithm. In this algorithm, first, we deal with training set using clustering algorithm (Class label is looked as an attribute of data set. And the samples that belong to the same cluster have same class label if radius r > 1 ). Then we can get clusters O(ni , oi , yi ) (Where ni is the number of sample point; oi is the center of the cluster; yi is class label). Next, we construct new training data set using centers of clusters and ni are fuzzy membership function. Then we train new training data set with FSVM and obtain support vectors. There are two strategies to deal with new adding data set: one is to add new adding samples into the support vector set that is get

480

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

in the first step using clustering algorithm; another is only to add samples that contrary to KKT condition using UC algorithm and thrown away the samples that are satisfied with KKT condition. At last we can get new training set and train it using FSVM again. In this paper, we will compare two treatment methods with experimental results and give the corresponding analysis. 1) UC algorithm Literature [23] presents a simple UC algorithm. Compared with tradition K-means algorithm, the algorithm doesn’t need to pre-specify the number of classification. There is a high speed of clustering, so we deal with the training data set using the UC algorithm in this paper. Given the training set

T = {x1 , x2 , L , xl }, xi ∈ R n +1

. Here, we look the label of the SVM training set as a dimension, so the number of dimensions is n+1. C_number is the number of cluster. Then algorithm can be described as follows: Step 1. Read one record xi from training set. If

classification. Then we can get the new training set T = {( n1 , o1 ', y1 ), ( n2 , o2 ', y2 ), L , (n p , o p ', y p )} .Where ni is the fuzzy membership function. Finally, retrain the new

training set T using FSVM and get classifier ϕ and the support vector set SVs; Step 3. There are two methods to deal with new adding set B. The first method is that we look the set SVs as the clustering center set and deal with set B using clustering algorithm of section 2.2. Then we can get new clustering center set O’. Second method, according to KKT condition, discard the samples that meet the KKT condition and deal with the samples that contrary to KKT using the first method; Step 4.Do with O’ using the method of step 2; then we can get the new classifier ϕ ' and new the support vector set SVs ' . The process of training is shown in figure 4.

C_number = 0, Create a new cluster center O1 and set O1 = xi; otherwise go to 2; Step 2. Compute distance di between sample xi and each ok

di = ( x j1 − ok1 )2 + ( x j 2 − ok 2 )2 + L+ ( x jm − okm )2 Set d m = min ( di ) , where n is number of cluster i =1,2,L, n

particles and m is index of cluster particle; Step 3. If

d m < r , then xi is added into cluster om om =

om * nm + x j

nm + 1 ; and reset n = n + 1; and reset om: k k Otherwise recreate a cluster and set C_number =

=x

o

n

=1

j C_number + 1, C _ number , C _ number ; Step 4. If all samples are dealt with, then stop; otherwise go to 1. 2) Combining SVM with UC algorithm In order to keep more classification information of original samples and have a high accuracy of classification, the clustering radius should be smaller. If r = 0, that is to say, we do not cluster. But in order to keep more boundary support vectors and improve the speed of training, we should make the radius larger. In this paper, we set r = 1. Given training sample set with label of categories A = {( xi , yi ), i = 1, 2, L , l} and new adding

data set B = {( xi , yi ), i = 1, 2, L , p} ,

xi ∈ R n ,

y i ∈ {1, − 1} . l, p are number of samples. The

Incremental Learning SVM algorithm based on cluster can be described as follows: Step 1. Deal with the data set A using the algorithm of section 2.2. Then we can get the clustering center set O = {(n1 , o1 ), (n2 , o2 ), L , ( n p , o p )} ; Step 2. Reconstruct training set with clustering center set O. Firstly, we separate the last characteristic of the vectors Oi and get

oi '

and Yi. Here Yi is the label of

© 2010 ACADEMY PUBLISHER

Figure 4.

The process of training

D. An Adaptive mechanism Adaptive mechanism for detecting agents mainly achieves through incremental learning for a new support vector set. During incremental learning of SVM, support vectors will change into none-support vectors and nonesupport vectors will change into support vectors to ensure the maximize margin and minimize error rate of classifier. Based on above analyzing, this paper presents a new incremental learning algorithm combining SVM with clustering algorithm. In this algorithm, first, we deal with training set using clustering algorithm (Class label is looked as an attribute of data set. And the samples that belong to the same cluster have same class label if radius r > 1 ). Then we can get clusters O(ni , oi , yi ) (Where ni is the number of sample point; oi is the center of the cluster; yi is class label). Next, we construct new

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

481

training data set using centers of clusters and ni is fuzzy membership function. Then we train new training data set with FSVM and obtain support vectors. There are two strategies to deal with new adding data set: one is to add new adding samples into the support vector set that is get in the first step using clustering algorithm; another is only to add samples that contrary to KKT condition using UC algorithm and thrown away the samples that are satisfied with KKT condition. At last, we can get new training set and training it using FSVM again. VI.

EXPERIMENTS

[24]

KDD CUP 1999 data are standard data sets for Intrusion Detection, including the training data sets and test data sets. The training data sets include 494,022 records and test data sets include 311030 records. There are 24 types of attacks in training data sets and increase of new 14 kinds of attacks in the test data sets. They can be divided into four major categories: Probing, Denial of Service (DoS), User-to-Root (U2R) and Remote-to-Local (R2L). Each a complete TCP connection is considered as a record, including four types of attributes collection: time-based traffic features, host-based traffic features, content features, basic features. A total of 41 different attributes, of which include 32 consecutive attributes and nine discrete attributes. In all experiments, we use personal computer (P4 3.0 GHZ, 512M RAM), and the operating system is Windows XP. A. Selection experiment data sets These attributes are the types of values, and others are types of characters, but SVM can only deal with numerical vector. Therefore, before training we must make the input data numerical and normal. This study TABLE I. detector

N-train

uses a simple substitution of symbols with numerical data types. The protocol-type, service and flag replaced by digital attributes. For examples, three kinds of Protocoltype (TCP, UDP and ICMP) are instead with 1, 2, and 3. 71 kinds of service are substituted with 1, 2,…, 71. Label of attack instead of 1 or -1, where normal record is 1 and abnormal record is -1. Last, normalize the input data set with Libsvm [25]. In order to reduce the training time and ensure representation of the chosen data, use the same interval to selected data sets. We select four training sets: training set is that get one record every 15 records from first and all are 32935 records from 10 percent of the training data set of KDD CUP 1999; Test set is that get one record every 20 records from fifth and all are 15552 records from the Correct with label of KDD CUP 1999. B. Experiments Results and Analysis We select different attributes to experiment for different network protocol. Select 32 attributes from TCP data set and 21 attributes from UDP data set and 18 attributes from ICMP data set. The accuracy rates are exactly same with the experimental results obtained with the 41 attributes. The results are shown in table 1 and table 2. In table1 and table 2, N-train is the number of records of training data set, N-test is the number of records of testing data set, TN is the number of correctly detected and TR is true rate of correctly classified and define as: TR (True rate) = numbers correctly classified / total numbers of samples R-error is the percentage of records that the corrected records are detected as attack. T-time is time of training FSVM.

THE DETECTION RESULTS OF 41 ATTRIBUTES

N-test

TN

Accuracy (%)

R-error (%)

T-time(s)

TCP FSVM

12674

6000

5149

85.8167

1.8513

217

UDP FSVM

1346

1297

787

60.6785

3.2856

2

ICMP FSVM

18915

8255

8249

99.9273

0

4

TABLE II. detector

N-rain

THE DETECTION RESULTS OF REDUCED ATTRIBUTES N-test

TN

Accuracy (%)

R-error (%)

T-time(s)

TCP FSVM

12674

6000

5149

85.8167

1.8513

189

UDP FSVM

1346

1297

787

60.6785

3.2856

2

ICMP FSVM

18915

8255

8249

99.9273

0

3

As shown in table 1 and table 2, training set of ICMP is very larger than others, but training time is shorter and there is a higher accuracy rate of prediction. Analyzing the training data set of ICMP, we can see that there are 18819 attack records and only 96 normal records in training set of ICMP. There is a small computation to construct the hyperplane for this unbalance training set, and so the training time is short. And reducing attributes, the training time is not too large change. The records in training set of UDP are significantly less than TCP and

© 2010 ACADEMY PUBLISHER

ICMP. The accuracy rate is only 60.6785% after UDP training from the table. In order to increase the number of records of UDP training set, we select all 21865 records of UDP from 10% training set of KDD CUP 1999 data set and look as the training set to train the classifier. The accuracy rate of prediction is still only 61.6573% and still bellows the accuracy rate of TCP and ICMP. So the reason of low accuracy rate is not less records. Analyzing the UDP training set, we can find that there are only 77 attack records in 1346 records (the proportion of 5.7207%)

482

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

and 1498 records in 21865 records (the proportion of 6.8511% ). But there are 18819 attack records in 18915 records (the proportion of 99.5925%) of ICMP training set and 7554 attack records in 12674 records (the proportion of 59.6023%) of TCP. So the accuracy rate of prediction for UDP is not because of training set but also because of unbalance training set of UDP. And it affects to construct the hyperplane of FSVM, and meanwhile it affects detection accuracy rate. Table 3 is the results comparison between single FSVM and multi FSVM. All number of detection records TABLE III.

is 5149+787+8249=14185 records with cooperative detection based on multi FSVMs, and accuracy rate is 91.2101%. The number of detection records is 12841 records with single FSVM, and accuracy rate is 82.5682%. Training time of detection based on multi FSVMs is only 194 seconds and training time of single FSVM is 816 seconds. We can see that training time and accuracy rate of cooperative detection based on multi FSVMs is better than single FSVM.

THE RESULTS COMPARISON BETWEEN SINGLE FSVM AND MULTI FSVMS.

algorithm

TN

Accuracy (%)

T-time(s)

Single FSVM

12841

82.5682

6.3714

816

Multi FSVM

14185

91.2101

5.1369

194

VII. CONCLUSION In this paper, we propose a cooperative network intrusion detection system based on multi FSVMs. Firstly, the v-FSVM reasoning process is given. And then construct the different detection agents according to different network protocol and give the process of adaptive learning. This method divides the network data flow with network protocol, so it can improve the speed of each detection agent and the accuracy rate of prediction. Meanwhile, experiment results prove the method. But the accuracy rate of the UDP detection agent is low because of too lack of attack records in training set. How to improve the accuracy of UDP detection agent in existing data set will be the major work of the next stage. VIII. ACKNOWLEDGEMENT This work was supported by Guangdong Provincial Natural Science Foundation (Grant No. 06021484), Guangdong Provincial science & technology project (Grant No. 2005B16001095, Grant No. 2005B10101077), Yuexiu Zone Guangzhou city science & technology project (Grant No. 2007-GX-075). IX.

R-error (%)

REFERENCES

[1] Vapnik V. The Nature of Statistical Learning Theory [M]. New York: Springer-Verlag, 1995 [2] J. Han and M. Kamber, Data mining concepts and techniques (Second Edition), China machine press, 2006 [3] Burges C.A A tutorial on support vector machines for pattern recognition[C]. Data Mining and Knowledge discovery, 1998, 2(2):121-167. [4] Rao Xian, Dong Chun-Xi, Yang Shao-Quan. An Intrusion Detection System Based on Support Vector Machines [J]. Chinese Journal of Software.2003, 14(4):798-803 [5] CHEN You, SHEN Hua-Wei, LI Yang, CHENG Xue-Qi. An Efficient Feature Selection Algorithm Toward Building Lightweight Intrusion Detection System [J]. Chinese Journal of Computers, 2007, 30(8):1398-1408 [6] Li Hui, Guan Xiaohong, Zan Xin. Network Intrusion Detection Based on Support Vector Machine [J]. Journal of Computer Research and Development, 2003, 40(6):799807 [7] XIAO Lizhong,Shao Zhiqing Ma Hanhua. An algorithm for automatic clustering number determination in networks

© 2010 ACADEMY PUBLISHER

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

intrusion detection [J]. Chinese Journal of Software.2008, 19(8):2140-2148 Lin, C.F. Wang, S.D. Training algorithms for fuzzy support vector machines with noisy data[C]. IEEE XIII Workshop on Neural Networks for Signal Processing, pp. 517-526 (2003) Yanyou Hao, Zhongxian Chi, Deqin Yan, Xun Yu. An Improved Fuzzy Support Vector Machine for Credit Rating[C]. IFIP International Federation for Information Processing 2007. 2007 495-505 Chiang JH, Hao PY.A new kernel-based fuzzy clustering approach: Support vector clustering with cell growing[C].IEEE Trans on fuzzy systems, 2003, 11(4):518-527 K.L LI, H.K Wang, S.F Tian, Z.P Liu, IU Zhi Qang. Fuzzy Multi-Class Support Vector Machine and Application in Intrusion Detection [J]. Chinese Journal of Computers.2005, 28(2):274-280 Weston J.Watkins C.Multi-class support vector machines[C]. Department of Computer Science, Royal Holloway University of London Technical Report, SD-TR98-04, 1998 Syed N . Liu H . Sung K . Incremental learning with support vector machines [A].Proceedings of the Workshop on Support Vector Machines at the International Joint Conference on Artificial Intelligence (IJCAI-99) [C]. Stockholm, Sweden: M organ Kaufmann, 1999.876—892. Yangguang Liu,Qinming He,Qi Chen.Incremental Batch Learning with Support Vector Machines[C].Proceedings of the 5th World Congress on Intelligent Control and Automation.Hangzhou,China.2004 (2):1857-1861 WANG Ding-cheng, JIANG Bin. Review of SVM-based Control and Online Training Algorithms [J]. Chinese Journal of System Simulation, 2007, 19(6):1177-1181 Xiaodan Wang, Chunying Zheng, Chongming Wu. A New Algorithm for SVM Incremental Learning[C].ICSP2006 Proceedings.Beijing, China.2006 WEN-JIAN WANG. A redundant incremental learning algorithm for SVM[C]. Proceedings of the Seventh International Conference on Machine Learning and Cybernetics[C], Kunming, China. 2008.734-738. TAO Liang. Fast Incremental SVM Learning Algorithm Based on Active Set Iterations [J]. Chinese Journal of System Simulation, 2006, 18(11):3305-3308 .Bernhard Scholkopf, Smola A.Williamson R.C.et a1. New support vector algorithms [J], Neural Computation.2000, 12(5):1207-1245

JOURNAL OF NETWORKS, VOL. 5, NO. 4, APRIL 2010

[20] Lin, C.F. Wang, S.D. Fuzzy Support Vector Machines[C]. IEEE Transactions on Neural Networks 2002, 13(2):464471 [21] Huang, H.P. Liu, Y.H. Fuzzy Support Vector Machines for Pattern Recognition and Data Mining [J]. International Journal of Fuzzy Systems, 2002 4(3):826-835 [22] Yanyou Hao, Zhongxian Chi, Deqin Yan, Xun Yu. An Improved Fuzzy Support Vector Machine for Credit Rating[C]. IFIP International Federation for Information Processing 2007. 2007 495-505 [23] Li XL,Liu JM,Shi ZZ.A Chinese Web page classifier based on support vector machine and unsupervised clustering . Chinese Journal of Computers , 2001 , 24 (1):62—68. [24] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [25] Chih-Chaung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001.Software available at http://www.csie.ntu.tw/~cjlin/libsvm Shaohua Teng is a Professor of Guangdong University of Technology in China. He was born on Jan in 1962. He is responsible for teaching data mining in Faculty of Computer. He is engaged in education and technology transfer on knowledge discovery issues, and research in network security, machine learning and statistical pattern recognition. Dr. Teng earned a Ph.D. in Industry Engineering at Guangdong University of Technology. He has published 50 papers on computer magazines and international conferences and 2 books. The three Papers are listed as follows in recent years: 1. Shaohua Teng, Wenwei Tan, Video Temporal Segmentation Using Support Vector Machine, Lecture Notes in Computer Science, Vol. 4993, LNCS, 2008:442447 2. Shaohua Teng, Wei Zhang, and et al, Cooperative intrusion detection model based on state transition analysis, Lecture Notes in Computer Science, Vol. 5236, 2008: 419-431 3. Shaohua Teng, Wenwei Tan, Wei Zhang, Cooperative shot boundary detection for video, Lecture Notes in Computer Science,Vol. 5236, 2008: 99-110 Hongle DU is a graduate of Guangdong University of Technology in China. He was born in 1979. His major research interests include Network security, intrusion detection based on Support Vector Machines and machine learning. NaiQi Wu (M’04-SM’05) received the M. S. and Ph. D. Degree in Systems Engineering both from Xi’an Jiaotong University, Xi’an, China in 1985 and 1988, respectively. From1988 to 1995, he was with the Chinese Academy of Sciences, Shenyang Institute of Automation, Shenyang, China, and from 1995 to 1998, with Shantou University, Shantou, China. From 1991 to 1992, he was a Visiting Scholar

© 2010 ACADEMY PUBLISHER

483

in the School of Industrial Engineering, Purdue University, West Lafayette, USA. In 1999, 2004, and 2007-2009, he was a visiting professor with the Department of Industrial Engineering, Arizona State University, Tempe, USA, the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, USA, and Industrial Systems Engineering Department, Industrial Systems Optimization Laboratory, University of Technology of Troyes, Troyes, France, respectively. He is currently a Professor of Industrial and Systems Engineering in the Department of Industrial Engineering, School of Mechatronics Engineering, Guangdong University of Technology, Guangzhou, China. His research interests include production planning and scheduling, manufacturing system modeling and control, discrete event systems, Petri net theory and applications, and information assurance. He is the author or coauthor of many papers published in International Journal of Production Research, IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Robotics and Automation, IEEE Transactions on Automation Science and Engineering, IEEE Transactions on Semiconductor Manufacturing, IEEE/ASME Transactions on Mechatronics, Journal of Intelligent Manufacturing, Production Planning and Control, and Robotics and Computer Integrated Manufacturing. Dr. Wu is an associate editor of the IEEE Transactions on Systems, Man, & Cybernetics, Part C and IEEE Transactions on Automation Science and Engineering,and editor in chief of Industrial Engineering Journal. He was a Program Committee Member of the 2003 to 2009 IEEE International Conference on Systems, Man, & Cybernetics, a Program Committee Member of the 2005 to 2009 IEEE International Conference on Networking, Sensing and Control, a Program Committee Member of the 2006 IEEE International Conference on service systems and service management, a Program Committee Member of the 2007 International Conference on Engineering and Systems Management, and reviewer for many international journals. Wei Zhang is a associate Professor of Guangdong University of Technology in China. She was responsible for teaching data mining in Faculty of Computer. She is engaged in network security, machine learning and statistical pattern recognition. Mrs. Zhang earned a M.S. in Software Engineering from the South China University of Technology. She has published 20 papers on computer magazines and international conferences. The three Papers are listed as follows in recent years: 1. Wei Zhang, Shaohua Teng, Xiufen Fu, Haibin Zhu, Roles in Learning Systems, SMC 2008, 2008 IEEE International Conference on Systems, Man, and Cybernetics, 2008: 2213-2218 2. Wei Zhang, Shaohua Teng, Xiufen Fu, Scan attack detection based on distributed cooperative model, CSCWD 2008:743-748 Wei Zhang, Shaohua Teng, Zhaohui Zhu, Xiufen Fu, Haibin Zhu, An improved Least-Laxity-First scheduling algorithm of variable time slice for periodic tasks, IEEE Conf., ICCI 2007, 548-553

Suggest Documents