Unsupervised Sequential Information Bottleneck ... - Semantic Scholar

4th Indian International Conference on Artificial Intelligence (IICAI-09)

Unsupervised Sequential Information Bottleneck Clustering For Building Anomaly Based Network Intrusion Detection Model 1

Mrutyunjaya Panda1 and Manas Ranjan Patra2 Department of ECE, Gandhi Institute of Engineering and Technology, Gunupur, Orissa-765022, India [email protected] 2 Department of Computer Science, Berhampur University, Orissa, India [email protected]

Abstract: In this paper we present a novel approach to unsupervised clustering in building an efficient anomaly based network intrusion detection model. The method is based on a recently introduced sequential information bottleneck (sIB) principle. KDDCup 1999 intrusion detection benchmark dataset is used for the experimentation of our proposed technique. The experimental results demonstrate that the proposed technique is more suitable in detecting network intrusions in terms of accuracy compared to other existing clustering algorithms. Keywords: Intrusion detection, Unsupervised clustering, Sequential information bottleneck, performance measures.

1 Introduction With the increasing number of commercial and public services being offered over the Internet security is becoming one of the key issues. The so-called “attacks” to internet service providers are normally carried out by exploiting unknown weaknesses or bugs always contained in system and application software [1, 2]. Computer networks are usually protected against such attacks by a number of access restriction policies that act as a course gain filter. Intrusion detection systems (IDS) are the fine grain filter placed inside the protected network, looking for known or potential threats in network traffic and/or audit data recorded by hosts. Based on the data source, IDSs are classified into host based and network based. But, based on the analysis approach, IDSs are categorized into misuse detection and anomaly detection systems. Misuse detection is based on attack signatures i.e. on a detailed description of the sequence of actions performed by an attacker. This approach allows detection of intrusions by looking for a perfect match of the intrusion signatures. The effectiveness of this approach depends strictly on the extent to which IDSs are updated with intrusion signatures of the latest attacks developed so far. This is currently a challenge since new attacks and new attack variants are constantly being developed. In particular, at the time an attack signature is made publicly available, a number of attack variants are designed to produce the same effect as the original

1659


attack, but with a slightly different signature that is not detected by signature based IDSs. The second approach is based on statistical knowledge about the normal activity of the computer system, i.e. a statistical profile of what constitutes the legitimate traffic in the network. The normal usage patterns are constructed from the statistical measures of the system features, for example, the CPU and I/O activities by a particular user or program. The behavior of the user is observed and any deviation from the constructed normal behavior is detected as intrusions. Clustering algorithms have gained lot of attention, since they can help current intrusion detection systems in several aspects [3] [4] [5]. Clustering aims to organize a collection of data items into clusters, such that items within a cluster are more “similar” to each other than they are to items in the other clusters. This notion of similarity can be expressed in very different ways, according to the purpose of the study, the domain-specific assumptions and the prior knowledge of the problem. The reminder of this paper is organized as follows. Section 2 provides some related research in the field of anomaly based network intrusion detection. Section 3, provides a brief description about the existing data clustering methods, which are used for a comparative analysis with our results. In Section 4, we describe the proposed sequential information bottleneck clustering algorithm in order to build an efficient intrusion detection model. A brief explanation about the KDDCup’99 intrusion detection benchmark dataset is provided in Section 5. Section 6 provides the details about the experimental setup. In Section 7, we analyze the results obtained and compare those with other existing clustering methods. Finally, concluding remarks are provided in Section 8.

2. Related Research Anomaly detection systems compute statistical models for normal network traffic and generate alarms when there is a large deviation from the normal model. Some systems have been developed based on this approach, namely, SPADE [6], PHAD [7] and ALAD [8]. Other techniques have been proposed as detection engines, for example using clustering and classification [9], autonomous agents and distributed intrusion detection [10], and hidden Markov model [11]. A good survey can be found in [12], [13]and [14]. A simple variant of single-linkage clustering was applied in [15] to learn network traffic patterns on unlabelled noisy data. The KDD CUP 1999 dataset was used but it was not clear that what features were selected. This approach achieved from 40% to 55% detection rate and from 1.3% to 2.3% false positive rate. NATE (Network Analysis of Anomalous Traffic Events) in [16] and [17] was proposed to select some of the traffic records to improve the detection performance. The selected features include the frequency of TCP flags, the average and total number of bytes transferred the percentage of session control flags, and also network packet header information. CLAD (Clustering for Anomaly Detection) in [15] used k-NN algorithm and an unsupervised training process. CCAS [18] was proposed for supervised clustering and classification. They chose clustering method because it relies very little on the distribution models of data.

1660


A new intrusion detection system using support vector machines and hierarchical clustering is proposed by L.Khan et al. in [19]. In this, the authors compared the proposed approach with the Rocchio Bundling technique and random selection in terms of accuracy loss and training time gain using a single benchmark real dataset. In [20], the authors propose automated feature weighting for network anomaly detection. They claim that the proposed method not only increases the detection rate but also reduces false alarm rate. A new clustering labelling strategy, which combines the computation of the Davis-Boulding index of the clustering and the centroids diameters of the clusters is proposed for application in anomaly based intrusion detection systems in [21]. In [22], unsupervised anomaly detection using an incremental clustering algorithm is proposed. Performance comparison of intrusion detection system classifiers using various feature reduction techniques are proposed in [23]. In [24], the authors investigated fuzzy rule based classifiers, decision trees, support vector machines and linear genetic programming to model fast and efficient intrusion detection systems. Ermann et al. present classification of network traffic using port-based or payloadbased analysis [25]. They use two unsupervised clustering algorithms, namely Kmeans and DBSCAN. They evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical internet traces. The experimental results show that both K-means and DBSCAN work very well and much more quickly than AutoClass. Their results indicate that although DBSCAN has lower accuracy compared to K-means and AutoClass, DBSCAN produces better clusters. In this paper, we use sequential information bottleneck clustering (sIB) algorithm which to the best of our knowledge has not been used so for, in order to build a network intrusion detection model.

3. Existing Clustering Algorithms In this section, we discuss briefly about the existing clustering algorithms applied to intrusion detection. 3.1. K-Means This algorithm is selected because it is one of the quickest and simplest. The K-Means algorithm partitions objects in a data set into a fixed number of K disjoint subsets. For each cluster, the partitioning algorithm maximizes the homogeneity within the cluster by minimizing the square-error. The square error is calculated as the distance squared between each object and the center (or mean) of its cluster. The pseudo code of the KMeans clustering algorithm is given below. Step-1(Initialization): Randomly choose k-instances from the data set and make them initial cluster centers of the clustering space. Step-2(Assignment): Assign each instance to its closest center. Step-3(updating): Replace each center with the mean of its members. Step-4(iteration): Repeat steps 2 and Steps 3 until there is no more updating.

1661


3.2. X-Mean X-means is a new algorithm that quickly estimates the number of clusters K. It goes into action after each run of K-means, making local decisions about which subset of the current centriods should split themselves in order to better fit the data. The splitting decision is done by computing the Bayesian Information Criterion (BIC). Xmeans have been experimented against a more traditional method that estimates the number of clusters by guessing K. X-means consistently produced better clustering on both synthetic and real-life data, with respect to BIC. It also runs much faster, which prompted us to select this algorithm for our intrusion detection model, which has not yet been considered by anybody for intrusion detection application. A detailed description about the X-Means operation can be obtained from [26]. 3.3. Self-Organizing Maps (SOM) SOM is an unsupervised neural network, which was introduced by Kohenen. It maps the high dimensional input data into two-dimensional output topology space. Each node in the output map has a reference vector w, which has the same dimension as the feature vector of input data. Initially, the reference vector is assigned to random vectors. More details about this can be found in [27]. 3.4. DBSCAN (Density based Clustering) DBSCAN (Density Based Spatial Clustering of Application with Noise) was chosen as a representative of density based clustering algorithms. It regards clusters as dense areas of objects that are separated by less dense areas. These clustering algorithms have an advantage over partitioned based algorithms because they are not limited to spherical shaped clusters but can find clusters of arbitrary shapes. More about this can be obtained from [28]. 3.5. Incremental Clustering Incremental clustering is based on the assumption that it is possible to consider patterns one at a time and assign them to existing clusters. Here, a new data item is assigned to a cluster without affecting the existing clusters significantly. Although, this is not the most effective clustering algorithm, it has the advantage of working in near linear time. Details of this algorithm can be obtained from [22].

4. Proposed Methodology In this section, we provide a simple framework for casting any given agglomerative procedure into a sequential clustering algorithm. The resulting sequential algorithm is guaranteed to find a local maximum of the target function (under very mild condition). Moreover, it has time and space complexity which are significantly better than those of the agglomerative procedure. In particular we use this framework to suggest a new algorithm-the sequential information bottleneck (sIB) algorithm. The proposed methodology is shown in Figure 1.

1662


4.1. Information Bottleneck Principle Let X denotes a set of elements that we want to cluster into C clusters. Let, Y be a set of variables of interest associated with X such that x  X and y  Y the conditional distribution

p  y | x  is available. The Information Bottleneck (IB)

principle states that the clustering C should preserve as much information as possible from the original data set X w.r.t. relevance variables Y. Cluster C can be interpreted as a compression (bottleneck) of initial data set X in which information that X contains about Y is passed through the bottleneck C.

Feature Selection/ Extracti on

Clustering Algorithm (sIB) Design/Selection

Clusters

Data Samples Results Interpretation

Knowledge

Cluster validation

Fig. 1. Proposed Methodology for Unsupervised anomaly detection

The IB method is inspired from Rate-Distortion theory and aims at finding the most compact representation C of data X that minimizes the mutual information

I  X , C  and

preserves

as

much

information

as

possible

about

Y

(maximizing I  C , Y  ). Thus, the IB objective function can be formulated as minimization of Lagrangian,

f C   I  X , C    I C,Y  Where



(1)

is the trade-off parameter between the amount of information

I  C , Y  to

be preserved and the compression of the initial representation I  X , C  . Function (1) must be optimized w.r.t. the stochastic mapping the data set X into a cluster C. Expression for as:

1663

p  c | x  that maps each element of

I  X , C  and I  C , Y  can be written


I  X ,C  



p  x  p  c | x  log

xX , cC

I C, Y  

p c | x p c

p  y | c  p  c  p  y | c  log p  c 

(2)

(3)

yY ,cC

Formal solution that maximizes the function (1) is given by an equation system that relates

p  c | x  , p  y | c  and

p  c  .The sequential information bottleneck

clustering algorithm have been proposed for maximization of function (1), which is explained below. 4.2. sIB (Sequential Information Bottleneck) Clustering The steps for the Sequential Information Bottleneck (sIB) clustering are as follows.  Finding the global minimum for given number of clusters (W)  Initialized with a (possibly random) partition of W clusters  sIB draws a sample x at random, treats it as a singleton cluster and merges it to a new cluster Where 

cnew such that, cnew  arg min cC JS  x, c 

(4)

JS .,. is the Jensen-Shannon distance.

At each step the objective function (1) improves or stays un-changed.

Further details about this algorithm can be found in [29].

5. Intrusion Detection Dataset The KDD Cup 1999 Intrusion detection contest data is used in our experiment. This data was prepared by DARPA Intrusion detection evaluation program by MIT Lincoln Laboratory. Lincoln labs acquired nine weeks of raw TCP dump data. The raw data was processed into connection records, which contains about 5 million connection records. The data set contains 24 attack types. These attacks fall into four main categories:  Denial of Service (DoS) - In this type of attack, an intruder makes some computing or memory resources too busy or too full to handle legitimate requests, or declines legitimate users’ access to a machine. Examples of this are Apache2, Back, land, Smurf, Neptune, Teardrop attacks.  Probing – here, an attacker scans a network of computers to gather information or find known vulnerabilities. An attacker with a map of machines and services that are available on a network can use this information to look for exploits. Examples are Ipsweep, Satan, Nmap attacks.  User to Root (U2R) - In this case, an attacker starts out with access to a normal user account on the system and is able to exploit the system vulnerabilities to gain root access to the system. Examples are Eject, Perl, Load Module attacks.

1664




Remote to Local (R2L) – In this, an attacker who does not have an account on a remote machine sends packets to that machine over a network and exploits some vulnerability to gain local access as a user of that machine. Examples are Imap, Phf, Guest Password, Ftp_write attacks.

Besides the four different types of attacks mentioned above, normal class will have to be detected. The data set for our experiments contain 1000 connection records, which is a subset of 10% KDD Cup’99 intrusion detection benchmark dataset. These were randomly generated from the MIT data set. Random generation of data include the number of data from each class proportional to its size, except that the smallest class is completely included. All the intrusion detection models are trained and tested with the same data set. As the data set contains five different classes, we perform 5class classification for building an efficient intrusion detection model.

6. Experimental Setup We have used the subset KDDCup 1999 intrusion detection benchmark dataset for our experiment to build a network intrusion detection model, which contains 1000 instances with 38 variables and 21 attacks that could be classified into four main categories as explained earlier. The variables selected are as given in Table 1. Table 1. Variables used for our Intrusion detection dataset Variable Variable Name Variable Type No. 1 src_bytes continuous 2 dst_bytes Continuous 3 land discrete 4 wrong_fragment Continuous 5 urgent Continuous 6 hot Continuous 7 num_failed_logins Continuous 8 logged_in discrete 9 Num_compromised Continuous 10 Root_shell Continuous 11 Su_attempted Continuous 12 Num_root Continuous 13 Num_file_creation Continuous 14 Num_shell Continuous 15 num_access_files Continuous 16 Num_outbound_cmds Continuous 17 Is_host_logins discrete 18 Is_guest_logins discrete 19 count Continuous 20 Srv_count Continuous 21 Serror_rate Continuous

1665


22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 CLASS LABEL

Srv_serror_rate Rerror_rate Srv_rerror_rate Same_srv_rate Diff_srv_rate Srv_diff_host_rate Dst_host_count Dst_host_srv_count Dst_host_same_srv_rate Dst_host_diff_srv_rate Dst_host_same_src_port_rate Dst_host_srv_diff_host_rate Dst_host_serror_rate Dst_host_srv_serror_rate Dst_host_rerror_rate Dst_host_srv_rerror_rate Normal or Attack ( any one attack out of 21 attacks)

Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Discrete

All our experiments were performed using a Pentium 4, 2.8 GHz CPU with 512 MB RAM. Full data set is used for the training purpose in order to build an intrusion detection model and then classes to clusters evaluation on training data for testing the efficiency of the model built in the training phase.

7. Result and Discussion Here, we present the results of the proposed unsupervised approaches for building an efficient anomaly based network intrusion detection model. In Table 2, we present a comparative study of our proposed method with respect to some other well known approaches. Table 2. Comparative Study of Clustering Algorithms Algorithm Average Detection Rate False Positive Rate (%) (%) K-Means [31] 46.986 0.875 KM-VQ [20] 48.4 10 FES-KM-VQ[20] 60.1 10 X-Means (ours) 78 36 KDDCup Winner 48.57 0.225 [31] GMIX [23] 53.725 29.715 SOM [23] 47 25.685 Nearest Cluster [30] 47.875 0.2 Incremental 44.57 to 84.78 15.99 to 76.83 Clustering [22]

1666


Cluster [22] K-NN [22] FR-1 [24] FR-2 [24] Pure SVM [19] SVM + Rocchio bundling [19] SVM +DGSOT [19] sIB (ours) for 2 clusters sIB (ours) for 5clusters

66 23 74.82 79.68 57.6 51.6

2 6 73.07 85.66 35.5 44.2

69.8 85.5

37.8 34

77

3.1

From the above comparison, it is clear that our proposed method is efficient in detecting anomalous activity with high detection rate and relatively low false positive rate. However, FPR is more in comparison to the results obtained in [30] and [31]. So, this is a compromise between detection rate and FPR. Other performance measures like, false negative rate (FNR), Recall rate (RR) and Fscore for our proposed method is provided in Table 3, which is also considered as an indicator of performance while building a network intrusion detection model. We are not able to compare all these with other existing algorithms, as they are not available. Table 3. Performance Measures for Sequential Information Bottleneck Clustering No. of Cluster=2 No. of Cluster=5 Algorithm/ Attack FNR RR F-Score FNR RR Ftypes Score Normal 0.486 0.514 0.655 0.182 0.818 0.8 Anomalous 0.019 0.9812 0.914 6E-3 0.994 0.872

8. Conclusions and Future Scope In this paper, we have illustrated the applicability of a new clustering algorithm sIB in building an anomaly based intrusion detection model. The proposed approach considers as many as 38 different attributes in order to provide better detection accuracy with a comparatively low false positive rate in comparison to other existing unsupervised clustering algorithms. Further, the proposed approach has a high recall rate (HRR) and high F-Score (HFS) with low false negative rate which makes it suitable for building an efficient anomaly based network intrusion model. We continue our future research in investigating other data mining techniques which can further enhance the detection accuracy as close as to 100% while maintaining a low false positive rate which we understand is a challenging task.

1667


References [1] McHugh, J., Christie, A. and Allen, J.: Defending Yourself: The role of Intrusion Detection System. IEEE Software: 42-51 (2000). [2] Proctor, P.E. The Practical Intrusion detection handbook. Prentice Hall (.2001). [3] Zhong, Shi, Khoshgoftaar, Taghi M. and Seliya, Naeem. Evaluating clustering techniques for network intrusion detection. In: Proc. Of 10th ISSAT International conf. on Reliability and quality design. pp.173-177, Las Vegas,USA.(2004). [4] Ye, Nong and Li, Xiangyang: A Scalable clustering technique for intrusion signature recognition. In: Proc. Of 2nd IEEE SMC Information assurance workshop. pp. 1-4. (2001). [5] Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proc. Of 17th International conference on Machine Learning. PP.255262, San Francisco, CA. (2000). [6 , Stuart Staniford, James A. Hoagland, Joseph M. McAlerney: Practical automated detection of stealthy port scans. Journal of Computer security.10(1/2):105-136.(2002). [7] Mahoney, M.V. and Chan, P.K.:PHAD:Packet Header Anomaly detectionfor identifying hostile network traffic.Technical report.Florida Tech.CS-2001-4.(2001). [8] Mahoney, M.V.: Network traffic anomaly detection based on packet bytes. In:proc. Of ACM symposium on applied computing.pp.346-350.(2003). [9] Yang, H., Xie, F.and Lu, Y.: Clustering and Classification based anomaly detection. Lecture notes in computer science:4223:1611-3349.(2006). [10] Balasubramanyam, J.S. , Gracia-fernandez, J.O. et al.: An architecture for intrusion detection using autonomous agents.In : Proc. Of the 14th IEEE ACSAC . Scottlale,AZ,USA,pp.13-24.(1998). [11] Curston, D., Matzner, S. et al.: cordinated internet attacks: responding to attack complexity.Journal of computer security: 12.165-190. (2004). [12] Sherif, J.S., Ayers, R. and Dearmond, T.G. Intrusion detection: the art and the practice.part-I. Information management and computer security: 11(4):175186.(2003). [13] Sherif, J.S., Ayers, R. and Dearmond T.G.,. Intrusion detection: the art and the practice. Part-I. Information management and computer security: 11(4):222229.(2003). [14] Rui Xu and Donald Wunsch.: Survey of clustering algorithms. IEEE transaction on neural networks.16 (3):645-678.(2005). [15] Snort. Snort website. http://www.snort.org [16] Taylor, C. and Alves-Foss, J.: An empirical analysis of NATE: network analysis of anomalous traffic events.In: Proc. Of 10th New security paradigms workshop.USA.pp.18-26.(2002). [17] Taylor, C. and Alves-Foss, J.: NATE: network analysis of anomalous traffic events. A low cost approach. .In: Proc. Of New security paradigms workshop.USA.pp.89-96.(2001). [18] Li, X. and Ye, N.: Mining normal and intrusive activity patterns for computer network intrusion detection.In: Intelligence and security informatics: 2nd symposium on intelligence and security informatics.Tucson,USA.Spinger Verlag. 3073:16113349.(2004).

1668


[19] Khan, L., Awad, M. and Thuraisingham, B. A new intrusion detection system using SVM and hierarchical clustering. The VLDB journal.16:507-521.(2007). [20] Dat Tran, Ma, W. and Sharma, D.: automated feature weighing for network anomaly detection. International journal of computer science and network security.8 (2):173-178. (2008). [21] Petrovic, S., Alvarez, G., Orfila, A.and Carbo, J. Labelling clusters in an Intrusion detection system using combination of clustering evaluation technique.In: Proc. Of 39th Hawaii International conf. on System sciences.pp.1-8. (2006).IEEE. [22] Hassan, T., Hashaem, M. and Fahmy, A.:Unsupervised anomaly detection using an incremental clustering algorithm.International journal of intelligent computing and information sciences,5(1):253-268.(2005). [23] Venkatachalam, V., Selvan, S.: Performance comparison of intrusion detection system classifier using various feature reduction techniques. International journal of simulation.9 (1):30-39. (2007). ISSN-1473-804x (online). [24] Abraham Ajith and Jain Ravi: Soft Computing models for network intrusion detection systems. Classification and Clustering for knowledge discovery.4: 191207.(2005). Springer Link Book.Berlin. [25] Erman,J. Arlitt, M. and Mahanti, A.: Traffic classification using clustering algorithm.In: SIGCOMM Mine Net workshop, pisa, Italy.pp.281-286.(2006). [26] Pelleg, D. and Moore, Andrew: X-Means:extending K-means with efficient estimation of the number of clusters.pp.1-8. Accessed online from http://www.cmu.edu/~dpellag/dand/xmeans.pdf. [27] Panda, M. and Patra, M.R.: Building an efficient network intrusion detection model using self organizing maps. PWASET.38: 1178-1184. (2009). [28] Ester, M., Kriegel, H., Sander, J. and Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise.In: 2nd Interl.conf. on knowledge discovery and data mining.Portland.USA.pp.226-231(1996).AAAI Press. [29] Slonim Noam, Friedman, Nir and Tishby, Naftali.: Unsupervised document classification using sequential information maximization. SIGIR .Tempere, Finland.pp.129-136. ACM Press. ISBN:1-58113-561-0/02/0008. (2002). [30] Sabhani Mahesh kumar and Serpen, Gursel: Application of machine learning algorithm to KDD intrusion detection dataset with misuse detection context. In: proc. Of Intl. conf. on machine Learning models, technologies and applications (MLMTA 2003).LosVegas, NV. (2003).pp.209-215. [31] Levin I.: KDD-99 classifier learning contest. In: LLsoft’s results overview. SIGKDD explorations, ACMSIGKDD.1 (2):67-75. (2000).

1669

Unsupervised Sequential Information Bottleneck ... - Semantic Scholar

Unsupervised Sequential Information Bottleneck ... - Semantic Scholar

Suggest Documents

Sequential EM for Unsupervised Adaptive ... - Semantic Scholar

Information Bottleneck Approach to Predictive ... - Semantic Scholar

Information Bottleneck Co-clustering - Semantic Scholar

Evaluation of Unsupervised Information Extraction - Semantic Scholar

Evaluation of Unsupervised Information Extraction - Semantic Scholar

BOTTLENECK MACHINE IDENTIFICATION ... - Semantic Scholar

Applying the Information Bottleneck to Statistical ... - Semantic Scholar

Unsupervised Learning of Sequential Patterns

Inverse Bottleneck Optimization Problems under ... - Semantic Scholar

Unsupervised Texture Segmentation - Semantic Scholar

Bottleneck Games in Noncooperative Networks - Semantic Scholar

UNSUPERVISED DISCRIMINATIVE ADAPTATION ... - Semantic Scholar

Topographic Connectionist Unsupervised ... - Semantic Scholar

Unsupervised Morphological Disambiguation ... - Semantic Scholar

Unsupervised Hierarchical Probabilistic ... - Semantic Scholar

Unsupervised Mammograms Segmentation - Semantic Scholar

Unsupervised Text Mining - Semantic Scholar

Automatic Semantic Annotation using Unsupervised Information ...

The Effect of Sequential Information Arrival on ... - Semantic Scholar

ChildrenÃ¢â¬â¢s sequential information search is ... - Semantic Scholar

Unsupervised Document Classification using Sequential ... - CiteSeerX

Unsupervised Semantic Labeling Framework for ... - Semantic Scholar

Unsupervised Semantic Object Segmentation of ... - Semantic Scholar

Sequential Object Monitors - Semantic Scholar

Unsupervised Sequential Information Bottleneck ... - Semantic Scholar