A comparative study of Classification algorithms used in Network ...

3 downloads 779 Views 110KB Size Report
check whether any malicious attack is present or not. ... Steps to monitor and ... network and construct the features based on the network connections and then ...
ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,

Dr. M. Safish Mary,

Department Of Computer Science,

Department Of Computer Science,

St.Xavier’s College, Palayamkottai,

St.Xavier’s College, Palayamkottai,

Tirunelveli-627002

Tirunelveli-627002

Abstract Network Intrusion Anomaly Detection System (NIADS) is a device or a software application used to detect unwanted data in the incoming network traffic. The classification of the incoming data as anomaly is dependent on the proper use of similarity and distance measures between the data that flows through the network. The need of classification in intrusion detection is to filter and prevent the entry of spam or unwanted text or data in the incoming network data. This paper discusses the various classification algorithms used in intrusion detection systems which classify the input to check whether any malicious attack is present or not. A comparative study of the various classification algorithms used in NIADS is presented in this paper to help in the identification and design of the best offline NIDS.

Keywords: Network Intrusion, Data Mining, Classification algorithms. Introduction With the tremendous growth in information technology, network security is one of the challenging issues faced by the Intrusion Detection system (IDS). The Intrusion detection technology is not matured and should not be considered as a complete defense, but at the same time it can play a significant role in overall network security architecture. Many early attackers simply want to prove that they could break into systems; increasingly nowadays, the trend is toward intrusions motivated by financial, political, and military objectives. NIDS is a software application that monitors the network traffic 7 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

and system activities to detect the malicious attacks present in them. Steps to monitor and detect the intrusion are: (i)

Monitor and analyze the traffic,

(ii)

Identify the abnormal activities,

(iii)

Detect incoming port scans and raising alarm.

IDS can recognize both intrusions and Denial-of-Service (DoS) activities and invoke counter measures against them in real time. Fig. 1 shows the basic architecture of NIDS used commercially.

Fig 1: Network Intrusion Detection System Architecture

The above architecture shows that all traffic in the internet goes through the NIDS and any malicious content is rejected even before they enter the system. The drawback of the system is that the host initiated intrusions detected by the firewall will not get trapped in the NIDS. The IDS are categorized into two groups: Network-based Intrusion Detection System (NIDS) and Host-based Intrusion Detection Systems (HIDS). The NIDS is done either on-line or off-line. On-line NIDS monitors and detects the malicious attacks occurring in the network in real time. The off-line NIDS logs the packets from the network and construct the features based on the network connections and then create a dataset. It detects the malicious attacks by applying some rules for intrusion detection onto the stored dataset. 8 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

Benefits of NIDS are: (i) It can monitor a huge network. (ii) It can make the network or systems secured against the network attack. And there are some limitations like: (i) It is difficult to process all packets in a large or busy network. (ii) Many NIDS benefits are not applied for more modern switch-based networks. (iii) It does not give complete information about its attack but it finds out the time when the attack was started. (iv) Loss of packets is possible as the network runs very slow. Section II describes the various classification algorithms used in literature for NIDS. The basic data mining classification algorithms used in NIDS for anomaly detection are studied and compared. In section III a comparative study of the existing classification algorithms are discussed.

2. Classification algorithms for NIDS: 2.1 Naïve Bayes Algorithm: The Naïve Bayes classifier is a supervised learning method for classification. The training and testing of data are very easy. The training of the classifier deals with the estimation of conditional probability distribution of each attribute in the class. This algorithm is used in many research areas, such as Text classification, Spam filtering, online applications. It is the most successful learning algorithm for classifying text documents. The method of feature reduction in the input feature for anomaly detection using Naïve Bayes classifier in network intrusion is done in [1]. The proposed model Feature Vitality Based Reduction Method (FVBRM) performs the intrusion detection at two levels. In the first level the important and relevant features are extracted from the input feature and in the second level training and testing the classifier with the reduced given dataset is repeated until this process performs better than the original dataset. In paper [2] Chharia A and Gupta RK compares the performance of Naïve Bayes Laplacian (NBL) 9 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

method and Naïve Bayes with Modified Absolute Discount (NB-MAD) technique. The modified Naïve Bayes classifier with NB-MAD is found to enhance the performance of Naïve Bayes classifier in Spam Classification. The NBMAD technique results in increased accuracy and reduced false positives thereby reducing the error cost by an average of 33%. Virendra Singh Thakur and Gireesh Kumar Dixit [3] discuss the method of using the semi-supervised machine learning model using Density-based spatial clustering algorithm for Intrusion Detection. This method combines both the supervised and unsupervised learning techniques to improve the classifier performance. This method is suitable for datasets with noisy data. The paper shows that semi-supervised approach successfully detects the intrusion in network traffic.

2.2 Support Vector Machine (SVM): SVM is the supervised learning algorithm widely used for classification and regression analysis. This algorithm is used to detect the Anomaly in network traffic. SVM algorithm supports Text Mining, Pattern recognition, image categorization, handwritten character recognition. SVM classifiers are widely used in complex machine learning applications containing uncertain data. Advantages of SVM are: (i)

prediction accuracy is generally high

(ii)

robust, works when training examples contain errors

(iii)

fast evaluation of the learned target function

Disadvantages of SVM: (i)

long training time

(ii)

difficult to understand the learned function

(iii)

not easy to incorporate domain knowledge

In paper [4], Zonghua, and Hong Shen proposed modified SVM methods to improve the text categorization techniques for on-line training of SVM for real time intrusion detection. By combining pattern recognition and text categorization techniques 10 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

the traditional SVM is modified to produce Robust SVM (RSVM) and one-class SVM. The performance of these two SVMs is found to outperform the traditional online SVMs. The performance of these algorithms is better than the conventional SVM algorithms in terms of detection rate and the training time is reduced. These methods result in faster convergence because the algorithms used fewer support vectors and significant training time can be saved. The performance of this algorithm reaches 100% detection rate while maintaining the low false positive rate. In paper [5], Josef Kittler, Mohamad Hatef, Robert, and Jiri Matas present a new scheme for combining classifiers on text classification. The paper uses the Hidden Markov Models (HMM). The combining classifiers of HMM classifier, Gaussian classifier, Neural Network classifiers use the min, max, product and sum rules. The HMM classifier produce the best classification rate compared with that of other classifiers. The performance of Artifical Neural Network (ANN), Support Vector Machines (SVM), and Multivariate Adaptive Regression Splines (MARS) is studied and compared to produce an ensemble of intelligent models in paper [6]. The back propagation method in ANN produced high accuracy rate of 97.04 % showing the best performance. In paper [7], Eskin and Eleazar present the geometric framework for intrusion detection in unlabeled data. The geometric framework maps the data elements to the feature space and detects the anomalies.

The proposed K-Nearest Neighbor (KNN)

algorithm detects the intrusion in two different feature maps and the proposed is likely to product a better decision rate as 91% and false positive rate 8% according to the ROC (Receiver Operating Characteristic) curve points. The classification efficiency of the training data and test data generated by k-means clustering is analyzed in paper [8]. The Semi-Supervised Naïve Bayes (SSNB) learning algorithm is used to analyze the data. The research mainly focused on the comparison of semi-supervised classification with normal supervised classification using different instances of a single dataset. The SSNB produced a classification accuracy of 94.4%. 11 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

3. Comparative study of existing Classification algorithms: The classification accuracy is the performance level of measurement of a classifier. It is the number of correct predictions made divided by the total number of samples used multiplied by 100 to turn it into a percentage.

The Detection Rate (Recall) is the number of True Positives (TP) divided by the num ber

Classification Algorithms

Measures

of True

Classification

Posi

Accuracy (%)

Detection rate (%)

tives and

FVRBM

97.78

0.45

the

DBSCAN

90.65

0.32

num

NB Modified Absolute Discount

97.32

0.19

ber

SVM

87.50

0.75

of

Hidden Markov Models

94.77

0.17

Fals

Neural Network

97.04

0.76

e

K-nearest neighbor

92.01

0.5

Neg

Semi-Supervised Naïve Bayes

94.4

0.005

ativ es (FP). It is otherwise called Sensitivity or True Positive Rate.

12 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

Conclusion: The aim of this paper is to compare the performance of widely used classification algorithms in NIDS. Table 1 shows the classification accuracy and detection rate of the various algorithms. In common, the classifiers with low detection rates and high accuracy rate are suitable for NIDS. The future work can focus on Naïve Bayes classifier algorithm to improve these qualities.

References: [1]

Mukherjee S, Sharma N., “Intrusion detection using naive Bayes classifier with

feature reduction”, Procedia Technology, 4, 31 Dec 2012, pp.119-28. [2]

Chharia A, Gupta RK., “Enhancing Naïve Bayes Performance with Modified Table 1 Comparison of Classification accuracy and detection rate produced by various Network Intrusion Detection techniques

Absolute Discount Smoothing Method in Spam Classification”, International Journal of Advanced Research in Computer Science and Software Engineering, 3(3), March 2013, pp.424-429. [3] Virendra Singh Thakur and Gireesh Kumar Dixit, "Intrusion Detection System Using Semi-Supervised Machine Learning By DBSCAN", International Journal of Modern Engineering & Management Research, Vol 1(3), Oct 2013. 13 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary

ARS - Journal of Applied Research and Social Sciences ISSN 2350-1472,Vol.3,Issue.23,December 2016 © International Society for Green, Sustainable Engineering and Management

[4]

Zonghua Zhang and Hong Shen, "Online training of SVM for realtime intrusion

detection based on improved text categorization model", Computer Communications Vol. 28, Dec 2005, pp.1428-1443. [5] Kittler, Josef, Mohamad Hatef, Robert PW Duin, and Jiri Matas. "On combining classifiers", IEEE transactions on pattern analysis and machine intelligence, 20(3), Mar 1998, pp.226-239. [6] Srinivas Mukkamala, Andrew H. Sung, and Ajith Abraham, "Intrusion detection using an ensemble of intelligent paradigms" Journal of network and computer applications, 28(2), 2005, pp.167-182. [7] Eskin and Eleazar, "A geometric framework for unsupervised anomaly detection." Applications of data mining in computer security, Springer US, 2002, pp.77-101. [8]

Uma Subramanian, and Hang See Ong, "Analysis of the Effect of Clustering the

Training Data in Naive Bayes Classifier for Anomaly Network Intrusion Detection" Journal of Advances in Computer Networks, 2(1), 2014.

14 A comparative study of Classification algorithms used in Network Intrusion Detection Systems (NIDS) L. Vanitha,Dr. M. Safish Mary