Of Computer Science, Nalanda College Of Engg., Chandi, Bihar, India. 2Assistant Professor Dept. Of Computer Science & Engineering, B.I.T. Mesra, Patna ...
International Journal of Computer Trends and Technology (IJCTT) – volume 29 Number 3 – November 2015
Intrusion Detection Algorithm for data security Neeraj Kumar1
Dr. Upendra Kumar2
Dr. G. Sahoo3
1
Assitant Professor & Head Dept. Of Computer Science, Nalanda College Of Engg., Chandi, Bihar, India. Assistant Professor Dept. Of Computer Science & Engineering, B.I.T. Mesra, Patna Campus, Bihar, India. 3 Professor & Former Head( Dept. Of Computer Science), Dean Admissions & Academic Coordination, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, India.
2
Abstract— Rapid development in Network Technologies system in this I.T. era huge flow of data from network every moment, so obviously there should be a strong Network Intrusion Detection system (NIDS)is an important detection system that is used as a counter measure to prevent data integrity and system availability from attack [14] or a robust mechanism require to distinguish between relevant and non-relevant data particularly acting as an attack. Thus to provide total network security from intrusion this paper contributed to propose a innovative Intrusion Detection Algorithm (HDensities of data points known as Hamming density. Hamming density [8] is k-nearest neighbor divided by Hamming-distance) Density based Outlier Detection in data mining on UCI repository KDD Cup’99(Network Intrusion) data set. Simulation and Compare the result in finding the intrusion by our propose DBOD from other exiting algorithms like LOF in own Simulator and found comparatively more accuracy and increase in detecting the number of Intrusion in our proposed work. Keywords — Hamming Distance, Outliers, Outlier Detection. I. INTRODUCTION To protect the computer network system from attacks, Intrusion detection system is a network security mechanism. Rapid improvement in network technologies field has provided opening to the hackers, fishers and intruders to find the unauthorized path to enter into a new networked system[3]. So as technologies evolution, there is also a risk of new threats associating with them. Thus, when a new type of invasion emerges, an intrusion detection system (IDS) needs to be able to act effectively in to avoid hazardous effects. Today‘s the biggest challenge could be dealing with a ‗big data‘, i.e., a huge volume of network traffic data that collected dynamically from network communications [3]. Therefore, intrusion detection is major research area in computer network secure data flow and its main purpose is to identify this malicious behaviour in network traffic and to protect
ISSN: 2231-2803
our precious data, data-communication, net-surfing, online banking, e-commerce etc. from the threat. Main essence of using data mining, for providing such robust ―Intrusion Detection system‖ (IDS) to protect our computer communication system from threats. An outlier is a data point which is significantly different from the remaining data. Hawkins formally defined [11] the concept of an outlier as follows: ―An outlier is an observation which deviates so much from the other observations as suspicions that it was generated by a different mechanism.‖ Outliers are also referred to as abnormalities, or anomalies in the data mining and statistics literature. Outlier often contains useful information about abnormal characteristics of the systems and entities, which impact the data generation process. The recognition of such unusual characteristics provides useful application-specific insights. Some examples are as follows: Intrusion Detection Systems: In many hostbased or networked computer systems, different kinds of data are collected about the operating system calls, network traffic, or other activity in the system. This data may show unusual behaviour because of malicious.
Outlier detection, it can be classified into two categories: classic outlier approach and spatial outlier approach. The classic outlier approach [5] analyzes transaction dataset, Spatial deals with spatial information. Classical approach can be grouped into statistical-based approach, distancebased approach [10], deviation-based approach, density-based approach. Density based approach is found to be proven approach. LOF methods is one of the most efficient and used algorithm. In this work we proposed an efficient outlier detection algorithm based on the novel definition of density. This algorithm detects outlier more effectively while keeping time complexity lower than LOF. This paper is organized as follows Section II discuss the survey related density based outlier detection methods specially LOF. Proposed work (Intrusion Detection Algo.)Introduces in Section III, and Section IV consists of experimental results on
http://www.ijcttjournal.org
Page 157
International Journal of Computer Trends and Technology (IJCTT) – volume 29 Number 3 – November 2015
real dataset and finally Section V conclude and future work. II.
BACKGROUND AND LITERATURE REVIEW
A. Intrusion Detection Systems The intrusion detection method [3] can be classified into two types based on analysis methods: i. Anomaly-based intrusion detection method: The normal activities at user or system level whereas the misuse detection matches sample data to known intrusive patterns. Machine learning frequently forms the basis for anomaly method and it can detect novel attacks by comparing suspicious ones with normal traffics but has high false alarm rate due to difficulty of modeling practical normal behavior profiles for protected system [12]. ii. Misuse intrusion detection method: To detect intrusions, the anomaly uncovering analyzes the deviation from with misuse method, pattern matching leads to high accuracy for detecting threats but it cannot detect novel attacks as novel attack signatures are not available for pattern matching[3]. Misuse approach which is similar to virus scanners, where events are detected after matching with specific pre-defined patterns known as signatures. The limitation of this approach is poor performance to detect new attacks and also neglect minor variations of known patterns. Besides that it is found to have significant administrative overhead cost attached with it in order to maintain signature databases. Anomaly based detection was started using statistics. NSM also used statistics along with rules to monitor LAN traffic. The statistical component set the standard for statistical based intrusion detection as it computes a historical distribution of continuous and categorical attributes that gets updated over time and deviation are found using chisquare tests[3]. B. Data Mining for Intrusion Detection Data mining is an approach used to identify useful informative patterns and relationships within the large amount of data by a variety of data analysis tools and techniques, such as classification, association, clustering, and visualization [13]. It is closely associated and has much overlapping with the area of machine learning. It plays an important role in assisting in the analysis of network data. It can also provide solution of big data problems by filtering out big data into useful subsets of information Essentially applying data mining tools to network data provides the ability to identify the various underlying contexts associated with the network being monitored.
ISSN: 2231-2803
C. Anomoly Detection for Intrusion Detection Outlier detection is another illustration of data mining that is useful for intrusion detection. An outlier can be understood as an observation that deviates from the other normal observations as to cause suspicion that it was generated by a different mechanism. Outliers can be of two types : One that deviate significantly from others within their own network peripherals while the second is the one whose patterns belongs to other network services other than their own service. It was also observed that researchers have tried two approaches either to model only normal data or both normal and abnormal data for finding intruders. Our proposed system is implemented on KDD‘99 dataset, which was maintained by DARA, and defined by MIT, USA. Since1999, KDD‘99 [1] has been the most widely used dataset for evolution of anomaly detection method [14]. KDD’99(Dataset): This dataset contains FOUR types of intruders which is given below: Denial or service attack(DoS) User to Root attack(U2R) Remote to Local attack(R2L) Probing attack LOF( Local Outlier Factor)Algorithm; The Local Outlier factor (LOF) concept was given by Markus M. Breunig in 2000 which is frequently used. It is a very powerful density based outlier detection approach in the field of data mining. Definitions to explain LOF are given below: Where k is a user supplied natural number and MinPts is minimum points, p and o are object in database are used interchangeably. LOF algorithm : Step 1: The k-nearestneighbors are found according to the hamming distance, which is the most rigorous computational step. In general, all data instances have to becompared with all other remaining ones and computational complexity of this step is O(N2d), Whereas ‗d‘ is the dimension of the data. Definition 2.1: k-distance (p): This is the distance between p and its kth nearest neighbor. Definition 2.2: N (p) k-distance(p) : Given the kdistance(p), the N (p) k-distance(p) contains every object Step2: Local Reachability Density (LRD) is computed for all data points p based on the set of k neighbors Nmin(p), such thatThe LRD can be defined as the inverse of the average reachability distance of the nearest neighbors. In this context, the reachability Definition 2.3: reach-distk(p, o): This is the maximum distance out of k-distance of o and real
http://www.ijcttjournal.org
Page 158
International Journal of Computer Trends and Technology (IJCTT) – volume 29 Number 3 – November 2015
distance between p and o. In LOF formulation MinPts is an indication of mass and the values reach - distMinPt (p,o) for o€NMinPts (p), is an indication of volume to determine the density in the neighborhood of p (lrd, as given in the next definition, is an indication of density). Definition 2.4: lrd (p) Minpts :The local reachability density of p can be calculated dividing one by the average reachability distance based on the MinPts distance neighborhood of p. In a third step, the LOF score can be computed using the LRDs from the k-nearestneighbors, such that From the above equation we see that the LOF is a ratio of the LRDs. Definition 2.5: (local outlier factor of an object p) The LOF of p is defined as [15]
lrd Minpts (o) o N MinPts (p)
LOFMinpts (p)
lrd Minpts (p)
| N MinPts (p) |
On the other hand, if the densities of the neighbors are approximately as high as of the instance itself, the resulting ratio will be close to one.
Output: Density for each data points and outliers object. IDA(H-Distance, T) For i=1 to n For j=1 to n H.D[i,j]= A[i,j] XOR B[i,j] End For End For For k = 1 to n N (x) H-distance(p) =no of data objects with H-
distances less than or equal to Abs H-distance(x) H-Density=NH-distance (x)/ H-distance(x) Density Variance(DV)= (1/ NH-distance (x))* ∑( H-Density (x)-H-Density)2 x (where H-Density is the mean density.) If DV[x]