Document not found! Please try again

A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using ...

0 downloads 0 Views 2MB Size Report
May 31, 2015 - ... for Home Science and Higher Education for Women, Coimbatore 641 108, India ... benchmark dataset from the UCI Machine Learning repository is used. ... The Scientific World Journal .... improved by the fixed number of trials, then the scout bees ... In this paper, 10% KDD Cup'99 dataset is used for.
Hindawi Publishing Corporation e Scientific World Journal Volume 2015, Article ID 574589, 15 pages http://dx.doi.org/10.1155/2015/574589

Research Article A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features P. Amudha,1 S. Karthik,2 and S. Sivakumari1 1

Department of CSE, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore 641 108, India Department of CSE, SNS College of Technology, Coimbatore 641 035, India

2

Correspondence should be addressed to P. Amudha; [email protected] Received 20 January 2015; Revised 19 May 2015; Accepted 31 May 2015 Academic Editor: Giuseppe A. Trunfio Copyright © 2015 P. Amudha et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup’99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.

1. Introduction Due to the tremendous growth in the field of information technology, one of the significant challenging issues is network security. Hence, intrusion detection system (IDS) which is an indispensable component of the network needs to be secured. The traditional IDS is unable to handle newly arising attacks. The main goal of IDSs is to identify and distinguish the normal and abnormal network connections in an accurate and quick manner which is considered as one of the main issues in intrusion detection system because of the large amount of attributes or features. To study about this aspect, data mining based network intrusion detection is widely used to identify how and where the intrusions occur. Related to achieving real-time intrusion detection, researchers have investigated several methods of performing feature selection. Reducing the number of features by selecting the important features is critical to improve the accuracy and speed of classification algorithms. Hence, selecting the differentiating features and developing the best classifier model in terms of high accuracy and detection rates are the main focus of this work.

The research on machine learning or data mining considers the intrusion detection as a classification problem, implementing algorithms such as Na¨ıve Bayes, genetic algorithm, neural networks, Support Vector Machine, decision trees. In order to improve the accuracy of an individual classifier, popular approach is to combine the classifiers. Recently, application of swarm intelligence technique for intrusion detection has gained prominence among the research community [1]. Swarm intelligence can be a measure presenting the communal behaviour of social insect colonies or other animal societies to implement algorithms [2]. The potential of swarm intelligence makes it a perfect candidate for IDS, which needs to distinguish normal and abnormal behaviors from large amount of data. The main objective of this work is (1) to select important features using two feature selection methods, namely, single feature selection method and random feature selection method and (2) to propose a hybrid optimization algorithm based on Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) algorithms for classifying intrusion detection dataset. The studies on ABC and PSO indicate

2 that ABC has powerful global search ability but poor local search ability [3], while the PSO has powerful local search ability but poor global search ability [4]. In order to provide a powerful global search capability and local search capability, in this paper a hybridized model called MABC-EPSO is proposed which brings the two algorithms together so that the computation process may benefit from both advantages. In this hybrid algorithm, the local search and global search abilities are balanced to obtain more quality results. KDDCUP’99 intrusion detection dataset developed by MIT Lincoln Laboratory is used for experiments to find the accuracy of the proposed hybrid approach. The rest of this paper is organized as follows. Section 2 provides an overview of related work. Section 3 presents the principles of PSO and ABC. Section 4 describes the methodology, dataset description and preprocessing, proposed feature selection methods, and hybrid approach. Section 5 gives performance metrics, experimental results, and discussions. Finally, conclusion is given in Section 6.

2. Related Work Being related to achieving real-time intrusion detection, researchers have investigated several methods of performing feature selection. Kohavi and John [4] described the feature subset selection problem in supervised learning, which involves identifying the relevant or useful features in a dataset and giving only that subset to the learning algorithm. The real-life intrusion detection dataset contains redundant features or insignificant features. The redundant features make it harder to detect possible intrusion patterns [5]. With the increasing applications of classification algorithms and feature selection methods for intrusion detection dataset, a comprehensive list of a few such literatures is given in [6–23]. Machine learning algorithms such as neural networks [9], fuzzy clustering [14] have been applied to IDS to construct good detection model. Support vector machine (SVM) [24] has become a popular research method in intrusion detection due to its good generalization performance and the sparse representation of solution. Satpute et al. [25] enhanced the performance of intrusion detection system by combining PSO and its variants with machine learning techniques for the detection of anomaly in network intrusion detection system. Chung and Wahid [26] proposed a novel simplified swarm optimization (SSO) algorithm as a rule based classifier and for feature selection for classifying audio data. The algorithm is more flexible and cost-effective to solve complex computing environments. Revathi and Malathi [10, 11] proposed hybrid simplified swarm optimization to preprocess the data and compared the proposed approach with a new hybridized approach, PSO with Random Forest, and found that the proposed method provided high detection rate and optimal solution. Karaboga and Basturk [27] proposed Artificial Bee Colony (ABC) algorithm based on a particular intelligent behaviour of honeybee swarms. By understanding the basic behaviour characteristics of foragers, ABC algorithm was developed and was compared with that of differential evolution,

The Scientific World Journal Particle Swarm Optimization, and evolutionary algorithm for multidimensional and multimodal numeric problems. Karaboga and Akay [28] proposed ABC algorithm for anomaly-based network intrusion detection system to optimize the solution. The proposed method was classified into four stages such as parameterization, training stage, testing stage, and detection stage. D. D. Kumar and B. Kumar [29] applied ABC algorithm for anomaly-based IDS and used feature selection techniques to reduce the number of features used for detection and classification. Mustafa ServetKiran and MesutGunduz [30] proposed a hybridization of PSO and ABC for different continuous optimization problems in which the information between particle swarm and bee colony helps in increasing global and local search abilities of the hybrid approach.

3. Theoretical Background The following subsections provide the necessary background to understand the problem. 3.1. Particle Swarm Optimization. Particle Swarm Optimization (PSO) is one of the popular heuristic technique which has been successfully applied in many different application areas, but, however, it suffers from premature convergence especially in high dimension, multimodal problems. The algorithm of the standard PSO is as follows. (1) Initialize a population of particles with randomly chosen positions and velocities. (2) Calculate the fitness value of each particle in the population. (3) If the fitness value of the particle 𝑖 is better than its pbest value, then set the fitness value as a new pbest of particle 𝑖. (4) If pbest is updated and it is better than the current gbest, then set gbest to the current pbest value of particle 𝑖. (5) Update particle’s velocity and position according to (1) and (2). (6) If the best fitness value or the maximum generation is met, then stop the process; otherwise, repeat the process from step 2. In PSO, a swarm consists of 𝑁 particles in a D-dimensional searching space. The 𝑖th particle is represented as 𝑋𝑖 = (𝑥𝑖1 , 𝑥𝑖2 , . . . , 𝑥𝑖𝑑 ). The best previous position pbest of any particle is 𝑃𝑖 = (𝑝𝑖1 , 𝑝𝑖2 , . . . , 𝑝𝑖𝑑 ) and the velocity for particle 𝑖 is 𝑉𝑖 = (V𝑖1 , V𝑖2 , . . . , V𝑖𝑑 ). The global best particle in the whole swarm is denoted by 𝑃𝑔 and it represents the fittest particle [31]. During each iteration, each particle updates its velocity according to the following equation: 𝑡 𝑡−1 𝑡−1 = V𝑖𝑑 + 𝑐1 ⋅ rand1 ⋅ (𝑝𝑖𝑑 − 𝑥𝑖𝑑 ) + 𝑐2 V𝑖𝑑 𝑡−1 ⋅ rand2 ⋅ (𝑝𝑔𝑑 − 𝑥𝑖𝑑 ),

(1)

The Scientific World Journal

3

where 𝑐1 and 𝑐2 denote the acceleration coefficients, 𝑑 = 1, 2, . . . , 𝐷, and rand1 and rand2 are random numbers uniformly distributed within [0, 1]. Each particle then moves to a new potential position as in the following equation: 𝑡 𝑡−1 𝑡 = 𝑥𝑖𝑑 + V𝑖𝑑 , 𝑥𝑖𝑑

𝑑 = 1, 2, . . . , 𝐷.

(2)

is calculated by (5). And a greedy selection method is used between 𝑥𝑚 and V𝑚 1 { , 𝑓𝑖 ≥ 0 { fit𝑖 = { 𝑓𝑖 + 1 { 󵄨󵄨 󵄨󵄨 {1 + 󵄨󵄨𝑓𝑖 󵄨󵄨 , 𝑓𝑖 < 0,

(5)

where 𝑓𝑖 represents the objective value of 𝑖th solution.

3.2. Artificial Bee Colony. The Artificial Bee Colony (ABC) algorithm is an optimization algorithm based on the intelligent foraging behaviour of honey bee swarm, proposed by Karaboga and Basturk [27]. The Artificial Bee Colony comprises of three groups: scout bees, onlooker bees, and employed bees. The bee, which carries out random search, is known as scout bee. The bee which visits the food source is an employed bee. The bee, which waits on the dance region is an onlooker bee and the onlooker bee with scout is also called unemployed bee. The employed and unemployed bees search for the good food sources around the hive. The employed bees share the stored food source information with onlooker bees. The amount of food sources is equal to the amount of employed bees and also is equal to the number of onlooker bees. The solutions of the employed bees which cannot be enhanced by a fixed number of bounds become scouts and their solutions are abandoned [28]. In the context of optimization, the amount of food sources in ABC algorithm represents the number of solutions in the population. The point of a good food source indicates the location of a promising solution to the optimization problem [27]. The four main phases of ABC algorithm are as follows.

Onlooker Bee Phase. Onlooker bees examine the effectiveness of food sources by observing the waggle dance in the dance region and then randomly select a rich food source. Then, the bees perform a random search in the neighbourhood area of food source using (4). The quantity of a food source is evaluated by its profitability 𝑃𝑖 using the following equation:

Initialization Phase. The scout bees randomly generate the population size (SN) of food sources. The input vector 𝑥𝑚 which contains 𝐷 variables represents food source where 𝐷 represents the searching space dimension of the objective function to be optimized. Using (3), initial sources of food are produced randomly

4.1. Research Framework. In this study, the framework of the proposed work is given as follows.

𝑝𝑖 =

fit𝑖

∑SN 𝑛=1

fit𝑛

,

(6)

where fit𝑖 denotes the fitness of the solution represented by food source 𝑖 and SN denotes the total number of food sources which is equal to number of employed bees. Scout Phase. If the effectiveness of food source cannot be improved by the fixed number of trials, then the scout bees remove the solutions and randomly search for new solutions by using (3) [29]. The pseudocode of the ABC algorithm is given in Algorithm 1.

4. Methodology

(i) Data preprocessing: it prepared the data for classification and removed unused features and duplicate instances.

(3)

(ii) Feature selection: it determined the feature subset using SFSM and RFSM methods that contribute to the classification.

where 𝑢𝑖 and 𝑙𝑖 are the upper and lower bounds of the solution space of objective function and rand(0, 1) is a random number within the range [0, 1].

(iii) Hybrid classification: it performed classification using MABC-EPSO algorithm to enhance the classification accuracy for the KDDCUP’99 dataset.

𝑥𝑚 = 𝑙𝑖 + rand (0, 1) ∗ (𝑢𝑖 − 𝑙𝑖 ) ,

Employed Bee Phase. The employed bee finds a new food source within the region of the food source. The employed bees reminisce higher quantity of food source and share it with onlooker bees. Equation (4) determines the neighbour food source V𝑚𝑖 and is calculated by V𝑚𝑖 = 𝑥𝑚𝑖 + 𝜙𝑚𝑖 (𝑥𝑚𝑖 − 𝑥𝑘𝑖 ) ,

(4)

where 𝑖 is a randomly selected parameter index, 𝑥𝑘 is a randomly selected food source, and 𝜙𝑚𝑖 is a random number within the range [−1, 1]. Suitable tuning on specific problems can be made using this parameter range. The fitness of food sources, which is needed to find the global, optimal solution,

The objective of this study is to help the network administrator in preprocessing the network data using feature selection methods and to perform classification using hybrid algorithm which aims to fit a classifier model to the prescribed data. 4.2. Data Source and Dataset Description. In this section, we provide brief description of KDDCup’99 dataset [30] which is derived from UCI Machine Learning Repository [31]. In 1998, DARPA intrusion detection evaluation program, to perform a comparison of various intrusion detection methods, a simulated environment, was set up by the MIT Lincoln Lab to obtain raw TCP/IP dump data for a local-area network (LAN). The functioning of the environment was like a real one, which included both background network traffic and

4

The Scientific World Journal

Input: initial solutions. Output: optimal solution. BEGIN Generate the initial population 𝑥𝑚 , 𝑚 = 1, 2, . . . , SN Evaluate the fitness (fit𝑖 ) of the population set cycle = 1 repeat FOR (employed phase) { Produce a new solution V𝑚 using (4) Calculate fit𝑖 Apply greedy selection process} Calculate the probability 𝑃𝑖 using (6) FOR (onlooker phase) { Select a solution 𝑥𝑚 depending on 𝑃𝑖 Produce a new solution V𝑚 Calculate fit𝑖 Apply greedy selection process} If (scout phase) There is an abandoned solution for the scout depending on limit Then Replace it with a new solution which will by randomly produced by (3) Memorize the best solution so far cycle = cycle + 1 Until cycle = MCN end Algorithm 1: Artificial Bee Colony.

Table 1: Distribution of connection types in 10% KDDCup’99 dataset. % of occurrence Label

DoS

Probe

U2R

R2L

Training data 79.24% 0.83% 0.01% 0.23% Testing data 73.90% 1.34% 0.07% 5.20%

Total attack

Total normal

80.31% 81.51%

19.69% 19.49%

wide variety of attacks. A version of 1998 DARPA dataset, KDDCup’99, is now widely accepted as a standard benchmark dataset and received much attention in the research community of intrusion detection. The main motivation of using KDDCup’99 Dataset is to show that the proposed method has the advantage of becoming an efficient classification algorithm when applied to the intrusion detection system. In this paper, 10% KDD Cup’99 dataset is used for experimentation. The distribution of connection types and sample size in 10% KDDCUP dataset is shown in Tables 1 and 2. The feature information of 10% KDDCUP dataset is shown in Table 3. The dataset consists of one type of normal data and 22 different attack types categorized into 4 classes, namely, denial of service (DoS), Probe, user-to-root (U2R), and remote-to-login (R2L). 4.3. Data Preprocessing. Data preprocessing is the timeconsuming task which prepares the data for subsequent analysis as per the requirement for intrusion detection system model. The main aim of data preprocessing is to transform the raw network data into suitable form for further analysis. Figure 1 illustrates the steps involved in data processing and

Table 2: Sample size in 10% KDDCUP dataset. Category of attack

Attack name

Normal

Normal (97277) Neptune (107201), Smurf (280790), Pod (264), Teardrop (979), Land (21), Back (2203) Portsweep (1040), IPsweep (1247), Nmap (231), Satan (1589) Bufferoverflow (30), LoadModule (9), Perl (3), Rootkit (10) Guesspassword (53), Ftpwrite (8), Imap (12), Phf (4), Multihop (7), Warezmaster (20), Warezclient (1020)

DoS Probe U2R R2L

Table 3: Feature information of 10% KDDCUP dataset. Dataset characteristics Attribute characteristics Associated task Area Number of instances Number of attributes Number of classes

Multivariate Categorical, integer Classification Computer 494,020 42 1 normal class, 4 attack classes

how raw input data are processed for further statistical measures. Various statistical analyses such as feature selection, dimensionality reduction, and normalization are essential to select significant features from the dataset. If the dataset contains duplicate instances, then the classification algorithms

The Scientific World Journal

5

Network audit data

Alarm/alert

Data preprocessing

Data analysis

Fill missing value

Association mining

Remove duplicate instances

Classification

Feature selection or dimensionality reduction

Clustering

Figure 1: Data preprocessing.

Input: Dataset 𝑋 with 𝑛 Features Output: Vital features Begin Let 𝑋 = {𝑥1 , 𝑥2 , . . . , 𝑥𝑛 }, where 𝑛 represents the number of features in the dataset. for 𝑖 = 1, 2, . . . , 𝑛 {𝑋(𝑖) = 𝑥(𝑖) //one dimensional feature vector Apply SVM classifier } Sort features based on classifier accuracy (acc) If acc > acc threshold and detection rate > dr threshold then Select the features End Algorithm 2: Single feature selection method.

of intrusion detection dataset may lead to complex intrusion detection model and reduce detection accuracy.

Table 4: Details of instances in the dataset. Before removing duplicates

After removing duplicates

Selected instances

Normal DoS Probe U2R R2L

97,277 391,458 4,107 52 1,126

87832 54572 2131 52 999

8783 7935 2131 52 999

Total

494,020

145,586

19,900

consume more time and also provide inefficient results. To achieve more accurate and efficient model, duplication elimination is needed. The main deficiency in this dataset is the large number of redundant instances. This large amount of duplicate instances will make learning algorithms be partial towards the frequently occurring instances and will inhibit it from learning infrequent instances which are generally more unsafe to networks. Also, the existence of these duplicate instances will cause the evaluation results to be biased by the methods which have better detection rates on the frequently occurring instances [32]. Eliminating duplicate instances helps in reducing false-positive rate for intrusion detection. Hence, duplicate instances are removed, so the classifiers will not be partial towards more frequently occurring instances. The details of instances in the dataset are shown in Table 4. After preprocessing, selected random sample of 10% normal data and 10% Neptune attack in DoS class and four new sets of data are generated with the normal class and four categories of attack [33]. Moreover, irrelevant and redundant attributes

4.4. Feature Selection. Feature selection is an important data processing process. As the dataset is large, it is essential to remove the insignificant features, in order to distinguish normal traffic or intrusions in a well-timed manner. In this paper, feature subsets are formed based on single feature method (SFSM), random feature selection method (RFSM) and compared the two techniques. The proposed methods reduce the features in the datasets which aim to improve accuracy rate, reduce processing time, and improve efficiency for intrusion detection. 4.4.1. Single Feature Selection Method. Single feature method (SFSM) uses the one-dimensional feature vector. In the first iteration, it considers only the first attribute and is evaluated for calculating the accuracy using the Support Vector Machine classifier. In the second iteration, it considers only the corresponding attribute for evaluation. The process is repeated until all 41 features are evaluated. After calculating the entire feature’s efficiency, it is sorted and vital features are selected, whose accuracy and detection rate are acc threshold and dr threshold values, respectively. The pseudocode of single feature selection algorithm is given in Algorithm 2. 4.4.2. Random Feature Selection Method. In this method, the features are removed randomly and evaluated using the classifier. In the first iteration, all the features are evaluated using SVM classifier, and then by deleting one feature, update the dataset, using the classifier efficiency. The importance of

6

The Scientific World Journal

Input: Dataset 𝑋 with 𝑛 Features Output: Vital features Begin Let 𝑋 = {𝑥1 , 𝑥2 , . . . , 𝑥𝑛 }, where 𝑛 represents the number of features in the dataset Let 𝑆 = {𝑋} ∀𝑛𝑖−1 𝑋: do Delete 𝑥𝑖 from 𝑋 𝑆 = 𝑆 − 𝑥𝑖 //update feature subset Apply SVM classifier Delete 𝑥𝑖 from 𝑋 end Sort the features based on classifier accuracy (acc) If acc > acc threshold and detection rate > dr threshold then 𝑆 = 𝑆 − 𝑥𝑖 //selecting vital features End Algorithm 3: Random feature selection method.

Table 5: List of features selected using SFSM methods. Dataset DoS + 10% normal Probe + 10% normal R2L + 10% normal U2R + 10% normal

Selected features 24, 32, 41, 28, 40, 27, 34, 35, 5, 17, 21, 4, 39, 11, 9, 7, 14, 1, 30, 6 11, 1, 15, 26, 10, 4, 21, 18, 19, 25, 39, 31, 7, 35, 28 16, 26, 30, 3, 7, 21, 6, 14, 12, 35, 32, 18, 38, 17, 41, 10, 31 27, 40, 26, 1, 34, 41, 7, 18, 28, 3, 20, 37, 11

Number of features 20 15 17 13

Table 6: List of features selected using RFSM methods. Dataset DoS + 10% normal Probe + 10% normal R2L + 10% normal U2R + 10% normal

Selected features 4, 9, 21, 39, 14, 28, 3, 8, 29, 33, 17, 12, 38, 31 27, 2, 3, 30, 11, 33, 23, 9, 39, 20, 21, 37 24, 15, 23, 7, 25, 16, 8, 33, 29, 38, 21, 30, 32 6, 19, 22, 30, 21, 28, 36, 27, 11, 17, 20

Number of features 14 12 13 11

the provided feature is calculated. In the second iteration, another feature is removed randomly from the dataset and updated. The process is repeated until only one feature is left. After calculating the entire feature’s efficiency, it is sorted in descending order of its accuracy. If the accuracy and detection rate are greater than the threshold value (accuracy and detection rate obtained using all features), then select those features as vital features. The pseudocode of the random feature selection algorithm is given in Algorithm 3. Tables 5 and 6 show the feature subsets identified using the two feature selection methods and size of the subsets identified as a percentage of the full feature set.

4.5. Hybrid Classification Approach. Artificial intelligence and machine learning techniques were used to build different IDSs, but they have shown limitations in achieving high detection accuracy and fast processing time. Computational intelligence techniques, known for their ability to adapt and to exhibit fault tolerance, high computational speed, and resilience against noisy information, compensate for the limitations of these approaches [1]. Our aim is to increase the level of performance of intrusion detection of the most used classification techniques nowadays by using optimization methods like PSO and ABC. This work develops an algorithm that combines the logic of both ABC and PSO to produce a high performance IDS and their combination has the advantage of providing a more reliable solution to today’s data intensive computing processes. Artificial Bee Colony algorithm is a newly proposed optimization algorithm and is becoming a hot topic in computational intelligence nowadays. Because its high probability of avoiding the local optima, it can make up the disadvantage of Particle Swarm Optimization algorithm. Moreover, Particle Swarm Optimization Algorithm can help us to find out the optimal solution more easily. In such circumstances, we bring the two algorithms together so that the computation process may benefit from both of the advantages. The flowchart of the proposed hybrid MABC-EPSO is given in Figure 2. In this hybrid model, the colony is divided into two parts: one possesses the swarm intelligence of Artificial Bee Colony and the other one is the particle swarm intelligence. Assuming that there is cooperation between the two parts, in each iteration, one part which finds out the better solution will share its achievement with the other part. The inferior solution will be replaced by the better solution and will be substituted in the next iteration. The process of MABC-EPSO is as follows. Step 1 (initialization of parameters). Set the number of individuals of the swarm; set the maximum circle index of the algorithm; set the search range of the solution; set the other constants needed in both ABC and PSO.

The Scientific World Journal

7 Network audit data Data preprocessing Feature selection using SFSM and RFSM Initialize the parameters of EPSO and MABC Evaluate the fitness value Determine the gbest of EPSO and the best of MABC

EPSO

MABC

Calculate the particle

Employed bee phase

Update particle positions

Onlooker bee phase

Update the best

Scout bee phase

Determine the pbest and gbest of EPSO

No

Determine the best of MABC

Is the termination condition satisfied? Yes Select the best solution

Figure 2: Flowchart of the proposed hybrid MABC-EPSO model.

Step 2 (initialization of the colony). Generate a colony with a specific number of individuals. Bee colony is divided into two categories, employed foragers and unemployed foragers, according to each individual’s fitness value; on the other hand, as a particle swarm, calculate the fitness value of each particle and take the best location as the global best location. Step 3. In bee colony, to evaluate the fitness value of each solution, an employee bee is assigned using (5). The employee bee selects a new candidate solution from the nearby food sources and then uses greedy selection method by calculating the Rastrigin function as follows: 𝑛

Min 𝑓 (𝑥) = 10𝑛 + ∑ [𝑥𝑖2 − 10 cos (2𝜋𝑥𝑖 )] .

(7)

𝑖=1

A multimodal function is said to contain more than one local optimum. A function of variables is separable if it can be modified as a sum of functions of just one variable [34]. The dimensionality of the search space is another significant factor in the complexity of the problem. The challenge involved in finding optimal solutions to this function is that, on the way towards the global optimum, an optimization problem can be easily confined in a local optimum. Hence, the classical benchmark function Rastrigin [34] is implemented using

Artificial Bee Colony algorithm and named as Modified Artificial Bee Colony (MABC) algorithm. In (1) 𝑓𝑖 is Rastrigin function whose value is 0 at its global minimum (0, 0, . . . , 0). This function is chosen, because it is considered to be one of the best test functions for finding the global minimum. Initialization range for the function is [−15, 15]. This function is with cosine modulation to produce many local minima. Thus, the function is multimodal. Step 4. If the fitness value is larger than the earlier one, the bee remembers the new point and forgets the previous one; otherwise, it keeps the previous solution. Based on the shared information by employee bees, an onlooker bee calculates the shared fitness value and selects a food source with a probability value computed as in (6). Step 5. An onlooker bee constructs a new solution selected among the neighbors of a previous solution. It also checks the fitness value and if this value is better than the previous one, it will substitute the old one with the new position; otherwise, it evokes the old position. The objective of scout bees is to determine new random food sources to substitute the solutions that cannot be enhanced after reaching the “limit” value. In order to obtain the best optimized solution, the algorithm goes through a predefined number of cycles

8

The Scientific World Journal

(MCN). After all the choices have been made, the best solution generated in that iteration is called MABCbest . Step 6. As there is a large effect of initial velocity in the balancing of exploration and exploitation process of swarm, in this proposed Enhanced Particle Swarm Optimization (EPSO) algorithm, inertia weight (𝜔) [35] is used to control the velocity and hence the velocity update equation (8) becomes as follows: 𝑡 𝑡−1 𝑡−1 = 𝜔 ⋅ V𝑖𝑑 + 𝑐1 ⋅ rand1 ⋅ (𝑝𝑖𝑑 − 𝑥𝑖𝑑 ) + 𝑐2 V𝑖𝑑 𝑡−1 ⋅ rand2 ⋅ (𝑝𝑔𝑑 − 𝑥𝑖𝑑 ).

𝑤max − 𝑤min × 𝑘. itermax

(9)

In particle swarm, after the comparison among the solutions that each particle has experienced and the comparison among the solutions that all the particles have ever experienced, the best location in that iteration is called EPSObest . Step 7. The minimum of the value MABCbest and EPSObest is called Best and is defined as {EPSObest Best = { MABCbest {

if EPSObest ≤ MABCbest if MABCbest ≤ EPSObest .

Attack False Positive (FP) True Positive (TP)

True Positive (TP): the number of of attacks that are correctly identified. True Negative (TN): the number of normal records that are correctly classified. False Positive (FP): the number of normal records incorrectly classified. False Negative (FN): the number of attacks incorrectly classified.

5.1. Performance Metrics. The performance metrics like accuracy, sensitivity, specificity, false alarm rate, and training time are recorded for the intrusion detection dataset on applying the proposed MABC-PSO classification algorithm. Generally, sensitivity and specificity are the statistical measures used to carry out the performance of classification algorithms. Hence, sensitivity and specificity are chosen to be the parametric indices for carrying out the classification task. In intrusion detection problem, sensitivity can also be called detection rate. The number of instances predicted correctly or incorrectly by a classification model is summarized in a confusion matrix and is shown in Table 7. The classification accuracy is the percentage of the overall number of connections correctly classified Classification accuracy =

(TP + TN) . (TP + TN + FP + FN)

(11)

5. Experimental Work This section provides the performance metrics that are used to assess the efficiency of the proposed approach. It also presents and analyzes the experimental results of hybrid approach and compares it with the other classifiers.

(12)

Sensitivity (True Positive Fraction) is the percentage of the number of attack connections correctly classified in the testing dataset Sensitivity =

Parameter Settings. The algorithms are evaluated using the two feature sets selected by SFSM and RFSM. In ABC algorithm, the parameters set are bee colony size: 40, MCN: 500, and limit: 5. In EPSO algorithm, the inertia weight 𝜔 in (11) varies from 0.9 to 0.7 linearly with the iterations. Also, the acceleration coefficients 𝑐1 and 𝑐2 are set as 2. The upper and lower bounds for V(Vmin , Vmax ) are set as the maximum upper and lower bounds of 𝑥

𝑡−1 ). + 𝑐2 rand (0, 1) (𝑝𝑔𝑑 − 𝑥𝑖𝑑

Normal Attack

Predicted Normal True Negative (TN) False Negative (FN)

(10)

Step 8. If the termination condition is satisfied, then end the process and report the best solution. Otherwise, return to Step 2.

𝑡 𝑡−1 𝑡−1 = 𝜔V𝑖𝑑 + 𝑐1 rand (0, 1) (𝑝𝑖𝑑 − 𝑥𝑖𝑑 ) V𝑖𝑑

Actual

(8)

A small inertia weight facilitates a local search, whereas a large inertia weight facilitates a global search. In the EPSO algorithm, linear decreasing inertia weight [36] as in (9) is used to enhance the efficiency and performance of PSO. It is found experimentally that inertia weight from 0.9 to 0.4 provides the optimal results 𝑤𝑘 = 𝑤max −

Table 7: Confusion matrix.

TP . (TP + FN)

(13)

Specificity (True Negative Fraction) is the percentage of the number of normal connections correctly classified in the testing dataset Specificity =

TN . (TP + FN)

(14)

False alarm rate (FAR) is the percentage of the number of normal connections incorrectly classified in the testing and training dataset False Alarm Rate (FAR) =

FP . (TN + FP)

(15)

Cross-validation is a technique for assessing how the results of a statistical analysis will generalize to an independent dataset. It is the standard way of measuring the accuracy of a learning scheme and it is used to estimate how accurately a predictive model will perform in practice. In this work, 10-fold cross-validation method is used for improving the classifier reliability. In 10-fold cross-validation, the original data is divided randomly into 10 parts. During each run, one of the partitions is chosen for testing, while the remaining

The Scientific World Journal

9

Table 8: Performance comparison of classification algorithms on accuracy rate. Average accuracy (%)

Feature selection method

C4.5 [6]

99.11 98.69 98.84 99.41

All features Genetic algorithm Best-first Correlation feature selection

BayesNet [6]

99.53 99.52 98.91 98.92

All features Genetic algorithm Best-first Correlation feature selection

ABC-SVM [7] PSO-SVM [7] GA-SVM [7]

92.768 83.88 80.73

Binary ABC

KNN [8]

98.24 98.11

All features Fast feature selection

Bayes Classifier [8]

76.09 71.94

All features Fast feature selection

ANN [9]

81.57

Feature reduction

SSO-RF [10, 11]

92.7

SSO

Hybrid SSO [12]

97.67

SSO

RSDT [13]

97.88

Rough set

ID3 [13] C4.5 [13]

97.665 97.582

All features

FC-ANN [14]

96.71

All features

Proposed MABC-EPSO

88.59 99.32 99.82

All features Single feature selection method Random feature selection method

96 92 88 84 MABC-EPSO

ABC

SVM

RBF

80 J4.8

5.2. Results and Discussions. The main motivation is to show that the proposed hybrid method has the advantage of becoming an efficient classification algorithm based on ABC and PSO. To further prove the robustness of the proposed method, other popular machine learning algorithms [38] such as Naives Bayes (NB) which is a statistical classifier; decision tree (j4.8); radial basis function (RBF) network; Support Vector Machine (SVM) that is based on the statistical learning theory and basic ABC are tested on KDDCup’99 dataset. For each classification algorithm, their default control parameters are used. In Table 8, the results are reported for accuracy rate obtained by various classification algorithms using different feature selection methods. The performance comparison of the classifiers on accuracy rate is given in Figures 3–6. The results show that, on classifiying the dataset with all features, the average accuracy rate of 85.5%, 84.5%, and 88.59% is obtained for SVM, ABC, and proposed hybrid approaches. When SFSM is applied, accuracy rate of ABC and proposed MABC-EPSO

100

Naïve Bayes

nine-tenths are used for training. This process is repeated 10 times so that each partition is used for training exactly once. The average of the results from the 10-fold gives the test accuracy of the algorithm [37].

Accuracy (%)

Classification Algorithms

Classification algorithms All SFSM RFSM

Figure 3: Accuracy comparison of classifiers for DoS dataset.

is increased significantly to 94.36% and 99.32%. The highest accuracy (99.82%) is reported when the proposed MABCEPSO with random feature selection method is employed. It

10

The Scientific World Journal Table 9: Accuracy rates of classifiers using SFSM feature selection method and Friedman ranks.

Dataset DoS + 10% normal Probe + 10% normal R2L + 10% normal U2R + 10% normal Average rank

NB 82.57 (6) 82.68 (5) 86.15 (4) 84.06 (6) 5.25

J4.8 87.11 (4) 82.6 (6) 82.55 (6) 87.16 (3) 4.75

RBF 87.96 (3) 83.72 (4) 85.16 (5) 85.54 (5) 4.25

ABC 90.82 (2) 96.58 (2) 92.72 (2) 97.31 (2) 2

MABC-EPSO 99.50 (1) 99.27 (1) 99.24 (1) 99.8 (1) 1

100

Naïve Bayes

MABC-EPSO

ABC

SVM

80

RBF

80 J4.8

85

Naïve Bayes

84

Classification algorithms

MABC-EPSO

90

ABC

88

95

SVM

92

RBF

Accuracy (%)

96

J4.8

100

Accuracy (%)

SVM 84.7 (5) 85.67 (3) 90.61 (3) 85.97 (4) 3.75

Classification algorithms

All SFSM RFSM

All SFSM RFSM

Figure 4: Accuracy comparison of classifiers for probe dataset.

Figure 6: Accuracy comparison of classifiers for U2R dataset.

100

Accuracy (%)

95 90 85 80

MABC-EPSO

ABC

SVM

RBF

J4.8

Naïve Bayes

75

Classification algorithms All SFSM RFSM

Figure 5: Accuracy comparison of classifiers for R2L dataset.

is also observed that on applying random feature selection method, the accuracy of SVM and ABC is increased to 95.71% and 97.92%. The accuracy rate of NB, j4.8, and RBF classifiers is comparatively high with RFSM method compared to SFSM and full feature set.

In order to test the significance of the differences among classifiers, six classification algorithms previously mentioned over four datasets are considered and performed experiments using Friedman test and ANOVA. Tables 9 and 10 depict the classification accuracy using two feature selection methods and their ranks computed through Friedman test (ranking is given in parenthesis). The null hypothesis states that all the classifiers perform in the same way and hence their ranks should be equal. The Friedman test ranked the algorithms for each dataset, with the best performing algorithm getting the rank of 1, the second best algorithm getting the rank 2. As seen in Table 9, MABC-EPSO is the best performing algorithm, whereas Na¨ıve Bayes is the least performing algorithm and Table 10 shows that MABC-EPSO is the best performing algorithm, whereas Na¨ıve Bayes and j4.8 are the least performing algorithms. Friedman statistic 𝜒2 = 15.716 and 𝐹𝐹 = 11.005 for SFSM and 𝜒2 = 15.712 and 𝐹𝐹 = 10.992 for RFSM are computed. Having four datasets and six classification algorithms, distribution of 𝐹𝐹 is based on 𝐹 distribution with 6 − 1 = 5 and (6 − 1) ∗ (4 − 1) = 15 degrees of freedom. The critical value of 𝐹(5, 15) for 𝛼 = 0.05 is 2.9013 and 𝑃 value < 0.05. So, we reject the null hypothesis, and the differences among classifiers are significant. The means of several groups by estimating the variances among groups and within a group are compared using the ANOVA test. Here, the null hypothesis which is set as all

The Scientific World Journal

11

Table 10: Accuracy rates using RFSM feature selection method and Friedman ranks. Dataset DoS + 10% normal Probe + 10% normal R2L + 10% normal U2R + 10% normal Average rank

NB 83.04 (6) 84.01 (5) 86.32 (4) 85.15 (6) 5.25

J4.8 90.05 (4) 82.72 (6) 83.10 (6) 88.42 (5) 5.25

RBF 88.83 (5) 85.94 (4) 86.11 (5) 88.98 (4) 4.5

SVM 94.02 (3) 95.87 (3) 97.04 (3) 95.91 (3) 3

ABC 96.43 (2) 97.31 (2) 98.96 (2) 98.96 (2) 2

MABC-EPSO 99.81 (1) 99.86 (1) 99.80 (1) 99.80 (1) 1

Table 11: ANOVA results for accuracy rate of classifiers. Source of variation

SS

df

Between groups Within groups Total

781.5143 88.20985 869.7241

5 18 23

Between groups Within groups Total

879.4307 65.21375 944.6444

5 18 23

RFSM method 48.54728

175.8861 3.622986

𝑃 value

𝐹-crit.

Suggest Documents