Application of Feature Selection and Fuzzy ... - Semantic Scholar

Application of Feature Selection and Fuzzy ARTMAP to Intrusion Detection Christina B. Vilakazi and Tshilidzi Marwala Abstract— This paper proposes a novel approach for intrusion detection and diagnosis. The proposed approach uses Sequential Backward Floating Search for feature selection and fuzzy ARTMAP for detection and diagnosis of attacks. The optimal vigilance parameter for the fuzzy ARTMAP is chosen using a genetic algorithm. The reduced set of features decreases the computation time by 0.789s. A classification rate of 100% and 99.89% is obtained for the detection stage and diagnosis stage, respectively.

I. INTRODUCTION An increasing dependence of companies and government agencies on computer networks has increased the importance of protecting these systems from attacks. A single intrusion of a computer network can result in the loss or unauthorized utilization or modification of large amounts of data. Hence, it is important to have a detecting and monitoring system to protect important data. Intrusion Detection System (IDS) uses a lot of features. However, some of the features may be redundant or contribute little if anything to the detection process. The purpose of this study is to identify important input features in building an IDS that is computationally efficient and effective. Experimental results indicate that significant input feature selection is important to design an IDS that is lightweight, efficient and effective for real world detection systems[1]. Punch et al.[2], Pei et al. [3], Huang et al.[4] addressed the problem of feature selection by using a genetic algorithm (GA) to find an optimal or nearly optimal weighting of features for k-Nearest Neighbor classifiers. Huang et al. [4] apply a GA to find an optimal subset of features for a Bayes classifier and a linear regression classifier. Gartner et al. [5] used support vector machines to find optimal feature weights for a Bayes classifier. Stein et al.[6] combined a genetic algorithm with decision tree classifiers to find an optimal subset of features for decision tree classifiers. Mukkamala et al.[7] used data mining paradigms for feature selection and classification based on Bayesian networks and Classification and Regression Tress (CART). To the best of our knowledge, using sequential backward floating search has not been tried for intrusion detection. In this paper, we use a sequential backward floating search to find an optimal subset of features for fuzzy ARTMAP. C. Vilakazi is with the school of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa. (e-mail: [email protected]). T. Marwala is a professor of Electrical Engineering at the school of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa. (phone: +27 11 717 7217; fax: +27 11 403 1929; e-mail: [email protected]).

II. OVERVIEW OF I NTRUSION D ETECTION S YSTEMS An intrusion is an unauthorized access or use of computer system resources. Intrusion detection systems are software that detect, identify and respond to unauthorized or abnormal activities on a target system Intrusion detection techniques can be categorized into misuse detection and anomaly detection. Misuse detection uses patterns of well-known attacks or vulnerable spots in the system to identify intrusions. However, only known attacks that leave characteristic traces can be detected this way. Anomaly detection, on the other hand, attempts to determine whether deviations from the established normal usage patterns can be flagged as intrusions [7]. Although misuse detection can achieve a low false positive rate, minor variations of a known attack occasionally cannot be detected [7]. Anomaly detection can detect novel attacks, yet it sufferers a higher false positive rate. Earlier studies on intrusion detection have utilized rule-based approaches to intrusion detection, but had a difficulty in detection new attacks or attacks that had not previously define patterns[8][9][10]. In the last decade, the emphasis has shifted to learning by example and data mining paradigm. Neural Networks have been extensively used to detect both misuse and anomies pattern [11][12][13][14][15][16][17]. Recently, kernel-based methods such as support vector machine(SVM) and their variants are being used to detect intrusion [14][18][19]. One way in which fuzzy ARTMAP differs from many previously fuzzy pattern recognition algorithm is that it learns each input as it is received online, rather than performing an off-line optimization of a criterion function. III. T HEORETICAL BACKGROUND A. Feature Selection In complex classification domains, some features may be redundant since the information they add is contained in other features. Extra features can increase computation time, and can impact the accuracy of the IDS. Feature selection improves classification by searching for the subset of features, which best classify the training data. In this paper, we propose floating search to select the optimal or near optimal subset of features. Wrapper method which utilizes the prediction ability of some learning machine is used in this work to evaluate the feature subset. The wrapper method is defined as [20] W = Racc (Sw ) (1) Where Sw is a feature subset candidate, and Racc is calculated by 10-fold cross validation method. For training sample S, Racc is defined as:

Racc =

l 1X

l

(yj = y˜j )

(2)

j=1

Where l is the number of training examples, yj is the predicted value by fuzzy ARTMAP. The feature subset corresponding to the largest W is the most significant one. Sequential Backward Floating Search (SBFS) is based on sequential backward search (SBS) and sequential forward search (SFS), where SBS eliminates the least important features one by one, while SFS adds the most important features one by one. The least important feature fx means that for a subset Sw, there exists fx such that Racc (Sw − fx ) > Racc (Sw −fi fi Sw ), while the most important feature fx means that for a subset Sw which satisfies Sw Su = S and Sw Su = 0,there exists fx such that Racc (Sw + fx ) > Racc (Sw + fi fi Sw ), where Sw = fx means fx is removed or added to Sw .Fuzzy ARTMAP is used as the criterion function to find the most important features from S is a function used to find the most important feature for Sw from S. The sequential backward floating search method proceeds as follows [21]; Step 1:Initialization, use SBS method to remove the least important feature and set the value for the current number and the target number of features. Since we want to eliminate the features from maximum to l, we set the target number to l. Step 2: Exclusion, use basic SBS method to remove the feature. Step 3: Conditional inclusion, find the most significant feature, among the excluded features, if it is not the feature just eliminated, the basic SFS is used to add the feature to the current subset. Step 4: Continuation of conditional exclusion, continue to find the most significant feature among the excluded features with respect to the subset obtained in Step 2, if Racc of the subset is not higher than that of any subset with the same number of feature, then go to Step 2; otherwise go to Step 3. Step 5:Display,if the number of features in the current subset is greater than the target number, go to Step 1; otherwise, display the optimal feature subset. B. Fuzzy ARTMAP The fuzzy ARTMAP architecture is an auto-organized learning system [22]. This kind of network has supervised training and pertains to the Adaptive Resonance Theory (ART) family; its structure is based on the adaptive resonance theory and is similar to the fuzzy ART network, which employs calculus based on fuzzy logic. This network is composed of two fuzzy ART modules;ARTa and ARTb , interconnected by an inter-ART using an associative memory module as illustrated on Fig. 1 [22]. A fuzzy ARTMAP has an internal controller that ensures autonomous system operation in real time. The inter-ART module has a self-regulator mechanism named match tracking, whose objective is to maximize the generalization and

minimize the network error. The F2a layer is connected to ab the inter-ART module by the weights wjk . The steps of the fuzzy ARTMAP algorithm are summarized below. 1) Input data: The input patterns of the ARTa is represented by the vector a = [a1 . . . aM a ] and the input patterns of the ARTb is represented by the vector b = [b1 . . . bM b ] 2) Parameters: There are three fundamental parameters for the performance and learning of the fuzzy ART network [23]. • The chosen parameter, (α > 0): acts on the category selection. • Training rate,(β[0, 1]): controls the velocity or the learning rate of the network adaptation.β = 1 permits the system to adapt faster while 0 < β < 1 allows the system to adapt slowly. • Vigilance parameter,(ρ[0, 1]): controls the network resonance. The vigilance parameter is responsible for the number of the categories formed. If the vigilance parameter is very large, it produces good classification, providing the generation of several categories and this means that the network has less errors but has less capacity of generalization. If it is very small, it will generate few categories and the network will have more capacity for generalization, but more possibility of making mistakes. . 3) Algorithm structure: The ARTMAP network performs the processing of two ART networks, ARTa and ARTb . After the resonance is confirmed in each network, J is the active category for the ARTa network and K is the active category for the ARTb network. The next step is to verify using the match tracking, if the active category on ARTa corresponds to the desired output vector presented to ARTb . The vigilance criterion is given by: ab | |y b ∧ wJK = ρab |yb |

(3)

It works by causing a little increment on the vigilance parameter only enough to exclude that category and select another category which will be active and reintroduced on the process until the active category correspond to the desired output. 4) Learning: After the input has completed the resonance state resonance by the vigilance criterion, the weight adaptation is implemented. The adaptation of the ARTa and ARTb module weights is given by: wJnew = β(I ∧ wJold ) + (1 − β)wJold

(4)

The adaptation for the inter-ART module is performed as follows [23]: ab ab wJK = 1 wJK = 0 for k 6= K C. Genetic Algorithm GA solves problems by applying the principles of evolutionary biology, such as crossover, mutation, reproduction and natural selection [24]. The GA search process consists

Fig. 1.

Structure of the fuzzy ARTMAP [22]

of the following steps: 1) Generation of a population (pool) of candidate 2)Evaluate the candidate in the population, candidate with the lowest fitness are discarded and make way for a new set of chromosomes. 3)Replacement sets of chromosomes are created by the genetic operations of crossover and mutation on the fit individuals. 4) Steps 1 to 3 are repeated for a given number of generations until a specified fitness level is attained or a maximum number of generations is exceeded [25]. The genetic algorithms represent input data from the problem by an encoding and use the genetic operations to iteratively evaluate solutions from the population of potential solutions to determine the global optimum [24]. The GA evaluates candidate solutions through a fitness function by maximizing this fitness function. The fitness function contains information from the problem space and is the mechanism by which properties of the problem space is transferred to the GA, which is dependent of the problem. The genetic operations are important since they add an element of randomness to the search process, allowing a wider range of the solution space to be explored. IV. P ROPOSED I NTRUSION D ETECTION S YSTEM The proposed intrusion detection system has four main phases which are data preprocessing and feature selection, selection of an optimal vigilance parameter using a GA, classification of attacks and testing of the IDS. Fig. 2 shows

the block diagram for the intrusion detection system. Data preprocessing is done to make preprocessing to make it easier for the network to learn. Optimal set of feature were selected using sequential backward floating search. Fuzzy ARTMAP requires the selection of the vigilance and the selection of this parameter is important in classification system. A method for automatic selection of an optimal vigilance parameter using a GA is presented. The process of selecting an optimal vigilance parameter is shown in Fig. 3.

Fig. 2.

Block Diagram of the proposed solution

The following fitness function was utilized: E=

CI CT

(5)

Where CI represent example that were correctly classified and CT is the total number of examples. In this paper, a floating point representation is used as it allows the search to be faster [24]. An arithmetic crossover was proven to provide better stability in generating solutions [24]. A non-uniform mutation is used as it aids in the search of an optimal solution by allowing for faster convergence and greater accuracy [24].Sequential Backward Floating Search is used to select the optimal or near optimal set of features. After a good set of feature are selected these features are used to train a fuzzy ARTMAP. The classification stage is divided into stages. The first stage detects if there is an intrusion or not; while the second stage determines the nature of the attack. Fig. 4 shows the block diagram of the classification phase.

Fig. 3.

Procedure for selecting the optimal vigilance

Fig. 4.

The two stage classification

V. E XPERIMENTATION A. Intrusion Detection Database The investigation in this paper is entirely based on the data obtained in the experimentation done by [26]. The data contains 24 attack types that could be used be classify into four main classes, which are probing, denial of service (DoS), user to root(U2R) and root to user (R2U). The data contains 5 symbolic features and 38 numerical features. The symbolic features were converted into binary using ASCII. The data is normalized using linear normalization. For the detection stage, all the attacks are put as one class, this is to determine if there is an intrusion or not. In the diagnosis stage, the attacks are classified into four classes. 1) Probing: Probing is a class of attacks where an attacker scans a network to gather information or find vulnerabilities [27]. An attacker with a map of machines and services that are available on a network can use the information to look for exploits. There are different types of probes some of them are abuse the computer’s legitimate features and social engineering. 2) Denial of service: DoS is a class of attacks where an attacker makes some computing or memory resource too busy or too full to handle legitimate request, thus denying legitimate users access to a machine [27]. There are different ways to launch DoS attacks through abusing the computers’ legitimate features by targeting the implementations bugs and exploiting the system’s misconfiguration. DoS attacks are classified based on the services that an attacker renders unavailable to legitimate users. 3) User to root: User to root exploits are a class of attacks where an attacker starts out with an access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system [27]. Most of the common exploit in this class of attacks are regular buffer overflow, which are caused by regular programming mistakes and environmental assumptions. 4) Remote to User: A remote to user (R2U) attack is a class of attacks where an attacker sends packets to a machine over a network, then exploits machine’s vulnerability to illegally gain local access as a user. There are different types of R2U attacks; the most common attack is done using social engineering [27]. VI. R ESULTS AND D ISCUSSION We first demonstrate how different vigilance parameter affect the classification error. Fig. 5 shows the error rate versus the number of training cycles for various vigilance parameters and the optimal vigilance parameter. As it can be seen from the graph the optimal vigilance converges to zero error after five training cycles. Sequential Backward Floating Search is used to select the optimal set of features. The feature selection process takes 140s seconds on average with a pentium IV with processing speed of 3.2 GHz. Fig. 6 shows criterion function as the number of features are increased. It can be seen that the criterion function increase to one, which is the maximum value when the number of features is twelve. Hence, 12 feature were selected to be the optimal number.

TABLE I C LASSIFICATION RATE FOR 41

FEATURE AND

12 OPTIMAL SET OF

FEATURE

D ETECTION

RATE

TABLE II (DR) AND FALSE ALARM RATE (FR)

FOR ALL

CLASSES OF ATTACKS

Fig. 5. Error rate for various vigilance parameter and the optimal vigilance selected using a genetic algorithm

VII. CONCLUSIONS A procedure for detection and diagnosis of network attacks using Fuzzy ARTMAP has been presented. The optimal subset of features were selected using Sequential Backward Floating Search. The optimal vigilance parameter for the Fuzzy ARTMAP was chosen using GA. The Fuzzy ARTMAP is used to detect and classify the attacks. It is found that reducing the features from 41 to 12 reduces the computation time by 0.789s. The optimal set of features and Fuzzy ARTMAP give a classification rate of 100% for the detection phase and 99.89% for the diagnosis. Future work will look at the implementation of on-line training for realtime intrusion detection system. R EFERENCES

Fig. 6.

The criterion function for different number of features

Table I shows the classification rate for 12 selected features and 41 features. The duration of the training process of the fuzzy ARTMAP with 12 feature is approximately 0.998 seconds with a Pentium IV having a processor speed of 3.2GHz. Using the same machine, the training took 1.7899s for 41 features. It can be seen that the reduced feature give similar accuracy to the 41 features in a shorter time. This is important for real-time intrusion detection. The overall classification rate of the system is 9989% Table II shows the detection rate and the false alarm rate for the diagnosis stage for the four different classes. The DoS and probe attacks gave a detection rate of 100% and false alarm rate of 0% while the U2R and R2U gave detection rate of 99.87% and 99.78%, respectively.

[1] S. Mukkamala and A.H. Sung, Feature Selection for Intrusion Detection using NN and SVM, Journal of the Transport Research Board National Acada,Transport Research Record no 18822, 2003, pp.33-39. [2] Z. Huang, M. Pei, E. Goodman, Y. Huang and G. Li, Genetic algorithm optimized feature transformation: a comparison with different classifiers, In Proc. GECCO 2003, 2003, pp. 2121-2133. [3] M. Pei, E.D. Goodman and W.F. Punch, Feature extraction using genetic algorithms.In Proc. Int’l Symp. On Intelligent Data Engineering and Learning, 1998, pp. 371-384. [4] W.F. Punch, E.D. Goodman , M. Pei, L. Chia-Shun, P. Hovland and R. Enbody, Further research on feature selection and classification using genetic algorithms. In Proc. 5th Int’l Conf. Genetic Algorithms, July 1993, pp. 557. [5] T. Gartner and P.A. Flach, WBCsvm: Weighted Bayesian Classification based on support vector machine,In Proc. 18th International Conference on Machine Learning , 2001, pp. 156-161. [6] C. Sriletha, A. Ajith and P.T. Johnson, Feature deduction and ensemble design of Intrusion Detection System, Journal of Computer and security, vol. 24, 2005, pp.295-307. [7] K. Ilgun, USTAT: a real-time intrusion detection system for UNIX, Proceedings of the 1993 Computer Society Symposium on Research in Security and Privacy, 1993, pp. 16-29. [8] D. Anderson, T.F. Lunt, H. Javitz , A. Tamaru and A. Valdes, Detecting unusual program behavior using the stastistical component of the nextgeneration intrusion detection expert system (NIDES), SRI-CSL-95-06, Menlo Park, CA: SRI International; 1995. [9] A. Porras and P.G. Neumann, EMERALD: Event monitoring enabling responses to anomalous live disturbances, In Proceedings of the National Information Systems Security Conference, 1997, pp. 353-65.

[10] H. Debar, B. Beck and D. Siboni, A neural network component for an Intrusion Detection System, In Proceedings of IEEE Computer Society symposium on Research in Security and Privacy, 1992, pp. 240-250. [11] H. Debar and B. Dorizzi, An application of a recurrent network to an Intrusion Detection System, In Proceedings of IJCNN, 1992, pp.78-83. [12] J. Ryan, M.J. Lin and R. Mikkulalan, Intrusion Detection on NN, Advances in Neural Information Processing System, Cambridge MA, MIT Press, 1997. [13] S. Mitra and S.K. Pal, SOM neural network as a fuzzy classifier, IEEE Transaction of Systems, Man and Cybernetics, vol. 24, no.3 pp. 385399, 1994. [14] S. Mukkamala, A. Jonaski and A.T. Sung, Intrusion Detection using neural network and SVM, Proceedings of the IEEE IJCNN, 2002, pp. 202-207. [15] S. Mukkamala, A.H. Sung and A. Ajith, Intrusion Detection using intelligent paradigm,Journal of Network and Computer Application, vol. 28, 2005, pp. 167-182. [16] C. Zhong, J. Jiong and M. Kanel, Intrusion Detection using hierachical NN, Pattern Recognition Letters, vol. 26,2005, pp. 779-791. [17] E. Marais and T. Marwala, Predicting global Internet instability caused by worms using Neural Networks, In Proc of the Annual Symposium of the Pattern Recognition Association of South Africa, South Africa, 2004, pp. 81-85. [18] W. Hu, Y. Liao, V. Rao and R.R. Venurri, Robust SVM for anomaly detection in Computer Security, In Proceedings of the 2003 International Conf. on MC and Application, LA California, 2003. [19] Z. Zhang and H. Shen, Application of Online training SVM for

[20] [21] [22]

[23] [24] [25] [26]

[27]

real time intrusion detection,Computer Communications,vol.28, 2005, pp.14281442. G. Li, J. Yang, C. Yo and D. Geng, Degree prediction of malignancy in brain glioma using support vector machine, Computers in Biology and Machine, vol. 36, 2006, pp.313-325. B. Somol and P. Pudil, J. Novovicova and P. Paclik, Adaptive floating search methods in feature Selection, Pattern Recognition Letters, vol. 3, 1998, pp. 1157 - 1163. G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynold and D.B. Rosen, Fuzzy ARTMAP: A neural network for incremental supervised learning of analog multidimensional maps, IEEE Transactions on Neural Network, vol.3, no.5, 1992, pp. 689-713. G.A. Carpenter, Default ARTMAP, In Proceedings 2003 IJCNN, 2003, pp. 1396-1401. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Berlin: Springer, third, revised and extended ed., 1999. D. E. Goldberg. Genetic Algorithms in Search, Optimisation and Machine Learning, New York: AddisonWesley Publishing Company, Inc., 1998 D.J. Newman, S. Hettich, C.L. Blake and C.J. Merz, UCI Respository of machine learning databases, http://www.ics.uci.edu/ mlearn/MLRepository.html, University of carlifornia, Irvine, Department of Information and Computer Science, 1998. K. Kendal, A database of Computer Attacks for the Evaluation of Intrusion Detection System, BSc and MSc Thesis in Computer Science and Engineering, MIT, 1999.

Application of Feature Selection and Fuzzy ... - Semantic Scholar

Application of Feature Selection and Fuzzy ... - Semantic Scholar

Suggest Documents

Feature Selection: Evaluation, Application, and ... - Semantic Scholar

Fuzzy feature selection

Deep Feature Selection: Theory and Application to ... - Semantic Scholar

An efficient fuzzy classifier with feature selection ... - Semantic Scholar

Comparison of Feature Selection and ... - Semantic Scholar

Application of Feature Transformation and ... - Semantic Scholar

Multivariate feature selection and hierarchical ... - Semantic Scholar

Simultaneous Feature And Instance Selection ... - Semantic Scholar

Feature selection of stabilometric parameters ... - Semantic Scholar

Selection, Characterization and Application of ... - Semantic Scholar

Selection, Characterization and Application of ... - Semantic Scholar

Selection of Heterogeneous Fuzzy Model ... - Semantic Scholar

Dimension Selection for Feature Selection and ... - Semantic Scholar

Linguistic Hedges Fuzzy Feature Selection for ...

Feature Selection Using Fuzzy Objective Functions - CiteSeerX

SUFFUSE: Simultaneous Fuzzy-Rough Feature- Sample Selection

Feature Selection via Discretization - Semantic Scholar

Gradient-based Laplacian Feature Selection - Semantic Scholar

Unsupervised Feature Selection Using ... - Semantic Scholar

Max-Margin Feature Selection - Semantic Scholar

Discrete feature weighting & selection algorithm - Semantic Scholar

Unsupervised Maximum Margin Feature Selection ... - Semantic Scholar

Unsupervised Feature Selection for Biomarker ... - Semantic Scholar

Ensemble Feature Selection with Dynamic ... - Semantic Scholar