The current issue and full text archive of this journal is available at www.emeraldinsight.com/0968-5227.htm
Using response action with intelligent intrusion detection and prevention system against web application malware Ammar Alazab
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
College of Engineering and Technology, The American University of the Middle East, Egaila, Kuwait
Michael Hobbs Faculty of Science Engineering & Built Environment, Deakin University, Geelong, Australia
Intrusion detection and prevention system 431 Received 4 February 2013 Revised 15 June 2013 23 October 2013 20 November 2013 9 December 2013 Accepted 26 December 2013
Jemal Abawajy School of Information Technology, Deakin University, Geelong, Australia
Ansam Khraisat Faculty of Science Engineering & Built Environment, Deakin University, Geelong, Australia, and
Mamoun Alazab Department of Crime, Policing, Security and Justice, Australian National University, Canberra, Australia Abstract Purpose – The purpose of this paper is to mitigate vulnerabilities in web applications, security detection and prevention are the most important mechanisms for security. However, most existing research focuses on how to prevent an attack at the web application layer, with less work dedicated to setting up a response action if a possible attack happened. Design/methodology/approach – A combination of a Signature-based Intrusion Detection System (SIDS) and an Anomaly-based Intrusion Detection System (AIDS), namely, the Intelligent Intrusion Detection and Prevention System (IIDPS). Findings – After evaluating the new system, a better result was generated in line with detection efficiency and the false alarm rate. This demonstrates the value of direct response action in an intrusion detection system. Research limitations/implications – Data limitation. Originality/value – The contributions of this paper are to first address the problem of web application vulnerabilities. Second, to propose a combination of an SIDS and an AIDS, namely, the IIDPS. Third, this paper presents a novel approach by connecting the IIDPS with a response action using fuzzy logic. Fourth, use the risk assessment to determine an appropriate response action against
This research is supported by the Australian National University and the ANU Cybercrime Observatory (http://cybercrime.anu.edu.au).
Information Management & Computer Security Vol. 22 No. 5, 2014 pp. 431-449 © Emerald Group Publishing Limited 0968-5227 DOI 10.1108/IMCS-02-2013-0007
IMCS 22,5
each attack event. Combining the system provides a better performance for the Intrusion Detection System, and makes the detection and prevention more effective. Keywords Information security, Security, Risk management Paper type Research paper
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
432
1. Introduction Web applications are increasingly important in areas such as the financial sector, e-commerce, e-government, social media networks, medical data, e-business, academic activities, e-banking, e-shopping and e-mail. Furthermore, web applications enable users to interact with back-end data servers to insert, delete and modify data contents by making a website their own space. Unfortunately, these activities have attracted malicious software writers to take advantage of such activities to perform malicious objectives for financial gain (Alazab et al., 2013). Today, web applications use multi-tier web application architecture. Such applications break complex tasks into simple tasks to enable modular functional components, easy maintenance, component reusability, parallel execution and security enhancement. Simply, web applications are designed to serve any client to connect to a database through a user’s web browser that provides the required service to customers (Cole, 2011; Alazab et al., 2011a). The database tier is at the heart of web application architecture. It is a place where the data, such as customer information, financial records, health data, plus any other secure data is usually stored. Thus, protecting these data are increasingly important because failure to do so can result in a loss for organizations. An effective approach in network security is safeguarding a database by observing the different layers involved. Database security means stopping unofficial or unplanned disclosure, modification or destruction of data. Moreover, confidential information within a database must be taken into consideration, as well as its availability. From the web application architecture and the connection between the tiers, if an attacker successfully compromises one of the tiers, the attacker will probably be able to extend the attack and compromise the database tier. For example, an attacker who achieves unauthorized access to financial systems can cause a lot of problems, and currently, there is no single method that can stop him/her (Alazab et al., 2013). The Signature based Intrusion Detection System (SIDS) and the Anomaly-based Intrusion Detection System (AIDS) have disadvantages. SIDS can only detect known attacks. It also suffers from high false-positive alarms, as it is commonly based on regular expressions and string matching. AIDS also has a high rate of false positives. The new arsenal of crime tool kits, such as Zeus and SpyEye (Alazab et al., 2011b), require an IIDPS with capability of a response action. Although researchers have proposed many Intrusion Detection System (IDS), research efforts in IDS and response actions are still not connected to each other (Jaiswal and Jain, 2010). However, most current intrusion response systems (IRSs) use static corresponding matching to decide a suitable response action to deal with any possible attack (Shameli-Sendi et al., 2012). The problem with this approach is its inability to take into account the current state of the entire system. In static mapping systems, the generated response actions represent separated models to make the system face few attacks without taking the particular state of attack and the effect on the system into account.
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
Thus, a new mechanism is needed that links the state of an attack with a suitable response technique. This allows the IRS to select a suitable reply among a number of possible responses, which achieves a high level of security. We designed and implemented an IIDPS framework that combines the SIDS, AIDS and response action. We present a technique to select the best response against an intrusion. Our IIDPS framework is flexible to accommodate different web application architectures. The communication between the SIDS, AIDS and the intrusion response component is based on the web application architecture. The main motivations to develop an (IIDPS) at the web application tier with a response action are: • the malware that targets a database server may not be detected at the network level (Alazab et al., 2013); • the IDS that is designed at the network level is not satisfactory to protect database layers against insider threats (Jaiswal and Jain, 2010); • individually, SIDS and AIDS do not provide enough security (Alazab et al., 2011b); and • it is necessary to mitigate the attacks by providing an appropriate response to an IDS (Shameli-Sendi et al., 2012). This paper is organized as follows: In Section 2, we present the background and related work. In Section 3, we describe our model. Section 4 presents the performance and performance evaluation to this paper. Section 5 provides the conclusion to this paper. 2. Background and related work According to the Open Web Application Security Project, the top ten application security risks are presented in Table I, which provides a brief description of the most widely exploited vulnerabilities, along with the occurrence and the impact of each one. According to the report of Common Vulnerabilities and Exposures, the number of web-related vulnerabilities has increased steadily from 2005 to 2010 (Corporation, 2012). Table II shows the results from another study by the Web Application Security Consortium’s Statistics Project from 2010 (Gordeychik, 2010) that identifies the most popular vulnerabilities in web applications. From this table, we note that all vulnerabilities are exploited in the database tier. These vulnerabilities allow attackers to perform malevolent actions that range from gaining unauthorized account access to achieving an attacker’s objectives. IDSs are hardware or software systems that observe activities on computer systems or computer networks to find abnormal activities (García-Teodoro et al., 2009). There are two approaches to analysing events using IDSs: SIDSs and AIDSs. This section provides an overview of both approaches and shows the advantages and disadvantages of the two ISDs. 2.1 Signature-based intrusion detection system Often called a misuse detection system, SIDS is based on pattern matching techniques to find a known attack. In other words, when a known intrusion matches with a malicious string database, an alert is raised. SIDS commonly provides good detection results for specified, well-known attacks. However, SIDS cannot detect zero-day attacks because
Intrusion detection and prevention system 433
IMCS 22,5
Web application security risks
Occur
Impacts
Cross-site scripting (XSS)
Hacker sends malicious string as query statement to the database[10] Attacker executed malicious code in the user’s browser
Broken authentication and session management Insecure direct object references
When a session ID is visible due to leaks in the authentication process The attacker change the URL parameter
Data loss, data stolen, modified, deleted, corruption or denial of access Redirect users to malicious websites, the result sensitive information Attacker could do anything in the victim privileged accounts
Cross-site request forgery (CSRF)
Attacker sent fake HTTP requests to tricks a victim to submitting them via image tags Attackers try to find unused pages, unprotected files and unprotected directories, etc. to reach for sensitive data Attackers don’t break the cryptographic, but disclose something else to get original text Simply changes the URL to a privileged page The attacker simply monitors network traffic (like an open wireless or their neighborhood cable modem network), and observes an authenticated victim’s session cookie Attacker redirect link to invalidate and tricks victims into clicking it. Victims are more probably to click on it, the result bypass security mechanism
Command injection
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
434
Security misconfiguration
Insecure cryptographic storage
Failure to restrict URL access Insufficient transport layer protection
Invalidated redirects and forwards Table I. The attack methods in web applications
Compromise all the data that can be referenced by the parameter Loss, modify data
The web application could be completely compromised without knowledge. All data could be stolen or modified Lost data such as health records, credentials, personal data, credit cards, etc. The attacker can do anything the victim could do Phishing, the result account stolen
Such redirects may attempt to install malware or trick victims into disclosing passwords or other sensitive information
the signature does not exist in the database until the signature database is updated. The main advantage of a SIDS is that it is very efficient in detecting known attacks without raising false alarms and it can quickly detect an attack. However, SIDS has the disadvantage that a signature must be created for every attack, and zero-day attacks cannot be detected. A SIDS is also prone to false positives, as they are commonly based on regular expressions and string matching. While signatures work well against attacks with a fixed behavioural pattern, they do not work well against the multitude of attack
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
patterns created by a human or self-modifying behavioural characteristic. SIDS has been implemented in many popular tools, such as Snort (Roesch, 1999) and NetSTAT (Vigna and Kemmerer, 1999). 2.2 Anomaly-based intrusion detection system Anomaly detectors detect users’ activities that are not the usual behaviour on a computer network. According to this approach, the assumption is that attacker behaviour deviates from normal user behaviour. AIDS builds the profile of users by using data that are accepted as normal behaviour. It then monitors the activities of new users and compares the new data with the obtained profile and tries to detect deviations (Alazab et al., 2012). Those different from normal behaviors are considered as attacks. The main benefit of an anomaly-based scheme is the power to detect zero-day malware. The main disadvantage of AIDS is the resultant high rate of false positives (mistakes in determining a non-attack). Table III shows comparisons between signature-based detection and anomaly-based detection (Davis and Clark, 2011; García-Teodoro et al., 2009). However, anomaly detection has an advantage over signature-based engines, in that a new attack for which a signature does not exist can be detected if it falls out of the normal traffic patterns. Up until recently, few approaches have implemented IDS to find anomalies in web applications using SIDS and AIDS. In the literature, many solutions for web application problems have been suggested, but none of them guarantees a high level of security on web applications. Table IV provides a summary of the current research in developing an IDS for web applications. The focus of the research has been on the development of anomaly-based detection. Several systems with AIDS capability have been proposed, such as SPADE, ADAM (Sekar et al., 2001) and NIDES (Faysel and Haque, 2010). Kruegel and Vigna (2003)
Intrusion detection and prevention system 435
Web application weaknesses
%
Improper output handling Insufficient anti automation Improper input handling Insufficient authentications Unknown Application misconfiguration Insufficient process validation
22.29 15.29 14.65 13.38 8.92 6.37 6.36
Table II. Web vulnerabilities
Table III. Advantages and disadvantages of SIDS versus AIDS
Detection model
Advantages
Disadvantages
SIDS
Low false positive rate Can quickly detect attack Can detect new attacks Can be used to obtain signature information ùsed by SIDS
Cannot detect new attacks Requires continuous update Requires a large set of training data in order to construct normal behavior profile Fag many false alarms
AIDS
IMCS 22,5
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
436
Table IV. Current research in the area of anomaly-based detection on web application
Name
Comments
Reducing errors in the anomalybased detection of web-based attacks through the combined analysis of web requests and SQL queries (Vigna et al., 2009; Davis and Clark, 2011) Protecting a moving target: addressing web application concept drift (Maggi et al., 2009; Vigna et al., 2009) Anomaly detection of webbased attacks (Kruegel and Vigna, 2003; Robertson et al., 2010)
Reduce FP and FN but did not NO validate with Data Mining Algorithm
YES
NO
Anomaly-based detection of changes in web application
YES
NO
NO
Anomaly by the parameter profiles associated with web applications (length and structure of parameters) from the analyzed data Detection system that accurately detects attacks against web applications
YES
NO
NO
YES
NO
NO
Anomaly by learning the relationships between the application’s execution and the application’s internal
YES
NO
NO
Learning-based anomaly detection system for Web applications
YES
NO
NO
Detecting and preventing attacks against web applications (Robertson, 2010; Kruegel and Vigna, 2003) Swaddler: an approach for the anomaly-based detection of state violations in web applications (Cova et al., 2007; Robertson, 2010) WebIDS: a cooperative bayesian anomaly-based intrusion detection system for web applications (Dagorn, 2008; Cova et al., 2007)
Detection Prevention Response
proposed anomaly-based techniques to study HTTP traffic at the network layer. They applied different statistical measures to message characteristics such as request type, length and the distribution of characters in the web application. Kruegel, Vigna and Robertson proposed to analyse web servers to access logs and build multiple statistical models to characterize normal values of the parameters from web requests (Dagorn, 2008). 2.3 Responses to IDS Protecting web application from malicious attacks is an essential issue. Intrusion prevention and IDSs have therefore been the topic of a lot of research and have been suggested in a number of papers (Shameli-Sendi et al., 2012). Nevertheless, after the functionality of prevention and detection, response action is needed as the primary function against any potential attack. The old style of achieving a response was performed by the security user, who extracted an alarm or warning of danger from the IDS. Then, the security user analysed the logs and any detailed recorded event on the whole system to decide if the attack was presently active and what damage had been done. In older-style response actions, when
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
the IDS detected an attack, it was the responsibility of the security user to prepare an appropriate response. Unfortunately, system administrators are unable to keep up with the pace of the IDS or react to these alerts within a reasonable time limit. Moreover, these manual responses are not flexible and are not very efficient. These manual responses totally depend on the expertise of the system administrator. Automated response systems replace multiple distributed systems to respond to an alert more quickly and accurately. Even though the intrusion response system component is often integrated with intrusion detection, it receives considerably less attention than the intrusion detection research because of the complexity involved developing and deploying an automated response. To address the existing intrusion challenge, the response system should be an essential part of the instruction system. Some researchers have proposed detection and response mechanisms to complement existing prevention mechanisms. In 2011, Elshoush and Osman (2011), declared that the intrusion response had a similar function to an IDS and was part of it by maintaining detection, and alerting and responding to the security operator. A study of the literature has shown there is a weakness in providing an appropriate response to an IDS, as shown in Table IV. Intrusion response systems can be categorized according the triggered response to passive and response attack. Passive response systems do not make an attempt to reduce damage already caused by the attack or prevent further attacks. They monitor and analyze network traffic activity and alert an operator to possible vulnerabilities and attacks. Their main goal is to notify the organization or issue report about attack information. Active response provides the IDS with the ability to automatically respond to an attack upon detection. This minimizes the damage caused by the attacker by controlling user activity, such as restricting user accounts, terminating hosts, restarting services and delaying suspicious system calls (Stakhanova et al., 2007). Intrusion response systems can be also categorized according to the level of automation to notification systems, manual response systems and automatic response systems. Notification systems present details regarding the attack, which are then used to select the right response against attack. The common existing IDSs offer notification response mechanisms. Manual response systems provide a higher level of automation than notification systems. Also they permit the user security to select a suitable response in advance based on the recorded intrusion details. An automatic response system is a mechanism to select a response without human intervention which supplies an instant reply to the intrusion through an automated decision making process. The key improvement of the automation response is decreasing the latency for response action from the time of detection. Intrusion response systems can be categorized according to response selection mechanism, static mapping and dynamic mapping. Static mapping systems are basically automated manual response systems that match an alert to one already responded to. Static mapping response systems use decision tables to directly map a potential attack situation with an appropriate response. Such an approach is context independent, as it fails to take into consideration the unique circumstances under which the attack was triggered. Dynamic response mapping systems are more advanced than static mapping systems, as the response selection is based on the specific attack metrics (confidence, severity of attack, etc).
Intrusion detection and prevention system 437
IMCS 22,5
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
438
Taha et al. (2010), presented analyses of the alerts from one or more IDSs and presented a compact summarized report and high-level view of attempted intrusions which highly improves security effectiveness. Sadoddin and Ghorbani (2009) presented an approach for real-time alert correlation which integrates novel techniques for aggregating alerts into structured patterns and incremental mining of frequent structured patterns. 3. Intelligent intrusion detection and prevention system As shown in Section 2, traditional IDS have restrictions, including low flexibility, inability to distinguish novel attacks, high cost, slow updates and lack of extensibility. It also shows that both SIDS and AIDS have drawbacks. The aim of the new system is to design and develop an effective IIDPS that addresses the weakness of SIDS and AIDS. IIDPS combines SIDS, AIDS and Responses to IDS to become an IIDPS. Figure 1 shows an overview of the proposed IIDPS. In our system, AIDSs help to detect unknown attacks, while SIDS detects known attacks. The basic idea of the new system is to take benefits from both SIDS and AIDS to create effective IDS. The IIDPS has three stages: the SIDS stage, the AIDS stage and the response action stage. 3.1 The SIDS stage The SIDS stage simply uses pattern matching to handle the received request from clients. Whether or not this request is legitimate or illegitimate, SIDS will detect a known malicious string attack that has been stored in the database. If the received request has the same pattern as found on the malicious string database, it means the request will be identified on the system as a true attack. As a consequence, stage three is for taking the appropriate response action. Otherwise, if the received request is not found in the malicious string, the AIDS stage handles the request.
Figure 1. Overview of our IIDPS
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
Throughout the signature-matching stage, SIDS examines the contents of a user request, checking for the occurrence of well-known intrusion. In SIDS, each signature is specified by a rule that identified a known attack. If user’s request is exit of database signature, followed by the action identified by the rule related with that signature is taken. This action is rise an alarm and some of response action against an attack such as no action, Alarm, hold, Abort and disconnect. It is very important for the IDS to be effective, that is, it should detect a large amount of attack, while still maintaining the false positive. Thus, the good IDS should not generate too many false positives. Reduction from false positives can be achieved when all malware signatures are stored in the database as shown in Figure 2. The false positive rate is calculated as the following formula: False Postive Rate ⫽
Intrusion detection and prevention system 439
total number of false positive instance total number of benign instances
Also increase true positive, number of detection malware by increase the number of malware that stored in the database, as shown in Figure 3. However, specific signature is hardly to generate any false positives, as shown in Figure 4. As the specific signature is always look for a specific signature. But specific signature has a weakness to detect new and unknown malware. Generic signature is more likely for false positive and more opportunity to detect unknown malware. The next subsection describes in details how the AIDS stage can generate automated signatures of attack.
Figure 2. The relation between FP and Malware signature that stored in the DB
Figure 3. The relation between TP and detection of unknown Malware
IMCS 22,5
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
440
3.2 The AIDS stage The second stage is the AIDS stage. The main idea of this stage is to overcome the shortfalls of the SIDS stage. The main assumption is that any request received from users is an anomaly request, unless proven otherwise. In this second stage, the system gathers information from the user request and, if any suspicious event is detected, the system will store it in the database. Next, it decides if it is a normal or abnormal behaviour. The result from this stage means the system will take precautions against the new attack. The profile information collected from the users’ activities by using the learning mode enables identification of the appropriate response to any attack, as shown at the bottom of Figure 1.
3.3 Response action stage The final element of our approach is taking appropriate response actions against an anomaly if it is found. A response action is a set of instructions that is carried out, especially a certain attack. Response actions are triggered by the IRS, in response to an attack detected by IDS. Once SIDS and AIDS detect an attack, then this stage reacts with attacks. The Response Action has two stages: risk assessment and response reaction. 3.3.1 Risk assessment. The main purpose for risk assessment is to estimate the risk level of an attack. We have adopted the DREAD from Microsoft model to help calculate
Figure 4. The relation between FP and detection of unknown Malware according the type of signature
Figure 5. Matrix risk
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
risk and rate the threats (Corporation, 2003). By using the DREAD model, it is possible to arrive at the risk rating for a given threat by asking the following questions: • Damage potential: How great is the damage if the vulnerability is exploited? • Reproducibility: How easy is it to reproduce the attack? • Exploitability: How easy is it to launch an attack? • Affected users: As a rough percentage, how many users are affected? • Discoverability: How easy is it to find the vulnerability?
Intrusion detection and prevention system 441
We applied risk assessment to identify the risk level for an application attack. In Figure 2, the risk matrix is used in the risk assessment process. These matrices provide a qualitative risk ranking that classifies the degree from very high to very low, as shown in Table V. The probability of risk ranges from zero to one. The threat impact can be classified in five states as shown in the following sets: Probability of Risk ⫽ {Very Low, Low, Medium, High, Very High} and Impact of Risk ⫽ {Very Low, Low, Medium, High, Very High}; We calculate each query risk and evaluate the probability of the risk occurring against the security impact using the following equations: Risk ⫽ Probability ⫻ Impact and Expected Value ⫽ [Risk Probability Value] ⫻ [Risk Impact Value]. On the vertical axis, there is a probability of the risk occurring, thus a higher chance of that risk occurring and becoming an issue. The horizontal axis shows the level of impact in the assumption that the risk will occur. As shown in Figure 6, the value outputs near to zero point to normal features, while outputs near to one indicate anomalous ones. One of the advantages of this approach is being able to show risks and identify how risky they are on the database. If all the risks are clustered in the top right of the diagram, then evidently the database is very risky. In other words, it may be exploited by a malicious writer. Probability
Description
Very high High Medium Low Very low
Expected to occur with absolutely certainly Expected to occur probably to occur Very unlikely to occur Almost no possibility of occurring
Table V. Standard terms for severity quantification
Figure 6. Rule action
IMCS 22,5
442
3.3.2 Response reaction. In this stage, we will identify the best response against a database threat according to the level of severity. Once the risk is estimated from the previous stage, our approach can determine an appropriate action response. The reaction of our approach is responsible for providing a corresponding response action when an anomaly activity is detected. Oncetheusers’trytosendamaliciousstring,anactionisexecutedandtheresponseactionwill handle this request according to a response policy. There are seven principle methods to handle risk. Table VI shows each response action according to the severity request. Once the risk has been calculated, an appropriate action will be executed according to the severity level.
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
4. Experiment and performance evaluation 4.1 Using Support Vector Machine To study a predictive model, with the ability to identify legitimate versus illegitimate connections within a computer network, the task for the classifier learning contest was organised in conjunction with the KDD ’99 dataset. The KDD Cup (1999) database contained a wide variety of intrusions simulated in a military network environment (Tavallaee et al., 2009). Some intrusion experts suggest that most novel attacks are variants of known attacks, and the “signature” of known attacks can be sufficient to catch novel variants. The datasets contained 24 training attack types, with additional 14 types in the test data only. The KDD Cup (1999) contained 4,898,431 records in the training set (Tavallaee et al., 2009). For the training set, only 19.85 per cent (972,781 records) were normal traffic and the remaining were attack traffic. Each record in the KDD Cup (1999) dataset contained 41 various quantitative and qualitative features (Tavallaee et al., 2009). The data used in classification are NSL-KDD, which is a new dataset for the evaluation of researches in network IDS. NSL-KDD consists of selected records of the complete KDD’99 dataset. Each NSL-KDD connection record contains 41 features as shown in Table VII. We conducted our study using Support Vector Machine (SVM) as classification. SVM categorises the data into two classes. Given a training set of dataset, labelled pairs {(x, y)}, where y is the label of instance x, SVM works by maximizing the margin to obtain the best performance in classification. SVM is based on the idea of a hyper plane classifier, where it first maps the input vector into a higher-dimensional feature space and then obtains the optimal separating hyper-plane (Horng et al., 2011). The goal of SVM is to find a decision boundary. The boundary function of SVM is described by support vectors, which are the data points located closest to the class boundary. In the SVM, there are kernels, and any of those can be chosen to achieve the boundary function. Their detailed usages and descriptions, including parameters definitions are as follows: Severity level
Severity level
Description
Low severity
No action Alarm Audit Hold Disconnect Refuse
The system will process as normal Sent notification to DBA The request is audit The user requests is aborted The user session is disconnect The user request is refused
Medium severity Table VI. Response actions
High severity
Feature name
Description
Type
duration protocol_type
Length of the connection (Sec.) Type of the protocol, e.g. TCP, UDP, etc. Network service on the destination, e.g., http, telnet, etc. Number of data bytes from source to destination Number of data bytes from destination to source Normal or error status of the connection 1 if connection is from/to the same host/ port; 0 otherwise Number of “wrong” fragments Number of urgent packets Number of “hot” indicators Number of failed login attempts 1 if successfully logged in; 0 otherwise Number of “compromised” conditions 1 if root shell is obtained; 0 otherwise 1 if “su root” command attempted; 0 otherwise Number of “root” accesses Number of file creation operations Number of shell prompts Number of operations on access control files number of outbound commands in an ftp session 1 if the login belongs to the “hot” list; 0 otherwise 1 if the login is a “guest”login; 0 otherwise Number of connections to the same host as the current connection in the past two seconds % of connections that have “SYN” errors % of connections that have “REJ” errors % of connections to the same service % of connections to different services Number of connections to the same service as the current connection in the past two seconds % of connections that have “SYN” errors % of connections that have “REJ” errors % of connections to different hosts
Continuous Discrete
service src_bytes dst_bytes
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
flag land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_hot_login is_guest_login count
serror_rate rerror_rate same_srv_rate diff_srv_rate srv_count
srv_serror_rate srv_rerror_rate srv_diff_host_rate
Intrusion detection and prevention system
Discrete Continuous
443
Continuous Discrete Discrete Continuous Continuous Continuous Continuous Discrete Continuous Discrete Discrete Continuous Continuous Continuous Continuous Continuous Discrete Discrete Continuous
Continuous Continuous Continuous Continuous Continuous
Continuous Continuous Continuous
Table VII. KDD’99 dataset features
IMCS 22,5
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
444
• • • •
Linear kernel: K(xi,xj ) ⫽ xi,xj Polynomial kernel: K(xt,xj ) ⫽ ( ␥ xtTxj ⫹ r d )2 RBF kernel: K(xi,xj ) ⫽ exp ( ⫺ ␥储xi ⫺ xj 储2 ) Sigmoid kernel: K(xi,xj ) ⫽ tanh ( ␥ xiTxj ⫹ r)
Sequential Minimal Optimization (SMO) is a supervised learning algorithm used for classification and regression, and it is a fast implementation of SVM. SMO has been selected to classify normal request and anomaly request because it is competitive with other SVM training methods such as Projected Conjugate Gradient “chunking”, and in addition it is easier to implement in WEKA. In this study, all the training and test data were applied. All data in the training data or test data consisted of 41 features. Each feature value was normalized to the range of [0, 1], by dividing their maximum value, and removing the non-important features for each SVM classifier, as listed in Table VIII. We evaluate the effectiveness and efficiency of our approach to answer what percentage our approach can detect. Validation of the models was achieved by using cross-validation. Cross-validation is a technique used for evaluating the results of statistical analysis by generating an independent dataset for normal and anomaly data. The most common types of cross-validation were repeated using random sub-sampling validation and K-fold cross-validation. For this research study, K-fold cross-validation has been selected for validation, as it is commonly adopted for many classifiers. In the K-fold cross-validation, the data are first partitioned into K-sized segments or folds. Then, K iterations of training and validation are performed; within each iteration, a different fold of the data is held out for validation, while the remaining K-1 folds are used for learning. The advantage of K-fold cross-validation is that all the examples in the dataset are eventually used for both training and testing. In addition to this, all observations are used for both training and validation, and each observation is used for validation once only. We evaluated various algorithms based on the following standard performance measures: • True Positive (TP): Number of correctly identified malicious code. • False Positive (FP): Number of wrongly identified benign code, when a detector identifies benign file as a malware. • True Negative (TN): Number of correctly identified benign code. • False Negative (FN): Number of wrongly identified malicious code, when a detector fails to detect the malware because the virus is new and no signature is yet available. Overall accuracy: Percentage of correctly identified code, given by:
Kernel functions Table VIII. Result
SVM – Linear kernel: K (xi,xj ) ⫽ xi.xj SVM – Polynomial kernel: K(xi,xj ) ⫽ (yxiTxj ⫹ r d )2 SVM – RBF kernel: K(xi,xj ) ⫽ exp ( ⫺ y㛳xi ⫺ xj㛳2 )
TP rate
FP rate
0.972 0.973 0.974
0.029 0.029 0.029
Overall Accuracy ⫽
TP ⫹ TN TP ⫹ TN ⫹ FP ⫹ FN
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
After applying the four SVM kernels functions, if the received request from the SIDS stage has normal behaviour as the KDD data behaviors, the system will make the decision that no attack is present. If the received request has an abnormal behaviour, then the third stage will decided the appropriate response against the attack type.
Intrusion detection and prevention system 445
4.2 Risk level In fuzzy logic, notions and membership functions define the dataset. Membership functions define the truth-value of such linguistic expressions. The degree of membership of an object in a fuzzy set is defined as a function where the universe of discourse (set of values that the object can take) is the domain, and the interval [0, 1] is the range (Wu and Mendel, 2011). Based on set theory, fuzzy logic provides a powerful way to define the risk level. Using the FIS editor of a Matlab fuzzy toolbox, we can define input and output names through our implementation, as shown in Figure 7. Then, using the member function editor, we present membership function of each input parameter corresponding to its range as shown in Figure 8. Figure 6 shows the rule viewer. Using five categories form DREAD, the total number of rules is 55 ⫽ 3,125 rules; some of them are shown as follows (Figure 9):
Figure 7. Risk level
Figure 8. DREAD membership function
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
IMCS 22,5
446
Figure 9. Rule viewer
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
• If (Damage potential is Low) AND (Reproducibility is Low) AND (Exploitability is Low) AND (Affected users is Low) AND (Discoverability is Low) THEN Risk level is Low; • If (Damage potential is High) AND (Reproducibility is High) AND (Exploitability is High) AND (Affected users is High ) AND(Discoverability is High) THEN Risk level is High; and • If (Damage potential is very High) AND (Reproducibility is Very High) AND (Exploitability is Very High) AND (Affected users is High) AND (Discoverability is Very High) THEN Risk level is Very High.
Intrusion detection and prevention system 447
4.3 Response action Once we finish the risk level, the next step is to determine the response action, as shown in Table VI. The selection method for an alert can be achieved by a set of the following “if-then” statements: • IF expected value ⬍ 0.03 Then Execute Response action; • Low severity ⫽ {No Action, Alarm}; • IF expected Value ⬍ ⫽ 0.25 and Expected Value ⱖ 0.03 Then; • Execute response action medium severity ⫽ {Audit, Hold}; • IF expected value ⱖ ⫽ 0.72 Then Execute Response action; and • High severity ⫽ {Abort, Disconnect, Refuse}. For example, if the DREAD rating for SQL injection is 0.9 (Table IX), according to the aforementioned statements, it executes a response action of high severity. In addition, it allows adjusting attack metrics to make it more effective and more flexible against a specific type of attack. For instance, attack alerts with low severity level can be disregarded; attack alerts with medium severity level can be hold or audit; attack alerts with high severity level can be abort, disconnect or refuse. 5. Conclusion Recently, prevention and detection IDSs have been an active research field in both industry and academia. While many detection and prevention solutions have been implemented, none of them guarantee a high level of security for web applications. The effectiveness of an IDS requests new demands to achieve a high degree of security. A detection system with a response system would be highly reliable against suspicious behaviour that could possibly be a zero day attack. In this paper, we developed IIDPS with a response action that provides an early-stage detection system. This IIDPS is capable of preventing and distinguishing various types of abnormal activity. Signature-based detection techniques were used to recognize known attacks, while anomaly-based detection techniques were used to recognize unidentified attacks. A risk assessment was completed to respond to the attack, with several response techniques used to minimize the damage caused by malicious activities. Threat
D
R
E
A
D
Total
SQL commands injected into application
0
0
0.3
0.4
0.2
0.9
Rating High
Table IX. DREAD rating of SQL injection
IMCS 22,5
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
448
References Alazab, A., Abawajy, J. and Hobbs, M. (2013), “Web malware that target web application”, Social Network Engineering for Secure Web Data and Services, IGI Global, Hershey, PA, pp. 248-264. Alazab, A., Hobbs, M., Abawajy, J. and Alazab, M. (2012), “Using feature selection for intrusion detection system”, 2012 International Symposium on Communications and Information Technologies (ISCIT), Gold Coast, pp. 296-301. Alazab, M., Ventatraman, S., Watters, P., Alazab, M. and Alazab, A. (2011a), “Cybercrime: the case of obuscated malware”, in 7th International Conference on Global Security, Safety & Sustainability, Thessaloniki. Alazab, M., Venkatraman, S., Watters, P. and Alazab, M. (2011b), “Zero-day malware detection based on supervised learning algorithms of API call signatures”, Australasian Data Mining Conference (AusDM 11), Ballarat, pp. 171-182. Corporation, M. (2003), “Improving web application security: threats and countermeasures”, available at: http://msdn.microsoft.com/en-us/library/ff648644.aspx Corporation, M. (2012), “Common vulnerabilities and exposures”, available at: http://cve. mitre.org/ Cole, E. (2011), Network Security Bible, Wiley, Vol. 768. Cova, M., Balzarotti, D., Felmetsger, V. and Vigna, G. (2007), “Swaddler: an approach for the anomaly-based detection of state violations in web applications”, Proceedings of the 10th International Symposium on Recent Advances in Intrusion Detection (RAID), Queensland, 5-7 September, pp. 63-86. Dagorn, N. (2008), “WebIDS: a cooperative bayesian anomaly-based intrusion detection system for web applications (Extended Abstract)”, Recent Advances in Intrusion Detection, Springer Berlin, Heidelberg pp. 392-393. Davis, J.J. and Clark, A.J. (2011), “Data preprocessing for anomaly based network intrusion detection: a review”, Computers & Security, Vol. 30 Nos 6/7, pp. 353-375. Elshoush, H.T. and Osman, I.M. (2011), “Alert correlation in collaborative intelligent intrusion detection systems – a survey”, Applied Soft Computing, Vol. 11 No. 7, pp. 4349-4365. Faysel, M.A. and Haque, S.S. (2010), “Towards cyber defense: research in intrusion detection and intrusion prevention systems”, International Journal of Computer Science and Network Security, Vol. 10 No. 7, pp. 316-325. García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G. and Vázquez, E. (2009), “Anomaly-based network intrusion detection: techniques, systems and challenges”, Computers & Security, Vol. 28 Nos 1/2, pp. 18-28. Gordeychik, S. (2010), “Web application security statistics”, available at: www.Webappsec.org/ projects/statistics/ Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L. and Perkasa, C.D. (2011), “A novel intrusion detection system based on hierarchical clustering and support vector machines”, Expert Systems with Applications, Vol. 38 No. 1, pp. 306-313. Jaiswal, A. and Jain, S. (2010), “Database intrusion prevention cum detection system with appropriate response”, International Journal of Information Technology, Vol. 2 No. 2, pp. 651-656. Kruegel, C. and Vigna, G. (2008), “Anomaly detection of web-based attacks”, CCS’03 Proceedings of the 10th ACM Conference on Computer and Communications Security, Washington, DC, 27-31 October, pp. 251-261.
Downloaded by Deakin University At 13:57 21 November 2014 (PT)
Maggi, F., Robertson, W., Kruegel, C. and Vigna, G. (2009), “Protecting a moving target: addressing web application concept drift”, Recent Advances in Intrusion Detection, Springer Berlin, Heidelberg, pp. 21-40. Robertson, W.K. (2010), “Detecting and preventing attacks against web applications”, PhD Dissertation, University of California at Santa Barbara, Santa Barbara, CA. Robertson, W., Maggi, F. and Vigna, C.K.G. (2010), “Effective anomaly detection with scarce training data”, Proceedings of the Network and Distributed System Security Symposium (NDSS). Roesch, M. (1999), “Snort-lightweight intrusion detection for networks”, Proceedings of the 13th USENIX Conference on System Administration, Seattle, Washington, 7-12 November, pp. 229-238. Sadoddin, R. and Ghorbani, A.A. (2009), “An incremental frequent structure mining framework for real-time alert correlation”, Computers & Security, Vol. 28 Nos 3/4, pp. 153-173. Sekar, R., Bendre, M., Dhurjati, D. and Bollineni, P. (2001), “A fast automaton-based method for detecting anomalous program behaviors”, Proceedings of the 2001 IEEE Symposium on Security and Privacy, Oakland, CA, pp. 144-155. Shameli-Sendi, A., Ezzati-Jivan, N., Jabbarifar, M. and Dagenais, M. (2012), “Intrusion response systems: survey and taxonomy”, International Journal of Computer Science Network Security, Vol. 12 No. 1, pp. 1-14. SPADE, “Silicon defense”, available at: www.silicondefense.com/software/spice/ Stakhanova, N., Basu, S. and Wong, J. (2007), “A taxonomy of intrusion response systems”, International Journal of Information and Computer Security, Vol. 1 Nos 1/2, pp. 169-184. Taha, A.E., Ghaffar, I.A., Bahaa Eldin, A.M. and Mahdi, H.M. (2010), “Agent based correlation model for intrusion detection alerts”, 2010 IEEE International Conference on Intelligence and Security Informatics (ISI), Vancouver, pp. 89-94. Tavallaee, M., Bagheri, E., Lu, W. and Ghorbani, A.A. (2009), “A detailed analysis of the KDD CUP 99 data set”, Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009. Vigna, G. and Kemmerer, R.A. (1999), “NetSTAT: a network-based intrusion detection system”, Journal of Computer Security, Vol. 7 No. 1, pp. 37-72. Vigna, G., Valeur, F., Balzarotti, D., Robertson, W., Kruegel, C. and Kirda, E. (2009), “Reducing errors in the anomaly-based detection of web-based attacks through the combined analysis of web requests and SQL queries”, Journal of Computer Security, Vol. 17 No. 3, pp. 305-329. Wu, D. and Mendel, J.M. (2011), “Linguistic summarization using IF–THEN rules and interval type-2 fuzzy sets”, IEEE Transactions on Fuzzy Systems, Vol. 19 No. 1, pp. 136-151. Further reading Alazab, A., Alazab, M., Abawajy, J. and Hobbs, M. (2011), “Web application protection against SQL injection attack”, Proceedings of the 7th International Conference on Information Technology and Applications, Hershey, PA, pp. 1-7. Corresponding author Ammar Alazab can be contacted at:
[email protected]
To purchase reprints of this article please e-mail:
[email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints
Intrusion detection and prevention system 449