Anomaly-based Intrusion Detection from Traffic Datamining on Internet Connections* Min Qin and Kai Hwang Internet and Grid Computing Laboratory, EEB 231 University of Southern California, Los Angeles, CA 90089-2562
Abstract: In this paper, we present a new datamining approach to generating frequent episode rules for the construction of anomaly-based, intrusion detection systems (IDS). These rules are derived from normal network traffic profiles. An anomaly is detected when the rule deviates significantly from the normal patterns. Three rule pruning techniques are devised to reduce the rule search space by 50-80%. This reduction makes datamining viable in detecting unknown network attacks. The new approach accelerates the entire process of machine learning and profile matching for intrusion detection. Testing our new scheme over DARPA 1999 IDS evaluation data sets, we find a 13% reduction in false alarms over 50 network attack incidents. The network episode rules reveal inter-relationship among sequences of network connection events. We detect unknown attacks embedded in telnet, http, ftp, smtp, and other requests of TCP, UDP or ICMP connections. Our IDS leads to an intrusion detection rate up to 47% for DoS (denial of service), R2L (remote-to-local), and probe attacks. Our scheme detects many attacks that cannot be detected by Snort, including the smurf, Apache2, Guesstelnet, Dict, Neptune, and Udpstorm. We recommend the use of the proposed anomaly detection scheme jointly with signature-based IDS to yield even better results. These results prove the viability of using the new scheme to build automated intrusion detection and response systems in real time.
Index Terms: Network security, intrusion detection, traffic datamining, anomaly detection, association rules, frequent episode rules, false alarm rate, Snort evaluation, and distributed Grid computing. _________________________________________________ •
Manuscript submitted to IEEE Transactions on Dependable and Secure Computing, Dec. 19, 2003. This paper was extended significantly from a preliminary version, “Effectively Generating Frequent Episode Rules for Anomaly-based Intrusion Detection”, submitted to the 2004 IEEE Symposium on Security and Privacy for consideration of presentation. The research support of this work from NSF/ITR Grant ACI-0325409 is acknowledged. All rights are reserved by the coauthors. Min Qin can reached via
[email protected] and Kai Hwang by
[email protected]
Qin and Hwang, USC, Dec.17, 2003
Page 1 of 35
1. Introduction Cyber crimes have become major threats to Internet computing, web and Grid services. Network security cannot be assured if unwanted intrusions are not stopped or removed in a timely manner. In August 2003, the outbreak of the MS Blast worm has caused millions of machines to become defenseless with interrupted Internet services [34]. An effective intrusion detection system (IDS) should be able to detect such attacks at the early stage. The purpose is to raise the alarms timely to prevent major damages on network or client resources [37]. Extensive research has been reported on the design and evaluation of IDS in the past. The NSS Group [29] in UK has evaluated various commercial IDSs from security companies. Gaffney et al [12] have proposed a decision theoretic approach to evaluate IDS. Method for reducing false alarm rate of IDS was introduced by Axelsson [3], who identified base-rate fallacy and implementation barriers. Integrating access control and intrusion detection was introduced by Ryutov et al [36]. Other recent studies on IDS can be found in Burroughs et al [8], Gopalakrishna [13], Ranum [33], and Sekar et al [38]. According to the detection methods used, the IDSs are generally classified into two major categories: signature-based versus anomaly-based. The signature-based IDS applies a misusedetection model, by which the attacks are checked against saved signatures (characteristics) from known attacks previously detected. The Snort [35] and the STAT [18] are good examples of this kind of IDS. Just like most anti-virus packages, the misuse model is based on pattern matching, which is only good in detecting known attacks with signatures collected. Anomaly-based IDSs are based on a normal-use detection model. Good examples are the IDES [22] and EMERALD [31]. The normal-use model checks the attack patterns against normal network behavior. The incoming traffic is compared with normal characteristics to reveal any significant deviations. The advantage of using anomaly detection lies in its ability to cope with unknown attack patterns. The major drawback of anomaly detection lies in higher false alarms than using signature matching [2]. Most existing anomaly-based IDSs concentrate on
Qin and Hwang, USC, Dec.17, 2003
Page 2 of 35
detecting traffic anomalies. Other techniques, such as detecting anomalies of packet header, were reported by Mahoney [24]. Datamining [11] is a commonly explored technique to build anomaly-based IDS [28]. In order to distinguish between intrusive and normal behavior, algorithms are needed to generate the association rules [1] or frequent episode rules (FER) [25] from audit traffic data. The concept of generating FERs using minimal occurrences started with the work of Mannila and Toivonen [26]. Associations are often used to capture the intra-record patterns, while FERs are used to detect inter-record patterns. With a huge amount of audit records, datamining often generates many long FERs with a high degree of redundancy or repetitions. In this paper, we aim to remove those unwanted redundancy or ineffective episode rules. Statistically generated sequential rules for detecting anomalies were introduced in [39]. Instead of using datamining, a time-based inductive learning machine was adapted to the changes in normal user behavior. Anomaly is detected when a sequence of events deviates significantly from the normal sequential rules. Hofmeyr, et al. [15] uses a similar approach by analyzing a sequence of system calls to detect intrusions. In [19], Lane et al. transformed discrete temporal sequence into a metric space and use a clustering technique to reduce the size of the user model. With datamining, there are several approaches to effective IDS construction [6, 20]. In an earlier work by Lee et al [20], they use axis and reference attributes to constrain the number of rules generated. Their method can reduce the number of rules to some extent. The JAM (Java Agent for Meta-learning) project [20] uses datamining to generate rules to provide temporal features. JAM uses RIPPER [9] in building classifiers that can detect signature of attacks. This is essentially a misuse IDS. Fan et al [10] extended Lee’s work by introducing artificial anomalies to discover accurate boundaries between known classes and anomalies. Bridge et al [7] applies fuzzy frequent episode and fuzzy association rules to the problem of intrusion detection. The ADAM project [4, 5] offered a datamining framework for detecting network intrusions. Unlike JAM, ADAM is an anomaly based detection system. ADAM uses a
Qin and Hwang, USC, Dec.17, 2003
Page 3 of 35
sliding window to scan frequent associations in TCP connection data. These associations are compared with normal profiles that have been constructed. ADAM has the ability to detect novel attacks through a pseudo-Bayes estimator with a low false alarm rate. In this paper, we reduce further the applicable FER rule space. Our method differs from both Lee’s scheme and ADAM by using a FER-matching methodology. The rest of the paper is organized as follows: In Section 2, we introduce basic techniques for mining audit data. Axis and reference attributes are revisited. Section 3 presents an anomalybased IDS architecture using datamining capabilities. We introduce a base-support algorithm for generating useful FERs to detect intrusions. Our new FER generation algorithm compares favorably over the level-wise algorithm developed by Lee’s group [20]. Advantages of our mining algorithm are justified here. In Section 4, three pruning techniques are introduced to reduce the FER search space. These pruning laws are illustrated with concrete traffic connection events. In Section 5, we propose a new algorithm for pruning ineffective episode rules by applying the reduction laws, systematically. We also outline the anomaly generation processes here. In section 6, the experimental results are reported in terms of intrusion detection rate and false alarm rate. Finally, we summarize the contributions and make a few suggestions for further research effort.
2. Mining of Audit Profiles in Network Traffic In order to build effective network IDS against intrusions, we use datamining to find the patterns of both normal and intrusion behaviors from system audit data. We adopted the idea of axis and reference attributes introduced by Lee et al. [20], since it includes domain-specific knowledge and is able to describe relationships among traffic records. The tasks of datamining are described by either association rules or frequent episode rules. An association rule is aimed at finding interesting intra-relationship inside a connection record. The FER describes the interrelationship among multiple connection records.
Qin and Hwang, USC, Dec.17, 2003
Page 4 of 35
2.1 Association Rules vs. Frequent Episode Rules Let T be a set of traffic connection records and A be a set of attributes defined over the connections. For example, the set A can be chosen as {timestamp, duration, service, srchost, desthost} for TCP connections. Let I be a set of values defined on A, such as I = { timestamp = 10, duration = 1, service = http, srchost = 128.125.1.1, desthost = 128.125.1.10 }. Any subset of I is called an Itemset representing the characteristics of connection events. Let X be a traffic itemset (event) under evaluation. The support value for X, denoted Support (X), is defined by the percentage of connection records in T that satisfies X. For example, X = {timestamp=10, duration=1} is an itemset. Y = {service = http} is another itemset. In this example, X I Y = φ . The union of the two itemsets X U Y = {timestamp =100, duration=1, service=http} represents the characteristics of the three traffic attributes as listed. Association Rules: An association rule is defined between two traffic itemsets, X and Y. These two itemsets are disjoint with X I Y = φ . The rule is denoted by : X → Y, ( c, s )
(1.a)
The association rule is characterized by a support value s and a confidence level c. These are probabilities of the corresponding traffic events, defined by: s = Support (X U Y)
and
c=
Support (X U Y) Support (X)
(1.b)
Both s and c are fractional numbers calculated directly from the Support functions on the itemsets X and on the joint itemset X U Y as exemplified above. Frequent Episode Rules: In general, an FER is expressed by the expression: L1, L2, …, Ln →
R1, … , Rm, (c, s, window)
(2.a)
where Li (1 ≤ i ≤ n) and Rj (1 ≤ j ≤ m) are ordered itemsets in a traffic record set T. We call L1, L2, …Ln the LHS (left hand side) episode and R1,….Rm the RHS (right hand side) episode of the
Qin and Hwang, USC, Dec.17, 2003
Page 5 of 35
rule. Note that all itemsets are sequentially ordered, that is L1, L2, …Ln, R1,…., Rm must occur in the ordering as listed. However, other itemsets could be embedded within our episode sequence. We define the support and confidence of rule (2.a) by the following two expressions: s = Support ( L1 U L2 ... U R1 ... U R m ) ≥ s 0 c=
(2.b)
Support ( L1 U L 2 U ... U R 1 U ... U R m ) ≥ c0 Support ( L1 U L 2 ... U L n )
(2.c)
We consider the minimal occurrence [26] of the episode sequence in the entire traffic stream. The support value s is defined by the percentage of occurrences of the episode within the parentheses out of the total number of traffic records audited. The confidence level c is the joint probability of the minimal occurrence of the joint episodes out of the support for the LHS episode. Both parameters are lower bounded, by so and co, which are the minimum support value and the minimum confidence level, respectively. The window size is an upper bound on the time duration of the entire episode sequence. Example 1: Association rules and episode rules Consider the following association rule for an http connection event: (service = http) → (duration = 1) (0.8, 0.1) The rule indicates that 80% of all the http connections have duration less than one second. There are 10% of all network connections that are initiated from http requests with a duration less than one second. Now, consider the following frequent episode rule for a sequence of network events: (service = authentication) → (service = smtp) (service = smtp) (0.6, 0.1, 2 sec) This rule specifies an authentication event. If the authentication service is requested at time t, there is a confidence level of c = 60% that two smtp services will follow before the time t + w,
Qin and Hwang, USC, Dec.17, 2003
Page 6 of 35
where the event window w = 2 sec. The support of 3 traffic events (service = authentication), (service = smtp), (service = smtp) accounts for 10% of all network connections.
!
The traffic connections on both sides of a FER need not be disjoint in an episode sequence of events. Episode rules can be used to characterize attacks. The SYN flood attack is specified by the following episode rule: (service = http, flag = S0) (service = http, flag = S0) → (service = http, flag = S0) where the event (service = http, flag = S0) is an association. Flag “S0” means that only the SYN packet was seen for a particular connection. The combination of associations and FERs reveals useful information on normal and intrusive behaviors. Theses rules can be applied to build IDS to defend against both known and unknown attacks.
2.2 Axis Attributes vs. Reference Attributes The basic rule generation algorithm does not take any domain specific knowledge into consideration. Often, too many ineffective rules are generated to be useful. For example, the association rule: Srcbytes = 200 → destbytes = 300 is of little interest to the intrusion detection process, since the number of bytes sent by the source (srcbytes) and destination (destbytes) is irrelevant to the traffic and threat conditions. In order to address this issue, Lee et al [20] has introduced the concepts of axis attributes and reference attributes to constrain the generation of mining rules. For each association rules, it must contain some values of axis attributes. Those association rules that do not contain any axis attributes are considered irrelevant to the context. Axis attributes are selected from essential attributes [20] such as srchost (source host), desthost (destination host), srcport (source port), and service (destination port). Different combinations of the essential attributes form the axis attributes. We also incorporated connection flag as an essential attribute, since some flags are pretty rare in daily
Qin and Hwang, USC, Dec.17, 2003
Page 7 of 35
network traffic and is hard to mine. However, flag has to be combined with at least one another essential attribute to form axis attributes. All Itemsets or traffic events in an FER must contain some axis attributes. The reference attributes demand itemsets to have the same reference value.
3.
Datamining for Anomaly Intrusion Detection Our long-term goal is to build an intelligent intrusion detection system that can help
secure any distributed computing infrastructure such as a Grid system. The system can detect not only known intrusion patterns, but also novel unknown intrusions. In order to achieve this objective, we use datamining to profile frequent network patterns for detecting anomalies.
3.1 The Network Datamining Architecture Three major components of our IDS are the datamining engine, the intrusion detection engine, and the alarm generation engine as shown in Fig. 1. In this paper, we apply the normal profile database and construct the anomaly detection engine. The alarm generation is beyond the scope of this report. In order to correctly detect intrusion patterns, we extract two levels of information from raw audit data of the network traffic. Although connection level information is very effective against flood and scan attacks, it can detect only a small portion of attacks. Most R2L and U2R attacks cannot be discovered at the connection level. It should be noted that we combine the anomaly intrusion detection with the signaturebased detection mechanisms in Figure 1. An attack can be detected by either mechanisms, whichever confirms the intrusion first. Once an intrusion anomaly is discovered in the traffic profile, its signatures will be added to the signature database. Initially, we generate the episode rules from the 1999 MIT Lincoln IDS evaluation data sets. [14, 17]. Eventually, we will update the database and extend the rules using more recent traffic connection records. The whole traffic database will be periodically updated for experiment purpose at USC.
Qin and Hwang, USC, Dec.17, 2003
Page 8 of 35
Audit data Data preprocessor
Rules from realtime traffic
Feature extraction Intrusion Detection Engine
Alarm generator
Attack-free episode rules
Data mining Engine
Anomaly Detection Engine
Signature Database
Alarm generation
Normal profile database
Security policy
Figure 1 Our datamining architecture for anomaly-based intrusion detection We added some additional features extracted from the packet level data to detect new attacks. In order to do so, we use the IDS tool Bro [30] to extract the features from both connection and packet information. The key features are generated for all traffic connections in Table 1. Table 1 Key Features Extracted from Traffic Connection Records Feature Name Timestamp Duration
Description Time when the first packet of the connection is seen Length of the connection in seconds, ignored for UDP packets
Srchost
IP address of the source host
Srcport
Port number of the source host
Srcbyte
Number of bytes sent by the source host
Destbyte
Number of bytes sent by the destination host
Desthost
IP address of the destination
Destport
Port Number of the destination host
Flag
Urgent Frag_Error
Connection status flag. Typical flag values given below. SF: both SYN and FIN packets are known for a connection S0: only the SYN packet was seen for a TCP connection REJ: the connection was rejected by the destination Number of urgent flags in the connection Number of Fragment errors in the connection
Qin and Hwang, USC, Dec.17, 2003
Page 9 of 35
For each connection, we check whether they violate any RFC protocols. For example, TCP three-way handshake protocol can be easily verified by looking at the packets for establishing the connection. Also during the preprocessing stage, packets with infrequent properties are identified for the purpose of anomaly detection. We keep a strong interest in those infrequent attribute values since attackers often utilize them. For example, packets with same destination and source address will normally indicate some potential attacks. In order to detect more R2L attacks, we extracted functional features from the network traffic as listed in Table 2. Table 2 Functional Features Extracted for Specific Services Feature Name
Description
Login_Failed
Determine whether a request is failed to yield ftp, telnet, rlogin services, etc
Sensitive_files
Sensitive files that are visited/created by the user, ex. .rhost, .password, etc
http_request
Number of http requests in an http connection
Privileged port
Whether the srcport is a privileged port (port number