Aug 6, 1999 - for the Degree of Master of Science in Computer Science in the Department of Computer Science. Mississippi State, Mississippi. August 1999 ...
INTEGRATING FUZZY LOGIC WITH DATA MINING METHODS FOR INTRUSION DETECTION
By Jianxiong Luo
A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science in the Department of Computer Science
Mississippi State, Mississippi August 1999
INTEGRATING FUZZY LOGIC WITH DATA MINING METHODS FOR INTRUSION DETECTION
By Jianxiong Luo
Approved:
______________________________ Susan M. Bridges Associate Professor of Computer Science (Advisor) and Graduate Coordinator of the Department of Computer Science
______________________________ Julia E. Hodges Professor of Computer Science (Committee Member)
______________________________ Rayford B. Vaughn, Jr. Associate Professor of Computer Science (Committee Member)
______________________________ A. Wayne Bennett Dean of the College of Engineering
Name: Jianxiong Luo Date of Degree: August 6, 1999 Institution: Mississippi State University Major Field: Computer Science Major Professor: Dr. Susan Bridges Title of Study: INTEGRATING FUZZY LOGIC WITH DATA MINING METHODS FOR INTRUSION DETECTION Pages in Study: 69 Candidate for Degree of Master of Science
This report explores integrating fuzzy logic with two data mining methods (association rules and frequency episodes) for intrusion detection. Data mining methods are capable of extracting patterns automatically from a large amount of data. The integration with fuzzy logic can produce more abstract and flexible patterns for intrusion detection, since many quantitative features are involved in intrusion detection and security itself is fuzzy. In this report, Chapter I introduces the concept of intrusion detection and the practicality of applying fuzzy logic to intrusion detection. In Chapter II, two types of intrusion detection systems, host-based systems and network-based systems, are briefly reviewed. Some important artificial intelligence techniques that have been applied to intrusion detection are also reviewed here, including data mining methods for anomaly detection. Chapter III summarizes a set of desired characteristics for the Intelligent Intrusion Detection Model (IIDM) being developed at Mississippi State University. A preliminary architecture which we have developed for integrating machine
learning methods with other intrusion detection methods is also described. Chapter IV discusses basic fuzzy logic theory, traditional algorithms for mining association rules, and an original algorithm for mining frequency episodes. In Chapter V, the algorithms we have extended for mining fuzzy association rules and fuzzy frequency episodes are described. We add a normalization step to the procedure for mining fuzzy association rules in order to prevent one data instance from contributing more than others. We also modify the procedure for mining frequency episodes to learn fuzzy frequency episodes. Chapter VI describes a set of experiments of applying fuzzy association rules and fuzzy episode rules for off-line anomaly detection and real-time intrusion detection. We use fuzzy association rules and fuzzy frequency episodes to extract patterns for temporal statistical measurements at a higher level than the data level. We define a modified similarity evaluation function which is continuous and monotonic for the application of fuzzy association rules and fuzzy frequency episodes in anomaly detection. We also present a new real-time intrusion detection method using fuzzy episode rules. The experimental results show the utility of fuzzy association rules and fuzzy frequency episodes in intrusion detection. The conclusions are included in Chapter VII.
DEDICATION
I would like to dedicate this research to my family and my wife.
ii
ACKNOWLEDGMENTS
I am deeply grateful to Dr. Susan Bridges for expending much time to direct me in this entire research project and directing my graduate study and research work during last two years. In this research, she often gave me many distinctive insights and guided me back to right path when I was astray. She always encourages me to learn new things and think independently. I also thank her very much for her concerns when I was hospitalized. I owe heartfelt thanks to Dr. Julia Hodges, who introduced me into the DIAL research group and taught me much in my research and graduate study. Her classes are always interesting and full of humors. I also thank her very much for her encouragement when I faced difficulties, as well as her valuable trust. I am also very indebted to Dr. Ray Vaughn for his direction in this project. He spent much time to guide my project, introduced many suggestions, and provided me much useful information. I also thank our system administrator, Mr. Gerhard Lehnerer. Without his time and effort, my experiments would not have been conducted smoothly.
iii
TABLE OF CONTENTS
Page DEDICATION …………………………………………………………………
ii
ACKNOWLEDGMENTS ………………………..…………………………….
iii
LIST OF TABLES ………………………………………………………….….
vi
LIST OF FIGURES …………………………………………………………….
vii
CHAPTER I. INTRODUCTION ……………………...………………….…………
1
II. LITERATURE REVIEW ON INTRUSION DETECTION ………… 2.1 Intrusion Detection Systems ...…………………………..………. 2.1.1 Host-Based Intrusion Detection ……………...…………… 2.1.2 Network-Based Intrusion Detection ………..…………….. 2.2 Artificial Intelligence and Intrusion Detection Methods ….…….. 2.2.1 Artificial Intelligence and Misuse Detection ….………….. 2.2.1.1 Rule-Based Expert System ………………………. 2.2.1.2 State Transition Analysis ………………………… 2.2.1.3 Genetic Algorithms …………………....…………. 2.2.2 Artificial Intelligence and Anomaly Detection ..………….. 2.2.2.1 Inductive Sequential Patterns ………….…………. 2.2.2.2 Artificial Neural Networks …………..…………… 2.2.2.3 Data Mining Methods ……………………………. 2.2.3 Summary of AI and Intrusion Detection …….…………….
5 5 6 7 9 9 9 10 10 11 11 12 12 13
III. AN INTELLIGENT INTRUSION DETECTION MODEL …………. 3.1 Expected Characteristics of IIDM …………………...…….…….. 3.2 Preliminary Architecture ………………………………..………..
14 14 16
IV. REVIEW OF FUZZY LOGIC AND DATA MINING ……..………. 4.1 Fuzzy Logic …………………………….………………..……….
20 20
iv
CHAPTER
Page
4.2 Data Mining Methods ……………………………………………. 4.2.1 Association Rules …………………….……..….………….. 4.2.2 Frequency Episodes …………………………..……………
25 25 30
V. INTEGRATION OF FUZZY LOGIC WITH DATA MINING ..……... 5.1 Fuzzy Association Rules ..………………………..………………. 5.1.1 Related Works ………………………..……………………. 5.1.2 Fuzzy Association Rules ………………..……..…………... 5.2 Fuzzy Frequency Episodes ……..……………………..…………..
34 34 35 36 39
VI. EXPERIMENTS AND RESULTS ……..…………………..….……... 6.1 Anomaly Detection ………………………………..……..………. 6.1.1 Experiment Set 1 …………………………..……..………... 6.1.2 Experiment Set 2 ……………………………..…….……… 6.2 Real-time Intrusion Detection ……………………………………. 6.2.1 Experiment 3 ……………………………..…….………….. 6.2.2 Experiment 4 ……………………………..…….………….. 6.2.3 Experiment 5 ……………………………..…….…………..
43 44 44 53 55 57 60 61
VII. CONCLUSION ………………………………………………………..
64
REFERENCES …………………………………………………………………..
67
v
LIST OF TABLES
TABLE
Page
2.1 Summary of AI Techniques and Intrusion Detection ………………..……..
13
6.1 Specification of Training and Test Data Sets ...…………………………….. 46 6.2 Effects of the minconfidence Threshold on the False Positive Error Rate (FPER) and the False Negative Error Rate (FNER) .………………….
vi
62
LIST OF FIGURES
FIGURE
Page
3.1 Architecture of an Intelligent Intrusion Detection Model ……………..…..
17
4.1 Singleton Representation of a Fuzzy Set ….…………………………….....
22
4.2 Standard Function Representation of Fuzzy Sets ………………………….
23
4.3 Calculation Method of Fuzzy Set Values ………………………………….
24
4.4 Agrawal and Srikant's Apriori Candidate Generation Algorithm (1994) ….
27
4.5 Flow Chart Depiction of Agrawal and Srikant’s Algorithm Apriori (1994) ……………….…………………………………
29
4.6 Candidate Generation Algorithm Based on Work of Mannila and Toivonen (1996) ……………………………………………..
32
5.1 Example of Sharp Boundary Problem …………..………………………....
35
5.2 Candidate Generation Algorithm for Fuzzy Association Rules …………...
38
6.1 Specification of Temporal Statistical Measurements Used in the Experiments …………..………………………..……………...
45
6.2 Comparison of Similarities Between Different Training and Test Data Sets for Fuzzy Association Rules …………………………..………….
51
6.3 Comparison of Similarities Between 3 Hour Training Data Set and Test Data Sets for Fuzzy Episode Rules ...………………………….……………..
51
6.4 Comparison of Similarities Between Different Training and Test Data Sets for Fuzzy Association Rules ……………………..………….…….
52
6.5 Comparison of Similarities Between 3 Hour Training Data Set and Test Data Sets for Fuzzy Episode Rules ……………………..……….……………
52
vii
FIGURE
Page
6.6 Comparison of Similarities Between Training Data Set and Different Test Data Sets for Fuzzy Association Rules ….……………………… 54 6.7 Comparison of Similarities Between Training Data Set and Different Test Data Sets for Fuzzy Episode Rules ….…………………………..
55
6.8 Anomaly Percentages of Different Test Data Sets in Real-time Intrusion Detection ………………….…...………….……………… 58 6.9 Distribution of the Feature PN with Time from Test Data Sets T1’ (Representing Normal Behavior) and T4’ (Representing Simulated mscan Intrusions) ……….…………….……
59
6.10 Comparison of False Positive Error Rates of Fuzzy Episode Rules and Non-Fuzzy Episode Rules ………….……………...
61
viii
CHAPTER I INTRODUCTION In recent years, computer security has become increasingly important and an international priority. This is due to the wide use of computers, the emergence of electronic commerce, and the rapid growth of computer networks. For example, a “Trojan horse” in a computer host can perform illegal operations or even do some damage because it masks itself as a valid program (Gasser 1988). In a computer network running TCP/IP, IP spoofing will help intruders gain access to a remote host by guessing its IP sequence numbers and then masking itself as a legal user. Since intrusions will take advantage of vulnerabilities in computer systems, intrusion detection methods are usually developed to enforce the security policy of computer hosts and computer networks. In a modern computer system, intrusion detection has become an essential and critical component. One of the reasons is that it is not technically feasible to build a system without any vulnerabilities (Denning 1986). As a matter of fact, it is also very difficult to test the security capabilities of a system since it is almost impossible to incorporate all intrusion patterns. In addition, future attackers may use completely unknown patterns which are unexpected and difficult to detect. On the other hand, intrusions originating from authorized system users who choose to abuse their access
1
2
rights will not cause an alarm without the use of intrusion detection methods (Denning 1986). There are two types of intrusion detection: misuse detection and anomaly detection (Sundaram 1996). Misuse detection can be applied to the attacks that generally follow some fixed patterns. For example, three consecutive login failures are likely to be one of the important characteristics of password guessing. Misuse detection is usually constructed to examine these intrusion patterns that have been recognized and reported by experts. However, intruders do not always follow publicly known patterns to break into a computer system. They will often try to mask their illegal behavior to deceive the detection system. Anomaly detection methods are designed to counter this kind of challenge. Unlike misuse detection that is based on attack patterns, anomaly detection tries to find patterns of normal behavior, with the assumption that an intrusion will usually include some deviation from this normal behavior. Observation of this deviation will then result in an intrusion alarm. Artificial intelligence (AI) techniques have played an important role in both misuse detection and anomaly detection. AI techniques can be used for data reduction and classification tasks (Frank 1994). For example, many intrusion detection systems have been developed as rule-based expert systems. An example is SRI’s Intrusion Detection Expert System (IDES) (Lunt and Jagannathan 1988). The rules for detection can be constructed based on the knowledge of system vulnerabilities or known attack patterns. On the other hand, AI techniques also have the capability of learning inductive rules. For example, sequential patterns can be learned by a system such as the Time-
3
based Inductive Machine (TIM) for intrusion detection (Teng, Chen, and Lu 1990). Neural networks can be used to predict future intrusions after training (Debar, Becker, and Siboni 1992). Data mining methods, such as association rules and frequency episodes, have been also proposed to mine normal patterns from audit data (Lee, Stolfo, and Mok 1998). However, if a rule is directly dependent on audit data, “there is very little flexibility in this one-to-one (rule-to-audit record) representation” (Ilgun and Kemmerer 1995). For example, an intrusion with a very small deviation from the patterns represented in the rules may not be matched and recognized. To improve the flexibility of an intrusion detection system, this thesis describes a method for integrating fuzzy logic with data mining methods for intrusion detection. There are two main reasons to introduce fuzzy logic for intrusion detection. First, many quantitative features are involved in intrusion detection. SRI’s Next-generation Intrusion Detection Expert System (NIDES) categorizes security-related statistical measurements into four types: ordinal, categorical, binary categorical, and linear categorical (Lunt 1993). Ordinal measurements and linear categorical measurements are quantitative features which can potentially be viewed as fuzzy variables. For instance, the CPU usage time and the connection duration are two examples of ordinal measurements. An example of a linear categorical measurement is the number of different TCP/UDP services initiated by the same source host. The second reason to introduce fuzzy logic for intrusion detection is that security itself includes fuzziness. Given a quantitative measurement, a range value or an interval can be used to denote a normal value. Then,
4
any values falling outside the interval will be considered anomalies to the same degree regardless of their different distances to the interval. The same applies to values inside the interval, i.e., all will be viewed normal to the same degree. Unfortunately, this causes an abrupt separation between normality and anomaly. For example, a value inside the border is assumed normal while another value outside the border is assumed abnormal even though there is only a very small difference between these two values. The introduction of fuzziness to these quantitative features will help to smooth the abrupt separation. The hypothesis of this research is that fuzzy logic is capable of producing more general rules which will increase the flexibility of intrusion detection systems. This thesis will investigate the integration of fuzzy logic with association rules and frequency episodes with the purpose of improving the performance of an intrusion detection system.
CHAPTER II LITERATURE REVIEW ON INTRUSION DETECTION Intrusions were first categorized by J. P. Anderson (Lunt 1993). They can be largely classified into three types: external intrusions, internal intrusions, and misfeasors. An external intrusion tries to break into a computer system without appropriate access rights. An internal intrusion originates from a valid user inside a computer system. A masquerader is an internal intruder who logs into the system by use of other users’ accounts. A clandestine is also an internal intruder who deceives the system and performs illegal operations. A misfeasor usually abuses his or her authority on the use of a computer system. Accordingly, intrusion detection can be defined as detecting outside intruders “who are using a computer system without authorization” and inside intruders “who have legitimate access to the system but are abusing their privileges” (Mukherjee, Heberlein, and Levitt 1994). Intrusion detection systems are usually built to identify these unauthorized behavior of outside or inside intruders and to enforce the security of computer systems.
2.1 Intrusion Detection Systems Generally speaking, there are two types of intrusion detection systems: host-based intrusion detection systems and network-based intrusion detection systems. 5
6
2.1.1 Host-Based Intrusion Detection A generic intrusion detection model proposed by Denning (1986) works as a rulebased pattern matching system which includes the following six components: 1. Subjects: A subject is the “initiator” of an action being performed on the host, e.g., a user or the host itself. 2. Objects: An object is the “receptor” of an action, e.g., a system file or a system device. 3. Audit records: An audit record is used to represent an action initiated by the subject and that occurred on the object. Some quantitative measurements on the action are also included in the audit record, e.g., CPU usage time or I/O activity. 4. Profiles: A profile is the “signature or description of normal activity” of a subject or a group of subjects concerning an object or a group of objects, e.g., a profile on the CPU usage of a user session or a profile on the CPU usage of a program. Several statistical models can also be included to calculate these quantitative measurements in these profiles. Examples include the mean and standard deviation model, Markov process model, and time serial model. 5. Anomaly records: An anomaly record is used to record an anomalous event that has been detected. 6. Activity rules: An activity rule describes what action will be taken under some conditions. For example, when a new audit record is created, the corresponding profile will be updated automatically.
7
So, intrusion detection tasks can be conducted by checking the similarity between the current audit record and the corresponding profiles. If the current audit record deviates from the normal patterns enough, it will be considered an anomaly. This process occurs in real time. Denning’s intrusion detection model is the basis of SRI’s IDES (Lunt and Jagannathan 1988). SRI’s IDES has two components: the statistical anomaly detector and the expert system (Mukherjee, Heberlein, and Levitt 1994). Based on Denning’s model, the first component is used to detect anomalies by applying statistical methods, i.e., the normal patterns are constructed by use of statistical analysis and the anomaly intrusions are detected by assuming that there will be always some differences between normal patterns and intrusions. The expert system component of SRI’s IDES is constructed as a rule-based system and is used to detect the intrusions whose patterns are already known.
2.1.2 Network-Based Intrusion Detection With the proliferation of computer networks, more and more individual hosts are connected into LANs of small scale or WANs of large scale. However, the hosts, as well as the networks, are exposed to intrusions due to the vulnerabilities of network devices and network protocols. For example, a “bastion host” is a host which exposes itself in the Internet since its address is publicly known (Chapman and Zwicky 1995). The TCP/IP protocol can be also exploited by network intrusions such as IP spoofing, port scanning, and so on. So, network-based intrusion detection has become increasingly important and is designed to protect a computer network as well as all of its hosts. Packet filtering, for example, can decide what kind data will be accepted or rejected for transfer through a
8
computer network based on routine information found in packet headers (Chapman and Zwicky 1995). The installation of a network-based intrusion detection system can also decrease the burden of the intrusion detection task on every individual host. To detect network-based intrusions, a network security monitor (NSM) has been proposed by Heberlein et al. (1990), which has a hierarchical architecture composed of the following five layers (from lowest to highest): 1. Packet catcher: It will monitor network traffic, catch every packet, and send it to the next layer. 2. Parser: It will analyze every incoming packet, summarize the security-related information into a four dimensional vector of , and pass it to the next layer. 3. Matrix generator: A corresponding four-dimensional matrix is maintained. Since the connection ID is unique, every connection will be represented by one cell in the matrix. A cell usually stores two measurements: the number of packets and the total data bytes transferred in one connection. 4. Matrix analyzer: Since the matrix actually represents the network traffic, the matrix analyzer will compare it with the normal patterns by use of a “masking” method. Anomaly intrusions will be detected because they will not be masked by normal patterns. 5. Matrix archiver: It will store the matrix at intervals, e.g., every fifteen minutes. These matrices can then be used to construct normal patterns of network traffic.
9
NSM detects network anomalies by monitoring network traffic. For misuse detection, LANL’s (Los Alamos National Laboratory) NADIR (Network Anomaly Detection and Intrusion Reporter) is built as a rule-base expert system through “audit analysis and consultation with security experts” (Mukherjee, Heberlein, and Levitt 1994).
2.2 Artificial Intelligence and Intrusion Detection Methods There are two types of intrusion detection methods: misuse detection and anomaly detection. Misuse detection is based on the knowledge of system vulnerabilities and the known attack patterns, while anomaly detection assumes that an intrusion will always reflect some deviations from normal patterns. Many artificial intelligence techniques have been applied to both misuse detection and anomaly detection.
2.2.1 Artificial Intelligence and Misuse Detection Since misuse detection is used to identify the intrusions whose patterns are known, pattern matching is a direct and efficient way to implement it.
2.2.1.1 Rule-Based Expert System A known intrusion pattern can be easily represented by rules such as production rules in the form of if-then-else. A rule-based expert system will also facilitate the process of pattern matching. This is the reason many intrusion detection systems are developed as rule-based expert systems or include rule-based inference components, such as SRI’s IDES, LANL’s NADIR, and so on. The efficiency of pattern matching is one of the most remarkable advantages for a rule-based expert system. When it is used in misuse detection, activation of more rules raises the level of suspicion.
10
Ilgun and Kemmerer (1995), however, have pointed out that one obvious disadvantage for a rule-based expert system is its “direct dependency” on audit data since a very small difference from the intrusion scenario will have different audit data which will blind the rule-based exert system from recognizing the intrusion.
2.2.1.2 State Transition Analysis The State Transition Analysis Tool (STAT) proposed by Ilgun and Kemmerer (1995) is another form of rule-based detection method. STAT first extracts a high-level representation of the audit trail, which is called signature action, from raw audit data. An intrusion pattern is represented by a sequence of state transitions from the initial state to the final state. Each state represents the system’s current situation, and the transition between two states is activated by a signature action. A STAT rule has three parts: “a state description field, a signature action field, and a rule dependence field” (Ilgun and Kemmerer 1995). The main advantage of STAT is that it can represent an intrusion pattern at a higher level than the audit data level, as well as in a sequential way, i.e., as a series of state transitions. However, the construction of a state transition diagram is not as direct as a rule-based expert system.
2.2.1.3 Genetic Algorithms Genetic Algorithm for Simplified Security Audit Trails Analysis (GASSATA) proposed by Me (1998) introduces genetic algorithms, a sub-symbolic AI technique, for misuse intrusion detection. GASSATA will construct a two-dimensional matrix. One axis
11
of the matrix specifies different attacks already known. The other axis represents different kinds of events derived from audit trails, i.e., the features of these attacks. So, this matrix actually represents the patterns of intrusions. A cell in the matrix, e.g., [Ei, Aj], reflects the number of events Ei that will occur in an attack Aj. Given an audit record being monitored which includes information about the number of occurrences of every event, GASSATA will apply genetic algorithms to find the potential attacks appearing in this audit record. Experiments on genetic algorithms have shown good results after evolving only 10 epochs (Me 1998). However, the assumption that the attacks are dependent only on events in this method will restrict its generality.
2.2.2 Artificial Intelligence and Anomaly Detection Statistical analysis has been widely used in anomaly detection (Denning 1986; Lunt and Jagannathan 1988). On the other hand, many AI techniques also can be applied to anomaly detection.
2.2.2.1 Inductive Sequential Patterns The Time-based Inductive Machine (TIM) has been proposed by Teng, Chen, and Lu (1990) to learn sequential patterns automatically from audit data for real-time anomaly detection. The format of the sequential rules inferred from audit trails by TIM can be illustrated with the following example: A – B È (C=90%; D=10%). This rule is interpreted to mean that if event A is directly followed by event B, then the next event will be C or D with the probabilities of 90% and 10%, respectively. Then any event sequence that does not match the normal sequential patterns inductively learned by TIM
12
will be marked as an anomaly. For example, given the normal pattern: A – B – C È (D=100%), the sequence of A – B – C – E will be flagged as an anomaly because a different event E (instead of D) has occurred while the conditions of the normal pattern have been matched. The main advantage of introducing an inductive learning mechanism to anomaly detection is that sequential patterns can be learned automatically and updated adaptively since new audit data can be used to train the system to find new normal patterns.
2.2.2.2 Artificial Neural Networks Artificial neural networks have been suggested for use in conjunction with expert systems to detect anomalies (Debar, Becker, and Siboni 1992). The backpropagation algorithm is used to learn “time series”. For example, after appropriate training, the backpropagation network will be able to predict the next command given a sequence of user commands. Then if the command observed in the audit record is different from that predicted by the neural network, a potential anomaly will be alarmed. An outstanding advantage of artificial neural networks is that they are highly tolerant of noisy data. Even an incomplete or inaccurate audit record will not prevent a neural network from detecting intrusions (Debar, Becker, and Siboni 1992).
2.2.2.3 Data Mining Methods Like TIM, data mining methods can be also used to extract normal patterns from training data automatically and adaptively. Two data mining methods, association rules and serial frequency episodes, have been proposed for audit data gathering, feature
13
selection, and off-line analysis for anomaly detection (Lee, Stolfo, Mok 1998). An association rule specifies the correlation among different features. A serial frequency episode represents a sequential pattern repeatedly occurring in the event sequence. An advantage here is that both the patterns among different features and the patterns among sequential events can be exploited.
2.2.3 Summary of AI and Intrusion Detection Table 2.1 summarizes different AI techniques that have been used for intrusion detection.
Table 2.1 Summary of AI Techniques and Intrusion Detection Intrusion Detection Types
Misuse Intrusion Detection
Anomaly Intrusion Detection
AI Techniques Rule-Based Expert Systems State Transition Analysis Genetic Algorithms Inductive Sequential Patterns Artificial Neural Networks Data Mining Methods
Pros
Cons
Efficiency of pattern matching; Ease of construction
Dependency on audit data
General rules at higher level; Sequential rules
Effort of construction
Efficiency of pattern matching
Less general
Automatic and adaptive learning
A large amount of training time
Tolerance of noisy data
A large amount of training time
Automatic and adaptive learning; More powerful rules
A large amount of training time
CHAPTER III AN INTELLIGENT INTRUSION DETECTION MODEL A research group at Mississippi State University is investigating the development of an intelligent intrusion detection model (IIDM) that applies artificial intelligence techniques and data mining methods.
3.1 Expected Characteristics of IIDM The expected characteristics of IIDM are specified as below. (1) Efficient: One of the most important characteristics for an intrusion detection system is its efficiency. An efficient intrusion detection system is able to correctly predict an attack as well as correctly recognize a normal operation. Two quantitative measurements are generally used to evaluate the efficiency of an intrusion detection system: a false positive rate and a false negative rate (Crosbie and Spafford 1995). The false positive rate is the error rate when an intrusion detection system wrongly predicts normal behavior as an abnormal attack. Similarly, the false negative rate is the error rate when an intrusion detection system marks an intrusion as a legal operation. A high false positive rate will seriously affect the performance of the system being detected. A high false negative rate will leave the system vulnerable to intrusions. So, both the false positive rate and the false negative rate should be minimized in IIDM.
14
15
(2) Intelligent: An intrusion detection system should be sufficiently intelligent to avoid being deceived by intrusions. For example, if an intrusion detection system is designed to recognize three consecutive login failures as a potential attack, an intruder can avoid this kind of routine check by always doing one or two consecutive login trials instead of three. Another example is subversion (Crosbie and Spafford 1995). Intruders may take some actions over a period of time. Each of these actions looks legal and safe if taken separately, but the sequence of these actions will compose a malicious intrusion. It is clear that an intelligent intrusion detection system should have enough flexibility to generalize patterns, even over a period of time. The integration of fuzzy logic with data mining methods is used to increase the flexibility of IIDM. (3) Adaptive: IIDM will incorporate a machine learning component that works as a background unit and learns normal patterns automatically from system audit data or network traffic data. The learning algorithms integrate fuzzy logic with association rules and serial frequency episodes and are implemented in a reusable way, i.e., they can be used to mine normal patterns from different sets of training data. Furthermore, the learning process is also an iterative and incremental procedure. New training data can be used to mine new normal patterns and the old patterns can be updated by combining these new patterns. This iterative and incremental learning process will make the intrusion detection system more adaptive. (4) Modular: Due to the complexity of intrusion detection, it is usually not sufficient for an intrusion detection system to use only misuse detection methods or anomaly detection methods. Accordingly, IIDM will include both of them in its core
16
component. In detail, the detection methods will be implemented as a set of intrusion detection modules. An intrusion detection module may address only one or even a dozen types of intrusions. Several intrusion detection modules may also cooperate to detect an intrusion in a loosely coupled way since these detection modules are relatively independent. Different modules may use different methods. For instance, one module can be implemented as a rule-based expert system and another module can be constructed as a neural network classifier. On the whole, this modular structure will ease future system expansion and upgrades since a module can be easily added, modified, or removed. (5) Distributed: With the rapid growth of computer networks and distributed systems, network-based intrusion detection is necessary. Accordingly, IIDM will be network-oriented. The intrusion detection sentries will collect and preprocess real-time system audit data or network traffic data, as well as communicate with the communication module in the core component. So, through the communication module, the collected data can be passed to the decision-making module and intrusion detection modules for further analysis, and the evaluation results can be fed back to the sentries. (6) Real-time: In IIDM, the intrusion detection modules, the decision-making module, the communication module, and the intrusion detection sentries will work together to conduct real-time detection, while the machine learning component will work off-line.
3.2 Preliminary Architecture A preliminary architecture for IIDM is shown in Figure 3.1.
17
Network Traffic or Audit Data (1)
Network Traffic or Audit Data (2)
…...
Network Traffic or Audit Data (m)
Machine Learning Component
( by mining fuzzy association rules and fuzzy frequency episodes )
Background Unit
Core Component Intrusion Detection Module 1
Experts
Intrusion Detection Module n+1
…...
. . .
Intrusion Detection Module n’
Anomaly Detection
Intrusion Detection Module n
Administrator
Decision-Making Module
Misuse Detection Communication Module
Server
Clients
Intrusion Detection Sentry 1
Intrusion Detection Sentry 2
Host or Network Device
Host or Network Device
…...
Intrusion Detection Sentry s Host or Network Device
Figure 3.1: Architecture of an Intelligent Intrusion Detection Model
18
The functionality of each unit is briefly explained below: (1) Machine Learning Component: With the purpose of learning rules that are more abstract and less dependent directly on audit data, fuzzy logic will be integrated with association rules and frequency episodes. The machine learning component will automatically learn fuzzy association rules and fuzzy frequency episodes from system audit data or network traffic data for anomaly detection. (2) Anomaly Intrusion Detection Module: This component will evaluate the deviation from normal patterns for an observed audit trail. (3) Misuse Intrusion Detection Module: Based on the knowledge of system vulnerabilities and expert advice, a misuse intrusion detection module can be built to detect known attacks. (4) Decision-Making Module: It has two roles. Given an observed audit trail, it will decide which intrusion detection modules (misuse or anomaly) will be activated. On the other hand, it will also integrate the evaluation results from different detection modules and generate an overall evaluation on the suspiciousness of the observed audit trail. (5) Communication Module: It is the bridge between the decision-making module and the intrusion detection sentries. The observed audit trail that has been preprocessed by detection sentries can be sent to the decision-making module for intrusion evaluation; the feedback can be also returned to the detection sentries.
19
(6) Intrusion Detection Sentry: This component will collect real-time system audit data or network traffic data and do some preprocessing tasks. This is resident at each host or at a host network interface component device.
CHAPTER IV REVIEW OF FUZZY LOGIC AND DATA MINING METHODS Based on fuzzy set theory, fuzzy logic provides a powerful way to categorize a concept in an abstract way by introducing vagueness. On the other hand, data mining methods are capable of extracting patterns automatically from a large amount of data. The integration of fuzzy logic with data mining methods will help to create more abstract patterns at a higher level than at the data level. Decreasing the dependency on data will be helpful for patterns used in intrusion detection. The literature review on fuzzy logic and two data mining methods, association rules and frequency episodes, will be discussed.
4.1 Fuzzy Logic Traditionally, a standard set like S = {a, b, c, d, e} represents the fact that every member totally belongs to the set S. However, there are many concepts that have to be expressed with some vagueness. For instance, “tall” is fuzzy in the statement of “John’s height is tall” since there is no clear boundary between “tall” and not “tall” (Stefik 1995; Hodges, Bridges, and Yie 1996). Fuzzy set theory established by Lotfi Zadeh is the basis of fuzzy logic (Stefik 1995). A fuzzy set is a set to which its members belong with a degree between 0 to 1. For example, S’ = {(a 0), (b 0.3), (c 1), (d 0.5), (e 0)} is a fuzzy set in which a, b, c, d, and e 20
21
have membership degrees in the set of S’ of 0, 0.3, 1, 0.5, and 0 respectively. So, it is absolutely true that a and e do not belong to S’ and c does belong to S’, but b and e are only partial members in the fuzzy set S’. A fuzzy variable (also called a linguistic variable) can be used to represent these concepts associated with some vagueness. A fuzzy variable will then take a fuzzy set as a value, which is usually denoted by a fuzzy adjective. For example, “height” is a fuzzy variable and “tall” is one of its fuzzy adjectives, which can be represented by a fuzzy set (Stefik 1995; Hodges, Bridges, and Yie 1996). A standard fuzzy logic system, FuzzyCLIPS provides several methods to represent a fuzzy set. These include singleton representation, standard function representation, and linguistic expression representation (Orchard 1995). In the singleton representation, a fuzzy set consists of a sequence of points, each of which is associated with a membership degree. Given a fuzzy set {( x1 µ1 ), ( x2 µ 2 ), …, ( xn µ n )} where for all i, 1 ≤ i < n and xi ≤ xi +1 , the two consecutive points will be linked by a straight line (Orchard 1995). Accordingly, the above example of the fuzzy set S’ = {(a 0), (b 0.3), (c 1), (d 0.5), (e 0)} will look like Figure 4.1.
22
1 0.8 0.6 0.4 0.2 0 a
b
c
d
e
Figure 4.1 Singleton Representation of a Fuzzy Set FuzzyCLIPS also provides three standard functions S, PI, and Z to represent fuzzy sets. Their graphical shapes and formal definitions are shown in Figure 4.2 (Orchard 1995).
23
µ≤a
0
S(µ, a, c) =
µ −a 2 c−a
2
µ −a 1 − 2 c−a 1
a