An Intelligent Decision Support System for Intrusion ... - CiteSeerX

To appear in Lecture Notes in Computer Science (publisher: Springer-Verlag) as the proceedings of International Workshop on Mathematical Methods, Models and Architectures for Computer Networks Security (MMM-ACNS), May 21-23, 2001, St. Petersburg, Russia.

An Intelligent Decision Support System for Intrusion Detection and Response Dipankar Dasgupta and Fabio A. Gonzalez Intelligent Security Systems Research Lab Division of Computer Science The University of Memphis Memphis, TN 38152. Email: ddasgupt,[email protected]

Abstract. The paper describes the design of a genetic classifier-based intrusion detection system, which can provide active detection and automated responses during intrusions. It is designed to be a sense and response system that can monitor various activities on the network (i.e. looks for changes such as malfunctions, faults, abnormalities, misuse, deviations, intrusions, etc.). In particular, it simultaneously monitors networked computer’s activities at different levels (such as user level, system level, process level and packet level) and use a genetic classifier system in order to determine a specific action in case of any security violation. The objective is to find correlation among the deviated values (from normal) of monitored parameters to determine the type of intrusion and to generate an action accordingly. We performed some experiments to evolve set of decision rules based on the significance of monitored parameters in Unix environment, and tested for validation.

1. Introduction The problem of detecting anomalies, intrusions, and other forms of computer abuses can be viewed as finding non-permitted deviations (or security violations) of the characteristic properties in the monitored (network) systems [12]. This assumption is based on the fact that intruders' activities must be different (in some ways) from the normal users' activities. However, in most situations, it is very difficult to realize or detect such differences before any damage occur during break-ins. When a hacker attacks a system, the ideal response would be to stop his activity before he can cause any damage or access to any sensitive information. This would require recognition of the attack as it takes place. Different models of intrusion detection have been developed [6], [9], [10], [11], and many IDS software are available for use. Commercial IDS products such as NetRanger, RealSecure, and Omniguard Intruder alert work on attack signatures. These signatures needed to be updated by the vendors on a regular basis in order to protect from new types of attacks. However, no detection system can catch all types of intrusions and each model has its strengths and weaknesses in detecting different violations in networked computer systems. Recently, researchers started investigating artificial intelligence [3], genetic approaches [1], [6] and agent architectures [4], [5] for detecting coordinated and sophisticated attacks. This paper describes the design and implementation of a classifier-based decision support component for an intrusion detection system (IDS). This classifier-based IDS monitors the activities of Unix machines at multiple levels (from packet to user-level) and determines the correlation among the observed parameters during intrusive activities. For example, at user level -- searches for an unusual user behavior pattern; at system level -- looks at resource usage such as CPU, memory, I/O use etc.; at process level -- checks for invalid or unauthenticated processes and priority violations; at packet level – monitors number, volume, and size of packets along with source and type of connections. We developed a Java-based interface to visualize the features of the monitored Unix environment. We used some built-in tools (such as vmstat, iostat, mpstat, netstat, snoop, etc.), syslog files and shell commands for simultaneously monitoring relevant parameters at multiple levels. As the data collector sensors observe the

deviations, the information is sent to the classifier system [7], [8] in order to determine appropriate actions.

2. Monitoring Data and Filtering The behavior-based techniques of detecting intrusion or anomalies usually involve monitoring the network and system parameters continuously over a period of time and collecting information about the network's normal behavior [3], [6], [9]. Accordingly, some parameters of the system are identified as the important indicators of abnormalities. The detection is based on the hypothesis that security violations can be detected by monitoring a system’s audit records for abnormal patterns of the system usage. Our prototype system collects historical data and determines the normal (activities) usage of the system resources based on various monitored parameters. 2.1 Multi-level Parameter Monitoring Our prototype system currently monitors the parameters listed below, some of these parameters are categorical in nature, (e.g. type of user, type of connections) which are represented numerically for interpretation. However, the selection of these parameters is not final and may vary (based on their usefulness) in our future implementation. Various Unix commands are used and filtered the output to get the values of the selected parameters [10]. To monitor user-level activities, the following parameters are recorded as audit trail and analyzed by statistical methods to develop profiles of the normal behavior pattern of users: U1. Type of user and user privileges U2. Login/Logout period and location U3. Access of resources and directories U4. Type of software/programs use U5. Type of Commands used User parameters are collected based upon the user id that started the session. This allows the separation of an account into each session. This is important to allow the detection of a session where a hacker is masquerading as a well-known user. Once the user id of the session is established an audit trail of that session is recorded. The system-level parameters that provide indication of resource usage include: S1. Cumulative per user CPU usage S2. Usage of real and virtual memory S3. Amount of swap space currently available S4. Amount of free memory S5. I/O and disk usage All these parameters provide information of the system resource of usage. These parameters are initially monitored over a period of time to obtain relatively accurate measure of the normal usage of the system there by providing indication of intrusion on the network. Commands used to obtain system level information: − Vmstat: Reports virtual memory statistics. Vmstat delves into the system and reports certain statistics about process, virtual memory, disk, trap and CPU activities. − Iostat: Reports I/O statistics. Iostat iteratively reports terminal and disk I/O activity, as well as CPU utilization. Various process-level parameters monitored to detect intrusion are:


P1. The number of processes and their types P2. Relationship among processes P3. Time elapsed since the beginning of the process P4. Current state of the process (running, blocked, waiting) and runaway processes P5. Percentage of various process times (such as user process time, system process time and idle time). Commands used to obtain process level information: − ps –ef: The ps command prints information about active processes. Without options, ps prints information about processes associated with the controlling terminal. The output contains only the process ID, terminal identifier, cumulative execution time, and the command name. Parameters that are monitored to gather packet-level (network) information: N1. Number of connections and connection status (e.g. established, close_wait, time_wait) N2. Average number of packets sent and received (for an user defined moving time window) N3. Duration of the connection N4. Type of connection (Remote/Local) N5. Protocol and port used Commands used to obtain packet level information: − Netstat - shows network status i.e. displays the contents of various network-related data structures in various formats, depending on the option selected. Intrusive activities in most cases involve external connection into the target network by an outsider, though internal misuse is also crucial. So it is very important to monitor packets sent across the network both inbound and outbound along with packets those remain inside the subnet. Moreover, the number of external connections established and validity of each connection can be verified using these monitored parameters. The main processes for collecting data is a korn shell script that performs system checks and formats the data using awk, sed, cat, and grep and the appropriate parameters are filtered from the output of these commands for storing in a file. 2.2 Setting Thresholds Historical data of relevant parameters are initially collected over a period of time during normal usage (with no intrusive activities) to obtain relatively accurate statistical measure of normal behavior patterns. Accordingly, different threshold values are set for different parameters.

Fig. 1. Showing different threshold levels as a measure of degree of deviations

During monitoring, each parameter is checked for any deviation by comparing current parameter values (at different nodes) with the profile of the normal usage (and behavior pattern). However, the threshold settings of the monitored parameters need to be updated to accommodate legitimate changes in the network environment. Figure 1 shows an example of time series where the variations in the data pattern indicate the degree of deviation from the normal. It is to be noted that we used different threshold values for different parameters and allowed enough clearance to accommodate legitimate variations in system usage. Setting of the thresholds for determining deviations, as a function of alert level is tricky. The detection system should not be alerting unnecessarily for trivial circumstances, but on the other hand, should not overlook real possibilities of serious attacks. Each parameter at different level is quantified and encoded by two-bit to represent a value between 0-3 as degree of deviation as shown in Table 1. Table 1. Binary Encoding of Normalized Parameter values

0 1 2 3

00 01 10 11

Normal Minimal Significant Dangerous

3. Designing Classifier-based Decision Support Module It is very difficult to develop intelligent decision support component for intrusion detection systems, as uncertainties and ambiguities. The best approach may be to design an evolvable system that can adapt to environment. A classifier system is an adaptive learning system that evolves a set action selection rules to cope with the environment. The condition-action rules are coded as fixed length strings (classifiers) and are evolved using a genetic search. These classifiers are evolved based on the security policy – this rule set forms a security model with which the current system environment needs to be compared. In our approach, the security policies are embedded while defining the normal behavior. In a classifier rule, the condition-part represent the amount of deviation (or degree of violation) in the monitored parameters from the normal values (0-3) and the action-part represent a specific response (0-7) according to the type of attack. Figure 2 shows different components of the prototype system. The data fusion module combines (discussed in section 3.2.2) the parameter values and put as an input template to the classifier system.

Decision Support Subsystem

Fig. 2. Different modules of the classifier-based intrusion detection system


3.1 Creating a High-Level Knowledge Base The degree of importance of each level (of monitored parameters) is hypothesized based on the domain knowledge. The purpose is to generate rules from a general knowledge base designed by experts. Though the accuracy of this knowledge base will result in more realistic actions, the heuristic rule set that we used can provide similar detection ability. Table 2, gives an example of such a knowledge base, a higher value of action indicates stronger response. Table 2. Estimates of affects of intrusive activities at different levels and proposed actions.

Hypothesis Number 1 2 3

User Level 0.2 0.4 0.0

System Level 0.0 0.0 0.1

Process Level 0.0 0.0 0.2

Packet Level 0.1 0.4 0.8

Action 1 3 6

Symbolically, each hypothesis is a 5-tuple (k1, k2, k3, k4, a), where ki∈[0.0, 1.0] and a∈ {0,…,7}. Suppose that the input template to the classifier contain the message (m1,m2,m3,m4), this messages match the hypothesis rule (k1,k2,k3,k4,a) if and only if: k1