Fixed vs. variable-length patterns for detecting suspicious ... - CiteSeerX

2 downloads 156 Views 191KB Size Report
May 11, 2004 - els the way an application or service running on a machine normally behaves by registering ... to study the behavior of the sendmail daemon.
Fixed vs. variable-length patterns for detecting suspicious process behavior Andreas Wespi∗, Herv´e Debar, Marc Dacier and Mehdi Nassehi IBM Research Division, Zurich Research Laboratory, S¨aumerstrasse 4, CH-8803 R¨ uschlikon, Switzerland {anw,dac,mmn}@zurich.ibm.com [email protected] May 11, 2004

Abstract This paper addresses the problem of creating patterns to model the normal behavior of UNIX processes. The pattern model can be used for intrusion-detection purposes. First, we present methods to generate input data sets that enable us to observe the normal behavior of a process in a secure environment. Second, we propose various techniques to derive either fixed-length or variable-length patterns from the input data sets. We show the advantages and drawbacks of each technique, based on the results of the experiments we have run on our testbed.

1

Introduction

In [6] Forrest et al. introduced a change-detection algorithm which is based on the way natural immune systems distinguish “self” from “nonself”. In [5] they reported preliminary results aimed at establishing such a definition of self for Unix processes. This technique models the way an application or service running on a machine normally behaves by registering the sequences of system calls invoked. An intrusion is assumed to exercise abnormal paths in the executable code, and is detected when new sequences are observed (see also [3, 4, 7, 8]). Like every behavior-based technique, an intrusion-detection system needs to be trained to learn what the “normal” behavior of a process is. This is usually done by recording the activity of the process running in a real environment during a given period. This procedure has several drawbacks: ∗

Corresponding author.

1

• It has the potential to produce false alarms if not all possible normal behaviors have been exercised during the recording period, for example when users execute “new”, i.e. previously unseen, commands; • It has the potential to produce no alarm when an attack is performed if the application was hacked during the recording period and the attack pattern was learned by the system; • The observed behavior is a function of the environment in which it is running and thus prevents the distribution of an “initialized” intrusion-detection system that could be immediately plugged in and activated. Another approach consists of artificially creating input data sets that exercise all normal modes of operation of the process. For example, Forrest et al. [5] use a set of 112 messages to study the behavior of the sendmail daemon. This method eliminates the risk of obtaining false positives but not that of obtaining false negatives. Furthermore, it is an extremely complex and time-consuming task to come up with a good input data set. We make use of the functionality verification test (FVT) suites provided by application developers to observe all the specified behaviors of an application [1]. Not only are we able to eliminate the risk of obtaining false positives with this approach, we also dramatically reduce the risk of obtaining false negatives because, by design, our input data set exercises all the normal (in the sense of having been tested by the designer of the application) modes of operation of the application under study. We present the results of experiments we have been running with this technique. Besides the use of the FVT suites, our work differs from that of Forrest et al. in three main ways: • The algorithm used to create fixed-length patterns is slightly different and results in shorter tables of patterns. • We consider in addition the opportunity of using variable-length patterns [14] and various ways of doing this. • The decision to raise an alarm is based on a new paradigm that is simpler than the Hamming distance described in [5], thus allowing possible real-time countermeasure. The structure of the paper is as follows. Section 2 presents the principles of the intrusion detection method we are working with. It explains how patterns are generated and used to cover sequences. Section 3 shows how we create the reference information by either recording and replaying user sessions or by using the FVT suites. Section 4 presents the experimental results obtained when applying a fixed-length pattern-generation algorithm. Section 5 does the same for a variable-length pattern-generation algorithm. These two sections also highlight advantages and disadvantages of each technique. Section 6 concludes the paper by offering a comparison of the results as well as ideas for future work.

2

2

Principles of the approach

In our approach, UNIX processes are described by the sequences of audit events 1 [15] that they generate, from start (fork ) to finish (exit). Their normal behavior is modeled by a table of patterns which are sub-sequences extracted from these sequences. The detection process relies on the assumption that when an attack exploits vulnerabilities in the code, new (i.e. not included in the model) sub-sequences of audit events will appear. Figure 1 describes the complete chain used to configure the detecting tool. TRAINING SYSTEM Filtering Reduction Aggregation

Translation

FTP Server

PROC_create FILE_close FILE_close FILE_open

...DAABGEABCDDAEFG...

Pattern Extraction

...DABGEABCDAEFG...

MODEL: Table of known patterns

OFF-LINE

ABCD ABGE AEFG

ON-LINE

INTRUSION DETECTION SYSTEM Filtering FTP Server

Translation Aggregation

PROC_create FILE_close FILE_close FILE_open S_PASSWORD_READ TCPIP_connect

...AABBGGEABCDDADECFG... ...ABGEABCDAECFG...

Pattern Matching ABGE -> matched ABCD -> matched AECFG -> alarm

Figure 1: Intrusion-detection system.

2.1

Off-line treatment

The upper part of Fig. 1 represents how the model of normal behavior is created off-line. Audit events are recorded from the ftp daemon, which has been triggered by an experiment process, and translated as letters for easier handling. Such a letter in our system actually represents the combination of the audit event and the name of the process generating it, both 1

Note that Forrest et al. [5, 6] use sequences of system calls instead of audit events. Obtaining system calls constitutes a more intrusive technique than the one we present here.

3

pieces of information being provided by the Unix C2 audit trail. This recording runs through a filtering and reduction process whose purpose is explained below. When the experiment has been completed, the entire audit information is used to generate the table of patterns that constitute our model of the normal behavior of the system. The purpose of the filtering/reduction/aggregation box is threefold (explained in more detail in the remainder of the section): filtering to eliminate irrelevant events (processes that are not related to the services we monitor) and to sort the remaining ones by process number; reduction to remove duplicate sequences due to processes generating exactly the same stream of audit events from start (fork ) to finish (exit)2 ; and aggregation to remove consecutive occurrences of the same system calls. Sorting prevents the introduction of arbitrary context switches into the audit stream by the operating system. Processes are sorted so that the intrusion-detection process is applied to each process individually. The events inside the process remain in the order in which they have been written to the audit trail. The reduction keeps only unique process images for model extraction. We use test suites to record the normal behavior of the ftp daemon. These test suites carry many repetitive actions and therefore result in many instances of the same process image (see [1] for a complete explanation of this issue). We remove these duplicates because they are not used by the current algorithm extracting the patterns from the reference audit data. The aggregation comes from the observation that strings of N consecutive audit events (N > 20) are quite frequent for certain events, with N exhibiting small variations. A good example is the “ftp login” session, where the ftp daemon closes several file handles inherited from inetd. The number of these closes changes for no apparent reason. Therefore, we have simplified the audit trail to obtain a model that contains fewer and shorter patterns, and have observed no differences in experiments that do or do not use the aggregation. There is no claim that the reduced (simplified) audit trail and the original one are equivalent; the new one possibly has less semantic content. This is an experimental choice, and we would remove the aggregation part if the true/false alarm performance of the intrusion-detection system were not satisfactory. This aggregation phase would also be omitted if the intrusion-detection technique requires redundancy, e.g. neural networks. The most obvious aggregation consists of replacing identical consecutive audit events with an extra “virtual” audit event. However, doing so enriches the vocabulary (the number of registered audit events) and possibly the number of patterns, which we want to keep small. Our solution simply aggregates these identical consecutive audit events. Therefore, any audit event represents “one or more” such events (i.e., A = A+ in regular expression formalism) at the output of the aggregation box. The pattern-extraction procedure is discussed in detail in Sections 4 and 5. 2

This is not used in the on-line version because of the memory cost and algorithmic complexity of keeping all known process images.

4

2.2

Real-time detection

The lower part of Fig. 1 shows the real-time intrusion-detection process. Audit events are again generated by the ftp daemon, and go through the same sorting and reduction mechanism in real time. Then, we apply a pattern-matching algorithm (Section 2.3) to cover the sequences on the fly. When no known pattern appropriately matches the current stream of audit events, a decision has to be taken whether to raise an alarm (see Section 2.4).

2.3

Pattern-matching algorithm

The pattern-matching algorithm is quite critical for the performance of the intrusiondetection system. We wish to maximize both its speed and detection capabilities. Therefore, we require that patterns match exactly, i.e., that they cannot be matched as regular expressions using wildcards. We use a two-step algorithm: the first step looks for an exact match, and the second step looks for the best partial match possible. These two steps are illustrated with the examples presented in Figs. 2 and 3. Table ABCD XYZD ABC GHI XYZ HF

String:

ABCABCDXYZGHI

ABC Selectable

ABCD XYZD ABC

Remainder of string: ABCDXYZGHI ABCD

Remainder of string: XYZGHI ABCD XYZD ABC GHI XYZ

Remainder of string: GHI

ABC Validated

ABCD XYZD ABC GHI

ABC validated: 3 consecutive patterns yield exact match Figure 2: Exact pattern-matching sequence. The first step of the algorithm is illustrated in Fig. 2. A pattern that matches the beginning of the string is selected from the pattern table. If no pattern matches the beginning of the string, then the first event is counted as uncovered and removed. The algorithm is subsequently applied to the remainder of the string. Once a selectable pattern has been found, the algorithm will recursively look for other patterns that exactly cover the continuation of the string up to a given depth D or to the end of the string, which ever comes first (D = 3 in Fig. 2). This means that, for a selectable pattern to be chosen, we must find D other patterns that also completely match the events following that pattern in the sequence. If such a sequence of D patterns does not exist, step 2 of the algorithm is used. The rationale behind this is that we want to know whether the selected pattern has a positive influence on the coverage of the audit events that are in its vicinity. When such a D-pattern sequence has been found, the algorithm removes the head pattern from the sequence and continues with the remainder of the string. 5

Table ABCD XYZD ABC GHI XYZ HF

String:

ABCABCDKMWHF ABCD XYZD ABC

Remainder of string: ABCDKMWHF ABC length 6 ABCD length 7 K M W

SELECTED

Shifted Shifted Shifted

Uncovered sequence KMW Figure 3: Approximate pattern-matching sequence.

If we cannot find three other patterns that match the sequence after the selected pattern, we deselect it and try the next selectable pattern in the table. The fact that there were selectable patterns – but none that fulfill the depth requirement – triggers the second phase of the algorithm. The second step of the algorithm deals with failure cases and is illustrated in Fig. 3. The algorithm looks for the sequence of N patterns that covers the largest number of audit events. This sequence is removed from the string and the algorithm goes back to its first part, looking again for a selectable pattern, and shortening the string as long as one is not found.

2.4

Intrusion detection

For each string, the algorithm extracts both the number of groups of audit events that were not covered and the length of each of these groups. The decision whether to raise an alarm is based on the length of the uncovered sequences. It is worth noting that this is different from the measures used for the same purpose by Forrest et al., namely the amount and the percentage of uncovered events in a string [5]. There are three reasons for this: • Processes can generate a large number of events. An attack can be hidden in the midst of a set of normal actions. Using the percentage of uncovered characters would fail to detect the attack in this case. • Small processes could be falsely flagged as anomalous because of a few uncovered events if we use the percentage of uncovered characters. • Long processes could be falsely flagged as anomalous because of many isolated uncovered events if we use the amount of uncovered characters. 6

During our experiments we observed that the trace of the attacks was represented by a number of consecutive uncovered characters. This is consistent with the findings of Kosoresow and Hofmeyr [8], who note that mismatches due to intrusions occur in fairly distinct bursts in the sequences of system calls that they monitored. This means that an attack cannot be carried out in fewer than T consecutive characters. Our experiments led us to choose the value 7 for T . In other words, each group of more than six consecutive uncovered events is considered anomalous and is flagged as an attack. This amounts to defining an intrusion as any event that generates a sequence of at least T consecutive events not covered by the algorithm defined above. To validate the concepts presented here, we have developed an intrusion-detection testbed [2]. It is used to compare various strategies to build the pattern table used in the detector. Before reporting the results of the experiments performed on the testbed, we address the problem of generating the input data sequences for the intrusion-detection system. We focus on the ftp service, which is widely used, is known to contain many vulnerabilities (either due to software flaws or configuration errors), and provides rich possibilities for user interaction.

3

Generating the reference data

The most commonly practiced method for training behavior-based intrusion-detection systems is to record the activity of the users on a normal workday. This recording is used as the reference information from which the process model is extracted during the training phase. In Section 3.1 we present an improvement of this method. On our testbed we can reproduce the reference information and record it under varying external conditions. Our recording procedure ensures that the reference information is attack-free. Section 3.2 describes various methods to systematically generate all possible user activities. We concentrate on a method that uses the functionality verification test (FVT) suite to exercise all subcommands of a given process.

3.1

Experimental reference data generation

The first approach we present to create the reference data is to simulate user sessions. We wish to do the recording in a closed environment to ensure that no attacks against the ftp server take place. We achieve this goal by asking several participants in our laboratory to record ftp sessions against an internal ftp server supporting users and anonymous connections. The ftp server is then reconfigured inside a closed network to prevent interactions with the outside world. Once this recording process is complete, we automatically transform the recordings into simple expect [9] scripts, which recreate the interactive sessions. User-related information such as username, password, and target ftp server is variable to allow the parallelization of the scripts. Additional scripts are added to the database to perform specific actions, such as using tar and compress to transfer files, that were not in the original configuration of the recordings. 7

This need comes from the attack simulator. As some attacks involve the use of tar and compression programs to create setuid shells, our normal user activity also has to invoke these commands. We performed a number of experiments to record the reference behavior from which the model is extracted. These experiments involve (1) varying the load on the server (number of simultaneous connections), (2) changing the network interface (Ethernet or Token-Ring), and (3) selecting different subsets of scripts. We separate our user scripts into four groups: all the scripts that we have in the database; the scripts involving normal user behavior; the scripts involving anonymous user behavior; and the scripts involving normal user behavior but without those that have been added to include the usage of tar and compress (via the site exec command). The experimentation results are presented in Table 1. The first column describes the experiment, the second gives the total number of scripts run, the third the total number of audit events generated, the fourth the total number of processes created (output of the filtering), the fifth the number of unique processes generated (output of the reduction), and columns six to eight the number of unique processes generated with aggregation levels of 3, 2, and 1, respectively. An aggregation level of N , N = 1, 2, 3, means that N or more identical audit events are aggregated to exactly N such events (e.g., AAA = AAA+ for N = 3). Table 1: Collection of user data.

Experiment description All scripts on Ethernet interface All scripts on Token-Ring interface All scripts simultaneously on both Simulation of light load Simulation of heavy load All scripts Normal user scripts Anonymous user scripts Normal user scripts w/o “site exec”

No. of scripts

Number of audit events in the ref. set

55 55 110 200 600 500 500 500 500

111521 105064 201357 494126 1082548 849095 299048 1038129 1118403

No. of processes after filtering and reduction and aggregation 3 2 1 430 430 860 1604 4583 3931 4008 3764 4178

70 70 70 70 70 72 25 49 40

70 70 70 70 70 72 25 49 40

70 70 70 70 70 72 25 49 40

69 69 69 69 69 71 24 49 40

Table 1 shows that the number of processes generated is in direct proportion to the number of scripts run. When we look for unique processes, we always find the same number of different processes (70). Aggregation does not reduce this number significantly. This is radically different from observations (in the LINUX environment) made using the strace command which monitors at the system call level. There, we observed variances in the number of different processes generated from experiment to experiment, mainly because of operating system context switches and signals. This shows that even a low-level information 8

source such as the C2 audit trail gives some abstraction to the data, such as eliminating context switches, signals, and interrupts.

3.2

Systematic reference data generation

Employing user-“controlled” manipulations to generate the reference behavior of the service is not optimal. One requirement is that in addition to being attack-free and reproducible, the reference behavior must contain all the possible legitimate commands supported by the ftp service, and that these commands must be received by the ftp server from the network interface. A second requirement is that these commands are either generated by an ftp client or directly “manufactured”, for example, by a packet-spoofing tool. 3.2.1

Software engineering and source-code analysis

The first possibility to generate the reference data is to extract the complete call graph [10] of the code considered. This process is very costly, and is only possible by acquiring source code for the service investigated. Given the increasing complexity of programs and the fact that many system calls would actually happen inside the libraries for which source-code may not be available, we have eliminated this course of action because it is too cumbersome. Moreover, this technique does not fit our client-side requirement. 3.2.2

Macroscopic simulation as a finite-state automaton

This second approach is based on a finite-state automaton that implements all possible actions that can be performed by a user. Generating normal behavior simply means running the automaton. However, only a very limited subset of these commands is actually known to users and used by them. Therefore, it would make sense to enrich the automaton with transition probabilities. This approach is very difficult to implement for three main reasons: • The values of the transition probabilities are difficult to determine, and specific to the experimental environment. • This approach is only suitable when the set of commands is limited. It can simulate an ftp client but not a normal user shell session. • Even in simple cases such as ftp clients, implementation problems can arise. Indeed, ftp clients on different platforms sometimes have a different syntax, and the commands they generate make the ftp server react in different ways. These difficulties encountered in modeling ftp activity also apply to other services. Therefore, this approach was dropped from the current implementation of the workbench.

9

3.2.3

Approximating user behavior with test suites

A simpler approach consists of using the test suites provided by the operating-system development laboratories. In our case, we obtained access to the test suite used by the AIX development team. This test suite, called functionality verification tests (FVT), tests the compliance of the implementation of the TCP/IP services with their specification from the functional point of view, i.e. it tests the server functionality using the client in the case of ftp. The FVTs exercise all possible commands a user can send to an ftp server. These FVTs do not offer a realistic view of what users usually do, but at least they show how the system should behave when responding to normal requests. Actually, these test cases implement very simple scripts to test one command at a time. In that sense, they can be considered as traveling a subgraph of the user sessions where the length of the path would not exceed one or two commands between login and logout. This approach is also interesting to evaluate the false-alarm rate of knowledge-based intrusion-detection systems. Because the FVT will make the network daemons generate all sorts of events that do not (or very infrequently) occur in real-life systems, these events can be confused with attack scenarios. In this case, every alarm generated by a knowledge-based intrusion-detection system is a false alarm. If the intrusion-detection system has rules governing the normal behavior of the network service, running such a test suite allows us to check the completeness of the normal behavior model and also to interpret alarms generated by the intrusion-detection system as false alarms. However, the capability to fully train a behavior-based intrusion-detection system on this kind of data is still an open issue, as the profiles may contain behavior rarely exercised by the users.

3.3

Experimentations with the FVT

This section shows how to use the FVT data to record different reference information sets, from which process models are then generated. Recording the FVT sessions was done by simply running all available tests. Table 2 presents the same results as Table 1 does, but for the FVT data collection. Three separate runs of the FVT were made to compare the resulting reference information. Data collection took place on an isolated network, with as little interference as possible from other applications. The first observation drawn from Table 2 is that 77 unique process images for 487 tests is very small. In fact, many of these tests are redundant, i.e. they transfer exactly the same file in different modes (e.g. ascii, ebcdic, struct, record). The basic sequence (login, set parameters, transfer, logout) is repeated several times, but the parameter changes affect only the network level, and not the system. Therefore, we do not observe changes in the audit trail. The second observation is that, contrary to the user-simulation case, aggregation reduces the number of different processes in a significant way. Even with these very similar processes, the recording of audit events is sometimes still not completely identical. In particular, the 10

Table 2: Collection of FVT data. Experiment description: Sequential run of all available FVT tests

No. of scripts

Number of audit events in the ref. set

487 487 487

87793 87813 89128

Run 1 Run 2 Run 3

No. of processes after filtering and reduction and aggregation 3 2 1 401 401 401

77 73 71

73 68 67

65 62 61

57 58 56

login sequence depends on the number of open file handles handed to the ftpd daemon by the inetd daemon. We extract the model of behavior from these process images, which we regard as our reference information.

4 4.1

Generating fixed-length patterns Pattern generation

Fixed-length patterns are the most immediate approach to generate the table of patterns. We have evolved four ways of obtaining these patterns. For each of them, we use a parameter L, the length of the patterns. Technique 1 is an exhaustive generation of all possible patterns by sliding a window of size L across the sequence, thus recording each unique L-event sequence. This technique is easy to implement, but it has obvious drawbacks. First, a large number of patterns in the reference table will not be used by the matching algorithm on the reference data set. Moreover, the covering algorithm generates boundary mismatches when the length of the process image is not a multiple of L, because it uses a juxtaposing window instead of a sliding one (which would not make sense in this context). It is worth noting that this is the technique used to run the experiments described in [4, 5, 8]. To solve the first problem, technique 2 is introduced, which shifts the string by L events, thus creating juxtaposed patterns instead of overlapping ones. In other words, the events in positions 1 to L constitute the first pattern, events in the positions L+1 to 2L constitute the second one, and so on . . . The table we obtain is thus much shorter than the table generated by technique 1, which makes the pattern-matching algorithm faster. Technique 2 drops the remainder of the sequence of events when its length is not a multiple of L, therefore the second problem mentioned remains. To solve it, we have introduced techniques 3 and 4. They are similar to technique 2 but keep the events remaining at the end of the string as a stand-alone pattern or concatenated to the last generated pattern, respectively. Techniques 3 and 4 both result in 100% coverage of the test sets. As they are the ones that were most promising during the experiments, we 11

focus on them for the remainder of this paper. Note that some of the patterns are smaller or larger than the specified size, and these can only match at the end of the process execution, as they necessarily contain the end-of-process event.

4.2

Experimental results

We have used three test sets in our experiments. The first test is solely based on the FVT. The second test set includes additional information on tools that are used by ftp users, but are not in the testing of the ftp daemon per se. The third data set is based on the recording of several hundred real user sessions. The experimental results are presented for the second test set. The reference table generated from the first set does not contain patterns for commands like tar, which are used by normal users. Using the third set breaks the rule fixed in the introduction of the paper (no user recording for training), thus this set has only been used to evaluate the false negative rates obtained with our approach. The output of the intrusion-detection system is shown in Fig. 4. Each line reports the analysis for one process related to the ftp service. |Number of |audit |events | | | 569 | 97 | 928 | 735 | 6 | 878 | 839 | 8 | 963 | 168 | 476 | 366

|Number of |uncovered |events | | | 4 | 0 | 3 | 3 | 0 | 8 | 4 | 6 | 2 | 2 | 1 | 3

|Percentage |of covered |events | | | 99.30 | 100.00 | 99.68 | 99.59 | 100.00 | 99.09 | 99.52 | 25.00 | 99.79 | 98.81 | 99.79 | 99.18

|Number of |patterns |needed | | | 188 | 32 | 308 | 244 | 2 | 290 | 278 | 1 | 320 | 55 | 158 | 121

|Average |size of |pattern | | | 3.01 | 3.03 | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | 2.00 | 3.00 | 3.02 | 3.01 | 3.00

|Number of |groups of |uncovered |events | | 4 | 0 | 3 | 3 | 0 | 8 | 4 | 1 | 1 | 2 | 1 | 3

|Length of |each group | | | | 1 1 1 1 | | 1 1 1 | 1 1 1 | | 1 1 1 1 1 1 1 1 | 1 1 1 1 | 6 | 2 | 1 1 | 1 | 1 1 1

Figure 4: Output of the intrusion-detection system. Table 3 shows the results of the experiments to monitor our ftp server under various conditions using the intrusion-detection system loaded with several reference tables. The first column identifies the experiment. The second column lists the number of audit events in the experiment. The third column indicates the two parameters corresponding to the reference table used in the intrusion-detection system, t denotes the generation technique (3 or 4) and s the size of the patterns (2, 3 and 4). The fourth column lists the overall number of uncovered characters. The fifth column lists the number of patterns used for the coverage. The sixth column lists the average size of patterns3 , and the seventh column indicates the number of false alarms triggered by the intrusion-detection system. As described in Section 2.4, alarms are raised if there are more than 6 uncovered events in a row. 3

This value is truncated to two places after the decimal point.

12

Table 3: Experimental results for normal user sessions. Simulation description

Number of audit events

Technique and size

Number of uncovered events

Number of patterns required

Average size of patterns

Number of false alarms

Simulation of user activity on loaded ftp server

26148

t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

11 185 437 12 196 450

13053 8652 6440 13015 8611 6380

2.00 3.00 3.99 2.01 3.01 4.03

0 0 4 0 0 4

Sequence of all available user simulations

26094

t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

11 199 444 12 213 511

13029 8621 6431 12985 8586 6362

2.00 3.00 3.99 2.01 3.01 4.02

0 0 4 0 0 4

Normal user sessions

20135

t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

9 108 257 10 116 264

10052 6675 4978 10025 6643 4936

2.00 3.00 3.99 2.01 3.01 4.03

0 0 1 0 0 1

Anonymous user sessions

6067

t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

2 79 180 2 82 186

3028 1994 1477 3013 1983 1455

2.00 3.00 3.99 2.01 3.02 4.04

0 0 3 0 0 3

Normal user sessions not using “site exec”

19439

t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

9 108 257 10 116 264

9705 6443 4802 9685 6416 4769

2.00 3.00 3.99 2.01 3.01 4.02

0 0 1 0 0 1

13

Table 3 show that our intrusion-detection system has the potential to perform well without generating false alarms. Indeed, for patterns of length 2, independent of the technique used (3 or 4), the intrusion-detection system does not generate false alarms. For patterns of length 4, independent of the technique used, the intrusion-detection system generates false alarms. For patterns of length 3, the generation of false alarms is dependent on the technique used for the generation. With technique 3, which introduces more patterns but also more freedom (some patterns can be smaller than the required length of 3), the intrusion-detection system does not generate false alarms, whereas it will generate false alarms if technique 4 is used. It is also worth mentioning that if we use the third test set, which is closely based on user-behavior observations, instead of the second one, we observe no false alarms for any of the combinations (pattern size, technique). However, using user-generated data to train the intrusion-detection system breaches the requirements expressed in the introduction. To evaluate the risk of not detecting an attack 4 , Table 4 shows the number of valid sequences of six audit events that can be created out of the table (six are there because in this case T = 7). This is the most obvious measure, but it does not account for the full risk, as an attack could be the combination of legitimate sequences with short illegal ones. The table clearly shows that a small increase in the length of the pattern dramatically reduces the degree of freedom in the table, and we consider this a very important result. Also, the technique used to generate the patterns has a dramatic impact on the number of combinations, as small patterns (typically length 1, potentially generated by the use of technique 3) significantly increase the probability of covering attacks, i.e. of not detecting them. Table 4: Statistics on fixed-length tables. Technique and size t=3, t=3, t=3, t=4, t=4, t=4,

s=2 s=3 s=4 s=2 s=3 s=4

Size of table Size of patterns 87 114 152 86 113 147

{1, 2} {1, 2, 3} {1, 2, 3, 4} {2, 3} {3, 4, 5} {4, 5, 6, 7}

Number of legitimates combinations of length < T (6) 1142100 11559 590 456456 9905 143

We conclude that a tradeoff has to be made between short and long patterns. On the one hand, long patterns reduce the risk of not detecting attacks. On the other hand, long patterns result in larger pattern tables, therefore reducing the performance of the pattern-matching process, and are prone to generate false positives. 4

By this, we denote the possibility of running an attack that would create a new sequence of audit events without being detected. This is possible if the new sequence consists of sub-sequences that are valid, i.e. that are present in the reference table.

14

We have also carried out a number of attacks against our ftp server to validate our approach. Table 5 shows our experimentation results for nine attacks. An X in the column indicates that the attack has been detected. Table 5: Experimental results for attack detection. Attack description put forward put rhost site copy site exec copy ftpd site exec copy ls site exec ftpbug ftpd tar exec copy ftpd tar exec copy ls tar exec ftpbug ftpd tar exec ftpbug ls

t=3, s=2 X

t=3, s=3 X

t=3, s=4 X

t=4, s=2 X

t=4, s=3 X

t=4, s=4 X

X X X X

X X X X

X X X X

X X X X

X X X X

X X

X X X

X X X

X X

X X X

X X X X X X X X

Table 5 reveals that three attacks pose a problem to the intrusion-detection system, but for very different reasons. The “put rhosts” attack consists of putting a .rhosts file in the home directory of the ftp user. This vulnerability results from a misconfiguration of the ftp service because this directory should obviously not be world writable. Actually, the “put forward” and the “put rhosts” attacks use the same vulnerability, except that one of them puts a .forward file, and the other one a .rhosts. The first one is detected because our attack script activates the attack by sending a mail to the ftp user, thus triggering the execution of the .forward file by the sendmail daemon. The second attack is not triggered because our attack script does not log on to the attacked machine. Any login attempt would be detected immediately because a shell would be spawned for the ftp user, and this behavior does not belong to the reference data set. The two “tar exec copy” attacks use the option of GNU tar to specify a compression program that will be used by tar to uncompress the data before extracting it. In our case, we have developed a Trojan horse that does no decompression, but copies /bin/sh to /tmp/.sh as a setuid executable. This attack exploits the fact that some ftp daemons do not release their root privileges quickly enough before forking other processes. Therefore the tar and the Trojan horse both run as root and the /tmp/.sh is created suid root. Our Trojan horse is compiled and saved under the name of either ls or ftpd (hence the code name of the attacks). The attacks are only detected by the more restrictive table of patterns of length 4. We observe that the patterns modeling the normal behavior of an ls process in our reference table are more restrictive than those modeling ftpd, as the masquerading of our attack program under the process name “ls” is detected by (t=4, s=3) but not the masquerading under the process name “ftpd” with the same pair (t=4, s=3). 15

This second case also highlights one of the difficulties of this intrusion-detection technique: the reference data set must be truly restrictive and exhaustive, so that the patterns can faithfully represent the information. In our case, tar is not a command that is tested by the test suites we use to generate the reference data set because it is not considered to be strictly related to ftpd. We have added some examples of usage of the tar command to our reference data set, but forking commands is a legitimate behavior of the tar command, and both the ftpd and ls patterns cover the execution of our Trojan horse. Lowering the threshold L would improve detection using the smaller reference tables, but it would also increase the false alarm rate.

5

Generating variable-length patterns

5.1

Rationale for variable-length patterns

The fixed-length patterns approach yields interesting results, but a careful look at the sequence of audit events shows that there are very long sub-sequences that repeat themselves frequently. For example, more than 50% of the process images we have obtained start with the same string. This string contains 40 audit events. Therefore, having these very long sequences would better represent our data. This is intuitively correlated by the fact that the ftp daemon responds to commands given by the user, and that each command can probably be represented by a (set of) long sequence(s) of audit events. Therefore, having the capability to extract variable length patterns from our data makes sense [14]. However, we also are looking for an automated approach. Manual pattern extraction might be feasible for small-scale experiments, but is very time consuming. Therefore, we have looked for a method that will allow us to generate variable-length patterns.

5.2

Description of the pattern-generation technique Sample

SUFFIX TREE CONSTRUCTOR

Pruning factor

Suffix Tree

PATTERN PRUNER

Sample

Pattern Tree

PATTERN SELECTOR

Reduced Tree

Figure 5: Variable-length pattern generation. A block diagram of the variable-length pattern generator is shown in Fig. 5 [11]. It consists of a suffix-tree constructor, a pattern pruner, and a pattern selector. The suffix-tree 16

constructor processes the sample of user behavior as a sequence of symbols, i.e., a string to generate its suffix tree. The well-known suffix-tree structure is a dictionary of all substrings in the string [13]. An example of a string and its corresponding suffix tree is given in Fig. 6a. The suffix tree initially consists of only a root. Then all suffixes of the sample string are added one by one until the complete suffix tree is obtained. Note that a “$”, representing an end delimiter, is added to the end of the string to ensure that the suffix tree contains one leaf corresponding to each suffix. Every node corresponds to the substring represented by the symbols on the path from the root to that node. The number next to the node represents the number of occurrences of the substring in the sample string. Sample of normal behavior: ACABACBDBABD$

B

ACBDBABD$

B

D$

4

A

12

2

C

A B

2

2

2

C D

2

2

C

2

A

2

A

B

2

C

2

A

2

D

2

4

BDBABD$ BD$ 12

CBDBABD$

B

4 D

4

A

ABACBDBABD$

2

4 D

BABD$ $

C

ABACBDBABD$

D

2

12

B

4

2

BDBABD$ BABD$

2

$

(a) Suffix tree

(b) Pattern tree

(c) Reduced tree

Figure 6: Pattern extraction. The pattern pruner prunes the suffix tree in order to generate a tree, referred to as a pattern tree. The pruning is done based on the pruning factor, which is a real number between 0 and 1. This factor is multiplied by the length of the sample of user behavior to obtain a lower limit on the occurrences of patterns. Every pattern with a number of occurrences less than this minimum is clipped, i.e., all its outgoing branches are removed. Figure 6b shows the pattern tree when the suffix tree is pruned with a pruning factor of 3/12. The number of occurrences of each pattern in the sample string is indicated next to the node corresponding to that pattern. As the branching factor is increased, the suffix tree is pruned more and, hence, the pattern tree contains fewer and shorter patterns. We can use this flexibility to try different sets of patterns and choose the one that results in the most satisfactory detector 17

performance. From the set of pruned patterns the selector removes those patterns that are not required for covering the sample of user behavior. In our example, patterns AB, AC, BA, BD in the pattern tree in Fig. 6c are sufficient to cover the sample of user behavior; patterns C, D are not needed. The selector identifies the needed patterns by processing the sample with a pattern matcher identical to that of the detector. First, it loads the matcher with the set of pruned-pattern tree and sets all the nodes to not marked. It then performs matching on the sample. For every matched-pattern event generated by the matcher, it marks the corresponding pattern in the pruned-pattern tree. After the entire sample has been processed, the marked patterns are selected as the set of patterns to be used by the detector. Note that the matcher used here is different from and much simpler than the matcher of the intrusion-detection system.

5.3

Experimental results

Using the second test set, we have built three reference tables of patterns. The first reference table contains all the possible sequences extracted from the pattern tree,5 denoted all patterns in Table 6. The second reference table contains only the sequences going from the root to the leaf in the pattern tree,6 denoted leaf patterns in Table 6. The last one contains only the sequences going from the root to the leaf in the reduced tree 7 (see Fig. 6). The statistics for the reference tables are given in Table 6, according to both the kind of reference table and the pruning factor used in the tree generation. The information still has to be compared with the complete number of combinations that we can derive from all the registered pairs (application, audit event) in our intrusion-detection system, which yield 562 ∗ 554 combinations8 . These reference tables actually contain patterns from length 1 to 50. This approach identifies the long sequences that we have found by browsing the data sets. However, they are less restrictive than the ones obtained with fixed-length patterns, owing to the presence of very small patterns. This unexpected result is a consequence of the presence in the table of numerous patterns of length one, along with much longer patterns, in the table. This is the reason we are now working to improve the variable-length pattern tables by introducing a lower limit for the size of the patterns. Lowering the pruning factor results in longer patterns, but the very small ones are still required to obtain a good match, thus the number of legitimate combinations is still very large. Another promising approach to build pattern tables that contain a good mixture of long and short patterns is based on the Teiresias pattern discovery algorithm [12], an algorithm initially developed for the discovery of rigid patterns in unaligned biological sequences. The principle applied there is first to generate all maximal-length patterns, and then to select out of this set of maximal-length patterns 5

A, AB, AC, B, BA, BD, C, D in the example in Fig. 6. AB, AC, BA, BD, C, D in the example in Fig. 6. 7 AB, AC, BA, BD in the example in Fig. 6. 8 In our experiments, we have observed 55 different pairs (audit event, process name); two fictitious events have been added to represent the beginning and end of a process. 6

18

Table 6: Statistics on variable-length tables. Reference table All patterns Leaf patterns Selected patterns Leaf patterns Selected patterns Leaf patterns Selected patterns

Size of table Pruning factor 87 82 51 146 74 215 80

0.1 0.1 0.1 0.05 0.05 0.02 0.02

Number of legitimates combinations of length < T (6) > 24 ∗ 225 > 21 ∗ 195 > 13 ∗ 125 > 17 ∗ 165 > 12 ∗ 115 > 16 ∗ 145 > 12 ∗ 115

those that are needed to cover the training sequences. 9 We have experimented with the three tables generated with the suffix-tree approach on the normal user behavior, and found that they also do not generate false alarms. Concerning the attacks, we also obtain the same results as in the previous section, i.e. the “.rhosts” and “tar exec copy” attacks are not detected.

6

Conclusion and future work

We have analyzed various techniques to build the reference information needed during the learning phase of behavior-based intrusion-detection systems. Functionality verification tests (FVT) provided by operating-systems developers can be used to exercise all possible execution paths of a process and are therefore well suited to serve as the basis for developing the process model. Owing to the fact that the process model can be developed in a closed laboratory environment, the model is independent of any external environment conditions and free of attacks. We have proposed two different families of methods, fixed vs. variable-length patterns, to generate tables of patterns that model the normal behavior of a process. We have also proposed a new paradigm to detect intrusions based on the length of uncovered sequences and relying on a specific pattern-matching algorithm. Based on the results of the experiments run on our testbed, we have shown that the appeal of the new paradigm for both families of patterns (fixed-length and variable-length) is that it minimizes the false negative alarm rate. We have explained the limitation of the fixed-length pattern approach, namely its inability to represent long meaningful substrings. We have defined a new family of methods based on a suffix tree representation to obtain variable-length patterns. Preliminary results exhibit their ability to detect real intrusions, but detailed analysis indicates that they are also more prone to issue false alarms. 9

We refer the interested reader to [16, 17] for more information.

19

Further work will concentrate on enhancing the algorithm for variable-length generation of patterns in order to reduce the risk of obtaining false positive alarms while keeping the good results regarding the true positive rate. Another proposed enhancement is to associate frequencies with the patterns, to customize the intrusion-detection patterns locally by flagging the infrequent use of registered patterns. Moreover, patterns can be associated with user commands, thus allowing the operator to obtain warnings upon execution of certain commands by users.

References [1] Herv´e Debar, Marc Dacier, and Andreas Wespi. Reference Audit Information Generation for Intrusion Detection Systems. In Reinhard Posch and Gy¨orgy Papp, editors, Information Systems Security, Proceedings of the 14th International Information Security Conference IFIP SEC’98, pages 405–417, Vienna, Austria and Budapest, Hungaria, August 31–September 4 1998. Chapman & Hall. [2] Herv´e Debar, Marc Dacier, Andreas Wespi, and Stefan Lampart. A workbench for intrusion detection systems. Technical Report RZ 2998, IBM Zurich Research Laboratory, S¨aumerstrasse 4, CH-8803 R¨ uschlikon, Switzerland, March 1998. [3] Patrick D’haeseleer, Stephanie Forrest, and Paul Helman. An immunlogical approach to change detection: algorithms, analysis, and implications. In Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy, pages 110–119, Los Alamitos, CA, May 1996. IEEE Computer Society, IEEE Computer Society Press. [4] Stephanie Forrest, Steven A. Hofmeyr, and Anil Somayaji. Computer immunology. Communications of the ACM, 40(10):88–96, October 1997. [5] Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji, and Thomas A. Longstaff. A sense of self for unix processes. In Proceedinges of the 1996 IEEE Symposium on Research in Security and Privacy, pages 120–128, Los Alamitos, CA, May 1996. IEEE Computer Society, IEEE Computer Society Press. [6] Stephanie Forrest, Alan S. Perelson, Lawrence Allen, and Rajesh Cherukuri. Self-nonself discrimination. In Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy, pages 202–212, Los Alamitos, CA, May 1994. IEEE Computer Society, IEEE Computer Society Press. [7] Steven A. Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion detection using sequences of system calls. Journal of Computer Security, 6(3):151–180, 1998. [8] Andrew P. Kosoresow and Steven A. Hofmeyr. Intrusion detection via system call traces. IEEE Software, 14(5):35–42, September/October 1997.

20

[9] Don Libes. Exploring Expect – A TCL-based Toolkit for Automating Interactive Programs. O’Reilly and Associates, Sebastopol, CA, 1995. [10] Gail C. Murphy, David Notkin, and Erica S.-C. Lan. An empirical study of static call graph extractors. In Proceedings of the 18th International Conference on Software Engineering (ICSE-18). IEEE, March 1996. [11] Mehdi Nassehi. Characterizing masqueraders for intrusion detection. Technical Report RZ3003, IBM Zurich Research Laboratory, S¨aumerstrasse 4, CH-8803 R¨ uschlikon, Switzerland, March 1998. [12] Isidore Rigoutsos and Aris Floratos. Combinatorial pattern discovery in biological sequences. Bioinformatics, 14(1):55–67, 1998. [13] G. A. Stephen. String Searching Algorithms. World Scientific, Singapore, 1994. [14] Henry S. Teng, Kaihu Chen, and Stephen C.-Y. Lu. Adaptive real-time anomaly detection using inductively generated sequential patterns. In Proceedings of the IEEE Symposium on Research in Security and Privacy, pages 278–284, Los Alamitos, CA, May 1990. IEEE Computer Society, IEEE Computer Society Press. [15] U.S. Department of Defense. Trusted computer systems evaluation criteria, August 1983. [16] Andreas Wespi, Herv´e Debar, and Marc Dacier. An intrusion-detection system based on the teiresias pattern-discovery algorithm. In Proceedings of the EICAR ’99. European Institute of Computer and Anti-Virus Research, March 1999. [17] Andreas Wespi, Herv´e Debar, Marc Dacier, and M. Nassehi. Audit trail pattern analysis for detecting suspicious process behavior. http://www.zurich.ibm.com/˜dac/Prog RAID98/Talks.html, September 1998.

21

Suggest Documents