Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
An Adaptive Intrusion Detection System Using Neural Networks J. M. Bonifácio Jr1, E. S. Moreira1, A. M. Cansian2 and A. C. P. L. F. Carvalho1 1 {boni, edson, andre}@icmsc.sc.usp.br Instituto de Ciências Matemáticas de São Carlos ICMSC/USP Po Box 668 - CEP 13560-970 - São Carlos - SP - Brazil 2
[email protected] UNESP - Universidade Estadual Paulista IBILCE/S.J.Rio Preto Po Box 136 CEP 15970-001 – São José do Rio Preto – SP - Brazil Abstract As the Internet expands both in number of hosts connected and in terms of the number of services provided, security has become a key issue for the technology developers. This work presents and analyses a prototype of an intrusion detection system. This system, positioned at key points of the network, will keep looking at the passing packets, in search of suspicious connections. The system provides a list of such connections for the administrator, enabling him/her to take the proper action at an early stage of the intrusion. Neural Networks are used to look for profiles of intrusion within the analysed data streams. The assessment is done through comparison with well-known profiles of intrusion. The system is highly adaptive, since new profiles can be added to the database and the Neural Network re-trained to consider them. Key Words Intrusion Detection System, Network Security, Neural Networks
1 INTRODUCTION The security, possession and handling of information have become an aspect of crucial importance for the whole society. On the other hand, piracy acts, intrusion attempts, consummate invasions and break-in actions are becoming frequent and involve an increasingly high number of computers (Bace, 1994)(Neumann, 1989). This scenario brings up the need for special security techniques in modern computer systems; ones that go beyond the traditional “locking up the doors” practice. Various Intrusion Detection Systems (IDS) (Lunt, 1993) have been developed and some of them have been introduced experimentally. These systems are divided into host-based (Javitz, 1991)(Winkler, 1990)(Lunt, 1990) and network-based (Heberlein, 1990, 1991)(Spirakis, 1994). Host-based systems use audit trails (which
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
are usually daemons) as the main action to detect intrusions, while network-based systems build up their own set of information using the network traffic “sniffed” from the communication medium. This paper describes the project of a network-based security system that uses techniques of Artificial Intelligence and Neural Networks to find out intruders, to register and learn about their mode of operation and then to provide elements that help the system administrator to take action against them. The detection system is formed by one or more security agents, which are placed at strategic points in the network and that provides information to the managing personnel or systems.
2 DETECTING SUSPECT CONDUCT Attack attempts take place according to some access techniques. In most cases and environments, the intruder is physically away from the system under attack (Neumann, 1989). The early IDS models, designed for isolated computers, use basic algorithms which include multinomial functions evaluation and covariant matrixes approximation in order to detect deviation from normal user behaviour (Winkler, 1990), as well as expert systems to detect security policy violation (Lunt, 1990). The latest models monitor a great number of network computers and transfer monitored information to be processed in a central equipment by employing distributed systems techniques (Snapp, 1991)(Ko, 1993). These models also have the traffic over network as part of their detection algorithms. Most of the IDS have an auditor process (daemon) in each machine, responsible for capturing security violation actions within that machine. Network-based systems rather than using audit trails, analyze the traffic of packets within the network to detect intrusive behaviour . One of the proposals for innovating our management system consists of the introduction of a security agent able to detect intrusive behaviour in established connections. This agent acts by capturing and deciphering packets that are transmitted through the network under monitoring. In order to make inferences about the security condition of the connections, the agent employs an Expert System and a Neural Network. They provide a value that gives an idea about the severity of attack or the degree of suspicion of the activities in that connection, based on previously recorded intrusive information. The system is based on the fact that an intrusion can be detected from an analysis of predetermined models, which are anomalous, compared with normal actions. The following examples illustrate instances of suspicious behaviour: • •
Someone trying to access a certain system would produce an irregular rate of password mismatch errors. Someone who gains illegitimate access to the system and on to an unauthorised account and password, would develop a different session to that of a legitimate user. Intruders may differ considerably from the legitimate users in terms of access to directories or in terms of the running of programs which seek information about the system status.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
•
• •
A user trying to get into the security mechanism of an operational system should execute a different set of programs to the regular user’s, and trigger more protection violations in the attempt to access unauthorised files and programs. Someone trying to access documents sensitive to security can do logins during unusual hours and direct data into discs and printers that are not normally used by him/her. An intruder able to monopolise a resource (for example, the network) should have an abnormally high rate of activity in relation to the resource.
Our proposal is that the use of Neural Networks can provide mechanisms for the recognition of successful attacks, as well as having the capacity to adapt in response to the changes in the intruder’s techniques.
3 THE SECURITY AGENT The agent is placed in a safe machine, one that is logically invisible to the others. This safe machine is placed at sensitive points of the network system. The agent passively monitors the network and captures the circulating packets through the use of the network interface in a promiscuous mode. The agent is organised in a four-layer model (Figure 1). The layers manage the flow of packets and provide the stimulus vector to the neural network. The lowest level only captures a flow of data in the network and passes the ordered packets to the second layer. Suspicion level from a particular connection
Neural Network
Post Processor Data
Semantic Analyser
Monitoring List Connection Module
Monitoring Vector 1 Monitoring Vector 2 Monitoring Vector 3 Monitoring Vector 4 ....
Pre-Selection and Inference Module
Capture Module
Data
Nettwork
.... ...
Figure 1 Security System Organization The second layer is consisted of 2 modules: Packets Pre-selection module, and Expert System module. The Pre-selection module makes an initial packet filtering, that may represent interesting events, such as what kind of protocol will be
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
monitored or which origins or destinies should be considered. The previously filtered packets then pass through an analysis made by the expert system The Expert System uses the following information when making decisions: which are the expected paths of connection, sensibility of the machines and reliability of the domains. These data work as an element of analysis to monitor the risky events, that are stored in a standing row where they stay for a definite time. The data stored in this row are vectors that include origin and destiny of the connection, ports involved, a “security level” and a time stamp. Each connection is represented by a vector in the row. The “security level” is a numeric value that grows according to the arrival of monitored events coming from a given source-destination pair or from the same domain. Another case to be considered is one in which the detected event comes from a domain where another event had previously taken place (already registered in the row), but not necessarily generated from the same machine. In this case, a new vector is inserted in the row, with an equal security level to which is added a weight indicating that a previous attempt coming from that domain had already been detected. This row is periodically checked, and when the security level field of a vector reaches to a pre-established threshold, the expert system “locks in” that connection or, alternatively, all of the connections coming from that domain. There is still the possibility of making different kinds of searches and checking in the row, considering not only the threshold for a source-destination pair, but also related information about the domains or a sub-set of related addresses. In this manner, when the expert system identifies patterns of behaviour different from the acceptable ones, all the packets travelling in that connection are sent to the next level (Connection Module). The third module is based on the hierarchic model of Network Security Monitor (NSM) (Herbelein, 1990, 1991) and receives the packets and organises them in a cause-effect relationship, identifying an unidirectional data flow. Once identified and ordered, the stream consists of data being transferred from a machine to another through a unified set of ports using a determined protocol. These packets are mapped into a “Connection Vector”, which contains all the stream of data that is travelling between the two hosts through the connection being monitored. The connection vectors are then sent to be processed by a “Semantic Analyser”. The semantic analyser acts upon connection vectors, searching for attack profiles that would appear the data. The profiles are stored in a data base and updated according to need. These profiles are attack signatures and contain information on how a suspicious session behaves. The data in the connection are analysed, also how many and which of them match to the profile. These information are sent to a last Post-processing module, which unifies the semantic analyser (matched profiles) to the expert system information (sensibility of services and connections) and its data base, in order to form a stimulus vector for the Neural Network. The stimulus vector (Figure 2) will contain values and information about the connection:
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
• • • • • •
Service capacity level. Telnet offers more capacities than FTP; Service authentication level. Telnet requires more authentication than SMTP; Security level of the source and destination machines, which may be a numeric value provided by a pattern analyser system, such as COPS or SATAN. Quantity of transferred data. Connection time. And, at last, the sequence of suspicious strings and attack signatures located and matched by the semantic analyser in the data, such as: su, root, passwd, telnet, debug, rlogin, rsh, chown, service disable, among others. Service capacity level
Host security level
“su” ”root” Authent service 1010010 1010011 level
Stimated by the port where the connection has ocurred
”rsh” 1011110
How timestamp much data
Atack signatures occurrences
Figure 2 Stimulus vector representation The neural system analyses the stimulus vector with its respective weights that represent the importance of the occurrence of events, and tries to attribute a suspicion degree, that represents the suspicion state of a particular connection. Before the neural system can identify potential attacks, it must be trained with a meaningful and large enough amount of stimulus vector, that represent the behaviour of the suspicious connections and the legitimate ones. Once trained, the network must use its characteristic of generalisation to identify correctly the users who show characteristics similar to those included in the intruding actions used to train it. The management system (remote) receives, then, the numeric value that represents the state of security of suspicious connections and map them, classified by security level, helped by a color code. From this point many things can be done, such as the transmission of many levels of alerts or warnings to the administrators (depending on the risk level of the detected events), shots logging processes, activation of counter-measures to isolate the host or the domain that caused the attack (active adjust of wrappers, firewalls or filters), among others. The Neural Network may also be adjusted, in order to reflect intruders' behaviour patterns changes. The administrators may detect highly intrusive behaviours, but that the neural network is not identifying correctly or is not identifying their severity as it was expected. In this case, new patterns can be added to the network and it be retrained in order to learn to recognise the new patterns appropriately. This adaptive capacity is the most important innovation in this project.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
4 CURRENT STATUS AND RESULTS In order to test the functionality of this intrusion detection model, a prototype has been developed. This prototype has a few simplifications regarding the proposed model, mainly in the Connection and Pre-selection module. In this module, the Expert System was initially substituted by a simple rules set. Besides, a few information were not included in the Neural Network stimulus vector, like the security level of the machines involved, previously mentioned. It was decided to implement all the prototype modules on the same host, in order to eliminate the necessity to develop agents and managers for data communication. The environment considered is an ethernet sub-network with TCP/IP packets. We choose to monitor only foreign packets initially, i.e., those not owned by local network monitored. When the pre-selection module chooses the full of a session, the capture module sends the monitored packets to the connection module. It is important to mention that, in this work, the term connection indicates only a small set of information packages whose source-target addresses are the same and are not related with the term connection oriented and connectionless, normally found in the network literature. In the example bellow, a TCP connection in the port 1899 from host 1.2.3.4 (source) to host 4.3.2.1 (target) in the port 25 (a SendMail connection) is showed. In such case the connection record shows a kind of SendMail attack. TCP:1.2.3.4-1899_4.3.2.1-25 220 victim.someplace.com ESMTP Sendmail 8.7.5 ready. mail from “ | /bin/mail
[email protected] < /etc/passwd “ 250 “ | /bin/mail
[email protected] < /etc/passwd “... sender ok. rcpt to: nobody 250 Recipient ok. 354 Enter mail, end with “.” on a line by itself data . 250 QAA23003 Message accept for delivery quit 221 victim.someplace.com closing connection. This record is an attack signature and holds the behaviour of a suspicious connection. The Semantic Analyser will filter the data, searching for the main components of a attack signature, to finally produce a binary vector that represents the suspicious behaviour to be interpreted by the neural network. The Semantic Analyser scans the text corresponding to the data which flows in the network and search for suspicious strings or control strings (type and ports of connection). It also stores a binary code for each string. This codes are grouped by connection and are used to form the stimulus vector to the Neural Network.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
4.1 The Stimulus Vector In the current development stage, the stimulus vector illustrated by Figure 3 is composed by two parts: The first part of this vector identifies the destination port of connection, i.e., the utilized service: telnet, ftp, sendmail, etc. This part is 6 bits length. Each bit identifies one port and the 000000 code identifies other less important ports. The codification is intended to facility the neural network training. Port (6 bits)
Suspicious String #1 (12 bits)
Suspicious String #2 (12 bits)
...
Suspicious String #10 (12 bits)
Figure 3 Current Stimulus Vector Different services have different authentication and capacity levels. To insert the knowledge that, for example, telnet is more dangerous than ftp, we put more examples of attacks using telnet to train the neural network. As a result, a telnet connection is associated to a higher level of suspicious than a ftp connection. The second part of the stimulus vector corresponds to suspicious strings. The strings are coded in a 12 bits binary code. Strings with similar meanings and that appear in similar contexts (different operation systems use different messages to a same event) will have the same binary code. In the cases that had been found less than 10 strings in connection, the non-utilised reserved blank spaces were fit with zero.
5 EXPERIMENTS The experiments presented in this paper use MLP (Rumelhart, 1986) neural networks. These networks are usually trained by the backpropagation algorithm. The back-propagation algorithm uses a generalization of the delta rule to train a multilayer network backwards through the layers starting at the output neurons. By usually requiring several passes through the training set. Backpropagation may take a long time to train the network, thus a few variations have been proposed which speed up the training process, variations of the backpropagation algorithm have been used, Quickprop (Fahlman, 1988) and Rprop (Schiffmann, 1993). In order to validate the experiments, they were carried out according to the method proposed in (Prechelt, 1994). The neural networks used in these experiments have a 126 neuron-input layer corresponding to stimulus vector of 126 bits, 6 bits to identify the port more 120 bits of 10 possibly strings found in the connection. Several different neural networks configurations were tested to identify which one has the best performance for this problem. The tests were carried out using the SNNS simulator (Stuttgart Neural Network Simulator). We made training and tests with 8 different topologies, all of them with
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
126 input-neurons and 1 output-neuron. The topologies used were: 126-1 (126 neuron-input layer, 1 neuron-output layer and no intermediary layer), 126-5-1, 12610-1, 126-20-1, 124-40-1, 126-60-1 (with one intermediary layer, where the second number indicates the number of neurons in this layer), 126-20-1-1 and 126-20-5-1 (with two intermediary layers). In the previous tests, we verified that the network with best performance was one with 20 neuron in intermediary layer, so we decided to insert a second layer in this topology, with 1 and 5 neurons. All topologies used were fully connected, i.e., all neurons in one layer are connected to all neurons in the next layer. Three training algorithms were used with different parameters each. The training algorithms and the parameters used are showed in Table 1: Table 1 Training algorithms and parameters Training Algorithm BackPropagation
RPROP QuickProp
Parameters n:1 delta:0.5 n:1 delta:0.1 n:0.1 delta:0.1 n:0.2 delta:0.5 delta0:0.1 deltamax:50 a:2 delta0:0.1 deltamax:50 a:0.1 delta0:0.1 deltamax:50 a:0.001 n:0.2 u:2.25 v:0.0001 delta:0.1 n:0.2 u:1.75 v:0.0001 delta:0.1
Each of the 8 topologies were tested using all training algorithms with the parameters showed in table 1. To each possible configuration (topology, training algorithm and parameters) it was verified that different values in the neurons initial weights caused large differences in the network output. To achieve a reliable result, 20 different training were run to each possible configuration and we use the medium value as the adequate comparing value. Thus, 1440 different training were carried out with three training algorithms: Backpropagation, Rprop and QuickProp. Figure 4 shows the results of the topology that achieved the best results, Backpropagation. The X-axis represents the topologies used, the Y-axis, the medium error and the bars in each topology represent the parameters used. We created three data sets to make the tests: training, validation and test, with 120, 56 and 56 patterns respectively and all sets have 50% of intrusive behaviour patterns and 50% of non-intrusive patterns to grant a non-tendentious neural network result. After training, the test was done with the third pattern set (test). The attack and non-attack patterns were captured in a controlled network environment by attack simulations and normal behaviours. The attack simulations were made by hand using know intrusion techniques and by systems like SATAN and ISS. With these methods, we achieve a good attack signatures set to train the neural networks.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
0,16
Medium Error
0,14 0,12 0,1 0,08
n:1 delta:0.5
0,06
n:1 delta:0.1
0,04
n:0.1 delta:0.1 n:0.2 delta:0.5
0,02 0 126-1 126-5- 1261 10-1
12620-1
12640-1
126- 126- 12660-1 20-1-1 20-5-1
Neural Network Topology
Figure 4 Backpropagation Algorithm Results The analysis of the results shows that the Neural Network with the best performance was that with two intermediary layers 126-20-5-1, backpropagation training algorithm and parameters n=0.2 and delta = 0.5, with a medium error rate of 5.067 %. It is also interesting to note that this training algorithm with these parameters had the best results. For the topologies with only one intermediary layer, the best results were achieved with 126-5-1 and 126-20-1 topologies, becoming worst with less or more neurons in intermediary layer.
6 CONCLUSIONS This work provides an effective contribution for the improvement in the mechanisms for intruder detection in computer networks. It was possible to design and to test a new model to attack an important, current and hardly solution problem through the unification of relatively simple techniques. We can point some advantages of this method: it can detect some new attacks, which were not showed to neural network; it does not reduce the performance in the monitored systems, it has a reduced cost, because it does not require big computational resources and it has facilities to codify new kinds of attack quickly Our Intrusion Detection System intends to provide a fast and versatile instrument to attack the problem of computers intrusion on networks. It is based on the fact that most intruders acts in a determined pattern. These patterns are stored in a database and used dynamically when the intrusion is being followed, with the help of expert systems and neural networks. Although there are not yet enough data to show the accuracy of the method, we believe that there is a good chance for the system to be successful. It does not try to state definitely about the presence of intrusion, instead, it provides a intrusion level that will indicate the likelihood of intruder activities. The neural networks used require a more deeper study, but our previous results suggests that the method is functional and can detect a intrusive behaviour with a good error rate.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
Another important characteristic of this method is the adaptability. The administrators knowledge is easily introduced into the in a way that new important information may be dynamically embedded, and will make the system up to date.
7 ACKNOWLEDGEMENTS We would like to thank FAPESP and CAPES for the support given to this project through scholarships and equipment.
8 REFERENCES Bace R.(1994) A New Look at Perpetrators of Computer Crime. In Proc. 16th Department of Energy Computer Security Conference. Fahlman, S. (1988) An Empirical Study of Learning Speed, Technical Report, Carnegie Mellon University. Heberlein L. et al. (1990) A Network Security Monitor. In Proc.1990 IEEE Symposium on Research in Security and Privacy, Oakland, CA, pp 296-304. Heberlein L. Levitt K. and Mukherjee B.(1991) A Method to Detect Intrusive Activity in a Networked Environment. In Proc. 14th National Computer Security Conference. Washington, DC, pp 362-371. Javitz H. and Valdez A. (1991) The SRI IDES Statistical Anomaly Detector. In Proc. 1991 IEEE Symposium on Research in Security and Privacy, Oakland, CA. Ko, C. et al. (1993) Analysis of an Algorithm for Distributed Recognition and Accountability. In Proc. First ACM Conference on Computer and Communicatons Security. Faifax, VA, . pp. 154-164. Lunt L. et al. (1990) A Real Time Intrusion Detection Expert System (IDES). Interim Progress Report, Project 6784, SRI International. Lunt T., (1993) A survey of intrusion detection techniques, Computers & Security, Vol. 12, pp. 405-418. Neumann P. and Parker. D. (1989) A Summary of Computer Misuse Techniques. In Proc. 12th National Computer Security Conference, pp. 396-407. Prechelt, L. (1994) PROBEN1 - A Set of Neural Network Benchmark Problems and Benchmarking Rules, University of Karlsruhe, Technical Report 21. Rumelhart, D. and Mcclelland, J. (1986) Parallel Distributed Processing: Exploration in the Microstructure of Cognition, MIT Press. Schiffmann, W., Joost M., Werner R. (1993) Optimization of Backpropagation Algorithm for Training Multilayer Perceptrons, University of Koblenz. Snapp, S. et al. (1991) A System for Distributed Intrusion Detection. In Proc. IEEE COMPCON 91. San Francisco, CA. pp 170-176 Spirakis P., et al. (1994) SECURENET: A Network-oriented Intelligent Intrusion Detection and Prevention System, Proc. of IFIP SEC '94. Winkler J. and Page W. (1990) Intrusion and Anomaly Detection in Trusted Systems. In Proc. Fifth Annual Computer Security Applications Conference, Tucson, AZ, pp.115-124.
Proceedings of the 14th Int. Information Security Conference (IFIP/Sec'98, part of the 15th IFIP World Computer Congress) - ISBN: 3-85403-116-5 31 Aug - 4 Sep, 1998, Vienna/Budapest, Austria/Hungary, 1998. IFIP, Austrian Computer Society
9 BIOGRAFHY Jose Mauricio Bonifacio Junior got his B. Sc. degree on computer science from USP – Universidade de Sao Paulo, Brazil. Graduate student doing a M. Sc. degree on computer science. His research interests are in the fields computer security, network intrusion detection and network management. Edson dos Santos Moreira, got his engineer degree on electronics and master's from University of Sao Paulo, Brazil and his PhD on Computer Science from Manchester University, UK. Currently he teaches and researches in areas related to networked multimedia and network security at the Computer Science Dept, University of Sao Paulo at Sao Carlos, Brazil. Adriano Mauro Cansian received the B.A.Sc. degree in physics from USP Universidade de Sao Paulo (Sao Paulo University, Brazil) in 1990, and M.Sc degree in 1992. He concluded the Ph.D. degree in Computer Science in 1997. In 1992 he joined the UNESP - Universidade Estadual Paulista (Sao Paulo State , Brazil) where he is currently both an Computer Science Associate Professor and CIO - Chief of Information Officer. His research interests are in the fields of network security and network intrusion detection. Andre Carlos Ponce de Leon Ferreira de Carvalho received the B. Sc degree in computer science and M. Sc. degree from Federal University of Pernambuco, Brazil. He concluded the Ph.D. degree in Eletronic Engineering in 1994 from University of Kent at Canterbury, England. Currently he teaches and researches in areas related to neural networks and artificial intelligence at the Computer Science Dept, University of Sao Paulo at Sao Carlos, Brazil.