Monitoring Controller's" DNA Sequence" For System Security

51 downloads 52823 Views 386KB Size Report
Advanced IT .... Network Traffic: volume, protocol used, packet size, IP hosts ... IP hosts monitoring can be subdivided into other categories such as “usual” IP ...
Monitoring Controller's "DNA Sequence" For System Security

Benjamin Yu Computer Systems Technology

Eric Byres Group For Advanced IT

Clay Howey Technology Center

Email:

Email:

Email:

[email protected]

[email protected]

[email protected]

British Columbia Institute of Technology Burnaby, B.C. Canada V5G 3H2

KEYWORDS Security, intrusion detection, virus, network, TCP traffic, DNA, PLC, controller, neural network ABSTRACT This paper presents research results on the detection of network security attacks in computer and control systems through the identification and monitoring of a synthetic "DNA sequence". Just as DNA characterizes the make up of the human body, and abnormal functioning of tissues can be traced to an altered DNA sequence, a "DNA sequence" of a computer system has similar functions. Changes in behavioral patterns of a computer system, such as virus attacks, are reflected in changes in the DNA sequence and appropriate actions can be taken. The security problem thus becomes one of defining what a DNA sequence should look like and how to monitor its evolution. The research aims at defining a DNA sequence for specific activities (e.g. TCP/IP traffic) and monitoring of its evolution. The paper describes schemes for handling changes in the DNA sequence which may result from legitimate operations or malicious attacks. We will also report on how the technology can be applied to a process control environment where industrial controllers are now equipped with HTTP servers for data access. Such an environment is

vulnerable for internal and external attacks, but also provides a practical and usable test bed for the ideas in this research.

INTRODUCTION Attacks on computer and network systems have significantly increased in recent years. Attacks in the form of virus, DOS, DDOS, worms, mail bombers, etc. have been reported on a variety of platforms, most notably on Microsoft products, and also on PDA’s, Linux, mobile phones, etc. [Bace 2000]. Research on intrusion detection has focused mainly on static monitoring of system integrity using predefined rules, signatures, or behavior patterns. This is useful to recognize and prevent known attacks, but it is absolutely useless against new forms of attacks. Other forms of monitoring the dynamic nature of a computer system using heuristic tools are usually found to be insufficient [Arnold et al 2000]. However, this is precisely what is needed in that system changes over time (addition, removal and upgrading of hardware and software), as well as changes in usage patterns (work behavior, type or level of activities during the day, etc.) A way of capturing the evolving characteristics of systems, whether hardware or software, and users mode of operation is needed. Thus, the basic question we would like to address in this research is simply: is there a way to characterize a dynamic system (hardware, software, and users) such that any disruption of the system can be easily detected? Our research builds on the idea of footprinting. Footprinting is the art of gathering target information by hackers before an attack [Scambray et al., 2001]. They gather all forms of data before execution of a focused and surgical attack. A number of tools for network enumeration including registrar, organizational, domain, network, POC (point of contact) query, DNS (Domain Name Server) interrogation, port scanning tools, etc. are available. A comprehensive collection of the data gathered in footprinting activities defines the operation of a computer and network system. Usually, hackers would concentrate on attacks through a specific penetration point, whether through an available port, a security flaw in an application or operating system, and data would be gathered specifically related to that entry point. A simplistic counter measure is to take the same set of data and analyze it to determine what possible forms of attack can be made. This is not entirely productive since it is next to impossible to secondguessing what a hacker may decide to do. (Research on software development methodology especially the one focused on security measure is more appropriate here!) A better use of the data collected from footprinting is to use the data, automatically filter it, and define a dynamic signature of the system to be monitored. This signature evolves over time based on the usage patterns of the system, and changes to the system. An analogy of this signature is the DNA sequence found in the human body. The DNA characterizes the make up and characteristics of the human body. Abnormal functioning of tissues can be traced to an altered DNA sequence. The same concept can be applied to computing systems where the normal functioning of the system can be captured in a concise and precise DNA sequence. The term “DNA sequence” of a computer system will be used to describe one of the key research areas in security.

Such computer system DNA sequence is not a new concept. The logs generated by a system are a most primitive form of a computer system’s DNA. Logs are usually very specific to activities that need to be traced, whether at the system or application level. There is also usually minimal dynamic monitoring of the logs. This research seeks to define what a computer system DNA should look like, and the monitoring activities that should be initiated when a change is detected in any part of the DNA sequence. It is believed that the characterization of the hardware, software, as well as the user behavior is necessary in order to monitor system integrity.

SYSTEM DNA

In this section, we propose the composition of a system DNA sequence and relate it to its effectiveness in detecting some of the known attacks. A form of static DNA sequence to enforce security has been used in many existing systems. For example, a system administrator may elect to limit the number of broadcast email that can be sent from any individual employee of an organization to, say, 25. Similarly, a network security administrator may limit files with attachment not to be delivered to the recipient until it is at least 1 day old. These are static DNA sequences, which characterize the particular system or network. A more dynamic approach takes into account of changing user needs, hardware and software changes, as well as other supporting technologies. If the changing environment in an organization has employees sending broadcast emails normally to 40 recipients rather than 25, the DNA sequence should evolve from 25 to 40. Similarly as demands for files to be transferred become quicker and quicker, the delay of delivering files to a recipient may evolve from 1 day to ½ day, or less. The genome project is an appropriate analogy to describe the complexity of this research. The genome of a human being contains all the genes needed for the functioning of every part of the human being. The genes are made up of DNA, and the sequence of building blocks is called bases. To describe the functioning of an entire computer system is analogous to the genome project which attempts to map all the genes to the functioning of a human being. At a smaller scale, particular DNA sequences have been identified to specific behavior or characteristic. Similarly, we will propose corresponding “DNA sequences” for a computer system. The culmination of the identification of different “DNA sequences” will define the “genome” of a computer system. Whereas in the genome project for living organisms, the DNA sequence is known, and the problem there was to identify the mapping of particular DNA sequences to genes (or characteristics of the living being), in this research, the process is reversed. We will begin with the characteristics of different aspects of a computer system, and propose DNA sequences for them, and finally derive the genome of the entire computer systems.

The difference between a DNA sequence and a signature, which is often found in computer literature, is that the latter is often used to denote a characteristic of a virus or anomaly, whereas the former is used in this research to denote the normal behavior of a system. When a system is infected or behaved abnormally, we refer to a DNA sequence that has evolved and / or mutated. The first requirement in the definition of DNA sequences is to capture essential characteristics of a system particular to a user or set of identified users. What should be included or not in this characterization is critical to the usefulness of the DNA sequences. The DNA sequences should also be easy to parse, analyze, and modify. It may also be the case that each system’s DNA sequences are defined by a different set of criteria. That is, the makeup of the DNA sequences of a computer system or network may differ from one system / network to another. As an example, a secretary’s workstation may have significant activities in document or spreadsheet files creation and updates, but minimal activities in execution files creation or updates. The opposite may be true for a developer. Similarly, network traffic on a server will be significantly higher (probably) than a receptionist workstation, and file replacement will be significantly higher during product release / update than other times. A preliminary list of categories of a computer system which needs to be monitored includes: network traffic, system files, and data files. For each category, the following essential characteristics should be captured: Network Traffic: volume, protocol used, packet size, IP hosts System files: file creation, deletion, modification (rename, size change, replacement) Data files: file creation, deletion, modification (rename, size change, replacement) Each characteristic becomes one DNA sequence. Each DNA sequence can be measured in different time intervals or frequencies depending on the need for how quick a change in the characteristic should be signaled. A critical component of a system may use a very short time interval. One should note the difference between using specific signatures to detect intrusion, and the use of DNA sequences to detect anomalies in a system. In the former case, specific filters or signatures must be identified whereas in the later, anything that is “unusual” is flagged. The challenge is how to distinguish between normal and abnormal behaviors. We propose one scheme using neural nets in the section titled Detecting DNA Mutation. However, in all cases, a change in the threshold can be used to signal possible abnormal behavior. In all the DNA sequences proposed earlier, a numerical threshold can be used to define the characteristic. For network traffic: ?? ?? ?? ??

Volume of network traffic can be defined by number of packets / seconds. Protocols used can be defined by number of packets for each protocol per time period. Packet size can be defined by number of bytes. IP hosts can be captured using an associative array of frequencies and IP addresses per time period.

For system and data files: ?? File creation / modification / renaming / size change / replacement can be defined by number of occurrences per time period. As an example of how these can be applied, if the average ICMP packets through a system is normally around 20 in a 10 seconds interval, a sudden surge of 10,000 packets within an interval may indicate a possible DOS attack. Please see Appendix 1 for a sample of IP packets volume under normal operation and under DOS attack. IP hosts monitoring can be subdivided into other categories such as “usual” IP host, vs. “unusual” IP hosts. Users may have a set of IP hosts frequently visited, e.g. home portal, sites to track investment, etc., while users may visit other sites for information as required. If there is a sudden increase of traffic with a new host, some form of attack can be suspected. Also, abnormal patterns where the source and destination IP addresses are identical may signal a Land attack, or irregular TCP header flags may indicate Xmas Tree or WinNuke attacks. [Northcutt et al 2001]. The last two types of abnormal traffic patterns are more difficult to detect than the previous ones since there is no definite piece of data to be measured. One way is to identify all possible combination of patterns and keep track of the frequencies for each pattern. This is time consuming. Another possibility is to monitor the delta between data items. Thus TCP header flags change or IP address for source and destination correspondence can be detected. Similarly, if on average, file renaming only occurs around once per day, a sudden surge of file renaming of all .mp3 files to .vbs files will signal the possible attack from a virus. As for system files, the threshold for changes should be set to minimal. Any changes to these files should signal the user for possible attack.

DNA EVOLUTION As system usage changes, the DNA sequences should evolve and reflect the changing nature of the system by the users. Initial thresholds for each DNA sequence can be predetermined by an administrator, setting the limits expected for the user. As usage patterns change, the threshold may be increased or decreased. Decreasing a threshold can be automatic. For example, if after a month, network traffic is almost nil, the threshold can be halved. If the same occurrence is noted in the next month, the threshold can further halved, etc. Increasing of a threshold is more complicated and usually requires user input. If the threshold is set too low initially, there will be significant warning signals for the users. At times, a user may temporarily suspend monitoring of a particular DNA sequence due to a foreseen change in usage pattern. As an example, during an installation of a new software, or software upgrade, a significant number of files will be created, or modified, while a number of

files will be deleted. Instead of having the user interact with the changes during the process, the monitoring can be temporarily suspended.

DETECTING DNA MUTATION Changes in DNA can be either normal or abnormal. As an example, some activities such as WinVNC, or Video download generate a lot of traffic, and so does SYN flood. However, the former is normal and the latter is abnormal. This section provides details on how neural network is used to detect DNA mutation. The scope of the DNA mutation detection in this research was limited to network traffic. A network “sniffer” polled the number of tcp, udp, and icmp packets on a network segment and recorded these along with a timestamp to a file as shown in Appendix 1. The sniffer recorded traffic during times of low network activity as well as high network activity, as was produced when large files were being transferred to and from various nodes on the network. The sniffer also recorded traffic during a DOS attack, and during a UDP flood attack. The data files produced by the sniffer were first normalized, and then fed into a neural network. A standard back-propagation neural network was used. A 3-2-1 network architecture was used (3 input neurodes, 2 hidden neurodes, and 1 output neurode). The first input neurode was fed the normalized TCP traffic, the second input neurode was fed the normalized UDP traffic, and the third input neurode was fed the normalized ICMP traffic. The neural network was shown low and high network traffic, as well as that that occurred during the DOS attack. It was not shown the traffic from the UDP flood attack during training. The neural net was trained to differentiate between “normal” traffic and “abnormal” traffic. Abnormal traffic was defined as that that takes place during the DOS attack. After training, the neural network was tested by being shown novel traffic data from the UDP flood attack. Learning and momentum rates for the neural network were set to zero during the test run, and the (synaptic) weights produced during the training session were used. The traffic from the UDP flood attack differed significantly from that of the DOS attack in that the DOS attack primarily elevated the level of ICMP packets where the UDP flood primarily elevated UDP packets, although it also elevated ICMP packets to a lesser degree. The results were that the neural network successfully identified the traffic during the UDP attack as abnormal, and the traffic prior to and after the attack as normal. Refer to the graphed test results in Appendix 2.

DNA APPLICATION IN INDUSTRIAL CONTROL SYSTEMS

Programmable Logic Controller (PLC) and Distributed Control Systems (DCS) have long been considered relatively immune from hacking and virus attacks because they have been based on little known proprietary networks and operating systems. In effect, industry has based its confidence on the premise of “security through obscurity”. However, as the use of Windows, Ethernet and TCP/IP increased dramatically in the past few years, industrial control systems have become much more susceptible to attack from the outside world. This section provides details on a real life environment where data from PLC controllers is communicated in an Ethernet network environment and how a simple DNA sequence can be used to detect intrusion in the organization network system. There are a lot of advantages to using Ethernet and TCP/IP as the basis for plant floor networks. For example, by adding Ethernet communication to controllers, integration of process control information with management information systems is greatly simplified. This is particularly attractive for tasks such as data monitoring and program maintenance. Many believe that technological advancements will result in Ethernet soon being used for missioncritical control responsibilities currently being managed by proprietary automation networks. Data flow in industrial environments is typically arranged in a four-tiered model. At the base level, raw plant floor information, such as process temperature or device status, is transmitted to PLC’s or DCS’s through I/O networks and field/device buses (Level 1). These may be Ethernet based, but are typically based on more obscure protocols or even proprietary protocols. From the controllers, selected process data is passed up to a server that acts as a data concentrator for process information (Level 2). This server will often utilize OPC (Object linking & embedding for Process Control) to provide a common interface between data streams from different vendors controllers. Once in the concentrator, the data is passed to the historian for both long-term storage and distribution (Level 3). Typically the links from the PLC/DCS controllers to the data concentrator and between the concentrator and the data historian utilize Ethernet and TCP/IP. Once at the data historian, data is distributed to various clients through out the organization on demand (Level 4). The connection here almost always uses Ethernet and TCP/IP and is increasingly based on HTTP client/server applications. Theoretically the data historian could act as a primitive firewall, preventing attacks from penetrating down to the critical PLC/DCS level. However, there have been reports of intrusions that have made it through (or around) the data historian, causing interruption of the actual production systems [Byres, 1999]. Since the integrity of the process controls is essential for both economic and human safety reasons, it is important that network management have a means of detecting any intrusion, or even any footprinting (indicating a pending intrusion) quickly and efficiently. A detection system based on DNA sequences would be ideally suited for intrusion detection on Level 1 or Level 2 process networks. The software would run on a dedicated PC attached to the network in a listen-only mode. Reporting to IS management would be “out-of-band” on a separate network connection, so as to be certain that the detection system could not impact

the process in anyway. This would also likely satisfy control system vendors who insist that only their “certified” equipment can be attached to the control network.

Figure 1: Typical tiered network structured used in industrial control applications The top layers are exposed to the greatest risk of intrusion but all layers need some detection and protection mechanisms. Technically, monitoring of DNA sequences is likely to be simpler to implement on a Level 1 or 2 networks versus a standard corporate network. Since most of the traffic is machine to machine, traffic patterns are fairly consistent at the plant floor. Human-based traffic on the corporate network tends to experience significant time based swings as people get their email at the start of the day, browse the Internet during lunch and so on. As well, the introduction of new software or new usage habits tends to be much more rare and tightly controlled on the plant floor as compared with the typical office environment. It is also worth noting that monitoring of DNA sequences is not dependent on the network using Ethernet and TCP/IP. Since the neural net software is looking for anomalies in the network statistics, the system could be modified for other field and device buses by substituting bus specific network interface cards (NIC) in the monitoring PC. The only requirements would be that the NIC can be made to listen to all traffic (i.e. run in promiscuous mode) and software drivers will allow access to the network statistics.

FUTURE DIRECTIONS Mapping of a complete DNA sequence in process control networks isn’t as complicated as the monumental work in mapping the DNA sequence of a human body, but it will take significant effort. Research in better pattern recognition capabilities is needed, as well as analysis on resources required for the effective monitoring of a system. Also, better interface for user interaction will enhance the training of the DNA sequences.

CONCLUSION In this paper, we propose that each system can be characterized by DNA sequences. We have shown how DNA sequences can be changed and how a neural network can be trained to monitor the normal and abnormal evolution of a DNA sequence. We have shown how the use of a DNA sequence could be applied in a process control environment.

References [Arnold et al 2000] W. Arnold and G. Tesauro, Automatically Generated Win32 Heuristic Virus Detection, Virus Bulletin Conference, 2000 [Bace 2000] R. Bace, “Intrusion Detection”, MacMillan Technical Publishing, USA, 2000 [Scambray et al., 2001] J. Scambray, S. McClure, G. Kurtz, “Hacking Exposed, Second Edition”, Osborne / McGraw-Hill, USA, 2001 [VB 2001] Virus Bulletin, The Pentagon, Abingdon, Oxfordshire, 2001 [Northcutt et al 2001] Stephen Northcutt and Judy Novak, Network Intrusion Detection An Analyst’s Handbook, Second Edition, New Riders, 2001 [Byres, 2000] E. J. Byres, Designing Secure Networks for Process Control, IEEE Industrial Applications Journal, IEEE, September 2000

APPENDIX 1 Normal Network Traffic Volume Timestamp 31585 31602 31630 31641 31668

TCP 2 1 2 3 1

UDP 1 2 1 10 7

ICMP 0 0 0 0 0

32291 32303 32324 32357 32370

3 2 1 3 2

0 1 3 0 4

0 0 0 0 0

Timestamp 34924 34935 34946 34957 34980

TCP 3 0 0 3 0

UDP 4698 3494 280 255 102

ICMP 15553 11213 154 127 51

35599 35610 35621 35632 35643

0 4 0 0 3

586 1 0 0 0

14322 22491 29278 32813 32024

Network Traffic During DOS Attack

APPENDIX 2

UDP Flood Test

1 0.8 Neural Net Output

0.6 0.4 0.2

Timestamp

99

85

71

57

43

29

15

0 1

Attack Indication

1.2

Suggest Documents