honeycyber: automated signature generation for ... - Semantic Scholar

10 downloads 1070 Views 218KB Size Report
outbound connections, which allow us to capture differ- ent payloads ... Unfortunately, this job takes a lot of time. Researchers .... that contains the instruction call ebx. For this .... rity Response Center, which works closely with the en- forcement ...
HONEYCYBER: AUTOMATED SIGNATURE GENERATION FOR ZERO-DAY POLYMORPHIC WORMS Mohssen M. Z. E. Mohammed, H. Anthony Chan and Neco Ventura Department of Electrical Engineering, University of Cape Town Rondebosch 7701, South Africa Emails: [email protected]; [email protected]; [email protected]

ABSTRACT Signature-based Intrusion Detection Systems (IDSs) can be evaded by polymorphic worms which vary their payloads in every infection attempt. In this paper, we propose Honeycyber, a system for automated signature generation for zero-day polymorphic worms. We have designed a novel double-honeynet system, which is able to automatically detect new worms and isolate the attack traffic from innocuous traffic. We introduce unlimited honeynet outbound connections, which allow us to capture different payloads in every infection of the same worm. The system is able to generate signatures to match most polymorphic worm instances with low false positives and low false negatives. 1. INTRODUCTION The yearly growth of internet worms increasingly threatens the availability and integrity of Internet-based services. Worms take the attack process one step further by selfreplicating. Once a worm has compromised and taken over a system, it begins scanning again, looking for new victims. Therefore a single infected system can compromise one hundred systems, each of which can compromise another one hundred more systems, and so on. The worm continues to attack systems this way and grows exponentially. This propagation method can spread extremely fast, giving administrators little time to react and ravaging entire organizations. Although only a small percentage of individuals can identify and develop code for worms, but once the code of a worm is accessible on the Internet, anyone can apply it. The very randomness of this process is what makes them so dangerous [1]. The research community has proposed and built intrusion detection systems (IDSs) to defend against Internet worms (and other attacks) [14, 15]. The IDS has database containing known attacks signatures. It searches for inbound traffic for known patterns that correspond to malicious traffic. The IDS may raise an alarm when the malicious traffic is found and block future traffic from the offending source address. Security experts manually generate the IDS signatures by studying the network traces after a new worm has

978-1-4244-2677-5/08/$25.00 ©2008 IEEE

been released. Unfortunately, this job takes a lot of time. Researchers have recently given attention to automating the generation of signatures for IDSs to match worm traffic. It has been shown that multiple invariant substrings must often be present in all variants of worm payload. These substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code [6]. On the other hand, generating a single substring signature for all worm instances can result in high false alarm rates. Current systems such as Honeycomb [3], Autograph [4], and EarlyBird [5] monitor network traffic to identify novel Internet worms, and produce signatures for them using pattern-based analysis, i.e., by extracting common byte patterns across different suspicious flows. All these systems generate a single signature to match all worm instances based on the assumption that there exists a single payload substring that will remain invariant across worm connections. Single signature is not qualified enough to match all worm instances with low false positives and low false negatives. In addition, current signature generation systems suffer from a relatively high false positives and high false negatives rate, which can be attributed to the fact that they capture innocuous traffic alongside suspicious traffic. Security experts need a great deal of information to perform signature generation. Such information can be captured by tools such as honeynet. Honeynet is a network of standard production systems that are built together and are put behind some type of access control device (such as a firewall) to watch what happens to the traffic [1]. We assume the traffic captured by honeynet is suspicious. Honeycyber reduces the rate of false alarms by using honeynet to capture traffic destined to a certain network. It takes one more step to further reduce the false alarm rate by generating multiple and more accurate signatures with the use of double-honeynet. This paper is organized as follows: Section 2 reviews techniques and anatomy of polymorphic worms. Section 3 discusses the related work regarding automated signature generation systems. Section 4 introduces the proposed Ho-

1 of 6

neycyber architecture to address the problems faced by current automated signature systems. Signature generation algorithms for Polymorphic Worm will be discussed in section 5. Section 6 concludes the paper and highlights the future work. 2. TECHNIQUES AND ANATOMY OF POLYMORPHIC WORMS

code-transposition technique changes the order of the instructions with the help of jumps. The excess jump instructions provide a statistical clue, and executable-analysis techniques can help to remove the unnecessary jump instructions. Finally, the register-reassignment technique swaps the usage of the registers, which causes extensive “minor” changes in the code sequence. B. Invariant Content in Polymorphic Exploits

A. Polymorphic Worm Techniques As stated in [8] there are many ways to make polymorphic worms. One technique relies on self encryption with a variable key. It encrypts the body of a worm, which erases both signatures and statistical characteristics of the worm byte string. A copy of the worm, the decryption routine, and the key are sent to a victim machine, where the encrypted text is turned into a regular worm program by the decryption routine. The program is then executed to infect other victims and possibly damage the local system. If the same decryption routine is always used, the byte sequence in the decryption routine can serve as the worm signature. A more sophisticated method of polymorphism is to change the decryption routine each time a copy of the worm is sent to another victim host. This can be achieved by keeping several decryption routines in a worm. When the worm tries to make a copy, one routine is randomly selected and other routines are encrypted together with the worm body. The number of different decryption routines is limited by the total length of the worm. Given a limited number of decryption routines, it is possible to identify all of them as attack signatures after enough samples of the worm have been obtained. Another polymorphism technique is called garbage-code insertion. It inserts garbage instructions into the copies of a worm. For example, a number of nop (i.e., no operation) instructions can be inserted into different places of the worm body, thus making it more difficult to compare the byte sequences of two instances of the same worm. However, from the statistics point of view, the frequencies of the garbage instructions in a worm can differ greatly from those in normal traffic. If that is the case, anomalydetection systems can be used to detect the worm. Furthermore, some garbage instructions such as nop can be easily identified and removed. For better obfuscated garbage, techniques of executable analysis can be used to identify and remove those instructions that will never be executed. The instruction-substitution technique replaces one instruction sequence with a different but equivalent sequence. Unless the substitution is done over the entire code without compromising the code integrity (which is a great challenge by itself), it is likely that shorter signatures can be identified from the stationary portion of the worm. The

The invariant content of vulnerabilities could form a basis for use in signatures that can match all variants of a polymorphic worm. As stated in [6] most software vulnerabilities requires invariant bytes to be exploited successfully. Any change in these bytes causes an exploit to become no longer functional. In the following the two chief sources of invariant content are described, namely, exploit framing and exploit payload. Invariant Exploit Framing A software vulnerability exists at some particular code site along a code path which is executed upon receiving a request from the network. In many cases, the code path to vulnerability contains branches whose outcome depends on the content of the received request; these branches typically correspond to parsing of the request in accordance with a specific protocol. Thus, an exploit typically includes invariant framing (e.g., reserved keywords or well known binary constants that are part of a wire protocol) essential to exploiting a vulnerability successfully. Invariant Overwrite Values Exploits typically alter the control flow of the victim program by overwriting a jump target in memory with a value provided in the exploit, either to force a jump to injected code in the payload, or to force a jump to some specific point in library code. Such exploits typically must include an address from some small set of narrow ranges in the request. In attacks that redirect execution to injected code, the overwritten address must point at or near the beginning of the injected code, meaning that the high-order bytes of the overwritten address are typically invariant. For example, CodeRed causes the server to jump to an address in a common Windows DLL that contains the instruction call ebx. For this technique to be stable, the address used for this purpose must work for a range of Windows versions. There are only six addresses that would work across Windows 2000 service packs zero and one. Signatures for polymorphic worms can be classified into two broad categories: content-based signatures that aim at using similarity in different instances of byte sequences to characterize a given worm, and behavior-based signatures that aim at characterizing worms through understanding the semantics of their byte sequences. Our approach focus-

2 of 6

es on content-based signatures that allow us to treat worms as strings of bytes. C. Invariant Bytes In a polymorphic worm sample we can classify three kinds of bytes: invariant, code, and wildcard [6]. Invariant bytes are those with a fixed value in every possible instance. If their value is changed, the exploit no longer works. They can be part of the protocol framework and exploit bytes but in some cases also of the worm body or the polymorphic decryptor. Such bytes are very useful in signatures generation because they are absolutely necessary for the exploit to work and their content is replicated across worm instances. Code bytes come from components like the worm body or decryption routine in which there are instructions to be executed. Although code section of worm samples can be subjected to polymorphism and encryption techniques, and thus they can assume different shapes in each instance, polymorphic engines are not perfect and some of these bytes can present invariant values. Lastly, wildcard bytes are bytes that may take any value without affecting worms spreading capabilities. Our system is based on specifying these invariant bytes in each worm and generating signatures accordingly. 3. RELATED WORK Honeypots are an excellent source of data for intrusion and attack analysis. Levin et al. described how honeypot extracts details of worm exploits that can be analyzed to generate detection signatures [2]. The signatures are generated manually. One of the first systems proposed was Honeycomb developed by Kreibich and Crowcroft. Honeycomb generates signatures from traffic observed at a honeypot via its implementation as a Honeyd [3] plugin. The longest common substring (LCS) algorithm, which looks for the longest shared byte sequences across pairs of connections, is at the heart of Honeycomb. Honeycomb generates signatures consisting of a single, contiguous substring of a worm’s payload to match all worm instances. These signatures, however, fail to match all polymorphic worm instances with low false positives and low false negatives. Kim and Karp [4] described the Autograph system for automated generation of signatures to detect worms. Unlike Honeycomb, Autograph’s inputs are packet traces from a DMZ that includes benign traffic. Content blocks that match “enough” suspicious flows are used as input to COPP, an algorithm based on Rabin fingerprints that searches for repeated byte sequences by partitioning the payload into content blocks. Similar to Honeycomb, Autograph generates signatures consisting of a single, contiguous substring of a worm’s payload to match all worm

instances. These signatures, unfortunately, fail to match all polymorphic worm instances with low false positives and low false negatives. S. Singh, C. Estan, G. Varghese, and S. Savage [5] described the Earlybird system for generating signatures to detect worms. This system measures packet-content prevalence at a single monitoring point such as a network DMZ. By counting the number of distinct sources and destinations associated with strings that repeat often in the payload, Earlybird distinguishes benign repetitions from epidemic content. Earlybird, also like Honeycomb and Autograph, generates signatures consisting of a single, contiguous substring of a worm’s payload to match all worm instances. These signatures, however, fail to match all polymorphic worm instances with low false positives and low false negatives. New content-based systems like Polygraph, Hamsa and LISABETH [6, 9 and 10] have been deployed. All these systems, similar to our system, generate automated signatures for polymorphic worms based on the following fact: there are multiple invariant substrings that must often be present in all variants of polymorphic worm payloads even if the payload changes in every infection. All these systems capture the packet payloads from a router, so in the worst case, these systems may find multiple polymorphic worms but each of them exploits a different vulnerability from each other. So, in this case, it may be difficult for the above systems to find invariant contents shared between these polymorphic worms because they exploit different vulnerabilities. The attacker sends one instance of a polymorphic worm to a network, and this worm in every infection automatically attempts to change its payload to generate other instances. So, if we need to capture all polymorphic worm instances, we need to give a polymorphic worm chance to interact with hosts without affecting their performance. So, we propose new detection method “Double-honeynet” to interact with polymorphic worms and collect all their instances. The proposed method makes it possible to capture all worm instances and then forward these instances to the Signature Generator which generates signatures, using a particular algorithm. An Architecture for Generating Semantics-Aware Signatures by Yegneswaran, J. Giffin, P. Barford, and S. Jha [7] described Nemean, Nemean's incorporates protocol semantics into the signature generation algorithm. By doing so, it is able to handle a broader class of attacks. The coverage of Nemean is wide which makes us believe that our system is better in dealing with polymorphic worms specially. An Automated Signature-Based Approach against Polymorphic Internet Worms by Yong Tang and Shigang Chen[8] described a system to detect new worms and generate signatures automatically. This system implemented a

3 of 6

double-honeypots (inbound honeypot and outbound honeypot) to capture worms payloads. The inbound honeypot is implemented as a high-interaction honeypot, whereas the outbound honeypot is implemented as a lowinteraction honeypot. This system has limitation. The outbound honeypot is not able to make outbound connections because it is implemented as low-interaction honeypot which is not able to capture all polymorphic worm instances. Our system overcomes this disadvantage by using double-honeynet (high-interaction honeypot), which enables us to make unlimited outbound connections between them, so we can capture all polymorphic worm instances. Automated Web Patrol with Strider HoneyMonkeys by YiMin Wang et al [11] developed an automated web patrol system to automatically identify and monitor malicious web sites that install malware programs by exploiting browser vulnerabilities. A “honeymonkey” is a computer or a virtual PC that actively mimics the actions of a user surfing the Web. A series of “monkey programs,” which drive a browser in a manner similar to that of a human user, run on virtual machines in order to detect exploit sites. The browsers can be configured to run with fully updated software or without specific updates in order to look for exploit sites that target specific vulnerabilities. In this manner, the attacks more likely to impact customers can be analyzed and detected. When a HoneyMonkey detects a zero-day exploit, it reports the URL to the Microsoft Security Response Center, which works closely with the enforcement team and the groups owning the software with the vulnerability to thoroughly investigate the case and determine the most appropriate course of action.

simulates one or multiple real systems. In general, any network activities observed at honeypots are considered suspicious, and it is possible to capture the latest intrusions based on the analysis of these activities. However, the information provided by honeypots is often mixed with normal activities as legitimate users may access the honeypots by mistake. Hours or even days are necessary for experts to manually scrutinize the data logged by honeypots and are therefore too slow against worm attacks which may infect the whole Internet in a shorter period of time [1, 8]. We propose a double-honeynet system to detect new worms automatically. A key contribution of this system is the ability to distinguish worm activities from normal activities without the involvement of experts. Figure 1 illustrates the double-honeynet architecture. It is composed of two independent honeynets, namely, Honeynet 1 and Honeynet 2. Our goal is to attract a worm to compromise the Honeynet 1 before it compromises a local server.

Internal Translator 2

Honeynet 2

A. System Architecture Before presenting the architecture of our double-honeynet system, we give a brief introduction of honeypot. Developed in recent years, honeypot is a monitored system on the Internet serving the purpose of attracting and trapping attackers who attempt to penetrate the protected servers on a network. Honeypots fall into two categories. A highinteraction honeypot operates a real operating system and one or multiple applications. A low-interaction honeypot

IDS

Gate Translator

Internet Internal Translator 1

From the abovementioned description, we can conclude that HoneyMonkeys follow an intrusion preventionoriented policy finding web sites that exploit browser vulnerabilities, whereas Honeycyber follows an intrusion detection-oriented policy waiting for attackers to attack the network. In addition, HoneyMonkeys are limited to webbased technologies and protocols, whereas Honeycyber provides support for a wider range of technologies and protocols. 4. DOUBLE- HONEYNET SYSTEM

Local Network

Signature Generator

Honeynet 1

Worm

Figure 1. Honeycyber architecture. We implement the gate translator at the edge router between the local network and the Internet. It detects the unwanted inbound connections and forwards them to Honeynet 1. To allow the gate translator to determine which connections are unwanted, we configure it with a list of used and publicly-accessible addresses. Organizations usually expose only the addresses of its publicly-accessible servers, which makes any access to other unused addresses unwanted. Once Honeynet 1 is compromised, the worm will attempt to make outbound connections. Each honeynet is asso-

4 of 6

ciated with an Internal Translator implemented in router that separates the honeynet from the rest of the network. The Internal Translator 1 intercepts all outbound connections from honeynet 1 and redirects them to honeynet 2 which does the same forming a loop. Polymorphic worms vary their payloads on every infection attempt, so the aforementioned process allows us to capture different payloads of the same worm, which means we can generate high quality signatures for a worm. Only packets that make outbound connections are considered malicious, and hence the Double-honeynet forwards only packets that make outbound connections. This policy is due to the fact that benign users do not try to make outbound connections if they are faced with non-existing addresses. If the double-honeynet captures enough instances of worm payloads, the payloads are automatically forwarded to the Signature Generator which generates signatures, using particular algorithms that will be discussed in the next section. These signatures are used to update the IDS (e.g. Snort or Bro) automatically by using a module. This module converts the signatures into Bro or pseudo-Snort format. For example, as shown in figure 1, if the Gate Translator suspects Packet 1 (P1), Packet 2 (P2), and Packet 3 (P3) to be malicious and redirects them to the Honeynet 1. Among these three packets, P1 and P2 make outbound connections and the Internal Translator 1 redirects these outbound concoctions to Honeynet 2. In Honeynet 2, P1 and P2 change their payloads and become P1′ and P 2′ respectively ( P1′ and P 2′ are instances of P1 and P2). Therefore, P1′ and P 2′ make outbound connection and Internal Translator 2 redirects these connections to Honeynet 1. In Honeynet 1, P1′ and P 2′ change their payloads and become P1′′ and P 2′′ respectively ( P1′′ and P 2′′ are also another instances of P1 and P2).

Signature-based and Anomaly-based IDS’s are currently used to detect worms, but each of them has drawback. The latter suffers from high false positives whereas the former is just able to find only know worms. So, we designed a new detection method “Double-honeynet” to solve the above problems. Our Double-honeynet has zero false positives because it’s not considered the packet within the honeynet 1 as malicious until it is making outbound connection to the honeynet 2 as we mentioned above. Also our Double-honeynet is able to detect unknown worms by redirecting all connections that are directed to non-existing address regardless of whether the connection is know worm or unknown worm. Then the Signature Generator will filter out the know worms by using a signature database for the known worms. B. Signature Generator Architecture Figure 2 illustrates the Signature Generator Architecture. It receives the packet payloads captured by double-honeynet. These packets are checked by the Protocol Classifier which classifies packets in terms of different protocols (TCP/UDP) and port numbers. Then the Known-Worms Filter component filters out known-worm samples and passes the remaining samples (unknown worms) to the Signature Generation Algorithms component which extracts all the distinct tokens in the samples. Then it clusters the distinct tokens according to their similarity. The set of tokens in each cluster is used as a signature for that cluster. The total number of the signatures is equal the total number of clusters. All the algorithms used above will be discussed in the next section.

Packets

P1 and P2 are malicious because of the outbound connections. Therefore Honeynet 1 forwards P1, P1′′ , P2, and P 2′′ to the Signature Generator for signature generation process. Similarly, Honeynet 2 forwards P1′ and P 2′ to the Signature Generator for signature generation process. P3 do not make any outbound connections when it gets to Honeynet 1. Therefore P3 is not considered to be malicious. Our design is not demanding in terms of resources. This advantage can be attributed to the fact that we use virtual Honeynets. A virtual honeynet is a solution that allows us to run everything we need on a single computer [1]. Virtualization technology allows for running multiple operating systems at the same time on the same hardware, which reduces cost dramatically.

Protocol Classifier

Known Worms Filter

Signature Generation Algorithms

Signatures

Figure 2. Signature Generator architecture. 5. SIGNATURE GENERATION ALGORITHMS FOR POLYMORPHIC WORMS As we mentioned earlier the single substring signatures is insufficient to match polymorphic worms robustly. In this section we describe the algorithms for automatically generating signatures of polymorphic worms. Many algorithms proposed in this section are based on algorithms found in [13]. Our proposed scheme and analysis consider two scenarios for a signature generation problem. In the first scenario,

5 of 6

the problem is defined as different instances of a single worm arrive in the Signature Generation Algorithms component, and the task is to perform a comparison between those instances so that the invariant contents of the worm are identified. In the second scenario, different instances of multiple worms arrive in the Signature Generation Algorithms component, and the task is to sort those instances as to which worm they belong to by identifying the invariant contents of each worm. In the first scenario, the Signature Generation Algorithms component extracts all the distinct tokens, of a minimum length X, which appear in every worm instance. The set of tokens is then taken as the signature of the worm. In the second scenario, the Signature Generation Algorithms component extracts all the distinct substrings of a minimum length X that occur in at least K out of all the different worm instances. Then the Signature Generator sorts these tokens in different clusters. The set of tokens in each cluster serve as a worm signature. To find the longest substring that occurs in at least k of n samples we use a well-known algorithm in [12]. This algorithm can be trivially modified to return a set of substrings to include all the distinct substrings that occur in at least k out of n samples, but also include some of the non-distinct substrings, in the same time bound. We can then prune out the non-distinct substrings and finally output the set of substrings for use in signature generation.

[3] [4] [5] [6] [7] [8] [9]

[10]

[11]

[12] [13] [14] [15]

6. CONCLUSION AND FUTURE WORK We have proposed Honeycyber, a system for automated signature generation for zero-day polymorphic worms using double-honeynet. Multiple invariant substrings must often be present in all variants of polymorphic worm payload. Single substring signatures cannot match all polymorphic worm instances with low false positives and low false negatives. We proposed new detection method “Doublehoneynet” to detect new worms that have not been seen before. The main objectives of this proposal are to reduce false alarm rates and generate high quality signatures for polymorphic worms. Currently we are building a test bed based on honeynet architecture to be connected to a real production network (CRG lab network). In addition we are investigating the development environment we are going to use to implement our propose algorithms. REFERENCES [1] [2]

L. Spitzner, “Honeypots: Tracking Hackers,” Addison Wesley Pearson Education: Boston, 2002. J. Levine, R. La Bella, H. Owen, D. Contis ,and B. Culver, "The use of honeynets to detect exploited systems across large enterprise networks," Proc. of 2003 IEEE Workshops on Information Assurance, New York, Jun. 2003, pp. 92- 99.

6 of 6

C. Kreibich and J. Crowcroft, "Honeycomb–creating intrusion detection signatures using honeypots," Workshop on Hot Topics in Networks (Hotnets-II), Cambridge, Massachusetts, Nov. 2003. H.-A. Kim and B. Karp, "Autograph: Toward automated, distributed worm signature detection," Proc. of 13 USENIX Security Symposium, San Diego, CA, Aug., 2004. S. Singh, C. Estan, G. Varghese, and S. Savage, “Automated worm fingerprinting," Proc. Of the 6th conference on Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2004. James Newsome, Brad Karp, and Dawn Song," Polygraph: Automatically generating signatures for polymorphic worms," Proc. of the 2005 IEEE Symposium on Security and Privacy, pp. 226 – 241, May 2005. V. Yegneswaran, J. Giffin, P. Barford, and S. Jha, "An architecture for generating semantics-aware signatures," Proc. of the 14th conference on USENIX Security Symposium, 2005. Yong Tang, Shigang Chen," An Automated Signature-Based Approach against Polymorphic Internet Worms," IEEE Transaction on Parallel and Distributed Systems, pp. 879-892 July 2007. Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian Chavez. Hamsa, “Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience,” Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2006. Lorenzo Cavallaro, Andrea Lanzi, Luca Mayer, and Mattia Monga, “LISABETH: Automated Content-Based Signature Generator for Zero-day Polymorphic Worms,” Proc. of the fourth international workshop on Software engineering for secure systems, Leipzig, Germany, May 2008. Yi-Min Wang et al, “Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities,” Proc. of the 4th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pp. 171-180, Seattle, WA, USA, 2008. L. Hui, "Color set size problem with applications to string matching," Proc. of the 3rd Symposium on Combinatorial Pattern Matching, Vol. 644, pp. 230 – 243, 1992. D. Gusfield, “Algorithms on Strings, Trees and Sequences,”, Cambridge University Press: Cambridge, 1997. Snort – The de facto Standard for Intrusion Detection/Prevention. Available: http://www.snort.org, 5 June 2008. Bro Intrusion Detection System. Available: http://www.snort.org, 5 June 2008.

Suggest Documents