A Model for Automatic generation of behaviour- based worm signatures

A Model for Automatic generation of behaviourbased worm signatures Sébastien Chainay

Karima Boudaoud

University of Nice–Sophia-Antipolis

University of Nice–Sophia-Antipolis

I3S-CNRS Lab.

I3S-CNRS Lab.

Sophia-Antipolis, France [email protected]

I.

INTRODUCTION

1

Worms are probably the faster malware with a propagation capacity growing with the speed of networks. The propagation is possible because of the bad organisation of the virtual memory of processes in operating systems and bad management of the memory in some programming languages (such as the C language). Usually, a worm works as follows: 1. First, it injects code (named shellcode) into a remote vulnerable process and binds a specific port. Then, the exploited process starts to listen on this port. 2. After that, the worm attempts to connect on the port (push propagation strategy) or the hijacked process tries to connect to the worm (pull propagation strategy). 3. If the connection is successful, the exploited process executes a command prompt. 4. Finally, the worm can start to send commands to the shell. The spreading of worms, particularly in high-speeds networks, requires systems able to generate, as soon as possible and automatically, signatures characterizing new worms. In this context, several systems have been designed [3][4][5][6][7][8][9][10][11]. The signatures generated by these systems are content-based, i.e. focus on finding one or several sets of bytes repeated in the code. Usually, hackers use mechanisms that can change the content of a worm to avoid its detection by systems using content-based signatures. However, when a hacker changes the code content of a worm, he doesn’t change its behaviour. Thus, what is needed is to define signatures based on the behaviour of the worm rather than on its content. Consequently, in this paper, we propose a model which generates a new kind of signatures, that we call: behaviour-based signatures.

1

A worm is a piece of software that uses computer networks and security flaws to create copies of itself. A copy of the worm will scan the network for any other machine that has a specific security flaw. It replicates itself to the new machine using the security flaw, and then starts replicating. [1]

Sophia-Antipolis, France [email protected] The aim of these signatures is to characterize the propagation of a worm in a network and its execution on an operating system. This paper is organized as follows. First, we give an overview on worm morphisms. Then, we define the notion of behaviour-based signatures. After that, we present our model. Finally, we conclude with some remarks and future works. II.

MORPHISM OF WORMS

A worm having one representation is a monomorphic worm. However, a worm may have several representations. It can be oligomorph, polymorph or metamorph. Moreover, a worm can be encrypted or not. Generally, an encrypted worm contains one decryption function at the beginning or at the end of the worm followed by the encrypted body. In oligomorphic worms, the decryption function can be different for some worm replications (i.e. copies). However, in polymorphic worms, the decryption function changes for each worm replication. Concerning metamorphic worms, they are not encrypted. They recompile themselves with a different coding at each replication. Existing signature generator systems detect and generate signatures for monomorph, oligomorph and polymorph worms. Currently, Autograph [3] and SweetBait [4] (which uses Honeycomb [5]) detect monomorph worms. Earlybird [6] and Nemean [7] detect oligomorph worms. PADS [8], PAYL [9], Polygraph [10] and Hamsa [11] detect polymorph worms (see Tab.I). All these systems focus on the worm content, except Earlybird, which takes into account the address dispersion [6] by counting the number of connexions on different hosts (i.e. different IP addresses). TABLE I.

A COMPARISON OF GENERATION SIGNATURE SYSTEMS

Metamorphic worms can’t be detected easily by these systems because worm codes (i.e. content) may change considerably (without changing the behaviour), contrarily to monomorphic, oligomorphic and polymorphic worms where only the decryption function may change [2]. Thus, to detect these kinds of worms, it will be more judicious to look at the behaviour rather than at the content. III.

BEHAVIOUR-BASED SIGNATURES

In the context of this work, we define the notion of behaviour-based signatures to represent the worm behaviour at the network and system level. Thus, we decompose the behaviour-based signature in two parts: the network-based signature and the system-based signature. A. Network behaviour-based signature The network behaviour-based signature defines the way a worm propagates from a source to a destination by analyzing the following network metrics: •

IP address of the source.

•

Source port, destination port, protocol type (tcp, udp…).

•

Number of packets.

•

Length of the biggest packet.

•

Propagation strategy (i.e. Push or Pull).

•

Average inter-arrival time between packets.

•

Duration between the first and the last packet.

Thus, to represent the system behaviour-based signature, we consider the following elements: •

Internal Ports sequence.

•

System calls sequence.

•

Library links sequence.

•

Devices access sequence.

•

CPU profile.

•

Memory consumption profile.

•

Length evolution of the worm (in the case of zipped worms).

•

Historic of the worm location in the file system.

•

Operating system of the source host.

In addition to these elements, we take into account the address dispersion characteristic used by EarlyBird. However, in our case, in addition to count the number of connexions (like in Earlybird), we look at the IP generation strategy used by the worm. All these elements can be measured by tools provided by Solaris such as VMstat, MPstat, IOstat, Kstat [12]. After having defined the behaviour-based signatures, we will now present our generator model of signatures. IV.

A MODEL TO GENERATE BEHAVIOUR-BASED SIGNATURES

To collect these metrics, several tools are necessary: TCPdump, TCPstat, DNS reverse lookup, etc. B. System behaviour-based signature The system behaviour-based signature defines the worm activities on a computer (see Fig.1). We have identified three kinds of activities: •

Execution activities, which concern system calls and library links made by the worm.

•

Communication activities, which concern communications with other processes using a communication port.

•

Access activities, which concern accesses to devices (mainly disk accesses, in the case of files).

Our signatures generator model is composed of (see Fig.2): •

A collector, which gets worms connecting on vulnerable applications.

•

An emulator, which extracts worms by using taint analysis on a virtual operating system.

•

An analyzer, which monitors system activities of worms executed on a virtual operating system.

•

A generator, which generates the behaviour-based signatures.

All these entities run independently each other so that informations on several worms are collected in parallel.

All these activities use CPU and RAM.

Figure 2. Figure 1.

Activities of a process in a computer

A model to generate behaviour-based signatures

To obtain the network metrics from the worms collected with both the emulator and collector, we propose to use TCPdump, TCPstat tools. To obtain the system metrics, we launch the worms on a virtual machine that emulate the operating system of the source host where it has been extracted. We determine this operating system by using passive a fingerprinting tool like Disco, p0f or ettercap.

A. Collector In the context of this work we don’t define a new kind of collector but we use an existing one, named Nepenthes [13]. The aim of this tool is to open vulnerable ports and to analyze received shellcodes in order to extract the URLs where the worms can be downloaded. B. Emulator As for the collector, in our model, we don’t design a new emulator but we use an existing one, named Argos [14], which is a Linux tool that extracts worms from the network traffic. C. Analyzer The aim of the analyzer is to extract the system metrics characterizing the collected worms. To do that, the analyzer starts a virtual system according to the operating system required for the execution of a specific worm. Then, it executes the worm and monitors its system activities.

TABLE II.

A FEW CHARACTERISTICS OF HIGH-SPEED WORMS [15][16] CodeRed II

Theorical faster worm

Slammer Slapper

Theorical faster worm

14h

3,3s

10 mn

1,2s

Length

4 ko

0,5 ko

0,4 ko

0,4 ko

Protocol-based

TCP (latency-limited)

Spread duration

UDP (bandwidth-limited)

B. Multimode and dual mode worms Classic multimode worms search for both security holes whereas dual mode worms start out searching for the first hole until it decides that it has been completely exploited then they switch to exploiting the second hole [17]. Such worms have several system behaviour-based signatures because they exploit several security holes. To identify them, we have to make groups (or clusters) that gather worms having the same network behaviour-based signature (more discriminatory). Within a cluster, worms that have completely different system behaviour-based signatures are either dual mode worms (if they have two system behaviour-based signatures) or multimode worms (if they have more of two system behaviour-based signatures). If some system behaviour similarities are found between worms inside a same group then they belong to a same family of worms because they exploit common vulnerabilities (see Fig.3).

D. Generator The generator gathers datas from the analyzer, emulator (i.e. Argos) and collector (i.e. Nepenthes) and generates a behaviour-based signature according to the following format composed of two parts: Net(Source(IP, country), Ports({({T|U}Source; {T|U}Destination)}i), Packets(Number, length of biggest packet), Time(Total duration, IAT), Space(propagation strategy, address dispersion)) Sys(Calls({system}i, {library}i), Comm({Internal ports}i), Devices({location access}i), CPU({%around ten use, duration in seconds}i), Mem({%around ten use, duration in seconds}i), Length({length}i), Location({absolute path location}), address dispersion, background OS name)

Figure 3.

VI.

with T=TCP, U=UDP, IAT=Inter-Arrival Time.

The generated signatures are then registered in a database in order to be used by a misuse detection system. V.

DISCUSSIONS

A. Time consideration By using high-speed networks, worms can be spread on all vulnerable computers within a very short period of time (see Tab.II). In parallel, servers have more and more resources. It involves that a worm can do more actions in a same time. So the entities that extract network and system metrics (emulator, collector, virtual machine, generator) have to be efficient enough (particularly with good hardware means) to respect the time constraint of these very fast spreads. But as the duration of worms execution in a virtual machine is unknown, this model does not guarantee to give a behaviour-based worm signature before the end of its spread.

The Slapper worm family [18]

CONCLUSION

In this paper, we have proposed a model to generate behaviour-based signatures in order to detect worms that can not be detected easily by content-based signatures. For future works, we plan to define classification clusters using generated behaviour-based signatures to recognize new worms, multimode worms, dual mode worms and worms family.

REFERENCES [1] [2] [3]

[4]

Wikipedia, http://en.wikipedia.org/wiki/Computer_virus Peter Szor. The Art of Computer Virus Research and Defense, 2005. K.-A. Kim and B. Karp. Autograph: Toward Automated Distributed Worm Signature Detection. In Proc. of the USENIX Security Symposium, 2004. G. Portokalidis and H. Bos. SweetBait: Zero-HourWorm Detection and Containment Using Honeypots, Elsevier Journal on Computer Networks,

Special Issue on Security through Self-Protecting and Self-Healing Systems, 2005. [5] Kreibich, C., Crowcroft, J. Honeycomb - Creating Intrusion Detection Signatures Using Honeypots. ACM SIGCOMM Computer Communication Review 34 (2004) 51-56. [6] S.Singh, C.Estan, G.Varghese, and S.Savage. Automated worm fingerprinting. In Proc. OSDI, 2004. [7] V.Yegneswaran, J.Giffin, P.Barford, and S.Jha. An architecture for generating semantic-aware signatures. In USENIX Security Symposium, 2005 [8] Y. Tang and S. Chen. Defending against internet worms: A signaturebased approach. In Proc. of Infocom, 2003. [9] K. Wang, G. Cretu, and S. J. Stolfo. Anomalous Payload-based WormDetection and Signature Generation. In Symposium on Recent Advances in Intrusion Detection, 2005. [10] J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In IEEE Security and Privacy Symposium, 2005.

[11] Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian Chavez. Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience. In IEEE Symposium on Security and Privacy, 2006. [12] Bob Netherton. DTrace. Solaris 10 Workshop, 2005. [13] http://nepenthes.mwcollect.org [14] http://www.few.vu.nl/argos [15] S.Staniford, V.Paxson and N.Weaver. How to Own the Internet in Your Spare Time. In Proc. of the 11th USENIX Security Symposium, pp.149167, USENIX Association, 2002. [16] S.Staniford, D.Moore, V.Paxson and N.Weaver. The Top Speed of Flash Worms. In Proc. of RAID, 2004. [17] N.Weaver. Potential Strategies for High Speed Active Worms : A Worst case Analysis, 2002. http://www.icsi.berkeley.edu/~nweaver/worms.pdf [18] J.Nazario. Defense and Detection Strategies against Internet Worms. 2005.

A Model for Automatic generation of behaviour- based worm signatures

A Model for Automatic generation of behaviour- based worm signatures

Suggest Documents

A Model-Driven Method for automatic generation

Automatic Model Generation Strategies for Model Transformation ...

Behaviour Based Worm Detection and Signature Automation

Automatic Human Model Generation

Service model for semi-automatic generation of

NetSpy: Automatic Generation of Spyware Signatures for ... - Rutgers CS

Automatic Geometry Based FE Model Generation for ...

Automatic Test Generation for Model-Based Code ... - Denis Silakov

Automatic Model Based Dataset Generation for Fast and ... - Diag

Automatic Test Generation for Model-Based Code ... - CiteSeerX

XML-based Automatic Generation of

Worm Model

Metamodel Matching for Automatic Model Transformation Generation*

IRJET- A Neural Conversational Model for Automatic Generation of Conversations

Survey of Polymorphic Worm Signatures - SERSC

Survey of Polymorphic Worm Signatures - SERSC

XSLT based method for automatic generation of a ... - Semantic Scholar

Automatic Generation of Equivalent Architecture Model ... - CiteSeerX

A Software Tool for Automatic Generation of

A Framework for the Automatic Generation of

A Metamodel-based Approach For Automatic User Interface Generation

A Search-Based Approach for Automatic Test Generation ... - CiteSeerX

EVOTLBO: A TLBO based Method for Automatic Test Data Generation ...

Automatic river network generation for a physically-based river ...