Virtually Distributed Low-Interaction Honeypot To ...

13 downloads 19458 Views 893KB Size Report
Feb 26, 2011 - computer and contains its own virtual CPU, RAM, hard drive and network interface card (NIC). It behaves like a ... disaster recovery with the use of snapshots. ...... Execution did not terminate correctly: The executable crashed.
硕 士学位 论 文 基于虚拟低交互蜜罐检测僵尸网络 专 业 名 称:计算机系统结构 研究生姓名:阿汉 导 师 姓 名:龚俭

Virtually Distributed Low-Interaction Honeypot To Detect the Existence of Botnet

A Thesis Submitted to

Southeast University For the Academic Degree of Master of Engineering

BY Ahmad Jakalan

Supervised by Professor GONG Jian

School of Computer Science and Engineering

Southeast University May 2011

东南大学学位论文独创性声明

本人声明所呈交的学位论文是我个人在导师指导下进行的研究工作及取得的研究成果。尽 我所知,除了文中特别加以标注和致谢的地方外,论文中不包含其他人已经发表或撰写过的研 究成果,也不包含为获得东南大学或其它教育机构的学位或证书而使用过的材料。与我一同工 作的同志对本研究所做的任何贡献均已在论文中作了明确的说明并表示了谢意。

研究生签名:

日期:

东南大学学位论文使用授权声明

东南大学、中国科学技术信息研究所、国家图书馆有权保留本人所送交学位论文的复印件 和电子文档,可以采用影印、缩印或其他复制手段保存论文。本人电子文档的内容和纸质论文 的内容相一致。除在保密期内的保密论文外,允许论文被查阅和借阅,可以公布(包括以电子 信息形式刊登)论文的全部内容或中、英文摘要等部分内容。论文的公布(包括以电子信息形 式刊登)授权东南大学研究生院办理。

研究生签名:

导师签名:

III

日期:

Southeast University Master Degree Thesis

Abstract Malware detection has become more difficult with the use of compression, polymorphic methods and techniques to detect and disable security software. Those and other obfuscation techniques pose a problem for detection and classification schemes of malwares especially for botnets which are the most complicated forms for internet crimes. The objective of this research is to detect the existence of botnets in the monitored network by designing and deployment of a distributed low-interaction honeypot, and to provide clues from the detection for the threat evaluation by botnets propagation estimation. A distributed framework of nepenthes honeypots is built to collect as more as possible malware samples which are spreading on the network. The configuration of Nepenthes is optimized to improve the capture efficiency. The collected malware samples are analyzed firstly by features via antivirus scan, then by behavior via two online sandboxes. All data included in analysis reports is extracted and stored in database for the evaluation of the threats on the monitored network, and to study the propagation schema of the analyzed malwares. Keywords: Network security, Botnet detection, Honeypot, Nepenthes, Sandbox.

IV

Table of Contents ABSTRACT.....................................................................................................................IV TABLE OF CONTENTS................................................................................................. V CHAPTER 1 ..................................................................................................................- 1 INTRODUCTION.........................................................................................................- 1 1.1 MOTIVATION ........................................................................................................... - 1 1.2 RESEARCH OBJECTIVE ............................................................................................ - 2 1.3 RESEARCH CONTENT .............................................................................................. - 2 1.3.1 EFFECTIVE COLLECTION OF MALWARE SAMPLES................................................. - 2 1.3.2 ACCURATE ANALYSIS OF THE MALWARE SAMPLES ................................................ - 2 1.4 STRUCTURE OF THE THESIS ..................................................................................... - 3 CHAPTER 2 ..................................................................................................................- 5 BOTNET AND BOTNET DETECTION....................................................................- 5 2.1 INTRODUCTION TO BOTNETS................................................................................... - 5 2.2 CODE OBFUSCATION USED IN BOTNET CREATION .................................................... - 7 2.3 POLYMORPHISM AND METAMORPHISM ................................................................... - 9 2.4 BOTNET DETECTION AND DESTRUCTION EFFORTS ................................................. - 10 2.5 HONEYPOTS .......................................................................................................... - 11 2.6 NEPENTHES .......................................................................................................... - 12 2.6.1 Nepenthes definition......................................................................................- 12 2.6.2 Architecture of the Nepenthes Platform ........................................................- 12 2.6.3 Uses of Nepenthes .........................................................................................- 13 2.6.4 Malware Collection in Nepenthes.................................................................- 14 2.6.5 Nepenthes Log Files......................................................................................- 15 2.6.6 Nepenthes shortcomings ...............................................................................- 16 2.6.7 Dionaea.........................................................................................................- 17 2.7 MALWARE ANALYSIS ............................................................................................ - 18 2.7.1 Static and Dynamic Analysis.........................................................................- 18 2.7.2 Virtual Environment ......................................................................................- 19 2.7.3 Sandbox Analysis ..........................................................................................- 20 2.8 IRC- BASED MEASUREMENT AND DETECTION ....................................................... - 22 2.9 SUMMARY............................................................................................................. - 22 CHAPTER 3 ................................................................................................................- 24 EFFECTIVE COLLECTION OF MALWARE SAMPLES ...................................- 24 3.1 OVERALL SYSTEM ARCHITECTURE ....................................................................... - 24 3.2 NEPENTHES INSTALLATION ................................................................................... - 25 3.3 HONEYPOT CONFIGURATION................................................................................. - 25 3.4 BUILDING THE MALWARE DATABASE SERVER ...................................................... - 28 3.4.1 Nepenthes Database......................................................................................- 28 3.4.2 Malware server Administration Interface.....................................................- 30 V

Southeast University Master Degree Thesis

3.5 DISTRIBUTED MALWARE COLLECTION ................................................................. - 30 3.6 STATISTICS ............................................................................................................ - 31 3.7 EFFECTIVENESS OF IP ADDRESSES ........................................................................ - 33 3.8 EFFECTIVENESS OF ADDING MORE HONEYPOT NODES ........................................... - 35 3.9 THE RELATIONSHIP BETWEEN SOURCES AND COLLECTED MALWARE SAMPLES ...... - 36 3.10 NEPENTHES HONEYPOT DETECTION ................................................................... - 36 3.11 SUMMARY ........................................................................................................... - 37 CHAPTER 4 ................................................................................................................- 38 ACCURATE ANALYSIS OF THE MALWARE SAMPLES..................................- 38 4.1 ANTIVIRUS SCAN .................................................................................................. - 38 4.1.1 Submit the malware to online antivirus scan ................................................- 38 4.1.2 Installed antivirus software on Linux............................................................- 39 4.1.3 Installing antivirus on a separated windows OS ..........................................- 39 4.2 SANDBOX ANALYSIS ............................................................................................ - 40 4.2.1 ANUBIS.........................................................................................................- 41 4.2.2 JOEBOX........................................................................................................- 43 4.3 SUMMARY ............................................................................................................ - 45 CHAPTER 5 ................................................................................................................- 46 RESULTS ACHIEVED ..............................................................................................- 46 5.1 DNS ..................................................................................................................... - 46 5.2 IRC....................................................................................................................... - 48 5.3 HTTP ACTIVITY ................................................................................................... - 51 5.4 IRC & HTTP TOGETHER....................................................................................... - 53 5.5 EFFECTIVENESS OF MULTIPLE BEHAVIOR ANALYSIS IN DIFFERENT TIME ................ - 55 5.6 SUMMARY............................................................................................................. - 56 CHAPTER 6 ................................................................................................................- 57 CONCLUSION ...........................................................................................................- 57 APPENDIX I: THE MOST ACTIVE MALWARE DETAILED DESCRIPTION- 58 1. MD5: 6D1F2E8419280D026828DB77DB958E9A ................................................... - 58 2. MD5: 7D99B0E9108065AD5700A899A1FE3441................................................... - 60 REFERENCES............................................................................................................- 63 ACKNOWLEDGEMENTS .......................................................................................- 67 ABOUT THE AUTHOR.............................................................................................- 68 -

VI

Table of Figures Figure 1 Botnet Mechanism............................................................................................................... - 7 Figure 2 Architecture of the Nepenthes Platform............................................................................. - 13 Figure 3: Nepenthes honeypot malware collection steps ................................................................. - 14 Figure 4 distributed low-interaction honeypot system to detect botnet ........................................... - 24 Figure 5 The total number of the (Unique) collected malware binaries........................................... - 32 Figure 6 The total number of the successful download of malware binaries ................................... - 32 Figure 7 Distribution of Submissions and malware collections on the honeypot sensors ............... - 34 -

Table of Tables Table 1 Nepenthes Running status ................................................................................................... - 26 Table 2 Top-ten malware submissions ............................................................................................. - 33 Table 3 Malware samples collected by the majority of sensors ....................................................... - 35 Table 4 malware samples collected only by the 2nd honeypot......................................................... - 36 Table 5 Online Antivirus scan by email submission ........................................................................ - 38 Table 6 Kaspersky scan report sample ............................................................................................. - 39 Table 7 Classification of the collected malware samples depending on Kaspersky antivirus scan.. - 40 Table 8 Joebox html analysis report................................................................................................. - 45 Table 9 Overall DNS Traffic of all analyzed samples...................................................................... - 46 Table 10 Example of IRC botnet...................................................................................................... - 48 Table 11 IRC botnet with IRC server IP .......................................................................................... - 49 Table 12 IRC server with all botnets connect with it ....................................................................... - 50 Table 13 HTTP servers IP with botnets that connect with ............................................................... - 51 -

VII

Chapter 1 – Introduction

Chapter 1 Introduction 1.1 Motivation Broadband Internet connections are getting very common even for home users, who normally have little or no sense of internet security, and so are the most desired target for most internet attacks by different types of malware. Malware[1] is a term that means software or a piece of software that serve malicious purposes. Malware is often used to infect the computers of unsuspecting victims by exploiting software vulnerabilities or tricking users into running malicious code. Malware can exploit and compromise a system in many ways, either attacking operating systems vulnerabilities or remote services through Internet or deceiving the user to execute it by clicking on a fake link or opening an e-mail attachment. There are different classifications of malware depending on its propagation method, activity, goals. As most of security researchers classify it as the most “evil-minded”, botnet are now the greatest challenge for the network security researchers. The term botnet is used to define networks of infected end-hosts, called bots that are under the control of a human operator commonly known as a botmaster or bot-herder. Such malware is a constant threat to the integrity of individual computers on the Internet; for example they can bring down almost any server through distributed denial of service, and the combined power of many compromised machines is a constant danger even to uninfected sites. Botmasters use bots for a variety of attacks, For example; carrying out a Distributed Denial-of-Service (DDoS) attacks, sending out millions of spam or phishing e-mails. Attacks against infected hosts often hurt their performance and may include capturing private information or credentials for identity theft. Using thousands of zombie machines to launch distributed denial of service attack against enterprise and government internet resources by attackers is becoming dangerously common trend[2]. It has gone the time that hackers try to demonstrate their technical prominence among others, instead of; botnets are predominantly used for illegal activities. Every compromised machine, so called bot can establish a connection to a remote control network by which the attacker can issue arbitrary commands. Typical examples for these remote control networks are IRC networks, HTTP servers, and P2P.

-1-

Southeast University Master Degree Thesis

Malware detection has become difficult with the use of compression, polymorphic methods and techniques to detect and disable security software. Those and other obfuscation techniques pose a problem for detection and classification schemes that analyze malware behavior.

1.2 Research Objective The objective of research in this thesis is to utilize a distributed low-interaction honeypot to detect the existence of botnets in the monitored network, and to provide clues for the threat evaluation by botnets propagation estimation.

1.3 Research Content The research content of this thesis includes the following points:

1.3.1 Effective Collection of Malware samples The goal of the malware collection phase is to collect as more malware binaries as possible. However, developing a scalable and robust infrastructure to achieve this goal is a challenging problem in its own right, and has been the subject of numerous research initiatives. The main objective of using low interaction honeypot is collecting malware samples, this thesis will study how different modifications on the system will help in the effective collection of malware samples. These modifications include adding more virtual and physical sensors to the honeypot, virtual sensors are the multiple IP addresses on the same machine, and the physical sensors are the distributed honeypot which form a honeypot farm. This will increase the probability of that the used IP addresses to be scanned by infected machines to propagate hosted malware.

1.3.2 Accurate analysis of the malware samples Very little is known about the behavior of malicious botnets and their distributed computing platforms. Botnet prevalence on the Internet is mostly a mystery, and the botnet life cycle has yet to be modeled. Our objective is to more understand the behavior of botnets to be able to prevent them from spreading in our networks, and prevent the existing botnets to get to their targets. This needs to know the characteristics of botnets and how it is spreading on the network hosts.

-2-

Chapter 1 – Introduction

Depending on some available antivirus software (Kaspersky, AVG …) we can get some help in classifying the collected malware samples into categories. The antivirus scan reports of the malware samples will provide the given name or alias of the scanned malware; this is not enough so we can refer to the description of this type on the antivirus website. To understand the behavior of the selected samples we will use some of the available sandboxes like (Anubis and JoeBox) which provide free online behavior analysis of the submitted malware sample. Depending on the reports of the behavior analysis we can understand the activity of the malware. Depending on the analysis reports from the available sandbox systems, we will classify the analyzed botnets into categories depending on the communication protocol for example IRC, HTTP, DNS, and P2P. This classification will be helpful in botnet propagation modeling. Accurate analysis comes from analyzing the collected malware samples in different periods for multiple times and extracting network behavior of the analyzed sample from analysis reports of different sources.

1.4 Structure of the thesis The latter chapters of this thesis are organized as following: 

Chapter 2 introduces a brief definition of botnets, general architecture of botnets, network protocols utilized in botnet communications, and botnet mechanisms, techniques used by botnets creators to avoid detection, and efforts which have been done by security researchers to detect botnets. Also give a description to the tools used for botnet detection, especially low-interaction honeypots (Nepenthes) which is used in this research to detect the existence of botnets in the monitored network.



Chapter 3 introduces the proposed infrastructure of the distributed low-interaction honeypot system, the configurations on the honeypots to collect as more as possible malware samples by adding more honeypot sensors, then it has a description of the malware database server, database tables, administration web interface and statistics related to the malware collection, also the effectiveness of the proposed systems by statistics.



Chapter 4 introduces the analysis of the collected malware samples, ways implemented to scan malwares by antivirus, and we presented the classification of the collected malware samples depending on kaspersky antivirus scan results, then we talked about the behavior analysis of the malware samples depending on two online sandboxes (Anubis and JoeBox) -3-

Southeast University Master Degree Thesis

and how the malware samples can be submitted to sandbox and how the analysis report is received and the intended data in the report is extracted. 

Chapter 5 includes the most important results to indicate the existence of botnet in the collected malware samples. We have displayed comprehensive results of the overall analysis results in addition to some detailed information of some selected analysis reports.



Chapter 6 is the final conclusion of this thesis.

-4-

Chapter 2 – Botnet and Botnet Detection

Chapter 2 Botnet and Botnet Detection 2.1 Introduction to Botnets A bot is an abbreviation for a software robot, and can be used for both useful and malicious purposes. The malicious kind can be described as a type of malware that allows an attacker to remotely control the infected computer without the owner’s knowledge. Using thousands of zombie machines to launch distributed denial of service attacks against enterprises and government internet resources by attackers is becoming dangerously common trend. A bot can also use methods that characterize other types of malware; it can infect other hosts without manual intervention, like a worm. While other classes of malware were mostly used to demonstrate technical prominence among hackers, botnets are predominantly used for illegal activities. These activities range from extortion of Internet businesses to e-mail spamming, identity theft, and software piracy. Unfortunately, even with the substantial increase in botnet activity witnessed over the past few years, still not enough is known about the specifics and features of this malicious behavior, for instance, questions pertaining to the prevalence of botnet activity, the number of different botnet subspecies (and how they can be behaviorally categorized), and the evolution of a botnet over its lifetime, etc. The main characteristic of a bot however, is the use of Command and Control (C&C) channels. This gives an attacker the ability to issue commands to the bot, which in turn carries them out through the infected computer. A computer is usually infected by a bot through malicious code, un-patched vulnerabilities in the Operating System, backdoors left by other Trojan worms or Remote Access Trojans, and password guessing and brute-force attacks. Once infected, the victim typically executes a script (known as shellcode[*]) that fetches the image of the actual bot binary from a specified location. Upon completion of the download, the bot binary installs itself to the target machine so that it starts automatically each time the victim is rebooted. *

shellcode is a small piece of code used as the payload in the exploitation of software vulnerability. It is called "shellcode"

because it typically starts a command shell from which the attacker can control the compromised machine. Shellcode is commonly written in machine code, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient. However, attempts at replacing the term have not gained wide acceptance.

-5-

Southeast University Master Degree Thesis

After the computer has been infected, the bot can perform a number of tasks; disable the antivirus, use rootkits to hide from users, and download additional malicious applications. Most importantly, it connects to a command and control center to notify the attacker that a new computer is infected and ready to serve him. Then the bot will be always able to be connected with the bot master’s network whenever the computer is running and connected to the internet. In this way, the bot master has a complete control over the infected computer, and can use it to perform different kinds of services; recruit new computers to the botnet, perform a Distributed Denial of Service (DDoS) attack towards a server, install Adware and Click4Hire, distribute Spam, perform Phishing attacks, store illegal content, Data mining, etc. The owner of the infected computer will be able to use it the same way as before the infection, although there could be signs of infection, like the computer slowing down or suddenly shutting down for no apparent reason. Nonetheless, the bots usually hide very well, and can masquerade as system processes to make it difficult to discover by the owner. A botnet is a network of compromised machines that can be remotely controlled by an attacker. The botnet comprises of several bots interconnected, sometimes up to hundreds of thousands. The bot master uses C&C channels to communicate with these bots, mostly the IRC protocol is used for this purpose. After the bot has announced its presence to the master it lays dormant awaiting further instructions. The master can issue commands that all the bots in the botnet will receive. These commands can be everything from a change of C&C server to avoid detection by botnet hunters. The following figure is taken from reference [11] and is the most clear descriptive diagram of botnet mechanism. We can notice how botnet members first scan the network to find a vulnerable host to exploit it and then the bot code is downloaded to be executed, after that the new host becomes a member in the botnet and starts communicating with the Command and Control server:

-6-

Chapter 2 – Botnet and Botnet Detection

Figure 1 Botnet Mechanism

The defining characteristic of botnets is the fact that individual bots are controlled via commands, these commands are sent by the network’s bot-master. The communication channel used to issue commands can be implemented using a variety of protocols (e.g., HTTP, P2P, etc.). However, the majority of botnets today use the Internet Relay Chat IRC protocol. The IRC protocol was specifically designed to allow for several forms of communication (point-to-point, point to multi-point, etc.) and data dissemination among large number of end-hosts. The inherent flexibility of this protocol, as well as the availability of several open-source implementations, enables third parties to extend it in ways that suit their needs. These features make IRC the protocol of choice for botmasters, as it simplifies the botnet implementation and provides a high degree of control over the bots. The key benefit to botnets tracking for researchers is the direct observation of malicious activity. These observations give a researcher insight into the people behind a botnet’s creation, and their motivations. All of these features could be either inferred from captured malware or measured by honeypot technologies. Active tracking of botnets by participating can yield a partial view of the botnet’s activities.

2.2 Code Obfuscation used in botnet creation To make it difficult to be detected, botnet creators depend on different techniques of code obfuscation. Obfuscated code[54] is source or machine code that has been made difficult to understand for humans. Programmers may deliberately obfuscate code to conceal its purpose (security -7-

Southeast University Master Degree Thesis

through obscurity) or its logic to prevent tampering, deter reverse engineering, or as a puzzle or recreational challenge for someone reading the source code. Programs known as obfuscators transform readable code into obfuscated code using various techniques. Code obfuscation is different in essence from hardware obfuscation, where description and/or structure of a circuit are modified to hide its functionality. Malware authors tend to protect the malware code using the obfuscation technique[55]. This technique encrypts the program binaries, which produces an encrypted assembly language representation. This can protect the program source code from reversing by a malware analyst. Obfuscation can also help the malware to bypass anti-virus and intrusion detection programs built upon a signature-based detection method. Different obfuscation tools can be used which would produce different encrypted source code. Therefore, it would be difficult to have a unique signature (code pattern) that would be used by the anti-virus program to identify the malware. Moreover, obfuscation is used to protect malware from other hackers. It would be easier for a hacker to hijack another hacker's malware that has been successfully installed and passed security checks than creating new malware. This can be accomplished using a Trojan that modifies the binaries of the installed malware to redirect the transmission of the stolen information to the second hacker remote machine, or adding the hijacked malware to the second hacker botnet army. It is common to use several obfuscation mechanisms for the same malware to gain extra protection. For example, packers and protectors (or cryptors) can be applied for the same malware creating layers of protection. Packers obfuscate compiled binary programs by compressing its binary code and data sections. This compression would generate invalid machine code and non-meaningful assembly code. However, the program can still execute on the target machine by uncompressing the program at runtime. A packer embeds an unpacking stub inside the packed program that unpacks the program before it is loaded into the target machine’s memory. This can be accomplished by modifying the program entry point (EP) to points to the unpacking stub. When the packed program executes, the operating system reads the new entry point and initiates the execution of the packed program at the unpacking stub that restores the original program binary into memory. Cryptors or protectors serve the same purpose of applying an encryption algorithm upon an executable file, causing the target program’s internals to be scrambled and therefore, the -8-

Chapter 2 – Botnet and Botnet Detection

assembly language representing the machine code is non-meaningful. Technically, packers and cryptors work at the same way. To decrypt the program at the runtime, cryptors embed a decryption stub inside the program that would run first (by modifying the program EP to the stub) when the program executes to restore the original program contents in memory.

2.3 Polymorphism and Metamorphism Polymorphism[16] is a technique that thwarts signature-based identification programs by randomly encoding or encrypting the program code in a way that maintains its original functionality. Most anti-virus software and intrusion detection systems (IDS) attempt to locate malicious code by searching through computer files and data packets sent over a computer network. If the security software finds patterns that correspond to known computer viruses or worms, it takes appropriate steps to neutralize the threat. Polymorphic algorithms make it difficult for such software to recognize the offending code because it constantly mutates. Malicious programmers have sought to protect their encrypted code from this virus-scanning strategy by rewriting the unencrypted decryption engine (and the resulting encrypted payload) each time the virus or worm is propagated. Anti-virus software uses sophisticated pattern analysis to find underlying patterns within the different mutations of the decryption engine, in hopes of reliably detecting such malware. Emulation may be used to defeat polymorphic obfuscation by letting the malware demangle itself in a virtual environment before utilizing other methods, such as traditional signature scanning. Such virtual environment is sometimes called a sandbox. Polymorphism does not protect the virus against such emulation, if the decrypted payload remains the same regardless of variation in the decryption algorithm. Metamorphic code techniques may be used to complicate detection further, as the virus may execute without ever having identifiable code blocks in memory that remain constant from infection to infection. Polymorphism differs from cryptors that it tends to generate a random valid machine code at runtime. A signature-based identification program works by defining a unique pattern for all malicious software based on its machine code. Randomizing the registers generates different machine code. Using polymorphism, malware authors can generate different machine code for different copies of the same malware which could be effective to bypass some of detection programs. However, advanced detection programs use plenty of ways to identify polymorphed code by analyzing the code and extracting certain high level information from it. -9-

Southeast University Master Degree Thesis

Metamorphism is a powerful technique to bypass most of (if not all) anti-virus programs by alerting the entire program each time it is replicated. Metamorphism can be applied by embedding a sophisticated code-generation engine that analyses and generates different code for every copy of the malware. This engine can perform a variety of alterations on the malicious program. For example, besides randomizing the registers selection, the engine can also insert randomly garbage code. The engine can also change the instructions order if they are independent of each other and replace an instruction with equivalent functionality instruction. Both polymorphism and metamorphism techniques can be categorized as malware specific and can considered as anti-anti-virus techniques. They allow malware writers to create flexible malware; more difficult to locate and identify. Fortunately for malware researchers and anti-virus developers, developing polymorphism and metamorphism engines is difficult and requires effort and time.

2.4 Botnet Detection and destruction efforts According to the report from the European Network and Information Security Agency[3], Deficiencies in international cooperation, national laws and information sharing have allowed botnet controllers to build robust networks although many researchers, security companies and governments are actively investigating botnets. Existing approaches to measuring the size of botnets commonly lack accuracy, in that the numbers produced are only reliable to a very limited degree. Researchers found that different types of botnets represent different functionality. Furthermore, the possibility of remotely updating malware may extend a botnets capabilities rapidly, multiplying the threat. So for global botnet threat to be best countered. There should be close international cooperation between governments and technically-oriented and legislative institutions. In the same report they mention three high-level objectives for engaging the botnet threat are: a) Mitigation of existing botnets by reduction of existing infections by supporting owners of compromised computers in this task. And make Internet Service Providers (ISP) to use of their position as an access channel to the Internet for their customers and to use their special position for detection efforts. b) Prevention of new infections by Public awareness rise. c) Minimizing the profitability of botnets and cybercrime by tackling the improving antifraud mechanisms.

- 10 -

Chapter 2 – Botnet and Botnet Detection

The defense against botnets starts first with collecting information about the spread botnets, understanding their behavior and features to be classified. The most important tool used to collect malwares is honeypot.

2.5 Honeypots The first step to study malware and its malicious activities is to collect malware samples. Security researcher have invented and deployed several ways and technologies to collect malware, either by setting up a vulnerable system to wait for attacks, or crawling web pages for malicious code stored on servers. Indeed, the effective method recommended collecting malware samples depend firstly on the spreading model of the malware itself. Automated Malware particularly means malware that spreads automatically over the network from machine to machine by exploiting known or unknown vulnerabilities. The main tool recommended for this type to collect malware in an automated fashion today is so-called honeypots. A honeypot is an information system resource whose value lies in unauthorized or illicit use of that resource; they are the vulnerable systems waiting for attacks. The idea behind this methodology is to lure in attackers such as automated malware and then study them in detail. Honeypots have proven to be a very effective tool in learning more about Internet crime like botnets. The main reason for researching and developing honeypots is to discover new information about the practices and strategies used by creators of malware and hackers. Two kinds of information can be gathered by honeypots: 

Types of attack vectors in operating systems and software used for attacks, as well as the actual exploit code which corresponds to them.



Actions performed on an exploited machine. These can be recorded, while malware loaded on to the system can be preserved for further investigation.

Honeypots may be characterized by the degree of interaction when responding to actions. There are two general types of honeypots[6]: 1) Low-interaction honeypots this type offers limited services to the attacker. They emulate services or operating systems with a low level of interaction which varies with the implementation. The risk of implementing this type of honeypots tends to be very low, because of that its main intention is to capture harmful code samples, so usually do not require too much interaction. Deploying and maintaining low-interaction honeypots tends to be easy. A popular example of this kind of honeypots is nepenthes. With the help of low-interaction honeypots, it is possible to learn more about attack patterns and attacker - 11 -

Southeast University Master Degree Thesis

behavior. One of the benefits of low-interaction honeypots is that exploitation of the apparently vulnerable service does not lead to a compromise of the underlying system. This helps to keep maintenance of the honeypot low and also offers good scalability, as multiple vulnerabilities can be emulated in parallel. 2) High-interaction honeypots[7] offer the attacker a real system to interact with. The risk of deploying this type of honeypots tends to be higher than that of low-interaction honeypots, so it’s required to establish precautions and special provisions are to be done to prevent attacks against system. They are normally more complex to setup and maintain. The highinteraction honeypots main intention is to understand the attack scene, concerned that the attacks on the process[8], it requires a strong ability to interact with the attacker. The most common setup for this kind of honeypots is a GenII honeynet[9].

2.6 Nepenthes 2.6.1 Nepenthes definition Nepenthes[10] is a low-interaction honeypot like honeyd or mwcollect. That means it is not a fully blown Operating System with live running services. Instead Nepenthes is designed to run on Linux and it emulates known vulnerabilities in the Windows OS that worms use to propagate. The emulated vulnerabilities cannot be used to attack the underlying Linux OS, so Nepenthes requires less maintenance. The worm payload used to infect Windows machines are downloaded and stored as binary files for later analysis. The downloaded payload can also be sent by e-mail to Norman Sandbox, Anubis sandbox, and CW Sandbox for evaluation. Also nepenthes is a scalable honeypot, this is because of its ability to be configured to listen to a numerous number of IP addresses. Nepenthes is modular. It has modules for resolve DNS, emulate vulnerabilities, download handlers, submit handlers, trigger events, shellcode handler, etc.

2.6.2 Architecture of the Nepenthes Platform In reference[10] the author shows the flexibility of nepenthes and the modularized design, and how that the actual work is carried out by several modules, which register themselves in the nepenthes core. The available different types of modules are: 

Vulnerability modules emulate the vulnerable parts of network services.



Shellcode parsing modules analyze the payload received by one of the vulnerability modules. These modules analyze the received shellcode, an assembly language program, - 12 -

Chapter 2 – Botnet and Botnet Detection

and extract information about the propagating malware from it. 

Fetch modules use the information extracted by the shellcode parsing modules to download the malware from a remote location.



Submission modules take care of the downloaded malware, e.g., by saving the binary to a hard disc, storing it in a database, or sending it to anti-virus vendors.



Logging modules log information about the emulation process and help in getting an overview of patterns in the collected data.

In addition to several further components are important for the functionality and efficiency of the nepenthes platform: shell emulation, a virtual filesystem for each emulated shell, geolocation modules, sniffing modules to learn more about new activity on specified ports, and asynchronous DNS resolution. The following figure is taken from the same reference which shows the Architecture of the Nepenthes Platform.

Figure 2 Architecture of the Nepenthes Platform

2.6.3 Uses of Nepenthes Nepenthes is a useful to capture new malware samples spreading by exploiting old vulnerabilities but still useless for capturing samples of malwares that exploit new vulnerabilities[11], this is simply because these vulnerabilities are not emulated yet, but at the - 13 -

Southeast University Master Degree Thesis

same time it has the ability to include more vulnerabilities modules. The main focus is to collect the malware binary, download it, store it for the further in-depth analysis, so nepenthes is not designed for any human interaction. Nepenthes does not emulate the full services for the attacker to interact with because the main idea is to offer only as much interaction as is needed to exploit a vulnerability. This can be considered as one of the limitations of low-interaction honeypot because in this case it’s easy for the advanced botnets to detect the existence of honeypot.

2.6.4 Malware Collection in Nepenthes Nepenthes Honeypot is set up to listen to a number of ports which the vulnerability modules expect to receive a worm attack through. Nepenthes Honeypot is a passive honeypot, means that it will not invite worms to hack the machine; instead it should wait until one of its vulnerable open ports is scanned by the worms, then the source of infection will send the shell_code which will trigger the machine to download the malware code, at this time the honeypot will log a download attempt of new malware. On the accomplishment of malware download, the honeypot Store the sample in binaries named with its md5-hash and logs a successful download, then the file with the information of some of useful information are submitted to the malware database server.

Figure 3: Nepenthes honeypot malware collection steps

- 14 -

Chapter 2 – Botnet and Botnet Detection

2.6.5 Nepenthes Log Files Nepenthes log files are located in the directory (/var/log/nepenthes/) and categorized into: 

successful downloads log



download attempts log



nepenthes.log (all nepenthes activities) For successful download, nepenthes will log it into file called (based on configuration file)

logged_submissions, and the download attempts, nepenthes will log them into file called (based on configuration file) logged_downloads. The difference between these two log files is one for “SUCCESSFUL” download and the other one is for any download attempts by the malware. When we have a log in the file logged_downloads that means the Nepenthes has detected the infection of the shell_code but for some reason the injection has not happen. Infection without injection, or injection didn’t complete. Successful download is only logged in logged_submissions and of course should be first logged in logged_downloads. The structure of these two files is as following: 1) logged_downloads [root@nep-01 nepenthes]# tail -f logged_downloads [2011-05-11T22:23:08] *.21.238.38 -> *.112.25.34 http://www.sciencedirect.com/ [2011-05-12T01:43:58] *.21.71.125 -> *.65.200.106 http://unfictional.com/ [2011-05-12T09:54:07] *.115.163.137 -> *.112.25.34 http://www.sciencedirect.com/ [2011-05-12T13:08:10] *.80.10.1 -> *.65.200.216 http:// *.80.10.1/pp/anp.php [2011-05-12T17:18:44] *.80.194.29 -> *.65.200.222 http://bsalsa.com/ [2011-05-12T18:08:17] *.80.10.1 -> *.65.200.205 http:// *.80.10.1/pp/anp.php [2011-05-12T23:08:25] *.80.10.1 -> *.65.200.194 http:// *.80.10.1/pp/anp.php [2011-05-12T23:50:07] *.109.83.163 -> *.112.25.34 http://www.baidu.com/

Column Description 

Timestamp



Source of Infection



Nepenthes sensors



Download link

2) logged_submissions [root@nep-01 nepenthes]# tail -f [2011-05-01T19:43:50] *.*.190.58 [2011-05-01T19:43:50] *.*.190.58 [2011-05-01T19:44:09] *.*.190.58 [2011-05-01T19:44:11] *.*.190.58 [2011-05-01T19:44:40] *.*.190.58 [2011-05-01T19:44:53] *.*.190.58 [2011-05-01T19:45:18] *.*.190.58 [2011-05-01T19:45:19] *.*.190.58

logged_submissions -> *.*.200.96 creceive://*.*.190.58:4497 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.30 creceive://*.*.190.58:4730 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.64 creceive://*.*.190.58:1875 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.73 creceive://*.*.190.58:1992 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.16 creceive://*.*.190.58:3661 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.191 creceive://*.*.190.58:4461 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.26 creceive://*.*.190.58:2042 3875b6257d4d21d51ec13247ee4c1cdb -> *.*.200.50 creceive://*.*.190.58:2108 3875b6257d4d21d51ec13247ee4c1cdb

- 15 -

Southeast University Master Degree Thesis

Column Description 

Timestamp



Source of Infection



Nepenthes sensors



Download link including the protocol used to download with the port number.



Malware md5 (the name of the file when it’s stored in the honeypot hard disk)

3) Nepenthes.log In addition to these two files, there is another log file, it’s nepenthes.log which contains all operation logs of the honeypot, this file is usefull to test the work of the honeypot itself, but the information included inside this file is not very important. [root@nep-01 nepenthes]# tail -f nepenthes.log [13052011 00:46:42 warn module] Unknown LSASS Shellcode (Buffer 51 bytes) (State 0) [13052011 00:46:42 warn handler dia] Unknown DCOM Shellcode (Buffer 51 bytes) (State 0) [13052011 00:48:51 warn dia] Unknown IIS 189 bytes State 2 [13052011 00:50:16 warn dia] Unknown IIS 233 bytes State 2 [13052011 00:52:03 info handler dia] Unknown DCOM request, dropping [13052011 00:52:10 warn dia] Unknown ASN1_SMB Shellcode (Buffer 51 bytes) (State 0) [13052011 00:52:10 warn module] Unknown PNP Shellcode (Buffer 51 bytes) (State 0) [13052011 00:52:10 warn module] Unknown LSASS Shellcode (Buffer 51 bytes) (State 0) [13052011 00:52:10 warn handler dia] Unknown DCOM Shellcode (Buffer 51 bytes) (State 0) [13052011 00:53:28 warn module] Unknown IIS SSL exploit 118 bytes State 0

This upper capture shows log information of the honeypot when it’s running.

2.6.6 Nepenthes shortcomings 1) port 445 A major problem in nepenthes was vulnerability modules for port 445, there were just too many exploitable vulnerabilities on the same port. More exploits started using Windows API to establish a valid CIFS/SMB session to the attacked host, before sending the exploits payload. As nepenthes does not speak SMB, we could not get the payload, no payload, no sample. 2) Shellcode detection Nepenthes uses pattern matching with perl regular expressions to detect shellcode. Pattern matching is pretty popular, but it only works if pattern for the shellcode already have been created. Creating patterns for unknown shellcode is tricky, a copy of the shellcode is needed to create a pattern, finding unknown shellcode from nepenthes output is rather difficult. 3) libemu integration - 16 -

Chapter 2 – Botnet and Botnet Detection

libemu[56] is a small library written in C offering basic x86 emulation and shellcode detection using GetPC heuristics. It is designed to be used within network intrusion/prevention detections and honeypots. libemu can detect shellcode using emulation, but it has problem in integration within nepenthes. 4) threads Detecting shellcode using (libemu's) emulation is pretty fast, for nepenthes this shellcode emulation would prevent the software from doing anything else, as the initial idea was not to use threads. Trying to make nepenthes aware of threads was a good idea, but turned out to be not the best idea, it just did not work. 5) extendability Even though it was always claimed to be easy to write nepenthes modules/add-ons, there was very little contribution. 6) TLS Some services, for example https, require TLS encryption, no way to get this done within nepenthes. 7) IPv6 Integration within nepenthes might have been possible, but it would have been a lot of work, even though there is no malware using IPv6 yet.

2.6.7 Dionaea † Dionaea[12] is meant to be a nepenthes successor, embedding python as scripting language, using libemu to detect shellcodes, supporting IPv6 and TLS. Dionaea initial development is funded by the Honeynet Project as part of the Honeynets Summer of Code during 2009. Andrew Waite on Infosanity's Blog[13] wrote that Nepenthes is going to die depending on post (dated October 27th 2009) on the Nepenthes site indicates that development on Nepenthes is coming to a close, stating 7 reasons preventing newer features being implemented with Nepenthes. Nepenthes official website mentions on the start pay the following statement “Do not use Nepenthes, use Dionaea instead. Read why and install Dionaea instead.”[14] It is simply because Dionaea is the result of all shortcomings we experienced with nepenthes; therefore it is meant to supersede nepenthes. So for future research it’s good to add Dionaea to the implemented distributed structure.



It was released in October 2010

- 17 -

Southeast University Master Degree Thesis

2.7 Malware Analysis The next step to detect and spot botnets out of the collected malwares is to analyze the collected malwares. Here we introduce some methods of malware analysis.

2.7.1 Static and Dynamic Analysis Static analysis is the process of analyzing executable binary code without actually executing the file. Static analysis has the advantage that it can reveal how a program would behave under unusual conditions, because we can examine parts of a program that normally do not execute. Malware might start executing after a period of time or when a special event occurs. Key logging feature might start only when the user browses online shop or visit a bank web site. It is important to discover how malware can escape detection by anti-virus programs, how they can bypass firewall and other security protections. Static analysis helps malware researches to reveal what a piece of malware is able to do and how to stop it. To be able to perform static analysis, malware researchers must possess a good knowledge of assembly language and the target operating system. To be able to perform static analysis, malware researcher must be able to transform assembly language back to the higher level language that was used in creating the program in a technique called reverse engineering. Reverse engineering is the only solution to understand or to obtain the source code of closed source programs since compilers work is a one way direction; that they are unable to transform object code back to the source code. Static analysis tools include program analyzers, disassemblers and debuggers. These tools are able to detect if the malware uses any of software protection techniques. Dynamic analysis or behavioral analysis involves executing the malware and monitoring its behavior, system interaction, and the effects on the host system. Several monitoring tools are used to capture the malware activities and responses. These activities include the attempt to communicate with other machines, adding registry keys to automatically start the program when the operating system starts, adding files to system directories, downloading files from the Internet and opening or infecting (embedding itself) to other files. The initial step in this dynamic analysis is to take a snapshot of the system that can be used to compare system status (before/after) running the malware. This helps to identify the files that have been added or modified at the system. Understanding what changes happened in the system after the execution help in analyzing and removing the malware. In the Windows - 18 -

Chapter 2 – Botnet and Botnet Detection

environment, host integrity monitors and installation monitors provide the required assistant. Host integrity or file integrity monitoring tools create a system snapshot in which subsequent changes to objects residing on the system will be captured and compared to the snapshot. For Windows, these tools typically monitor changes made to the files system, registry, and system's configuration files. Unlike host integrity systems, installation monitoring tools track all of changes made to the consequences from the execution or installation of the target program. This means that they do not monitor all changes occurred in the system but only the changes during the installation or the initialization. Typically for Windows OS, they monitor file system, registry, and system's configuration files changes. The most convenient and practical method to perform behavioral analysis is to run the analyzed malware in a sandbox. Sandbox[15] is a testing environment that isolates untested code changes and outright experimentation from the production environment or repository, in the context of software development including Web development and revision control. Sandboxing protects "live" servers and their data, vetted source code distributions, and other collections of code, data and/or content, proprietary or public, from changes that could be damaging (regardless of the intent of the author of those changes) to a mission-critical system or which could simply be difficult to revert. Sandboxes replicate at least the minimal functionality needed to accurately test the programs or other code under development. By further analogy, the term "sandbox" can also be applied in computing and networking to other temporary or indefinite isolation areas, such as security sandboxes and search engine sandboxes (both of which have highly specific meanings), that prevent incoming data from affecting a "live" system.

2.7.2 Virtual Environment It is important to establish a secure environment before starting the malware analysis. The environment should not contain any important information, disconnected from the network (or the traffic is redirected to local host), and preferably contain fresh installation of the operating system. Virtualization products offer help for malware researchers. These products allow an unmodified operating system with all of its installed software to run in a special environment besides existing operating system. This environment, called a virtual machine, is created by the virtualization software by intercepting access to certain hardware components and certain features. - 19 -

Southeast University Master Degree Thesis

The virtual machine is software and therefore all of its components are software-based although they depict hardware components. A virtual machine behaves exactly like a physical computer and contains its own virtual CPU, RAM, hard drive and network interface card (NIC). It behaves like a real physical computer but for some malware it can detect the existence of virtual machine. The physical computer is usually called the host, while the virtual machine is often called the guest. Virtualization software allows running multiple virtual machines on a single physical machine, sharing the resources of that single machine across multiple environments. One feature the virtual machine can offer to malware researchers is testing and disaster recovery with the use of snapshots. When something goes wrong (for example, the operating system is corrupted and cannot run) one can easily switch back to a previous snapshot and avoid the need of frequent backups and restores. There are several virtualization software available for commercial license or for free. VMware Workstation, Sun xVM VirtualBox and Microsoft Virtual PC are common virtualization software.

2.7.3 Sandbox Analysis The need for Sandbox Analysis Depending on antivirus gives a little information about the collected malware samples, even it may give false results. Antivirus scan depends on Virus Signature. A signature is an algorithm or hash (a number derived from a string of text) that uniquely identifies a specific virus. Most antivirus software are not able to detect zero day spreading malwares, they need to be added to the signatures database before they can be detected, so it’s not enough to depend only on antivirus and it’s important for security researchers to analyze the new collected malware. As already mentioned, analyzing unknown executable files is divided into two broad categories: static analysis and dynamic analysis. In static analysis the program’s binary code is disassembled first, then, both control flow and data flow analysis techniques can be employed to draw understand the functionality of the program. Dynamic analysis is the process of observing the code during run-time to determine the purpose and functionality of the malware sample. This manner has an advantage that the code is actually executed. Thus, sandbox analysis is immune to obfuscation attempts and has no problems with self-modifying programs. But still there is a problem in building the suitable environment in which the binary executable file can be executed

- 20 -

Chapter 2 – Botnet and Botnet Detection

safely without affecting the other parts of the network. Running malware directly on a real machine which is part of network or connected to the internet could be disastrous as the malicious code could easily escape and infect other machines. In addition, reinstalling the operating system on the machine after each dynamic test run is not an efficient solution because of the overhead that is involved. To solve these problems, sandbox techniques are used, sandbox is a secured environment which emulates real world environment to enable researchers to execute and observe malicious code securely. Having private sandbox is very useful but building sandbox tends to be very complicated, still there is the ability to depend on many available third party online sandboxes, we used two of them to get reports from different sources for the accurate analysis of malware samples, the first one is Anubis, and the other is Joebox. Both of them enable submitting the malware sample and they return back the analysis report. Normally analysis reports are divided into many categories like: General Information about the malware sample like file size and time to perform the analysis, File activities, Registry activities, Services activities, Process activities, in addition to the network activity part which is the most important for the network security.

ANUBIS The first sandbox used in this research is Anubis. Anubis[26] is a tool for analyzing the behavior of Windows PE-executables with special focus on the analysis of malware. Execution of Anubis results in the generation of a report file that contains enough information to give a human user a very good impression about the purpose and the actions of the analyzed binary. The generated report includes detailed data about modifications made to the Windows registry or the file system, about interactions with the Windows Service Manager or other processes and of course it logs all generated network traffic. The analysis is based on running the binary in an emulated environment and watching i.e. analyzing its execution. The analysis focuses on the security-relevant aspects of a program's actions, which makes the analysis process easier and because the domain is more fine-grained it allows for more precise results. It is the ideal tool for the malware and virus interested person to get a quick understanding of the purpose of an unknown binary. It is designed to be an open framework for malware analysis that allows the easy integration of other tools and research artifacts. Anubis is sponsored by Secure Business Austria and developed by the International Secure Systems Lab. The service is provided free of charge.

- 21 -

Southeast University Master Degree Thesis

JOEBOX Joe Sandbox[27] is a fully automated analysis system for Trojans, viruses and rootkits (malware). It requests malicious executable files such as PE, PDF (Acrobat Reader) or DOC (Microsoft Word) files as input and returns highly detailed reports describing the behavior of executables being executed. The well-structured reports show how the malware installs itself, how it communicates with the internet and how it hides its presence. With the help of advanced behavior signatures Joe Sandbox summarizes interesting actions, making the understanding of the behavior extremely easy. Joe Sandbox is suitable for manual as well as for large scale malware analysis. In the beginning of 2011, the 3.1.0 sandbox was released.

2.8 IRC- based measurement and detection According to the 2010 Symantec Annual Report[4], 31% of all identified C&C servers in 2009 used IRC as their communication protocol. IRC is a lightweight and robust chat protocol, offering a lot of functionality suitable for controlling even large botnets. An IRC-based C&C design implies one or more IRC channels, either on public IRC networks or on custom (sometimes compromised) servers, where bots report their presence and receive their commands. Typically, newly-launched bots will, as their first action, connect to such a channel and wait for further instructions. In order to measure bots using an IRC-based communication model, it is first necessary to obtain the information needed to join and participate in the botnet channel. Basic connection information, which involves the IP address and port number of the IRC server, as well as the channel used for controlling the botnet, can be extracted from malware samples of the botnet concerned. Depending on the complexity of the botnet communication mechanism, the login or authentication credentials, and also encryption functionality, may have to be extracted and understood.

2.9 Summary In this chapter we have introduced botnets and botnet mechanisms, techniques used by botnets creators to avoid detection, and efforts which have been done by security researchers to detect botnets. Also we have given a description to the tools used for botnet detection, especially low-interaction honeypots (Nepenthes) which is used in this research to detect the existence of botnets in the monitored network.

- 22 -

Chapter 2 – Botnet and Botnet Detection

Methodology and Implementation

- 23 -

Southeast University Master Degree Thesis

Chapter 3 Effective Collection of Malware samples 3.1 Overall System Architecture

Figure 4 distributed low-interaction honeypot system to detect botnet

The deployed system is designed as in Figure 4; it consists of at least one Linux server machine with nepenthes installed on it. We have installed two machines the first running Red Hat Enterprise Linux AS release 4, with nepenthes-0.2.2. On the other machine we have installed Linux Ubuntu server with nepenthes-0.2.2 also. The two machines form a distributed farm of nepenthes honeypots each one collects malware samples and submit the collected malware samples with some information like the source of infection, nepenthes sensor IP, and the honeypot IP to a central machine (we can call it malware samples central database server). Configurations on the nepenthes servers include adding a range of IP addresses (complete scope C of IP addresses for each honeypot) to increase the probability of catching the spreading malware. Each machine (nepenthes honeypot nodes) has two Network Interface Cards (NIC)s, one is allocated to the local login dedicated for the management, and the other is connected directly to the internet without any filtering on the gateway to enable the honeypot to receive as more as possible network attacks. The second NIC is configured to use a complete list of class C network IP addresses each one referred to as nepenthes honeypot sensor. - 24 -

Chapter 3 – Effective Collection of Malware samples

3.2 Nepenthes Installation On Red Hat Enterprise Linux AS release 4 (Nahant Update 4): Installation of nepenthes first requires installation of two packages (libadns, and libssh); these packages can be downloaded from the website http://rpm.pbone.net/ by searching for: nepenthes-0.2.2-1.el4.pp.i386.rpm It required the following two packages: libssh-0.2.1-0.2.svn193.el4.pp.i386.rpm libadns-1.4-3.el4.pp.i386.rpm Installation: rpm –i libssh-0.2.1-0.2.svn193.el4.pp.i386.rpm rpm –i libadns-1.4-3.el4.pp.i386.rpm rpm –i nepenthes-0.2.2-1.el4.pp.i386.rpm Running nepenthes as a service: service nepenthes start On Ubuntu Server apt-get install nepenthes It’s possible to face errors in binding ports, which means that the port ins listening to the real service, for example error binding port 25 that means that sendmail is running and it should be closed to make nepenthes listen to that port.

3.3 Honeypot Configuration Network Configuration Management Interface: /etc/sysconfig/network-scripts/ifcfg-eth0 IP:

*.*.25.34‡

Mask: 255.255.255.0 GW:

*.*.25.1

Exterior Interface: /etc/sysconfig/network-scripts/ifcfg-eth1 IP:

*.*.200.124

Mask: 255.255.255.0 GW:



*.*.200.1

IP addresses are anonimyzed

- 25 -

Southeast University Master Degree Thesis

Add more IP addresses To collect as more as possible malware samples, the Nepenthes server was set up to listen to 254 different IP addresses simulating a production network, this will increase the probability for the honeypot sensors to be scanned by infected machines. We created a script that set up 254 Virtual IP interfaces from the range *.65.200.2 - *.65.200.254 with *.65.200.124 as the real IP interface: for((i=2;i*.240.68.152:55814 IPv4

(ESTABLISHED)

ds->*.47.29.161:1309

The Nepenthes configuration files also were edited. The most important configuration file is /etc/nepenthes/nepenthes.conf were edited by uncomment the following line to enable http submission from the distributed nodes to the central malware server. "submithttp.so", "submit-http.conf", "" // submit files to a web server

- 27 -

Southeast University Master Degree Thesis

Another configuration file was modified is submit-http.conf as following: submit-http { url "http://malwaredbserver/nepenthes/submit_http.php"; email "email@address "; // optional user "?????"; // optional pass "?????"; // optional };

This will trigger the honeypot to request the URL when there is new submission. The “submit_http.php” was built as a PHP code to receive the submissions from the distributed honeypots including the malware file and to save the submission information in the malware server. Similar configuration is applied on the other honeypot machines.

3.4 Building the Malware Database Server This is a Linux machine running Apache web server with MySQL database. The management interface is a website hosted in this machine.

3.4.1 Nepenthes Database This database includes many tables as following: 

logged_downloads, logged_submissions are used to store the information from the main two log files of nepenthes, but these two tables are no more used because of the distribution, so instead of them http_submissions table was created.



http_submissions is the most important table in malware collection, it stores all the information submitted with the new malware, it contains the following fields:



remote_host

Remote honeypot

URL

Download URL

md5

Collected malware md5

source_host

Source of infection

target_host

Seneor IP address

date

Date

time

Time

Analysis table: in this table the scan result of antivirus is stored with the md5 of the scanned malware sample and date of scan result. md5

Malware md5

date

Scan or analysis date - 28 -

Chapter 3 – Effective Collection of Malware samples



filesize

Malware sample file size

analysis_avg

Avg antivirus

analysis_avast

Avast antivirus

analysis_mcafee

Mcafee antivirus

analysis_kaspersky

Kaspersky antivirus

analysis_clamav

Clamav antivirus

analysis_panda

Panda antivirus

Joebox analysis group of tables as following: id

The analysis id

md5

The md5 of the analyzed malware sample

date

Date of analysis

time

Time of the analysis

In addition to six tables they are: o Joebox_TCP: to store the tcp activity o Joebox_UDP: to store the udp activity o Joebox_DNS: to store dns activity o Joebox_ICMP: to store icmp activity o Joebox_HTTP: to store http activity o Joebox_IRC: to store IRC activity 

Anubis analysis tables are similar to the previous list of tables: in these tables all information related to Anubis analysis are stored. md5

Malware md5

link

Analysis report URL of the analyzed file

task_id

Reference number to the analysis task

analysis_date

Date of analysis

report_txt

Analysis report as txt

report_html

Analysis report as html

- 29 -

Southeast University Master Degree Thesis

traffic_pcap

Traffic pcap file (if exists it means the malware analysis has a network activity)

Report_xml

The information stored in these tables is extracted from the analysis reports of the joebox, Anubis sandbox.

3.4.2 Malware server Administration Interface It’s simply a website created using PHP code, It consists of many pages that allow to access the desired information on the system, display some statistics about malware collection and analysis, also allow to submit the collected malware samples to many online analysis services starting with sending these malware samples to some known antivirus websites, to submitting these malware samples to online sandboxs, getting the analysis reports, extracting information contained in these reports to be used in the future to understand the malicious activity of the collected malware samples.

3.5 Distributed Malware Collection All nepenthes nodes are connected to the central machine via http_submission, This machine collects all the information from the distributed nepenthes network. The central machine runs the administration Interface and is able to submit the collected malwares to online sandbox systems for analysis and to receive the analysis reports from them, in addition to scanning the malwares with some known antivirus systems. The http_submission is implemented as a PHP code to be requested by other honeypot nodes by configuring the nepenthes honeypot to request this http page as soon as it has a new malware, the new malware binary code is submitted to the malware database server in addition to the source of infection IP address, honeypot sensor IP address, md5-hash. One malware can be collected many times from the source of infection, so each time the information are logged but only one time the file is submitted to the malware server. Information transferred in submission is as the following example: email : email@address URL : creceive://*.32.152.221:26429 trigger : creceive://*.32.152.221:26429 md5 : 8e903e9a70ee432c270d5f132f311eaf sha512:39685a0fb8a1b5c275795cb0972110b4e0ae402e8302b55545ac63770de7c45463d4909d67b f8848659164498d7e51e6d5794c7a2ae15d8038fd3121830bc086

- 30 -

Chapter 3 – Effective Collection of Malware samples filetype : MS-DOS executable (EXE), OS/2 or MS Windows source_host : 2032179421

// IP address as integer

target_host : 3544303845 filename : index.html

3.6 Statistics For a period of about one year we have collected about 2500 different malware samples, we had on one of the honeypots (the main honeypot) which is always run more than 215,000 download attempts, and more than 21,000 successful downloads. But the other honeypot was run for two periods each one of about ten days (this is only related to IP availability not to a research intention). In the first run from 2010/11/04 to 2010/11/14 it has collected two new different malware samples which are already collected by the first nepenthes honeypot. In the second run from 2010/12/05 to 2010/12/15 it has collected also two new different malware samples but this time both the new collected malware samples haven’t been collected by the first nepenthes honeypot.

- 31 -

Southeast University Master Degree Thesis

3000 2500 2000 1500 1000 500 2011/2/1

2011/1/1

2010/12/1

2010/11/1

2010/10/1

2010/9/1

2010/8/1

2010/7/1

2010/6/1

2010/5/1

2010/4/1

2010/3/1

2010/2/1

2010/1/1

2009/12/1

0

Malware Sample

Figure 5 The total number of the (Unique) collected malware binaries

25000 20000 15000 10000 5000

20 09 -D

ec em 20 be 10 r -J an 20 u 10 ar -F y eb ru 20 ar y 10 -M ar ch 20 10 -A 20 pr il 10 -M ay 20 10 -J un 20 e 10 20 Ju 10 ly 20 -A ug 10 us -S t ep te 20 m be 10 r -O 20 ct 10 ob -N er ov 20 e 10 m b -D ec er e 20 m be 11 r -J an 20 u 11 ar -F y eb ru ar y

0

Submissions

Figure 6 The total number of the successful download of malware binaries

Figure 5 and Figure 6 show comparison between the total number of the collected malware

samples and the number of successfully downloaded malware binaries with a percentage of about 1/10, this is because many malware files were submitted many times on different sensors of the honeypot. The following table shows top-ten downloaded malware samples according to the number of submissions.

- 32 -

Chapter 3 – Effective Collection of Malware samples Table 2 Top-ten malware submissions

Malware md5

Number of

First date

Last Date

submissions 7d99b0e9108065ad5700a899a1fe3441 4992

2010/2/27

2010/11/15

928f4e82e333629e5221542dc0c81ff6 4251

2011/02/18 2011/02/22

98eb0fdadf8a403c013a8b1882ec986d 1844

2010/2/11

2010/9/22

1b7012e6f8abd316360694544074033b 1611

2011/1/3

2011/1/4

fb486908b086c67488dab1deb871f706 1235

2011/2/5

2011/2/5

2aa635dda735bbbb560b12a10f6a764c 746

2010/1/31

2010/2/1

fd28c5e1c38caa35bf5e1987e6167f4c 680

2010/6/12

2010/11/3

7d3fcccb077e7fe87e3f5d3483bc6f0f 555

2010/6/13

2010/7/9

1085f60dabfe6df63ec98ae3ad2860d0 491

2010/1/16

2010/2/2

4f86d95b1b57bd0f5f2e288de68547a1 458

2010/1/29

2010/2/4

e269d0462eb2b0b70d5e64dcd7c676cd 433

2010/2/11

2010/8/4

3.7 Effectiveness of IP addresses select target_host AS sensor, count(distinct md5) AS cnt from http_submissions where remote_host='*.*.25.34' Group By sensor order by cnt DESC; Average, Max, Min of different malware submissions for each sensor for the first nepenthes honeypot in a period of about one year is as given in the next table (Malware/IP): Average

12.93

Max

22

Min

6

The following figure shows the distribution of the different malware samples collected on the first nepenthes honeypot for a period of about one year, with the distribution of the successful submission on the nepenthes sensors. From the figure we know that it has equal probability to detect and collect malware samples on the different honeypot sensors. Distribution of Submissions and malware collections on the honeypot sensors in the period of about one year showed that each honeypot sensor has collected about 50 unique malware samples (which may be collected also by other sensors) with a total number of successful submissions ranges from about 100 to more than 200 successful downloads (the same malware collected many times on the same sensor) - 33 -

Southeast University Master Degree Thesis

Number of Submissions, Collected Malware samples / Honeypot Sensor 300 250 200 150 100 50 0 1

12

23

34

45

56

67

78

89 100 111 122 133 144 155 166 177 188 199 210 221 232 243 254

Number of Submissions

Number of Malware samples

Figure 7 Distribution of Submissions and malware collections on the honeypot sensors

The distribution of Malware samples on the honeypot sensors will show the number of sensors on them the malware sample has been submitted:

select

md5

AS

Malware,

count(distinct

target_host)

AS

ditribution from http_submissions Group By Malware order by ditribution DESC; The output of this query showed that only (142) malware samples from (2528) were submitted on different sensors (more than one sensor), that means the ability that they would not be collected if that sensor didn’t exist. This shows the importance to add as many as possible honeypot sensors to detect and collect more malware samples. The following table shows the distribution of the top 20 malware samples collected by the majority of nepenthes sensors. Distribution is the Number of Nepenthes sensors which collected the malware sample

- 34 -

Chapter 3 – Effective Collection of Malware samples Table 3 Malware samples collected by the majority of sensors Malware 7d99b0e9108065ad5700a899a1fe3441 1b7012e6f8abd316360694544074033b 928f4e82e333629e5221542dc0c81ff6 98eb0fdadf8a403c013a8b1882ec986d 986b59708d2ca33f4c1ad682a5d7a673 2aa635dda735bbbb560b12a10f6a764c 1085f60dabfe6df63ec98ae3ad2860d0 7efc6c86c744035c1d142f4c8b3ee76a b6a73bc3a81c008369b88f76b76aa1c3 456b9caaeca9c96fc4fce52f71412e47 85d6bdf6c29ae4065aa3570e2c5ca298 5423b2c0a73c74872956d07ec682f8f8 e74fbd3de884d8e8a5c3a617d9fd1470 e269d0462eb2b0b70d5e64dcd7c676cd 1af49e0cf3bf715d9055930a63d53566 fb486908b086c67488dab1deb871f706 4f86d95b1b57bd0f5f2e288de68547a1 3875b6257d4d21d51ec13247ee4c1cdb 5e8ccc4190f86e783d9c5b9f1388c934 6d1f2e8419280d026828db77db958e9a

Distribution 254 (all sensors detected this malware) 253 253 253 249 235 229 227 227 226 223 222 219 216 210 206 204 200 197 188

The previous table shows that the first malware sample has been detected by all the scope of sensors, while others have been collected by less number of nepenthes sensors.

3.8 Effectiveness of adding more honeypot nodes Actually it is not necessary to add more machines in the same place. As nepenthes is scalable, that means it has the ability to be configured to listen to a big number of IP addresses (nepenthes sensors). Building a distributed honeypot helps adding more machines in different places to collect malware samples and submit them to the central malware database server. As we have mentioned earlier, we set the second honeypot to run for two periods each one about 10 days. An examples of malware samples which has been collected on both the two machines is (95d1a78f0dd88967080df1bb840a86cf) which first has been collected 2010-11-07 on the first machine on 100 different sensors, the first sensor collected this malware sample was (*.*.200.148) from the source(*.*.76.65), also it has been collected after one week 2010-11-13 from the source (*.53.39.121). And the first sensor collected it on the second honeypot machine was (*.65.201.147) and then it has been collected 92 times on different sensors of the second honeypot machine on the sequence of IP addresses (*.65.201.2, *.65.201.3, *.65.201.4,…., - 35 -

Southeast University Master Degree Thesis

*.65.201.90, *.65.201.91). In the second period of running the second honeypot and on one day 2010-12-10 it has collected two different malware samples: Table 4 malware samples collected only by the 2nd honeypot md5 source sensor date

time

622e21a25a68632adfdce8574bcb026b

*.91.160.71

*.65.201.61

12/10/2010

20:12:13

5597eeceb98994e521478510e15df464

*.246.93.237

*.65.201.75

12/10/2010

18:12:51

These malware samples never have been collected on the first honeypot machine.

3.9 The relationship between sources and collected malware samples This relationship gives an imagination about the propagation of this malware; so depending on that the source of infection is an infected machine, which means if the malware comes from different sources, that means this malware has infected all of these machines. On 2011/4/14 the number of collected malware samples was 2863 different malware samples. The following sql query returns the number of sources each malware came from: select

md5

AS

Malware,

count(distinct

source_host)

AS

ditribution

from

http_submissions Group By Malware order by ditribution DESC; The outcome of this query showed that 51 malware samples came from more than one source, while other malware samples came from only one source IP. For example the malware (6d1f2e8419280d026828db77db958e9a) had been downloaded from 242 different sources between 2011-02-03 and 2011-03-25. A complete description is attached in the appendix about this malware and its activity.

3.10 Nepenthes Honeypot Detection Many researches talk about the detection of existence of honeypot and honeypot aware botnets like in references [17][18]. Nepenthes can be detected simply by attempt to access any of its allegedly available services, for example if we try to access the web server which to know if it is a real webserver or an emulation, the result will be waiting without any response from the emulated web server as shown in the following:

- 36 -

Chapter 3 – Effective Collection of Malware samples

Another example is scanning nepenthes honeypot using nmap[19] which will give results like in the following: PORT

STATE

SERVICE

VERSION

21/tcp

open

ftp

2103/tcp

open

netbios-ssn Nepenthes fake honeypot netbios-ssn

2105/tcp

open

netbios-ssn Nepenthes fake honeypot netbios-ssn

2107/tcp

open

netbios-ssn Nepenthes fake honeypot netbios-ssn

Nepenthes HoneyTrap fake vulnerable ftpd

3.11 Summary This chapter has introduced the proposed infrastructure of the distributed low-interaction honeypot system, the configurations on the honeypots to collect as more as possible malware samples by adding more honeypot sensors, then it had a description of the malware database server, database tables, administration web interface and statistics related to the malware collection, also the effectiveness of the proposed systems by statistics.

- 37 -

Southeast University Master Degree Thesis

Chapter 4 Accurate analysis of the malware samples The simple idea behind low-interaction honeypots is like opening all the doors of the house allowing all people who have an intention to enter the house without permission, the next step is to know which one is the enemy. The same way in honeypot, we have allowed access to the emulated services on the nepenthes honeypot to collect as more as possible malware samples to understand the nature of these malware, and to know the type of each sample to achieve the objective of our research to detect the existence of botnet in the monitored network, and then to get information about the controller IP, and any other information can help to discover and understand these botnets. The first step is to depend on antivirus scan which may provide approximately trusted results, but also this will provide only the name or alias used by the antivirus to identify this malware, so the next step is to depend on behavior analysis to understand the malware in details.

4.1 Antivirus Scan Antivirus scan can be done by several ways.

4.1.1 Submit the malware to online antivirus scan The first way is to submit the malware to online antivirus scan websites, many antivirus companies provide online antivirus scan, and submission could be done by sending the malware sample as a zip file attachment sent by email to a specific email address, some antivirus scan providers require that zip file should be encrypted with a password. The scan result is returned to the sender email address containing malware file name with the scan result. We have implemented this way with the following scan providers: Table 5 Online Antivirus scan by email submission Antivirus Company [20]

Email aaddress

Password required

virustotal.com

[email protected]

infected

Mcafee[21]

[email protected]

infected

[22]

Kaspersky

[email protected]

www.ca.com[23] [24]

emsisoft.com [25]

avg

[email protected]

spyware

[email protected] [email protected]

- 38 -

Infected

Chapter 4 – Accurate analysis of the malware samples

4.1.2 Installed antivirus software on Linux The second way is installing the antivirus on the malware server and scanning the collected malware samples, some antivirus software have linux version of their antivirus like avast, avg, clamav, and panda. Depending on this way we should be careful to disable the antivirus service to protect the collected malware samples not to be deleted from the server and scan is done on demand with a parameter that scan will give a report only.

4.1.3 Installing antivirus on a separated windows OS The final way is installing antivirus on a separated windows OS and submitting the collected malware samples to this machine, scanning the collected malware samples, then the scan report is returned to the malware sever to extract the scan result to be stored in the database. Although all the previous ways have been implemented; but we decided to depend on the last one for many reasons. Kaspersky is installed on the windows machine, “avp.com” command line scan is executed as following: C:\Program Files\Kaspersky Lab\Kaspersky Internet Security 2011>avp.com scan /i0 /fa /R:c:\kis2011.txt c:\binaries20110421 Scan result report lines look like the following: Table 6 Kaspersky scan report sample 2011-04-21 16:28:14 c:\binaries20110421\b23b63950c92a2057aa74bb8b022a110

detected

Backdoor.Win32.Rbot.adqd

2011-04-21 16:28:14 c:\binaries20110421\b24aaed40b3220d5f2bf64ee9208cbc7

detected

Net-Worm.Win32.Allaple.b

2011-04-21 16:28:14 c:\binaries20110421\b23b63950c92a2057aa74bb8b022a110

skipped

2011-04-21 16:28:14 c:\binaries20110421\b24aaed40b3220d5f2bf64ee9208cbc7

skipped

2011-04-21 16:28:14 c:\binaries20110421\b25cb691a6d99d21ad36680a5a00da4b

detected

2011-04-21 16:28:14 c:\binaries20110421\b25cb691a6d99d21ad36680a5a00da4b

skipped

2011-04-21 16:28:14 c:\binaries20110421\b297006befbcea3a92af6d4b9c1d5517

detected

2011-04-21 16:28:14 c:\binaries20110421\b297006befbcea3a92af6d4b9c1d5517

skipped

2011-04-21 16:28:14 c:\binaries20110421\b2aaf21fed508877fea92f52df50148d

detected

Net-Worm.Win32.Allaple.b Net-Worm.Win32.Allaple.a Trojan.Win32.Buzus.cvzu

The type of malware provided by the antivirus scan will be helpful to understand the malware by referring to the antivirus website and searching for the type name. The following table contains the classification of the collected malware samples depending on Kaspersky scan, (32) samples are unknown to the scan done in 2011-04-21:

- 39 -

Southeast University Master Degree Thesis Table 7 Classification of the collected malware samples depending on Kaspersky antivirus scan Malware_Type Net-Worm.Win32.Allaple.b Net-Worm.Win32.Allaple.e Net-Worm.Win32.Allaple.a Backdoor.Win32.Rbot.bni Virus.Win32.Virut.av Backdoor.Win32.Rbot.adqd Virus.Win32.Virut.at Backdoor.Win32.Rbot.aftu Trojan.Win32.VB.ajzw Trojan.Win32.VB.ampt Virus.Win32.Virut.as Trojan.Win32.VB.aizl Net-Worm.Win32.Kolab.tmg Trojan.Win32.Swisyn.ubr Backdoor.Win32.Rbot.sr Trojan.Win32.Buzus.cvzu Virus.Win32.Virut.n Backdoor.Win32.Rbot.gen Net-Worm.Win32.Padobot.m Net-Worm.Win32.Padobot.n Trojan.Win32.Genome.rioo Virus.Win32.Virut.a P2P-Worm.Win32.Palevo.nxs Net-Worm.Win32.Padobot.h Net-Worm.Win32.Allaple.d Backdoor.Win32.Nepoe.th Net-Worm.Win32.Kolabc.gkd Backdoor.Win32.VanBot.gz Net-Worm.Win32.Padobot.af Packed.Win32.PePatch.ja Trojan.Win32.VB.afua

Number 1750 764 87 56 51 37 16 15 8 8 7 6 5 4 4 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1

Malware_Type Backdoor.Perl.Shellbot.a Backdoor.Perl.Small.l Backdoor.Linux.Small.al Net-Worm.Win32.Padobot.p Net-Worm.Win32.Kolab.jmq Net-Worm.Win32.Kolab.jrm Trojan.Win32.Genome.mpuu Net-Worm.Win32.Kolab.jkd Trojan-Dropper.Win32.Agent.dhct Net-Worm.Win32.Kolab.lon Net-Worm.Win32.Kolab.swt Net-Worm.Win32.Kolab.tcl Net-Worm.Win32.Kolab.tec Net-Worm.Win32.Kolab.jrk Net-Worm.Win32.Kolab.jwx HEUR:Worm.Win32.Generic Net-Worm.Win32.Kolab.jlb Virus.Win32.Sality.aa Virus.Win32.Virut.ao Virus.Win32.Virut.w Virus.Win32.Sality.ag Virus.Win32.Virut.ar

Number 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

4.2 SandBox Analysis Because each malware is executed in the sandbox alone to observe changes on the operating system and the network activity, and the malware can’t be executed for a long time, in addition to that botnets are not always active, they are just waiting for the command of the botnet controller, so it’s useful to execute the malware sample many times and collect the network activities of the analysis reports of the multiple executions. So to get more accurate results from behavior analysis, the collected malware is submitted to sandbox many times, this will give many analysis reports for the malware sample in different times which will ensure that if the malware is not active in the first time, it may be active in the second, and submitting them to different - 40 -

Chapter 4 – Accurate analysis of the malware samples

sandbox providers is useful to get more accurate results. The main characteristics of bots are the network behavior, so we pay attention on the network behavior section of the reports. All the network behavior of the analysis reports is collected with a multiple submissions of each malware sample in different times to multiple sandboxes. This will provide more accuracy to our results. Most of the existing botnet controllers use IRC to communicate with their zombies. The table in the appendix shows a brief description of some randomly selected malwares that show botnet behavior.

4.2.1 ANUBIS Submission to ANUBIS Submission to Anubis online sandbox is done by a python script. Email address can be provided to get the analysis report, in addition to the malware sample. The script return the URL of the analysis report which will be available online, and anytime it can be accessed to extract the required information from the desired report. For example to submit the malware (fb6d8a35816738a06242afee75ba597d) located in the binaries folder input the following command: python ./submit_to_anubis.py --email email@address ../binaries/fb6d8a35816738a06242afee75ba597d

This command will return the URL of the report, but because sandbox needs time to analyze the submitted file, so this link should be kept, otherwise, we should wait until receiving the email from Anubis which contain the URL of the analysis report which look like this email:

Subject: Analysis result for task 1157aae768d49e4840158b054349bec84 From: [email protected] Date: Wed, March 30, 2011 12:25 To: email@address The analysis of your file 'fb6d8a35816738a06242afee75ba597d' (Md5: fb6d8a35816738a06242afee75ba597d) is finished. You can find your report at http://anubis.iseclab.org/?action=result&task_id=1157aae768d49e4840158b054349 bec84 We wish you a nice day, The Anubis Team (http://anubis.iseclab.org) XFilter-NENC-Signature: 4d92b2f20024eb84

- 41 -

Southeast University Master Degree Thesis

We store the md5 with the task_id and the URL in the database. The provided URL looks like the following: Task Overview Save Report: Task ID:

1157aae768d49e4840158b054349bec84

File Name:

fb6d8a35816738a06242afee75ba597d

MD5:

fb6d8a35816738a06242afee75ba597d

Analysis Submitted:

2011-03-19 13:28:44

Analysis Started:

2011-03-30 04:19:49

Analysis Ended:

2011-03-30 04:20:02

Created Report:

New

Analysis No - The Analysis report was created on 2011-02-26 00:49:28.

Available Report Formats: Download Files:

HTML

XML

PDF

Text

• traffic.pcap

The analysis report of Anubis The analysis report can be retrieved as html, xml, pdf, and text formats, in addition to the network traffic captured when the malware sample is analyzed as a pcap file. It’s important to download these files to be available in the future. When the file is downloaded, the path to the local machine is stored in database. For some reason the file can’t be executed (mostly because the file can’t be executed, or it’s corrupted or not executable file, so we will not get an analysis report). Analysis report starts with a summary of the analysis report look like the following: Description

Risk

Write to foreign memory areas: This executable tampers with the execution of another process. Joins IRC Network: The executable connects to an IRC network, most probably functioning as a zombie in a botnet. Performs File Modification and Destruction: The executable modifies and destructs files which are not temporary. AV Hit: This executable is detected by an antivirus software. Autostart capabilities: This executable registers processes to be executed at system start. This could result in unwanted actions to be performed automatically. Changes security settings of Internet Explorer: This system alteration could seriously affect safety surfing the World Wide Web. Creates files in the Windows system directory: Malware often keeps copies of itself in the Windows directory to stay undetected by users. Execution did not terminate correctly: The executable crashed. Modify system files: This executable modifies files in the windows system directories. Spawns Processes: The executable produces processes during the execution. Performs Registry Activities: The executable creates and/or modifies registry entries.

- 42 -

Chapter 4 – Accurate analysis of the malware samples

The most important part of this report in the network activity which is like the following example: Network Activity - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

dns.aswend.com

DNS_TYPE_A

*.166.6.86

YES

udp

- Opened Listening Ports: Port

Type

80

tcp

21

tcp

- IRC Conversations: From ANUBIS:1038 to *.166.6.86:7000 Nick: SL665466429715 Username: vmsqxjaxyyah Joined Channel: #GL with Password .x. Channel Topic for Channel #GL: ".advscan asn1http 100 5 0 217.x.x.x -s" Private Message to Channel #GL: ".ksa ack 67.220.72.4 119 500" Private Message to Channel #GL: ".4 h4cker -s" - TCP Connection Attempts: from ANUBIS:1039 to *.125.117.71:80 from ANUBIS:1040 to *.227.211.156:80 from ANUBIS:1041 to *.124.43.18:80 from ANUBIS:1042 to *.171.199.210:80 ….

All network activity is extracted from the analysis reports and stored in the database to make it easy to understand the malicious activity in the monitored network, and to locate the controllers IP addresses.

4.2.2 JOEBOX Submission to ANUBIS The same way like Anubis, submission to Joebox online sandbox is done by a python script, email address here should be provided to get the analysis report, in addition to the malware sample. The script returns only the number of malware samples waiting in the queue to be - 43 -

Southeast University Master Degree Thesis

analyzed. For example to submit the malware (fb6d8a35816738a06242afee75ba597d) located in the binaries folder input the following command: python ./ submit_to_analysis.py email@address ../binaries/fb6d8a35816738a06242afee75ba597d

This command will return the number of the malware samples waiting in the queue to be analyzed.

The analysis report of JoeBox When the sample is analyzed, the analysis report is attached with an email to the email address specified in the submission command. The received email looks like this: Subject: Joebox has analysed your submission (xp.jbs,ed25fcaf3e3a2fcf35e51f40a9e13030) id: 9006 system: xp From: [email protected] Date: Wed, March 16, 2011 11:15 To: email@address Priority: Normal

Thank you for using Joe Sandbox. The attached files contain all kind of behaviour which Joe Sandbox captured. Best regards Joe Security Ps: This message was generated automatically. Do not replay to this message. Use [email protected] instead. Your comments: Attachments: analysis.zip

275 k

[ application/zip ]

Download

We have performed a script to extract the attached zip file, extract the html file analysis report, and from this html file network traffic data is extracted and stored in the database. Joebox html analysis report looks like in this example:

- 44 -

Chapter 4 – Accurate analysis of the malware samples Table 8 Joebox html analysis report

4.3 Summary This chapter introduced the analysis of the collected malware samples, ways implemented to scan malwares by antivirus, and we presented the classification of the collected malware samples depending on kaspersky antivirus scan results, then we talked about the behavior analysis of the malware samples depending on two online sandboxes (Anubis and JoeBox) and how the malware samples can be submitted to sandbox and how the analysis report is received and the intended data in the report is extracted.

- 45 -

Southeast University Master Degree Thesis

Chapter 5 Results Achieved 5.1 DNS The next table includes all different DNS queries requested by all behavior analysis for the collected malware samples. Table 9 Overall DNS Traffic of all analyzed samples ANUBIS dns botz.noretards.com botz.noretards.com botz.noretards.com botz.noretards.com botz.noretards.com botz.noretards.com

JoeBox ip *.111.73.201 *.104.35.224 *.202.109.119 *.104.35.224 *.82.57.7 *.202.109.119 *.104.35.224 *.82.57.7 *.104.35.224

botz.noretards.com. tuwien.ac.at

brussels.be.eu. undernet.org

caen.fr.eu.undernet.org caen.fr.eu.undernet. org.tuwien.ac.at citi-bank.ru coins.dal.net comdns.aswend.com comdns.aswend.com diemen.nl.eu. undernet.org dns.aswend.com dns.aswend.com dns.aswend.com dns.aswend.com image.perfectexe.com kdddaber.com ku.perfectexe.com lulea.se.eu.

*.125.182.255 *.141.29.10 *.237.188.216 *.109.20.90 *.18.164.194 *.47.220.2 *.197.175.21

*.155.0.224 *.14.236.50 *.39.164.4 *.107.249.167 *.109.20.90 *.10.10.10 *.166.6.86 *.16.165.227 *.107.249.167 *.170.127.203 *.217.162.178 *.170.127.203

dns botz.noretards.com dns.aswend.com dns.aswend.com dns.aswend.com go.microsoft.com go.microsoft.com go.microsoft.com ieonlinews.microsoft.com ieonlinews.microsoft.com m.drd3h.com proxim.ircgalaxy.pl proxim.ircgalaxy.pl proxim.ntkrnlpa.info proxim.ntkrnlpa.info ss.ka3ek.com ss.ka3ek.com ss.MEMEHEHZ.INFO ss.memehehz.info ss.nadnadzzz.info ss.nadnadzzz.info tx.mostafaaljaafari.net www.ieaddons.com www.ieaddons.com www.ieaddons.com www.ieaddons.com www.microsoft.com www.microsoft.com www.microsoft.com www.microsoft.com www.microsoft.com www.microsoft.com www.public-trust.com www.public-trust.com

- 46 -

ip

*.166.6.86 *.107.249.167 *.4.11.160 *.55.57.251 *.55.113.208

*.133.119.206 *.68.16.30 *.196.130.50 *.196.130.50 *.196.130.50

*.136.35.139 *.192.170.187 *.236.15.26 *.46.170.10 *.46.170.123 *.4.31.252 *.55.12.249 *.55.21.250 *.18.20.10

Chapter 5 – Results Achieved undernet.org lulea.se.eu.undernet. org.tuwien.ac.at m.DRD3H.COM moscow-advokat.ru moscow-advokat.ru moscowadvokat.ru.tuwien.ac.at msus.tuwien.ac.at proxim.ircgalaxy.pl proxim.ntkrnlpa.info proxim.ntkrnlpa.info proxim.ntkrnlpa.info proxim.ntkrnlpa.info .tuwien.ac.at proxima.ircgalaxy.pl proxima.ircgalaxy.pl ss.ka3ek.com ss.ka3ek.com ss.ka3ek.com ss.ka3ek.com ss.nadnadzzz.info ss.nadnadzzz.info worker-24.seclab. tuwien.ac.at wpad wpad. wpad.tuwien.ac.at xx.enterhere.biz xx.enterhere.biz. tuwien.ac.at xx.ka3ek.com xx.ka3ek.com. tuwien.ac.at xx.nadnadzz.info xx.sqlteam.info

xx.ka3ek.com xx.nadnadzz.info xx.sqlteam.info *.107.249.167 *.98.86.164

*.130.34.224 *.133.119.206 *.111.73.201 *.68.16.30

*.190.222.139 *.133.119.206 *.196.130.50 *.196.130.50 *.196.130.50 *.45.13.154 *.43.232.36 *.196.130.50 *.43.232.36 *.130.56.24

*.43.236.67 *.237.86.66 *.237.86.76

As we notice in the Joebox analysis that some DNS records are for some known websites, this means false positive. It’s not enough to depend only on DNS activity to add these domain names and IP addresses to the black list, but we should understand the nature of traffic done with these addresses.

- 47 -

Southeast University Master Degree Thesis

5.2 IRC In order to measure bots using an IRC-based communication model, it is first necessary to obtain the information needed to join and participate in the botnet channel. Basic connection information, which involves the IP address and port number of the IRC server , as well as the channel used for controlling the botnet, can be extracted from malware samples of the botnet concerned. Depending on the complexity of the botnet communication mechanism, the login or authentication credentials, and also encryption functionality, may have to be extracted and understood. An example on these IRC botnets: Table 10 Example of IRC botnet Malware md5

9748a48ea0a327c63600f6d739bf147a

Malware Date

2011.03.16

Malware Size

268288 byte

Malware File Type

/var/www/nepenthes/binaries/9748a48ea0a327c63600f6d739bf147a: PE32 executable for MS Windows (GUI) Intel 80386 32-bit byte

Source

source_host *.53.179.226 *.53.179.232 *.53.179.227 *.53.179.233 *.102.234.167

Anti virus

date 2011-03-15 2011-03-15 2011-03-15 2011-03-15 2011-03-16

Kaspersky

Backdoor.Win32.Rbot.sr

Behavior Analysis Result DNS

IRC

dns

ip

dns.aswend.com

*.166.6.86

dst_ip dst_port nick server_pass src_port user joined_channel password *.166.6.86 7000 FL486913110742 (none) 1038 ihlsxbinoura #GL .x. *.166.6.86 7000 FL486913110742 (none) 1038 ihlsxbinoura

We have noticed that (74) Different malware samples showed IRC traffic when they were analyzed.

- 48 -

Chapter 5 – Results Achieved Table 11 IRC botnet with IRC server IP Kaspersky Type Malware MD5 Virus.Win32.Virut.av 90cdeb80d56213f0ae19d63a1c042311 fab8b123cd98d3ce845bf65bf562b62f cfd565624d470dc5c3711d9856bfed15 456b9caaeca9c96fc4fce52f71412e47 3879e4e1a8db13d96e7a1589157cc2f2 0683daab3cf9575e4297295fc6afcb0c 29f8dea05c9fe1144f856ca98bf82f6b 71f0e8fab1b7fc6c10babe22da0984a9 8d0e06cb121eaa3257fa8e2d6cb8b259 b17ae6f4dd5f4e9a95389bf54e33d954 54b30a5438eb02a422ea871697010ef9 5b7603f921774870140aceb6c7a1d625 f9393826f88905c278df91eefdf347a6 7c6cff4ab19c527c6060876b574ac33c c5a3c608b232069acf4417a27e0e72e4 0f6c81e7c4549b99e3c63289da9f9ab0 7439af17412e9ba71f4769a1719c4288 36a80c382c457eaac85aff8ab1f5a9cd Net-Worm.Win32.Padobot.n 7f60162c2c0bd2cc7531e51328e98290 Backdoor.Win32.Rbot.aftu

Trojan.Win32.Buzus.cvzu

Backdoor.Win32.Rbot.adqd

Backdoor.Win32.Rbot.sr

Virus.Win32.Virut.n Backdoor.Win32.Rbot.gen

Net-Worm.Win32.Kolab.jmq Net-Worm.Win32.Kolab.jrm

2fa0e36b36382b74e6e6a437ad664a80 1d419d615dbe5a238bbaa569b3829a23 52c022dc2c91252c10983dba123b02d7 3228f8bc721572422c268f244476dbb8 14a09a48ad23fe0ea5a180bee8cb750a 93094c5ea5a47e5c5f3e020f2c434c35 e269d0462eb2b0b70d5e64dcd7c676cd df51e3310ef609e908a6b487a28ac068 bb39f29fad85db12d9cf7195da0e1bfe 1085f60dabfe6df63ec98ae3ad2860d0 0368ff583a3f118a16a72fd6c53e6508 b2aaf21fed508877fea92f52df50148d 5423b2c0a73c74872956d07ec682f8f8 83138759dd8c5d5b99d41d9f6c86ea8b 1af49e0cf3bf715d9055930a63d53566 b3138b807d340e5edaf68e732ceb6c13 9748a48ea0a327c63600f6d739bf147a 4f8dc63adfe71098316be0dac64dd8de 3226769c82cbbc24c230b1fe4fb1e0db 9b2c7d1c22c849265fc6cee80331fef8 f04bb801ec4558e37dee64bc46870977 ee7c21b5c7e9243abb5d971694e1cda5 b55df243e2dbacb561994166698214e4 ae77cdc159c3fa5d1afb7993d78da564

- 49 -

IRC Server *.*.182.255 *.*.16.30 *.*.16.30 *.*.16.30 *.*.16.30 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.249.167 *.*.249.167 *.*.164.4 *.*.164.4 *.*.236.50 *.*.20.90 *.*.57.7 *.*.109.119 *.*.57.7 *.*.35.224 *.*.57.7 *.*.57.7 *.*.57.7 *.*.35.224 *.*.109.119 *.*.232.36 *.*.232.36 *.*.13.154 *.*.16.30 *.*.16.30 *.*.16.30 *.*.249.167 *.*.6.86 *.*.249.167 *.*.249.167 *.*.249.167 *.*.249.167 *.*.164.4 *.*.249.167 *.*.249.167

Southeast University Master Degree Thesis Net-Worm.Win32.Kolab.jkd TrojanDropper.Win32.Agent.dhct Net-Worm.Win32.Kolab.lon Net-Worm.Win32.Kolab.swt Net-Worm.Win32.Kolab.tcl Virus.Win32.Virut.at

Net-Worm.Win32.Kolab.tec Net-Worm.Win32.Kolab.tmg

Net-Worm.Win32.Kolab.jrk Net-Worm.Win32.Kolab.jwx Net-Worm.Win32.Kolab.jlb Virus.Win32.Virut.as

Virus.Win32.Virut.w Virus.Win32.Sality.ag Virus.Win32.Virut.ar

83332be7905efc51a9beef2610c66b56 f176a160ef615d6a7b2fd43bd4394107

*.*.249.167

2ed65d8c8e0bc98e54da5d7f10d5ebed 6d1f2e8419280d026828db77db958e9a fb6d8a35816738a06242afee75ba597d fb486908b086c67488dab1deb871f706 df565c384834c033ab829702ac174adf 3bda8140060a9a1b4938158ded7c2e99 7debc1f6deade586a4ff4a3624e6fcef 893f58f812f30c2e219ac5116104bc7b 3597d81d22898024b9b99214dda7abe9 9e32df8465aa69888fb190d933170daf 0b04862cff663c36cb7e70e4366bca6f 319c2141b0e183edfbd9e2de4a159c9f 451946183311550a1d75ecd573a2eefb e61162721314b69ece550bfb0a85cd7b 386d138b865da4e510f6b7b06bf13221 017568f08ac420b48d7c04b5036a6d30 2727fcf78f8ad1cff3082e8cef8e2e25 46e42f17eb9750951684db810d8b2e10 6c4029a126eadb39c295583ac326743d 4d5e61573f03d47b91487e9a0c57b3e5 af9e9c452c0cc04a057b40bd8dbfcb91 dc141d7c97ad5f58dc8a13ff10e2f5d3 5e7355177a7b7d4188dcaafbcc7d4aff c321d880a6aeb4879d70dde7e9fa56f8

*.*.249.167 *.*.249.167 *.*.6.86 *.*.6.86 *.*.16.30 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.6.86 *.*.6.86 *.*.6.86 *.*.6.86 *.*.249.167 *.*.249.167 *.*.249.167 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.164.4 *.*.249.167 *.*.164.4 *.*.249.167 *.*.249.167

Also we have noticed that some IRC servers are involved in IRC traffic with many different botnets. The following table shows the relationship between the IRC server IP and the botnet name. Table 12 IRC server with all botnets connect with it IRC Botnet Name Virus.Win32.Virut.av Backdoor.Win32.Rbot.adqd Virus.Win32.Virut.at *.14.236.50 Net-Worm.Win32.Padobot.n *.82.57.7 Backdoor.Win32.Rbot.aftu *.43.232.36 Trojan.Win32.Buzus.cvzu *.104.35.224 Backdoor.Win32.Rbot.aftu *.45.13.154 Trojan.Win32.Buzus.cvzu *.109.20.90 Net-Worm.Win32.Padobot.n *.125.182.255 Virus.Win32.Virut.av *.202.109.119 Backdoor.Win32.Rbot.aftu *.107.249.167 Backdoor.Win32.Rbot.sr Virus.Win32.Virut.n IRC Server IP *.68.16.30

- 50 -

Chapter 5 – Results Achieved

*.166.6.86

*.39.164.4

Backdoor.Win32.Rbot.gen Net-Worm.Win32.Kolab.jmq Net-Worm.Win32.Kolab.jrm Net-Worm.Win32.Kolab.jkd Trojan-Dropper.Win32.Agent.dhct Net-Worm.Win32.Kolab.lon Net-Worm.Win32.Kolab.jrk Net-Worm.Win32.Kolab.jwx Net-Worm.Win32.Kolab.jlb Virus.Win32.Sality.ag Virus.Win32.Virut.ar Virus.Win32.Virut.av Virus.Win32.Virut.as Net-Worm.Win32.Kolab.swt Net-Worm.Win32.Kolab.tcl Net-Worm.Win32.Kolab.tec Net-Worm.Win32.Kolab.tmg Backdoor.Win32.Rbot.sr Virus.Win32.Virut.as Backdoor.Win32.Rbot.gen Virus.Win32.Virut.at Virus.Win32.Virut.av Virus.Win32.Virut.w

5.3 HTTP Activity The following table contains the HTTP server IPs with all collected malware samples which have http network activity with the correspondence IP address. Table 13 HTTP servers IP with botnets that connect with HTTP Server IP Botnet *.68.16.30

Virus.Win32.Virut.av Backdoor.Win32.Rbot.adqd Virus.Win32.Virut.at Virus.Win32.Virut.as Virus.Win32.Virut.ao Virus.Win32.Virut.w Virus.Win32.Virut.ar

*.155.0.224

Net-Worm.Win32.Padobot.h Virus.Win32.Virut.av Net-Worm.Win32.Padobot.m Net-Worm.Win32.Padobot.af Net-Worm.Win32.Padobot.p Virus.Win32.Virut.at Trojan.Win32.Genome.mpuu

*.170.127.203

Virus.Win32.Virut.a

- 51 -

Southeast University Master Degree Thesis Virus.Win32.Virut.n *.217.162.178

Virus.Win32.Virut.a Virus.Win32.Virut.n

*.111.73.201

Virus.Win32.Virut.av

*.103.113.46

Net-Worm.Win32.Kolab.swt

*.162.40.26 *.54.35.49 *.3.42.17 *.156.129.6 *.207.48.113 *.152.59.223 *.50.147.212 *.255.177.236 *.252.107.197 *.48.51.104 *.198.141.107 *.94.206.40 *.95.156.43 *.196.44.254 *.239.80.155 *.87.16.221 *.34.182.208 *.82.105.201 *.80.82.145 *.29.89.241 *.181.153.175 *.232.72.26 *.129.160.143 *.27.248.132 *.229.26.43 *.23.179.93 *.123.44.249 *.125.117.71

Net-Worm.Win32.Kolab.tcl

*.227.211.156 *.124.43.18 *.171.199.210 *.18.61.224 *.67.7.145 *.220.95.134 *.68.213.147 *.13.224.1 *.116.86.14

- 52 -

Chapter 5 – Results Achieved

5.4 IRC & HTTP together Example1: IRC traffic with HTTP traffic: Malware md5

1af49e0cf3bf715d9055930a63d53566

Malware Date

2010.08.16

Malware Size

57344 byte

Malware File Type

/var/www/nepenthes/binaries/1af49e0cf3bf715d9055930a63d53566: PE32 executable for MS Windows (GUI) Intel 80386 32-bit byte

Source

source_host

date 2010-08-16

*.198.192.9 Anti virus

Kaspersky

Backdoor.Win32.Rbot.adqd

Behavior Analysis Result ANUBIS DNS Activity dns

DNS

ip

proxim.ntkrnlpa.info wpad

*.68.16.30

JOEBOX DNS Activity dns

ip

proxim.ntkrnlpa.info proxim.ntkrnlpa.info

*.68.16.30

ANUBIS IRC Activity dst_ip dst_port nick server_pass src_port user joined_channel password tmniztkq (none) 1038 e020501 *.68.16.30 80 IRC

JOEBOX IRC Activity dst_ip *.68.16.30 *.68.16.30 *.168.111.6

data Nickname:scsrfeyp User:j Joinschannel::&virtu3

ANUBIS ICMP Activity destinations ICMP

218.10.0.0/16 JOEBOX ICMP Activity destinations 199.243.0.0/16 ANUBIS HTTP Activity

HTTP

dst_ip *.68.16.30 JOEBOX HTTP Activity

- 53 -

Southeast University Master Degree Thesis dst_ip

host

data HTTP/1.1200OK

*.168.111.6 ANUBIS TCP Scan Activity subnet 218.10.0.0/16 218.10.0.0/16 218.10.0.0/16 218.10.0.0/16

TCP Scan

remote_port 445 139 445 139

Example2: (First time collected in 2011.04.18) Malware md5

f04bb801ec4558e37dee64bc46870977

Malware Date

2011.04.18

Malware Size

107520 byte

Malware File Type

/var/www/nepenthes/binaries/f04bb801ec4558e37dee64bc46870977: PE32 executable for MS Windows (GUI) Intel 80386 32-bit byte

Source

Anti virus

source_host 134 different sources Kaspersky

date 2011-04-13, 2011-04-16, 2011-04-18 Backdoor.Win32.Rbot.gen

Behavior Analysis Result

DNS

IRC

ANUBIS DNS Activity dns comdns.aswend.com

ip *.107.249.167

ANUBIS IRC Activity dst_po server_ src_p joined_cha dst_ip nick user password rt pass ort nnel *.107.249.167 7000 SL773995095309 (none) 1028 fnfjiuvsldhqym #GL .x. *.107.249.167 7000 SL773995095309 (none) 1028 fnfjiuvsldhqym ANUBIS HTTP Activity dst_ip

HTTP

*.160.218.153 *.107.202.193 *.54.237.103 *.154.52.133 *.203.180.0 *.147.168.55 *.250.30.69 *.44.183.247 *.42.159.191 *.196.246.181

- 54 -

Chapter 5 – Results Achieved

5.5 Effectiveness of multiple behavior analysis in different time Accurate analysis results come from the multiple analyses of the collected malware samples in different time, this is because botnets are not always active, and sometimes they will show only activities that are not different from any other virus or worm. Depending on behavior analysis to detect and appoint to any code that is a botnet we need to see its botnet activity, since it’s not possible to allow the collected malware to be run forever in sandbox, it’s necessary to run them at least for multiple times in different periods. Here we want to show the effectiveness of this way by examples. The malware sample with md5: 6d1f2e8419280d026828db77db958e9a was caught for the first time in 2011-02-03 and 2011-02-04 from three different sources, and then it was submitted in the period from 2011-03-15 to 2011-03-15 from 268 different sources. Network activity on the behavior analysis report for this malware in 2011-02-21 included the following activities: - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

dns.aswend.com

DNS_TYPE_A

*.166.6.86

YES

udp

- Opened Listening Ports: Port

Type

80

tcp

21

tcp

- IRC Conversations: From ANUBIS:1038 to *.166.6.86:7000 Nick: SL982113567975 Username: gmfduccvablxn Joined Channel: #GL with Password .x. Channel Topic for Channel #GL: ".advscan asn1http 100 5 0 65.x.x.x -s" Private Message to Channel #GL: "-^C4scan^C- Already 404 scanning threads. Too many specified." - TCP Connection Attempts: from ANUBIS:1044 to *.103.113.46:80 from ANUBIS:1039 to *.162.40.26:80 from ANUBIS:1040 to *.54.35.49:80

- 55 -

Southeast University Master Degree Thesis from ANUBIS:1041 to *.3.42.17:80 from ANUBIS:1042 to *.156.129.6:80 from ANUBIS:1043 to *.207.48.113:80

This malware has been submitted twice in 2011-04-14 and the network activity in the analysis reports included only a DNS query to the same domain name queried in the previous analysis with a different reply, without any other network activity.

5.6 Summary In this chapter we have displayed the most important results to indicate the existence of botnet in the collected malware samples. We have displayed comprehensive results of the overall analysis results in addition to some detailed information of some selected analysis reports.

- 56 -

Chapter 6 - Conclustion

Chapter 6 Conclusion In this thesis we have presented our work in detecting the existence of botnet by designing and implementing a distributed low-interaction honeypot. This work introduced the importance of collecting and analyzing malware as part of the detection procedure, and how the behavior analysis shows information can’t be obtained by only scanning the malware files by antivirus. Lessons learned from this work include that complete efforts should be done all over the world by cooperation between network researches to integrate results achieved in the field of fighting botnets to stop them. Actually detecting botnets depends on suspicion in the malware which is showing activities similar to botnet activities. The information obtained from the proposed system provides clues to understanding the botnets and in divesting the controllers of these botnets. Data stored in database will be also helpful to study the propagation methods and targets of the analyzed malwares. Advantages of this work include building the distributed infrastructure that allow adding different types of low-interaction honeypots anywhere, automatic antivirus scan and behavior analysis of the collected malware and the storage of the information of the analysis reports in database helps future studies about botnets, also the ability to spot controller IP of the detected botnets. For the malware collection different low-interaction honeypots can be utilized like Dionaea (which is expected to be the successor of nepenthes) with the implemented distributed structure.Disadvantage includes depending on free online sandbox that sometimes may not be available, or stop free service like in Joe Sandbox which stopped the free service and no more it will be available for free users, in addition to that free sandbox will run the malware for only a short period which may not be enough to give the accurate results. For these reasons it’s important for future malware researches to build own sandbox configured according to the demands of the desired analysis.

- 57 -

Southeast University Master Degree Thesis

Appendix I: The most active malware detailed description 1. Md5: 6d1f2e8419280d026828db77db958e9a Malware Size 329728 byte Ikarus Virus Scanner Backdoor.Win32.Rbot Kaspersky Net-Worm.Win32.Kolab.swt Joebox analysis 2011-02-06 Analysis report contain only DNS query to the domain name (dns.aswend.com) without any answers. Anubis analysis 2011-03-19 - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

dns.aswend.com

DNS_TYPE_A

*.166.6.86

YES

udp

- Opened Listening Ports: Port

Type

80

tcp

21

tcp

- IRC Conversations: From ANUBIS:1038 to *.166.6.86:7000 Nick: SL982113567975 Username: gmfduccvablxn Joined Channel: #GL with Password .x. Channel Topic for Channel #GL: ".advscan asn1http 100 5 0 65.x.x.x -s" Private Message to Channel #GL: "-^C4scan^C- Already 404 scanning threads. Too many specified." - TCP Connection Attempts: from ANUBIS:1044 to *.103.113.46:80 from ANUBIS:1039 to *.162.40.26:80

- 58 -

Chapter 6 - Conclustion from ANUBIS:1040 to *.54.35.49:80 …..

2011-04-14 10:55:27 Network Activity - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

dns.aswend.com

DNS_TYPE_A

*.16.165.227

YES

udp

- TCP Connection Attempts: from ANUBIS:1028 to *.16.165.227:7000 from ANUBIS:1029 to *.16.165.227:7000 from ANUBIS:1030 to *.16.165.227:7000 from ANUBIS:1031 to *.16.165.227:7000

2011-04-14 10:56:28 Network Activity - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

dns.aswend.com

DNS_TYPE_A

*.16.165.227

YES

udp

- TCP Connection Attempts: from ANUBIS:1028 to *.16.165.227:7000

- 59 -

Southeast University Master Degree Thesis

2. MD5: 7d99b0e9108065ad5700a899a1fe3441 Malware md5

7d99b0e9108065ad5700a899a1fe3441

Malware Date

2010/2/27

2010/11/15

Number of submissions 4992 Malware Size

9353 byte

Malware File Type

/var/www/nepenthes/binaries/7d99b0e9108065ad5700a899a1fe3441: PE32 executable for MS Windows (GUI) Intel 80386 32-bit byte

Source IP

source_host

date

*.70.100.248 *.132.72.105 *.233.226.24 *.82.79.254 *.19.92.133 *.197.192.62 *.177.153.25 *.177.153.25 *.254.160.165 *.30.222.74 *.30.222.74 *.22.2.70 *.227.55.113 Anti virus

2010-02-27 2010-03-09 2010-05-22 2010-06-22 2010-07-19 2010-07-30 2010-07-30 2010-07-31 2010-10-12 2010-10-21 2010-10-22 2010-11-14 2010-11-15

Kaspersky

Net-Worm.Win32.Padobot.m

clamav

Worm.Padobot.M

avast

Win32:Padobot-Y

avg

Worm/Korgo.A

JoeBox: 1:2010-12-02 DNS: go.microsoft.com

Anubis 2010-06-04 - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

citi-bank.ru

DNS_TYPE_A

*.155.0.224

YES

udp

wpad

DNS_TYPE_A

NO

- Opened Listening Ports: Port

Type

5767

tcp

- 60 -

Chapter 6 - Conclustion - HTTP Conversations: From ANUBIS:1044 to *.155.0.224:80 - [*.155.0.224] Request: GET /index.php?id=akajywskgfovduwptu&scn=0&inf=0&ver=19&cnt=AUT Response: - TCP Scans: 16 IPs on Port 445 - Unknown TCP Traffic: from ANUBIS:1058 to *.197.238.11:445 State: Connection established, not terminated - Transferred outbound Bytes: 0 - Transferred inbound Bytes: 0 from ANUBIS:1043 to *.168.0.1:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 from ANUBIS:1046 to *.164.135.114:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 from ANUBIS:1039 to *.168.41.120:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 from ANUBIS:1041 to *.168.12.155:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 from ANUBIS:1059 to *.195.80.96:445 State: Connection established, not terminated - Transferred outbound Bytes: 0 - Transferred inbound Bytes: 0 from ANUBIS:1040 to *.168.54.141:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 from ANUBIS:1049 to *.85.254.186:445 State: Connection established, not terminated - Transferred outbound Bytes: 137 - Transferred inbound Bytes: 0 - TCP Connection Attempts: from ANUBIS:1041 to *.168.12.155:445 from ANUBIS:1042 to *.168.74.52:445 from ANUBIS:1043 to *.168.0.1:445 from ANUBIS:1045 to *.97.193.240:445

- 61 -

Southeast University Master Degree Thesis from ANUBIS:1047 to *.203.68.238:445 from ANUBIS:1051 to *.197.241.213:445 from ANUBIS:1050 to *.130.184.76:445 from ANUBIS:1053 to *.116.166.5:445 from ANUBIS:1052 to *.243.61.183:445 from ANUBIS:1056 to *.78.74.16:445

ANUBIS 2010-11-23 - DNS Queries: Name

Query Type

Query Result

Successful

Protocol

citi-bank.ru

DNS_TYPE_A

*.155.0.224

YES

udp

wpad

DNS_TYPE_A

NO

- Opened Listening Ports: Port

Type

7373

tcp

- HTTP Conversations: From ANUBIS:1038 to *.155.0.224:80 - [*.155.0.224] Request: GET /index.php?id=eclqbuclejuifhu&scn=0&inf=0&ver=19&cnt=AUT Response: - TCP Scans: 133 IPs on Port 445

- 62 -

References

References [1]

Michael Bailey1, Jon Oberheide1, Jon Andersen1, Z. Morley Mao1, Farnam Jahanian1,2, and Jose Nazario2. Automated Classification and Analysis of Internet Malware. C. Kruegel, R. Lippmann, and A. Clark (Eds.): RAID 2007, LNCS 4637, pp. 178–197, 2007. SpringerVerlag Berlin Heidelberg 2007

[2]

Puri Ramneek, “Bots and botnet - an overview”, SANS Institute, 2003-08-08, (URL:http://www.sans.org/reading_room/whitepapers/malicious/1299.php)

[3]

European Network and Information Security Agency Report 2011

[4]

2010 Symantec Annual Report

[5]

Beale, Jay et al, Snort 2.1 Intrustion Detection, Second Edition, Rockland, MA: Syngress Publishing, 2004.

[6]

Honeypots: Tracking Hackers. Spitzer, L. Addison-Wesley Professional, 2002.

[7]

http://old.honeynet.org/papers/honeynet/

[8]

The Honeynet Project, “Know Your Enemy, Learning about Security Threats”,Addison Wesley, 2004

[9]

http://old.honeynet.org/papers/gen2/

[10] P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling. The nepenthes platform: An efficient approach to collect malware. In Proceedings of International Symposium on Recent Advances in Intrusion Detection (RAID’06), Hamburg, September 2006. [11] M. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multi-faceted approach to understanding the Botnet phenomenon. In Proceedings of ACM SIGCOMM/USENIX Internet Measurement Conference (IMC’06), Brazil, October 2006. [12] http://dionaea.carnivore.it/ [13] http://blog.infosanity.co.uk/2009/11/01/nepenthes-is-dead-long-live-dionaea/ [14] http://nepenthes.carnivore.it/ [15] http://en.wikipedia.org/wiki/Sandbox_(computer_security) [16] Michalis Polychronakis1, Kostas G. Anagnostakis2, and Evangelos P. Markatos1, Network-Level Polymorphic Shellcode Detection Using Emulation, ICS-FORTH, January 2006

- 63 -

Southeast University Master Degree Thesis

[17] Cliff C. Zou Ryan Cunningham, “Honeypot-Aware Advanced Botnet Construction and Maintenance”: Proceedings of the 2006 International Conference on Dependable Systems and Networks (DSN’06) [18] Wang, P., Wu, L., Cunningham, R. and Zou, C. (xxxx) ‘Honeypot Detection in Advanced Botnet Attacks’, Int. J. Information and Computer Security, Vol. x, No. x, pp.xxx–xxx. [19] http://nmap.org/ [20] virustotal.com [21] mcafee.com [22] kaspersky.com [23] www.ca.com [24] emsisoft.com [25] avg.com [26] http://anubis.iseclab.org/ [27] http://www.joesecurity.org/ [28] Muhammad Najmi bin Ahmad Zabidi, Effective Malware Analysis with Nepenthes [29] http://www.wireshark.org/ [30] R. Puri. Bots & botnet: An overview. [31] http://www.sans.org/rr/whitepapers/malicious/1299.php, 2003. [32] J. R. Binkley and S. Singh. An algorithm for anomaly-based Botnet detection. In Proceedings of USENIX SRUTI’06, pages 43–48, July 2006. [33] Chao-Hsi Yeh and Chung-Huang Yang. Design and Implementation of Honeypot Systems [34] Based on Open-Source Software: ISI 2008, June 17-20, 2008, ©2008 IEEE [35] J. Newsome, B. Karp, and D. Song. Polygraph: Automatically Generating Signatures for Polymorphic Worms. In Proc. of the IEEE Symposium on Security and Privacy, May 2005. [36] Norman.

Norman

SandBox

Whitepaper.

http://sandbox.norman.no/pdf/03_sandbox%20whitepaper.pdf, 2007. [37] André R. A. Grégio1, Isabela L. Oliveira2, Rafael D. C. Santos3, Adriano M. Cansian2, Paulo L. de Geus1, Malware distributed collection and pre-classification system using honeypot technology. [38] Zhuge, J.; Holz, T.; Han, X.; Song, C. and Zou, W. “Collecting Autonomous Spreading Malware Using Highinteraction

- 64 -

References [39] Honeypots”. Proceedings

of

9th International

Conference on

Information

and

Communications Security (ICICS'07), Zhengzhou, China, December 2007. [40] Provos, N and Holz, T. “Virtual Honeypots: From Botnet Tracking to Intrusion Detection”. Addison Wesley, 2007. ISBN: 0-321-33632-1. [41] Spitzner, L. “Honeypots: Tracking Hackers”. Addison Wesley, 2002. ISBN: 0-321-10895-1 [42] Brazilian

Honeypots

Alliance



Distributed

Honeypots

Project.

Available

at:

http://www.honeypots-alliance.org.br/. [43] Cliff C. Zou Ryan Cunningham, Honeypot-Aware Advanced Botnet Construction and Maintenance [44] Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: Proceedings 2005 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 8–11, 2005, IEEE Computer Society Press, Los Alamitos (2005) [45] Jan Goebel1, Thorsten Holz2, and Carsten Willems2, Measurement and Analysis of Autonomous Spreading Malware in a University Environment: B. M. H¨ammerli and R. Sommer (Eds.): DIMVA 2007, LNCS 4579, pp. 109–128, 2007. Springer-Verlag Berlin Heidelberg 2007 [46] Rajab, M.A., Zarfoss, J., Monrose, F., Terzis, A.: My botnet is bigger than yours (maybe,better than yours): Why size estimates remain challenging. In: Proceedings of 1st Workshop on Hot Topics in Understanding Botnets (HotBots ’07) (2007) [47] Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007) [48] Konrad Rieck1, Thorsten Holz2, Carsten Willems2, Patrick D¨ussel1, and Pavel Laskov1 3, Learning and Classification of Malware Behavior: D. Zamboni (Ed.): DIMVA 2008, LNCS 5137, pp. 108–125, 2008. Springer-Verlag Berlin Heidelberg 2008 [49] Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. Journal in Computer Virology 2, 67–77 (2006) [50] Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007) [51] Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: Capturing system-wide information flow for malware detection and analysis. In: Proceedings of ACM Conference on Computer and Communication Security (October 2007)

- 65 -

Southeast University Master Degree Thesis

[52] Peter Wurzinger1, Leyla Bilge2, Thorsten Holz1,3, Jan Goebel3, Christopher Kruegel4, and Engin Kirda2, Automatically Generating Models for Botnet Detection, TR-iSecLab-0609001 [53] G. Gu, J. Zhang, and W. Lee. BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic. In Network and Distributed System Security Symposium (NDSS), 2008. [54] http://en.wikipedia.org/wiki/Obfuscated_code [55] Prosenjit Sinha, Amine Boukhtouta, Victor Heber Belarde, Mourad Debbabi, Insights from the Analysis of the Mariposa Botnet, 2010 5th International Conference on Risk and Security of Internet and Systems (CRISIS) (October 2010) [56] http://libemu.carnivore.it/ [57] Alexander K. Seewald, Wilfried N. Gansterer: On the detection and identification of botnets, computers & s e c u rity 2 9 ( 2 0 1 0 ) 4 5 – 5 8 [58] http://rpm.pbone.net/

- 66 -

Acknowledgements

Acknowledgements  First, I wish to express my gratitude to the Chinese Scholarship Council and Southeast University, especially to the College of Computer Science and Engineering, and the College of International Students for offering me the opportunity to study in Southeast University.  I also want to thank all the Computer Science Department for their advice which was great helpful. Foremost I offer my sincere gratitude to my supervisor Prof. Gong Jian for his patience and supporting me and for his advice, auspices, and encouragement. Also I would like to express my gratitude to all my classmates for their permanent support especially YangWang, WuHua, LiuShangDong, ZangJiaNing, LaiXiaoTing, ZhuHaiting, zhangWeiWei, and others…  I would like to thank my parents, brothers and sisters, Asking ALLAH to bless them for their support and their Duaa and prayers. Many thanks to my Wife and my son who gave me all the affection.  Thanks to all Chinese people for their modesty, courtesy, and helpfulness.  Finally, I would like to thank all my friends (brothers) I met and lived with in China as a one family supporting each other.

Ahmad Jakalan Wednesday, May 25, 2011

- 67 -

Southeast University Master Degree Thesis

About the author

Ahmad Jakalan Nationality: Syria

Languages Arabic (maternal), English, Chinese.

Papers Published 

Ahmad Jakalan, Gong Jian, Liu Shangdong: Distributed low-interaction honeypot system to detect botnets, 2011 International Conference on Internet and Web Engineering (ICIWE 2011) Kuala Lumpur, Malaysia



Ahmad Jakalan: Learning Management System, International symposium of Informatics & E-Learning, October 2009, Syria

Education 2008 – 2011 Master in Computer Networks (China, Southeast University) 2000 - 2005 Bachelor in Informatics Engineering (Syria, University of Aleppo) Membership SCS: Syrian Computer Society IACSIT: International Association of Computer Science and Information Technology Communication China – Nanjing (210096), Southeast University, Cheng yuan, dorm 404, Mobile: 0086-15366165651, Fax: 0086-25-83694035, Email: [email protected], [email protected]

- 68 -