service (DDoS) attacks, spamming, phishing, keylogging, click fraud, identify theft and information exfiltration [1]. Botnets can be centralized, distributed (P2P) or ...
Automatic Discovery of Botnet Communities on Large-Scale Communication Networks Wei Lu, Mahbod Tavallaee and Ali A. Ghorbani University of New Brunswick Fredericton, NB E3B 5A3, Canada
{wlu,m.tavallaee,ghorbani}@unb.ca
ABSTRACT Botnets are networks of compromised computers infected with malicious code that can be controlled remotely under a common command and control (C&C) channel. Recognized as one the most serious security threats on current Internet infrastructure, advanced botnets are hidden not only in existing well known network applications (e.g. IRC, HTTP, or Peer-to-Peer) but also in some unknown or novel (creative) applications, which makes the botnet detection a challenging problem. Most current attempts for detecting botnets are to examine traffic content for bot signatures on selected network links or by setting up honeypots. In this paper, we propose a new hierarchical framework to automatically discover botnets on a large-scale WiFi ISP network, in which we first classify the network traffic into different application communities by using payload signatures and a novel cross-association clustering algorithm, and then on each obtained application community, we analyze the temporal-frequent characteristics of flows that lead to the differentiation of malicious channels created by bots from normal traffic generated by human beings. We evaluate our approach with about 100 million flows collected over three consecutive days on a large-scale WiFi ISP network and results show the proposed approach successfully detects two types of botnet application flows (i.e. Blackenergy HTTP bot and Kaiten IRC bot) from about 100 million flows with a high detection rate and an acceptable low false alarm rate.
Categories and Subject Descriptors C.2.0 [Computer-Communication Network]: Security and Protection;
1. INTRODUCTION The Internet has witnessed the growth of botnets in recent years. Recent Symantec's report shows that botnets have become the biggest security threat to the current cyberworld, by conducting a large volume of malicious activities, such as distributed-denial-ofservice (DDoS) attacks, spamming, phishing, keylogging, click fraud, identify theft and information exfiltration [1]. Botnets can be centralized, distributed (P2P) or randomized according to different command and control (C&C) models and different communication protocols (e.g. HTTP, IRC, P2P or other creative communication protocols). In Figure 1, we illustrate a typical lifecycle of an IRC botnet and its attacking behavior. The botmaster first finds a new bot by exploiting its vulnerabilities remotely. Once affected, the bot will download and install the binary code by itself. After that, each bot on the botnet will attempt to find the IRC server address by DNS query, which is illustrated in Step 3 of Figure 1. Next is the communication step between bots and IRC server. In IRC based communication mechanism, a bot first sends a PASS message to the IRC server to start a session and then the server authenticates the bot by checking its password. In many cases, the botmaster also needs to authenticate itself to the IRC server. Upon the completion of these authentications, the command and control channels among botmaster, bots, and IRC server will be established. To start a DDoS attack, the botmaster only needs to send a simple command like ".ddos.start victim_ip". Receiving this command, all bots start to attack the victim server. This is shown in Step 8 of Figure 1. More information about the botmaster command library can be found in [2]. victim server
General Terms Security
8.DDOS 1.exploit 2.bot download
Keywords Botnet detection, traffic classification, machine learning
Botmaster
vulnerable host 6.pass
7.command
5.pass authen.
4.join
7.command
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASIACCS'09, March 10-12, 2009, Sydney, NSW, Australia. Copyright 2009 ACM 978-1-60558-394-5/09/03…$5.00.
Botnet
3.DNS query
DNS server
IRC server
Figure 1. Typical life-cycle of an IRC based botnet and its attacking behaviors
In reality, detecting and blocking such an IRC botnet, however, is not a difficult task since the whole botnet can be put down by blacklisting the IRC server. Network administrators who attempt to avoid computers on their internal network as being part of botnets can simply block the outbound IRC connections. Thus, more and more botnets are now evolving from the centralized communication way to the advanced distributed strategy, such as the early Sinit [3], Phatbot with WASTE command [4], Nugache [5] and the recent Peacomm (Storm worm) [6]. Compared to the traditional centralized C&C model, the distributed (Peer-to-Peer) C&C based botnet is much harder to be detected and destroyed because the bot's communication does not heavily depend on a few selected servers, and thus shutting down a single or even couple of bots cannot necessarily lead to the entire destruction of the whole botnet. Previous attempts to detect botnets are mainly based on honeypots [8,9,10,11,12,13], passive anomaly analysis [18,19,20,21] and traffic application classification [14,15,16,17]. Setting up and installing honeypots on the Internet is very helpful to capture malware and understand the basic behavior of botnets. The passive anomaly analysis for detecting botnets on a network traffic is usually independent of the traffic content and has the potential to find different types of botnets (e.g. HTTP based botnet, IRC based botnet or P2P based botnet). Botnet detection based on traffic classification focuses on classifying traffic into IRC traffic and non-IRC traffic, and thus it can only detect IRC based botnets, which is the biggest limitation compared to the anomaly based botnet detection. Although existing botnet detection mechanisms generate a number of good ideas, they are far from completed yet due to the evolution in botnet strategy. Specifically, the status quo regarding botnet detection raises two major challenges: (1) How to detect new (or recent) appeared botnets? A conventional botnet usually has a centralized C&C structure that exploring network protocols like IRC or HTTP. Almost all the current approaches are proposed for detecting well known IRC and/or HTTP based botnets by modeling bot binaries or botnet signatures. These approaches, however, might be completely useless against new (or recent) appeared botnets in which their structures are moved from centralized to decentralized (peer-to-peer) and their C&C channels are evolved from IRC or HTTP to other own developed protocols based on TCP/IP stack (e.g. turning a social network into a botnet [7]); (2) How to identify applications for network traffic? Identifying network traffic as different applications is very challenging and is still an issue yet to be solved. The traffic classification in the existing IRC or HTTP based botnet detection approaches relies to a large extent on the transport layer port numbers. Although the traffic identification using port numbers was an effective way in the early days of the Internet, it provides very limited information nowadays. An alternative way is to examine the payload of network flows and then create signatures for each application. This, however, still generates two major limitations: one is legal issues related to privacy and the other is that it is impossible to identify encrypted traffic. By observing traffic on a large-scale WiFi ISP network over a half-year period, we found that even exploring the payload signature examination method, there are still about 40% network flows that cannot be classified into specific applications. Investigating such a huge number of unknown traffic is inevitable in botnet detection since they might simply stand for the missed known botnet traffic, malicious
activities or the new botnet traffic based on novel (creative) applications. Addressing the above two challenges, we propose a hierarchical framework illustrated in Figure 2 for the next generation botnet detection, which consists of two levels: (1) in the higher level all unknown network traffic are labeled and classified into different network application communities, such as P2P community, HTTP Web community, Chat community, DataTransfer community, Online Games community, Mail Communication community, Multimedia (streaming and VoIP) community and Remote Access community (i.e. Steps 1 and 2 of Figure 2); (2) in the lower level focusing on each application community, we investigate and apply the temporal-frequent characteristics of network flows to differentiate the malicious botnet behavior from the normal application traffic (Step 3 of Figure 2). The major contributions of this paper include: (1) we propose a novel application discovery approach for automatically classifying network applications on a large-scale WiFi ISP network, and (2) we develop a generic algorithm to discriminate general botnet behavior from the normal network traffic on a specific application community, which is based on n-gram (frequent characteristics) of flow payload over a time period (temporal characteristics). input network flows Step 1. Payload Signature based Application Classifier unknown flows Step 2. Cross-Association based Application Classifier network application communities P2P Step 3 IRC
Humans IRC Flows
Bots IRC Flows WEB Figure 2. The proposed hierarchical framework for automatic botnets discovery The rest of the paper is organized as follows. Section 2 introduces related work, in which we summarize existing botnet detection approaches in terms of three categories. Section 3 presents our application classification approach for network flows. Section 4 is the botnet detection algorithm based on the temporal-frequent characteristics of botnets. Section 5 is the experimental evaluation for our detection model with a mixture of around 100 million flows collected on a large-scale WiFi ISP network and two types of botnet traffic trace (i.e. Kaiten IRC bot [41] and BlackEnergy HTTP bot [42]) collected on our testbed network. Finally, in Section 6 we make some concluding remarks and discuss the future work.
2. RELATED WORK Early research work on botnets analysis is based on existing public botnet codebases. A typical example is the work by Barford and Yegneswaran [2], in which they analyzed botnet behavior in terms of exploits, botnet control mechanisms, host control mechanisms, propagation mechanisms, delivery mechanisms, obfuscation and deception mechanisms based on four public IRC botnets codebases. In order to get a full understanding of botnets behavior, honeypots are widely installed and setup on the Internet to capture the malware and then bots are collected, tracked and analyzed. Typical works on honeypot based botnets detection are illustrated in [8,9,10,11,12,13]. Except honeypot technique based botnet detection, the other two categories of botnet detection approaches have been proposed recently, namely traffic classification based and passive anomaly analysis based. Typical works of traffic application classification based botnet detection includes [14,15,16,17]. In [14,15], Strayer et al. propose an approach for detecting botnets by examining flow characteristics such as bandwidth, duration, and packet timing in order to look for the evidence of the botnet command and control activities. They proposed an architecture that first eliminates traffic that is unlikely to be a part of a botnet, then classifies the remaining traffic into a group that is likely to be part of a botnet, and finally correlates the likely traffic to find common communications patterns that would suggest the activity of a botnet. In [16], Livadas et al. applied machine learning techniques to identify the commond and control (C&C) traffic of IRC-based botnets. They suggest a two-step detection process: (1) distinguishing between IRC and non-IRC traffic, and (2) distinguishing between botnet IRC traffic and real IRC traffic. In [17], Goebel and Holz develop a signature based IRC botnet detection system, Rishi, by monitoring only IRC application traffic and matching predefined specific bot nicknames patterns. Typical approaches of anomaly based botnet detection are discussed in [18,19,20,21]. In [18], Karasaridis et al. study network flows and detect IRC botnet controllers in a fashion of four steps, in which the most important one is to identify hosts with suspicious behavior and isolate flow records to/from those hosts. In [19], Binkley and Singh first determine an IRC channel and then apply a SYN-scanner detection system to decide which individual host in the IP channel is a scanner. IRC channels are sorted by the scanning count, with the top suspect channels labelled as potential botnets. In [20], Gu et al. investigate the spatial-temporal correlation and similarity in network traffic and implement a prototype system, BotSniffer, to detect IRC and HTTP botnets. All the above mentioned botnet detection techniques are either limited to the specific C&C protocols (e.g. can detect IRC botnet only) or limited to the specific botnet structures (e.g. centralized only). Gu et al. propose in their latest paper a general botnet detection framework, BotMiner, which is entirely independent of the botnet structure and C&C protocols and requires no prior knowledge of botnets [21]. Sharing with the same motivation with BotMiner by Gu et al., our hierarchical botnet detection system addresses firstly the automatic network application discovery and then bots behavior is analyzed on each obtained application communities, which is very different with the detection way of BotMiner in which similar communication traffic and malicious traffic are first clustered and then correlated in order to identify hosts sharing both similar communication
patterns and similar malicious activity patterns, and as a result these hosts will be naturally considered as bots based on the essential property of botnets. To the best of our knowledge, the similar communication patterns defined in BotMiner might roughly stand for the same network application (like Web, FTP, Chat, ect.) and discovering automatically the exact network applications, however, is necessary in BotMiner.
3. TRAFFIC CLASSIFICATIONS Identifying network traffic into different applications is very challenging and is still an issue yet to be solved. In practice, traffic application classification relies to a large extent on the transport layer port numbers, which was an effective way in the early days of the Internet. Port numbers, however, provide very limited information nowadays due to the increase of applications tunneled through HTTP, the constant emergence of new protocols and the domination of P2P networking [22]. Examining the payload signatures of applications improves the classification accuracy, but still a large number of traffic cannot be identified because of the privacy related issues and encrypted network traffic. Recent studies on network traffic application classification include "applying machine learning algorithm for clustering and classifying traffic flows" [23,24,25,26,27], "statistical signatures or fingerprint based classification" [28,29,30,31] and "identifying traffic in blind or on the fly" [32,33]. The biggest limitation of current application classification approaches is that they cannot identify all the existing network applications and the application scopes they can identity are very rough, for example, BLINC attempts to identify the general P2P traffic instead of the specific underlying P2P applications like eDonkey, BitTorrent, etc. Moreover, comparing all above mentioned methods is difficult due to the lack of sharable dataset and appropriate metrics [43]. Different with the previous approaches, our method is hybrid, combining the payload signatures with a novel cross association clustering algorithm [25]. The payload signatures classify traffic into predefined known application communities. The unknown traffic is then assigned into different application communities with a set of probabilities by using a clustering algorithm. Those unknown traffic that cannot be classified into any known application community will be considered as new or unknown applications. In the following sections, we first discuss the payload signature based classification approach, and then present the cross association clustering algorithm for classifying the unknown traffic into different known application communities.
3.1 Payload Signature based Classification The payload signature based classifier is to investigate the characteristics of bit strings in the packet payload. For most applications, their initial protocol handshake steps are usually different and thus can be used for classification. Moreover, the protocol signatures can be modeled through either public documents like RFC or empirical analysis for deriving the distinct bit strings on both TCP and UDP traffic. The application signatures are composed by 10 fields, namely application name, application description, protocol, srcip, srcport, dstip, dstport, commondstport, srccontent and dstcontent. The total number of application signatures is 470. As an example, we illustrate the signatures of 8 typical applications in Table 1. From Table 1, we
Table 1. Payload signatures for network applications Fields Signatures BitTorrent
application name BitTorrent
description
protocol
BitTorrent
Peer Sync IRC traffic HTTP traffic IMAP IMAP IMAPtraffic VNC VNC VNC traffic NFS NFS NFS TCP RPC traffic Streaming Streaming Real Time Audio Audio Streaming Protocol PostgreSQL PostgreSQL postgreSQL remote connection see that a flow is IRC traffic if the protocol for the flow is TCP and the source content for the flow includes a bit string like "PRIVMSG". Also in the IRC signature, there is a field commondstport to define the most common destination port for IRC traffic. IRC HTTP
IRC HTTP
The classifier is deployed on Fred-eZone, a free wireless fidelity (WiFi) network service provider operated by the City of Fredericton [34]. Table 2 lists the general workload dimensions for the Fred-eZone network capacity. From Table 2, we see, for example, that the unique number of source IP addresses (SrcIP) appeared over one day is about 1,055 thousands and the total number of packets is about 944 millions. All the flows are bidirectional and we clean all uni-directional flows before applying the classifier. Table 3 lists the classification results over one hour traffic collected on Fred-eZone. From Table 3, we see that about 249,000 flows can be identified by the application payload signatures and about 215,000 flows cannot be identified. A general result is that about 40% flows cannot be classified by the current application payload signature based classification method. Next, we present a fuzzy cross association clustering algorithm in order to address this issue. Table 2. Workload of Fred-eZone WiFi network over one day SrcIPs
DstIPs
Flows
Packets
Bytes
1055K
1228K
30783K
994M
500G
Table 3. Classification results - one hour traffic on Fred-eZone Obtained Known Applications
Unknown Applications
Flows
ScrIPs
DstIPs
App.
Flows
SrcIPs
DstIPs
249K
102K
202K
82
215K
1001K
1055K
3.2 Identifying Unknown Traffic Applications We propose an automatic application discovery approach based on the across association of source IPs and destination IPs in the first step and destination IPs and destination Ports in the second step. The basic idea of applying cross association algorithm is to
TCP
common dstport 6881
src content 0x0000000d060 0
dst content null
TCP TCP
6667 80
PRIVMSG GET
null null
TCP TCP TCP
143 5900 111
LOGIN RFB 0x000186A0
* OK .0 null
TCP
554
null
RTSP
TCP
5432
null
null
study the association relationship between known traffic and unknown traffic. In numerous data mining applications, a large and sparse binary matrix is used to represent the association between two objects (corresponding to rows and columns). Cross associations are then defined as a set of rectangular regions with different densities. The clustering goal is to summarize the underlying structure of object associations by decomposing the binary matrix into disjoint row and column groups such that the rectangular intersections of groups are homogeneous with high or low densities. Previous association clustering algorithms need to predefine the number of clusters (i.e. rectangles). This, however, is not realistic in our unknown traffic classification because the actual number of applications is unknown. The basis of our unknown traffic classification methodology is a novel cross association clustering algorithm that can fully estimate the number of rows and columns automatically [35]. During classification, the traffic consisting of unknown and known flows are clustered in terms of the source IP and the destination IP. A set of rectangles is generated after this stage. We define these rectangles as communities including either a set of flows or empty. Then flows in each community are clustered in terms of destination IP and destination port. Similarly, one community will be decomposed into several sub-communities, each represents an specific application community. The main purpose of applying a two-stage cross association clustering is to obtain the exact applications underlying a general application category through the association of different features. Figures 3 to 6 illustrate an example on applying our approach for unknown traffic classification. Figure 3 is an illustration of the original sparse binary matrix for the cross-association of the source IP addresses and the destination IP addresses. Each point (element) on Figure 3 stands for a flow connection between a specified source IP and a specified destination IP. Figure 4 shows the clustering results after using cross-association algorithm, the final number of partitions includes 10 rectangular intersections, where 6 intersections are non-empty and 4 are empty. Figure 5 shows the original sparse binary matrix based on one application community (i.e. nonempty rectangle in Figure 4) on which the association is described by the destination IP address and the destination port. Figure 6
illustrates the clustering result for that specific community, where 10 rectangular intersections are obtained, 6 are non-empty and the rest are empty.
Figure 3. Original binary matrix of {src IP, dst IP}
After all flows are classified into different application communities, we have to label each application community. A simple and effective way is to label each application community based on its content. In particular, we calculate the number of flows for each known application in the community and normalize the numbers into a set of probabilities ranging from 0 to 1. The unknown flows in each application will be assigned into a specific application according to a set of probabilities. This idea is similar with the member function in fuzzy clustering algorithm and the experimental evaluation proves its accuracy and efficiency. An exception for this labeling method is if the dominant flow in the community is the unknown flow, the whole community will be labeled as "unknown", which provides a potential to discover new or unknown applications.
4. BOTNET DETECTION
Figure 4. Clustered results
Figure 5. Original binary matrix of {dst IP, dst Port}
Figure 6. Clustering results
A general aim for intrusion detection is to find various attack types by modeling signatures of known intrusions (misuse detection) or profiles of normal behavior (anomaly detection). Botnet detection, however, is more specific due to a given application domain. N-gram byte distribution has proven its efficiency on detecting network anomalies. Wang et al. examined 1-gram byte distribution of the packet payload, represent each packet into a 256dimensional vector describing the occurrence frequency of one of the 256 ASCII characters in the payload and then construct the normal packet profile through calculating the statistical average and deviation value of normal packets to a specific application service (e.g. HTTP) [36,37]. Anomalies will be alerted once a Mahalanobis distance deviation of the testing data to the normal profiles exceeds a predefined threshold. Gu et al. improve this approach and apply it for detecting malware infection in their recent work [38]. Different with previous n-gram based detection approaches, our method extends n-gram frequency into a temporal domain and generates a set of 256-dimensional vector representing the temporal-frequent characteristics of the 256 ASCII binary bytes on the payload over a predefined time interval. The temporal feature is important in botnet detection due to two empirical observations of botnets behavior: (1) the response time of bots is usually immediate and accurate once they receive commands from botmaster, while normal human behavior might perform an action with various possibilities after a reasonable thinking time, and (2) bots basically have preprogrammed activities based on botmaster's commands, and thus all bots might be synchronized with each other. These two observations have been confirmed by a preliminary experiment conducted in [39]. After obtaining the n-gram (n = 1 in this case) features for flows over a time window, we then apply an agglomerative
hierarchical clustering algorithm to cluster the data objects with 256 features. We do not construct the normal profiles because normal traffic is sensitive to the practical networking environment and a high false positive rate might be generated when deploying the training model on a new environment. In contrast, the agglomerative hierarchical clustering is unsupervised and does not define threshold that needs to be tuned in different cases. In our approach, the final number of clusters is set to 2. We denote the 256-dimensional n-gram byte distribution as t a vector < f1t , f 2t ,..., f 256 > , where f jt stands for the i
i
i
i
frequency of the j th ASCII character on the payload over a time window ti ( j = 1, 2,...,256 and i = 0,1,...) . Given a set of N data objects F ~ {Fi | i = 1, 2,..., N } , where ti Fi =< f1ti , f 2ti ,..., f 256 > , the detection approach is described
in Algorithm 1. Algorithm 1. Implementation of Botnet detection approach Function BotDel (F) returns botnet cluster ti Inputs: Collection of data objects Fi =< f1ti , f 2ti ,..., f 256 >, i = 1, 2,..., N
Initialization: initialize number of clusters k (i.e. k = N ) by assigning each data instance to a cluster so that
long period on a large scale WiFi ISP network, the IRC botnet traffic collected on a honeypot, and the IRC/Web botnet traffic collected on our testbed network, we derive a new metric, standard deviation σ m for each cluster m, to differentiate botnet clusters from normal traffic clusters. The higher the value of average σ m over 256 ACSII characters for flows on a cluster m, the more normal the cluster m is. This is reasonable because during normal traffic, human being's behavior are more diverse with various possibilities compared to the malicious traffic generated by bots. Given the frequency vectors for n flows: 1 2 n { < f11 , f 21 ,..., f 256 > , < f12 , f 22 ,..., f 256 > ,… < f1n , f 2n ,..., f 256 >}
Suppose σ j is the standard deviation of the j th ASCII over
n flows, the average standard deviation σ over 256 ACSII characters for flows can be calculated by the following formula: 256
∑ σi σ = i =1 256 As an example, Figures 7 and 8 illustrate the average byte frequency over the normal IRC flows and IRC botnet flows, respectively.
each cluster contains only one data instance
Repeat: k ← k − 1 find the closest pair of clusters and then merge them into a single cluster compute distance between new clusters and each data of old clusters
Until: k = 2 calculate cluster centers cm , and standard deviation
σ 1 ,.,σ m , 1 ≤ m ≤ k If σ b = min(σ 1 ,σ 2 ,...,σ m ) then cluster b is labeled as botnet cluster
Return the botnet cluster b with σ b .
In practice, labeling clusters is always a challenging problem when applying unsupervised algorithm for intrusion detection. Previous intrusive cluster labeling methods are based on two assumptions: (1) there are two clusters only, one is normal and the other is intrusive, and (2) the number of instances in normal cluster is much bigger than the number of instances in intrusive cluster [40] and thus the cluster with small number of instances is usually labeled as intrusive cluster. This is not true in botnet detection because the detection is based on specific applications and the botnet traffic is sometimes more overwhelming than normal traffic in small size communities. By observing the normal IRC and HTTP Web traffic over a
Figure 7. Average byte frequency over 256 ASCIIs for normal IRC flows
Figure 8. Average byte frequency over 256 ASCIIs for botnet IRC flows
The average standard deviation of byte frequency over 256 ASCII characters for normal IRC traffic is 0.002 and the maximal standard deviation of byte frequency over 256 ASCII characters for normal IRC traffic is 0.05, while the average standard deviation of byte frequency over 256 ASCII characters for IRC botnet traffic is 0.0009 and its maximum is 0.01, which is much smaller than that of normal IRC traffic. This observation confirms that the normal human being's IRC traffic is more diverse than the malicious IRC traffic generated by bots.
Since our approach is a two-stage process (i.e. unknown traffic classification first and botnet detection on application communities next), the evaluation is accordingly divided into two parts: (1) the performance testing for unknown traffic classification, not only focusing on the capability of our approach to classify the unknown IRC and Web traffic, we also concentrate on the classification accuracy for other unknown applications (e.g. new P2P) since we expect the algorithm could be extended to detect any new appeared decentralized botnet; (2) the performance evaluation for system to discriminate malicious IRC (Web) bonnet traffic from normal human being IRC (Web) traffic.
5. EXPERIMENTAL EVALUATION
5.1 Evaluation on Traffic Classification
We implement a prototype system for the approach and then evaluate it on a large-scale WiFi ISP network over three consecutive business days (i.e. 24 hours seamless flow monitoring for each day). The botnet traffic consists of two traces: one is collected on a honeypot deployed on a real network, aggregated then into 243 flows, and the other is collected on our testbed network, aggregated then into 10 Web C&C flows and 44 IRC C&C flows, respectively.
Evaluating the unknown traffic classification capability is not an easy task in reality since we have no idea on the novel or recent appeared applications and it always needs the intervention of network experts. During our experiment, we randomly choose part of known traffic and then force to label them as unknown. The selection for the number of all these label free traffic is decided according to the 40% rule (i.e. the volume of unknown traffic is about 40% over a long time observation on a large-scale network). The final unknown traffic set is composed by the forcibly labeled known traffic and the botnet flows collected on both honeypot and testbed network. Over five days evaluation, we found that all the IRC bot C&C flows are accurately classified into the IRC application community (i.e. 100% classification rate for IRC traffic) and all the Web bot C&C flows are successfully classified into the HTTPWeb application community. However, the general classification accuracy over all applications is about 85% which is not that high compared to the specific IRC and HTTPWeb applications. The general classification accuracy is an average value over all application classification since the approach has different classification rate for different application communities. As an example, Table 5 lists classification results over one hour flows on a real large scale network in order to show the performance when using our approach for classifying unknown traffic and Table 4 is a description about known application set and the unknown application set over one hour, such as how many known flows included in the known dataset, how many known applications the flows belong to, how many unknown flows
As illustrated in Figure 9, the testbed network is composed by a 48-port Gigabit switch and 60 computers belonging to 6 VLANs (some of the computers are deployed outside the testbed network, such as the code server for malware downloading and the IRC server). We apply 3 VLANs during the botnet traffic collection: VLAN2 is an attack network, VLAN3 is the victim network and VLAN5 is the zombie network. Traffic traces are collected on the interface deployed at the gateway of VLAN5 since it accepts the command from botmaster of VLAN2 and starts the attack to victim server in VLAN3. The IRC bot we use is kaiten [41] and the web based bot we use in the experiment is blackenergy developed by Russian hacker community [42]. The time interval for flow aggregation is 1 second. When evaluating the prototype system, we randomly insert and replay botnet traffic flows on the normal daily traffic, in particular the 243 IRC C&C bot flows collected on a honeypot are included in the first day, the 45 IRC C&C bot flows collected on the testbed network are appeared in the second day and the Web C&C bot flows exist on the third day.
Internet code server
Testbed Firewall IRC server gateway (xxx.xxx.xxx.xxx)
gateway (xxx.xxx.xxx.xxx)
management VLAN
• •
gateway (xxx.xxx.xxx.xxx)
VLAN2: attack network
VLAN5: zombie network
botmaster
xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx
SSH Gateway IPTables Filtering
gateway (xxx.xxx.xxx.xxx) victim server Figure 9. Testbed network topology
included in the unknown set that we want to classify, etc.
Table 4. Description of known and unknown set over one hour Known Flowset
Unknown Flowset
Num of Flows
Num. of App.
Num. of Flows
Num. of App.
176484
38
39408
11
Table 5. Classification results for unknown flows over one hour flows Application False Number of Correct Communities
Flows
Classification
Classification
BitTorrent
12897
10861
2036
Gnutella
2198
2187
11
HTTPWeb
21435
18320
3115
SecureWeb
2138
1909
229
WebFileTransfer
8
5
3
Web-Ports
463
388
75
Unknown
269
N/A
N/A
second day evaluation, respectively). The detection results for Web C&C flows on the Web application community is not quite good and only 3 malicious flows are hit over total 8 Web C&C flows. The reason might be due to the clustering algorithm we apply. The agglomerative hierarchical clustering algorithm we use might have good match with the n-gram features extracted from IRC flows, they are not necessarily good at Web. The issue of "Feature Selections vs. Unsupervised Learning" is left to our future work. Moreover, evaluation results from Table 8 indicate that the average standard deviation of byte frequency over the 256 ASCII characters on the flow payload is an important metric to indicate normal human IRC clusters and malicious IRC traffic generated by machine bots, while for the clusters on Web clusters, the difference between those two standard deviations are not quite big (i.e. it is 0.0064 for the normal one and 0.0026 for the malicious one) possibly because of the mix of normal flows and malicious flows on the normal cluster.
Table 6. Description of application community over three days Flows
5.2 Evaluation on Botnet Detection The proposed approach is evaluated with three full consecutive days traffic. Table 6 shows the flow distribution for the application community with bot flows and the total number of flows for each day after the traffic classification step. As illustrated in Table 6, the total number of flows on the first evaluation day is 32,693K and the number of flows labeled by the payload signature based classifier is 20,596. The rest unknown flows are 12,097, in which 243 unknown flows are classified into known IRC community (i.e. they actually represent the IRC C&C bot flows). Similarly on the second day, the 45 unknown flows are classified into the IRC community, and 8 unknown flows on the third day are classified Web community. Since we know all these unknown flows are actually belong to IRC and Web, our approach obtains 100% accuracy for classifying these malicious bot C&C flows into their own application community. Next, we evaluate the capability of our approach for discriminating the bot generated traffic from normal traffic in the same application community. As illustrated in Table 7, we show the detection results in terms of number of correctly detected bot C&C flows and the number of falsely detected bot flows over the actual number of bot flows and normal flows on the specific community. Accordingly Table 8 lists the average standard deviation over the 256 characters of the payload collected on the network for each cluster. From Table 6, we see that the total number of flows we collect for one day is over 30 millions and the total number of known flows which can be labeled by the payload signatures is over 20 millions. The number of IRC and Web C&C flows over the three consecutive days is a very small part of the total flows. Our traffic classification approach can classify the unknown (malicious) IRC/Web flows to the IRC/Web application communities with a 100% classification rate on the three days evaluation. All the IRC C&C flows are differentiated from the normal traffic with a low false alarm rate (i.e. only 4 and 19 false alarms over the first and
Days
Total Flows
Known Flows
1
32693K
20596K
2
35409K
23724K
3
29538K
18313K
Flows in Botnet Communities 264 IRC {21 normal} 408 IRC {363 normal} 1010 Web {1002 normal}
Table 7. Detection performance over three days Normal IRC Flows
Bot C&C Flows
Correctly detected Bot C&C Flows
1
21
243
243
Number of Falsely Identified Bot C&C Flows 4
2
363
45
45
19
3
1002
8
3
0
Performance Days
Table 8. Standard deviation of byte frequency over 256 ASCII characters for normal and botnet clusters Average Standard Days
Normal Clusters
Botnet Clusters
1
0.0213
0.0005
2
0.0136
0.0032
3
0.0064
0.0026
6. CONCLUSIONS Before the work reported by Rajab et al. in [6], very little has been done to study the botnet behavior theoretically. The first workshop on botnets was hold in 2007 and since then many detection approaches have been proposed and also some real bot detection systems have been implemented (e.g. BotHunterTM by Gu et al. [38]). In this paper we attempt to conduct a taxonomy on
all existing botnet detection approaches and classify them into three categories, namely honeypots based, passive anomaly analysis based and traffic classification based. As claimed by Gu et al., anomaly based botnet detection approaches have the potential to find different types of botnets, while current existing traffic classification approaches focus only on differentiating malicious IRC traffic from normal IRC traffic, which is considered as its biggest limitation. In this paper, we address this limitation by presenting a novel generic application classification approach. Through this, unknown applications on the current network are classified into different application communities, such as Chat (or more specific IRC) community, P2P community, Web community, to name a few. Since botnets are usually exploring existing application protocols, detection can be conducted in each specific community. As a result, our approach can be extended to find different types of botnets and has the potential to find the new botnets when exploring specifically the traffic on the "unknown" community. In particular, we evaluate our framework on IRC and WEB community and evaluation results show that our approach obtains a very high detection rate (approaching 100% for IRC bot) with a low false alarm rate when detecting IRC botnet traffic. Moreover, we formalize the botnet behavior by using an average standard deviation of byte frequency over 256 ASCII characters on the traffic payload, and conclude an important bot identification strategy, that is the higher the value of the average deviation, the more likely the traffic is generated by human beings. This indication strategy is important when using unsupervised clustering algorithm for botnet detection in the later research. In the immediate future, we will evaluate our approach on the P2P community and measure its performance on P2P based botnets. Until the deadline of the paper we have not received any P2P botnet traffic from the honeypot and we also attempted to search the source code of some well-known P2P bots (e.g. Rustock, Nugache and Peacomm) from the public malware sharing website so that we can run it and collect P2P botnet traffic traces on our testbed network (fortunately we will get the storm P2P .pcap data from the German Honeynet Chapter [46]). Also some novel P2P botnets construction methods have been proposed and investigated in [44, 45], and in summary, we will focus on the detection of existing and new appeared P2P botnets in the near future.
7. REFERENCES [1] http://www.symantec.com/business/theme.jsp?themeid=threa treport, Symantec Internet Security Threat Report, Volume XIII: April, 2008 [2] P. Barford and V. Yegneswaran, "An inside look at Botnets," Special Workshop on Malware Detection, Advances in Information Security, Springer Verlag, ISBN: 0-387-32720-7, 2006. [3] Sinit, available on and assessed in December 2008 http://www.secureworks.com/research/threats/sinit/ [4] Phatbot, available on and assessed in December 2008 http://www.secureworks.com/research/threats/phatbot/ [5] Nugache, available on and assessed in December 2008 http://www.securityfocus.com/news/11390/
[6] http://www.secureworks.com/research/blog/index.php/2007/ 09/12/analysis-of-storm-worm-ddos-traffic/ [7] E. Athanasopoulos, A. Makridakis, S. Antonatos, D. Antoniades, S. Ioannidis, K. Anagnostakis, and E. Markatos, "Antisocial networks: turning a social network into a Botnet," In Proceedings of the 11th Information Security Conference, Taipei, Taiwan, 2008. [8] M.A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis, "A multifaceted approach to understanding the botnet phenomenon," In Proceedings of the 6th ACM SIGCOMM Conference on Internet measurement, pp. 41-52, 2006. [9] P. Baecher, M. Koetter, T. Holz, M. Dornseif, and F. Freiling, "The nepenthes platform: an efficient approach to collect malware," In Proceedings of Recent Advances in Intrusion Detection, LNCS 4219, Springer-Verlag, 2006, pp. 165-184, Hamburg, 2006. [10] V. Yegneswaran, P. Barford, and V. Paxson, "Using honeynets for internet situational awareness," In Proceedings of the 4th Workshop on Hot Topics in Networks, College Park, MD, 2005. [11] Z.H. Li, A. Goyal, and Y. Chen, "Honeynet-based botnet scan traffic analysis," Botnet Detection: Countering the Largest Security Threat, in Series: Advances in Information Security, Vol. 36, W.K.Lee, C. Wang, D. Dagon, (Eds.), Springer, ISBN: 978-0-387-68766-7, 2008. [12] F. Freiling, T. Holz, and G. Wicherski. "Botnet tracking: exploring a root-cause methodology to prevent Denial of Service attacks. In Proceedings of 10th European Symposium on Research in Computer Security (ESORICS’05), 2005. [13] T. Holz, M. Steiner, F. Dahl, E. Biersack and F. Freiling, "Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm", In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, California, 2008. [14] T. Strayer, R. Walsh, C. Livadas, D. Lapsley, "Detecting botnets with tight command and control," Proceedings 2006 31st IEEE Conference on Local Computer Networks, pp. 195-202, 2006. [15] T. Strayer, D. Lapsley, R. Walsh, and C. Livadas, "Botnet detection based on network behavior," Botnet Detection: Countering the Largest Security Threat, in Series: Advances in Information Security, Vol. 36, W. K. Lee, C. Wang, D. Dagon, (Eds.), Springer, 2008. [16] C. Livadas, R. Walsh, D. Lapsley, T. Strayer, "Using machine learning techniques to identify botnet traffic," In Proceedings 2006 31st IEEE Conference on Local Computer Networks, pp. 967-974, Nov. 2006. [17] J. Goebel and T. Holz, "Rishi: Identify bot contaminated hosts by irc nickname evaluation," In Proceedings of USENIX HotBots’07, 2007. [18] A. Karasaridis, B. Rexroad, and D. Hoeflin, "Wide-scale botnet detection and characterization," In Proceedings of the 1st Conference on 1st Workshop on Hot Topics in Understanding Botnets, Cambridge, MA, 2007. [19] J. R. Binkley and S. Singh, "An algorithm for anomaly-based botnet detection," USENIX SRUTI: 2nd Workshop on Steps to
Reducing Unwanted Traffic on the Internet, 2006. [20] G.F. Gu, J.J. Zhang, and W.K. Lee, "BotSniffer: detecting botnet command and control channels in network traffic," In Proceedings of the 15th Annual Network and Distributed System Security Symposium, San Diego, CA, February 2008. [21] G.F. Gu, R. Perdisci, J.J. Zhang, and W.K. Lee. "BotMiner: clustering analysis of network traffic for protocol- and structure-independent Botnet detection," In Proceedings of the 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008. [22] A. W. Moore and K. Papagiannaki, "Toward the accurate identification of network applications," In Proceedings of 6th International Workshop on Passive and Active Network Measurement, pp. 41-54, Boston, MA, 2005. [23] N. Williams, S. Zander and G. Armitage, "A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification," ACM SIGCOMM Computer Communication Review, Vol. 36, Issue 5, pp. 5-16, 2006. [24] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, "Flow clustering using machine learning techniques," Proceedings of 5th International Workshop on Passive and Active Network Measurement, pp. 205-214, Antibes Juan-les-Pins, France, 2004. [25] S. Zander, T. Nguyen, G. Armitage, "Automated traffic classification and application identification using machine learning," In Proceedings of the IEEE Conference on Local Computer Networks. 30th Anniversary, pp. 250-257, 2005. [26] L. Bernaille, R. Teixeira, K. Salamatian, "Early application identification," In Proceedings of ACM International Conference On Emerging Networking Experiments And Technologies (CONEXT 06), Lisboa, Portugal, 2006. [27] A. Moore, D. Zuev, "Internet traffic classification using Bayesian analysis techniques," ACM SIGMETRICS Performance Evaluation Review, Vol. 30, Issue 1, pp. 50-60, 2005. [28] M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli, "Traffic classification through simple statistical fingerprinting," ACM SIGCOMM Computer Communication Review, Vol. 37, Issue 1, 5-16, 2007. [29] M. Roughan,S. Sen, O. Spatscheck, and N.G. Duffield, "Class of service mapping for QoS: a statistical signature based approach to IP traffic classification," In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, Taormina, Sicily, Italy, October 25-27, 2004. [30] H. Dahmouni, H., S. Vaton, D. Rosse, "A Markovian signature-based approach to IP traffic classification", In Proceedings of the 3rd Annual ACM Workshop on Mining Network Data, San Diego, California, USA, pp. 29-34, 2007. [31] C. Park, Y. Won, M. Kim and J. Hong, "Towards automated application signature generation for traffic identification," In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS 2008), Salvador, Brazil, 160-167, 2008. [32] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, "BLINC: multilevel traffic classification in the dark," In Proceedings
of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 229-240, Philadelphia, Pennsylvania, 2005. [33] L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian, "Traffic classification on the fly," ACM SIGCOMM Computer Communication Review, Vol. 36, Issue 2, pp. 23-26, 2006. [34] Fred-eZone WiFi ISP, available on and assessed in December 2008 http://www.fred-ezone.ca/ [35] D. Chakrabarti, S. Papadimitriou, D. Modha, and C. Faloutsos, "Fully automatic cross-associations," In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 79-88, Seattle, Washington, 2004. [36] K.Wang and S. Stolfo. "Anomalous payload-based network intrusion detection," In Proceedings of the 7th International Symposium on Recent Advances in Intrusion Detection (RAID), Sophia Antipolis, France, 2004. [37] K. Wang and S. Stolfo, "Anomalous payload-based worm detection and signature generation," In Proceedings of the 8th International Symposium on Recent Advances in Intrusion Detection (RAID), Seattle, WA, 2005. [38] G. F. Gu, P. Porras, V. Yegneswaran, M. Fong, and W.K. Lee, "BotHunter: detecting malware infection through IDSDriven dialog correlation," Proceedings of the 16th USENIX Security Symposium, Boston, MA, 2007. [39] M. Akiyama, T. Kawamoto, M. Shimamura, T. Yokoyama, Y. Kadobayashi, and S. Yamaguchi, "A proposal of metrics for botnet detection based on its cooperative behavior," In Proceedings of the 2007 International Symposium on Applications and the Internet Workshops, pp. 82-85, 2007. [40] E. Eskin, "Anomaly detection over noisy data using learned probability distributions," In Proceedings of 17th International Conference on Machine Learning, pp. 255-262, Palo Alto, 2000. [41] Kaiten, available on and assessed in December 2008 http://packetstormsecurity.org/distributed/indexsize.html [42] BlackEnergy, available on and assessed in December 2008 http://atlas-public.ec2.arbor.net/docs/BlackEnergy+DDoS+ Bot+Analysis.pdf [43] L. Salgarelli, F. Gringoli, and T. Karagiannis, "Comparing traffic classifiers", ACM SIGCOMM Computer Communication Review, Volume 37, Issue 3, pp. 65-68, 2008. [44] P. Wang, S. Sparks, and C. Zou "An advanced hybrid peerto-peer botnet," In Proceedings of the 1st conference on 1st Workshop on Hot Topics in Understanding Botnets, Cambridge, MA, 2007. [45] C. Zou and R. Cunningham, "Honeypot-aware advanced botnet construction and maintenance," In Proceedings of International Conference on Dependable Systems and Networks, 2006. [46] German Honeynet Project, assessed in Dec http://pi1.informatik.uni-mannheim.de/index.php? pagecontent=site/Research.menu/ Honeynet.page
2008