Effective Flow Filtering for Botnet Search Space Reduction - CiteSeerX

4 downloads 6226 Views 640KB Size Report
unsolicited emails. These large .... Our setup involved an IRC server, a code server, 13 ... in order to differentiate a bulk transfer, such as a song, movie, or web ...
Effective Flow Filtering for Botnet Search Space Reduction Robert Walsh, David Lapsley, and W. Timothy Strayer BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA {rwalsh|strayer}@bbn.com, [email protected] Abstract The use of sophisticated techniques is essential to detect and identify the presence of botnet flows, but these techniques can be expensive in computational and memory resources. A critical first pass is to filter out all traffic that is highly unlikely to be part of a botnet, allowing the more complex algorithms to run over a much smaller set of flows. This paper presents our studies and experience in filtering flows to reduce the botnet search space, and shows that a series of simple filters can provide as much as a 37-fold reduction in the flow set.1

1

Introduction

One of the most vexing Internet-based security threats today is the use of very large, coordinated groups of hosts for brute-force attacks, intrusions, and unsolicited emails. These large groups of hosts are assembled by turning normally operating hosts into socalled zombies, after which they can be controlled from afar. A collection of zombies, also called bots, when controlled by a single command and control infrastructure, form a botnet. Some botnets grow to include many thousands of bots. Botnets become very dangerous when they are turned into instruments of crime. Such criminal uses include crippling e-commerce websites, extortion, phishing, and data collection. Botnets have caught the attention of organized crime syndicates, and a thriving market exists for buying and selling botnets (or access to botnets). Botnets have become the tool of choice for Internet-based crime, and the perpetrators are insulated from the activities by a layer of unwitting accomplices (the bots), and the separation in time between the assembly of the botnet and its use for attack.

1

This material is based upon work supported by the US Army Research Office under contract W911NF-05-C-0066. The content of the information does not necessarily reflect the position or the policy of the US Government, and no official endorsement should be inferred.

Searching for botnets requires building evidence from clues within hosts or network traffic. Host-based detection techniques can be untrustworthy given the sophistication of rootkits at hiding intrusions and other malicious activity. Network-based detection is also subject to obfuscation methods (detailed in section 5 of [1]), but once traffic is on the wire, it is hard to modify its characteristics. Consequently, we focus on extracting clues from network-based evidence in order to detect the presence of botnets. But because of the widely distributed nature of botnets, one must examine a great deal of traffic from many locations in order to increase the likelihood of finding coordinated botnet communications, and current analysis techniques such as machine learning and flow correlation ([2-5]) are too expensive to run on large-scale traffic flow sets. This paper explores the effectiveness of employing flow filtering as a means to reduce the amount of traffic to examine while maintaining a high probability that botnet flows will not be filtered. Our approach follows a set of increasingly more complex analyzers, filtering out unlikely flows along each step, so that the most computationally intensive detection of botnet coordination is done on a dramatically reduced traffic set. The work reported here is associated with a larger botnet detection project [6] in which we designed a pipelined approach to analyzing network traffic for evidence of botnet flows. The latter stages of the pipeline included machine learning classification and n-byn flow correlation, both of which are demanding on compute resources. The first stage of the pipeline, flow reduction, so effective that it bears deeper discussion.

2

Approach

While new C2 infrastructures are appearing (Storm uses P2P, for example), IRC-based botnets still dominate as the preferred deployment technique. This reflects the freely available source code for Internet relay chat (IRC), allowing attackers to focus on botnet applications rather than on architecting and coding “mere plumbing.” Consequently, many botnet detection sys-

tems begin by simply looking for chat sessions (TCP port 6667) [7] and then examining the content for botnet commands [8]. Like many client-server protocols, however, the use of a standard port number is just a suggestion. Also, relying on having access to the packet contents and, even with that access, being able to identify botnet commands, is an overly simplistic assumption (botnet commands could be hidden in precoordinated “normal” chat, masked via steganography, protected within encrypted packet payloads, or obfuscated via similar approaches). Our flow filtering analysis assumes that the botnet C2 infrastructure is based loosely on IRC, but we assert that the techniques for deriving the filters can be applied to other C2 structures as well. We use IRC traffic as a proxy for botnet C2 traffic. By looking at example botnet commands [9], the important insight is that C2 messages are brief and interactive. In the absence of access to extensive botnet traces, we characterize IRC flows (which are also brief and interactive) to identify how we can separate the C2 channel from other Internet traffic. Specifically, there are several notable points. First, identification of chat is a statistical problem. For each attribute of a flow, chat flows are spread across the spectrum of values. Instead of a deterministic decision, one is left with a probabilistic conclusion, complete with the risk of false positives and false negatives. Flows can be winnowed into likely chat and likely non-chat classifications, but the likely chat classification will certainly include a number of non-chat flows. Second, consideration of attributes in isolation is a good start, but is not completely sufficient to extract evidence of botnet flows—it is equivalent to using independent probabilities to evaluate the traffic. Stronger techniques based upon interdependent conditional probabilities and attribute correlation are eventually needed as well. These are expensive techniques, however, and so should be used judiciously. Finally, in spite of the prior two points, the resulting characterization is good for guiding the construction of efficient filters whose focus is data reduction. By reducing the data set, even if it contains some false positives, later steps can take advantage of more computationally intensive approaches.

2.1 Source of Background Traffic It would be too contrived to try to create a large dataset of both background and botnet traffic using a tightly controlled testbed. Instead, for the background traffic, we incorporated a data set that typifies the range and variety of real-world traffic. We chose packet traces collected on the Dartmouth campus under their CRAWDAD project [10]. The traces are a com-

plete set of TCP/IP headers from the campus wireless, taken over a period of four months (November 1, 2003 to February 28, 2004) from a variety of campus locations. No payloads were included in the trace. In all, the traces were 164 GBytes compressed, and approximately 3.8 times that amount when uncompressed. This large trace set means that we truly are looking for the needle (botnet C2 flows) in a haystack. As the first step of data reduction, we convert the sequence of packets into flow summaries. Later, after suspect infrastructure is identified, a forensic archive of packet-level data can be collected and analyzed.

2.2 Source of Botnet Traces In order to generate traffic that was representative of real botnet traffic, we implemented a benign bot based on the “Kaiten” bot, a widespread bot that has readily downloadable source code. The Kaiten bot was re-implemented in C using approximately 1000 lines of code. The original Kaiten bot had a repertoire of TCPand UDP-based attacks. Our bot implementation attack code employs harmless variations of Kaiten attacks. Like the Kaiten bot, our bot provides a number of remotely controlled features, including a mechanism to execute arbitrary commands on the bot client, HTTP download capability, a flexible multi-process architecture, a highly configurable architecture and a rich command set. In order to obtain traces of botnet traffic, we constructed a botnet test facility. This facility involved a simple setup modeled along the lines of a chat-based botnet architecture. Our setup involved an IRC server, a code server, 13 zombies, and an attacker. We used this test facility to obtain actual traces of the communications between the various botnet entities while the botnet was in operation. Our experiments entailed using the IRC server to instruct the zombies to download attack code from the code server and to subsequently launch a coordinated TCP “attack” on the victim host. The traces included ssh transmissions used for setting up and monitoring the experiments, IRC traffic between the bots and the IRC server, http traffic between the zombies and the code server (for downloading the attack code), and the TCP traffic involved in the coordinated TCP attack on the victim host. The setup and the launch of the attack were successively repeated in order to increase the amount of trace data collected. We collected 74 flows associated with our botnet using tcpdump at the IRC server. Thirty of these flows were C2 flows. We merged this botnet trace with the Dartmouth traffic data set in order to create a test data set that contained ground truth that could be verified

after all of the data reduction filters and other analyzers have been applied. Our botnet was active on the order of hours, while the Dartmouth traces span four months, exacerbating the vast size difference between the needle and the haystack.

3

Measurements

In order to understand which flow attributes we can effectively use as filters, we conducted an extensive set of measurements on the traffic traces looking for thresholds that clearly differentiated chat/C2 traffic from other traffic.

3.1 Average Packet Size We obtain average packet size by normalizing the flow’s total bytes transferred by the number of packets in order to differentiate a bulk transfer, such as a song, movie, or web page, from a sequence of requestresponse interactions. Applications and TCP stacks performing bulk transfers pursue efficiencies through the use of large packets in order to amortize media bandwidth overheads, interrupt overheads, context-switching overheads, and layered-software overheads. Interactive applications accept these overheads so that they can avoid buffering-instigated latency and thereby meet human factors goals. So, we can expect packet sizes to have a causal relationship to the application, not merely an accidental correlation.

sized. Figure 1 also clearly depicts the false positive and false negative risk. For example, if one were to accept flows with an average packet size less than 200 bytes as potential botnet traffic, and to reject flows with a larger average packet size, then about 48% of non-IRC traffic would be accepted. Since there are several orders of magnitude more non-IRC flows than IRC flows, a filter based exclusively on a 200-byte average packet size would cut the amount of data to process in half, but most of the data would still be nuisance chaff. Consider a 300-byte cutoff that reduces false negative risk. The line for IRC shows that virtually all of the potential botnet C2 traffic would be accepted, but only 35% of the non-IRC traffic would be rejected. (The acceptance rate increases from 48% to 65%.) Even more data is nuisance chaff. We recognize that there is a trade-off between identifying C2 flows and stepwise reduction of the data set to the meaningful subset of flows. The selection of the cutoff for quick filtering for data reduction requires both quantitative statistical information and human judgment. Even if the selection of the cutoff were phrased in terms of meeting a false positive or a false negative goal, that goal is based upon judgment.

3.2 Data Rate Both the cumulative bytes and cumulative packets can be normalized for flow duration, generating a bitsper-second or packets-per-second measure of application data rate. These measures are correlated. We display the packets-per-second measure in Figure 2.

Figure 1—Bytes per packet Cumulative Distribution

Figure 1 shows the spread of values for average packet size as a cumulative distribution. Notice that the statistical spread of IRC packet size runs the gamut. IRC typically uses small packets, but some flows do have an average packet size which is essentially MTU-

Figure 2—Average Packets Per Second

Many flows, whether IRC or non-IRC, use a low communication rate. At best, high-speed bulk transfers can be identified and labeled non-IRC. Chat flows are

expected to be low data rate, since humans read at a limited rate. C2 flows are expected to be low data rate, attempting to avoid detection, and having dispatched the real work to the zombies. A one-packet-per-second decision threshold accepts 94.9% of the non-IRC flows as false positives. All of the IRC flows meet a threshold of three packets per second. So, the benefit of this filter in isolation is greater for packet-level forensic archives (which may operate in parallel to flow detection) than for flow analysis. In conjunction with other activities, such as machine learning, it may eliminate outliers and increase focus on resolving ambiguity in areas of parameter value overlap.

4

Additional Measurements

The collection of flow characteristics was designed and implemented in order to support exploratory investigations into filtering pipelines. We had initial ideas about good candidate characteristics, but collected a broader set in order to double-check our preconceived notions. The collected set of flow characteristics also targeted potential discriminators for a successive classification subsystem that was based on machine learning, and that could consider joint probability distributions. A broad set of metrics was provided to machine learning to evaluate and choose from. As a result, the characterization software collected additional metrics that were not included in the final pre-processing pipeline. We introduce a subset of those metrics here, to more fully describe flow discrimination. We also reference some of these in the upcoming description of the final pipeline, to clarify various details.

4.1 Packet Length Histograms We recorded histograms of packet length, in case they were able to convey more information than the simple average packet length. The buckets for the histogram were biased to distinguish among smaller transfers, with no need to distinguish among bulk transfers. The bucket boundaries that we used were: 80, 120, 160, 200, 300, 512, and larger. The first bucket includes packets of size 41 to 80 bytes, and is shown in Figure 3. The cumulative distribution shows the statistical spread—there are IRC flows with no small packets, just as there are IRC flows with larger packets. It also shows the risk of false positives, since there are non-IRC flows along the entire spectrum as well. The non-IRC line lies above the IRC line. This is the only bucket where that occurs. About a quarter of the IRC flows have no packets in this bucket, and about two-thirds of non-IRC flows have no packets in this bucket. This is just a reminder that non-IRC flows tend to use larger packets than IRC flows. The cumulative distribution function makes it easy to verify the frequency statements presented here. Machine calculations may rely on the probability density function instead. Steep slopes in the cdf correspond to spikes in the pdf that may have discriminating value, especially if the response differs for IRC and Non-IRC. For example, a flow with ~75% of its packets in this bucket is likely an IRC flow.

Figure 4—Packet Length Histogram 81-120 bytes

Figure 3—Packet Length Histogram 41-80 bytes

The next bucket (81–120 bytes) is depicted in Figure 4. There are many IRC (83%) and non-IRC (76%) flows with no packets in this range of sizes. There are also IRC and non-IRC flows with a moderate number of packets in this range (vertically above 12%). These

lines are close together, which suggests that this bucket size may not be a good discriminator.

separation, with differing slopes, and therefore this is a possible candidate discriminator (though weaker in isolation than the 41-80 or 121-160 byte buckets). Chat’s communication of short phrases and sentences generated by human typing is consistent with a bias toward small packets, leaving other protocols to have a greater likelihood of filling this and later buckets.

Figure 5—Packet Length Histogram 121-160 bytes

Figure 5 shows the bucket with packet sizes between 121–160 bytes. There are IRC (87%) and nonIRC (66%) flows with no packets in this range of sizes. There are also IRC and non-IRC flows with a moderate number of packets in this range (vertically above 12%, where the IRC probability density function may also provide extra discrimination value). These lines have some separation, with differing slopes, and therefore this bucket may be a good candidate discriminator.

Figure 7—Packet Length Histogram 201-300 bytes

Figure 8—Packet Length Histogram 301-512 bytes

Figure 6—Packet Length Histogram 161-200 bytes

The next bucket (161–200 Bytes) is depicted in Figure 6. Again, many IRC (96%) and non-IRC (81%) flows have no packets in this range of sizes. While there are both IRC and non-IRC flows with some number of packets in this range, these lines have some

Figures 7 and 8 show the bucket holding the range 201-300 and 301-512 bytes respectively. In the first of these two buckets, 98% of the IRC and 71% of the non-IRC flows have no packets in this range of sizes; in the second, the values are 81% (IRC) and 61% (nonIRC). There is good evidence that a value in these two ranges may serve well as a discriminator because there are few IRC and more non-IRC flows with packets in this range, the lines have good separation, and the lines have significantly different slopes.

Figure 9—Packet Length Histogram > 512 bytes

The last bucket (512 and more Bytes) is depicted in Figure 9. While only 2% of the IRC flows have packets in this range, more than half (51%) of non-IRC flows do. It makes sense that few IRC flows have packets falling into this bucket. Per RFC 2812, IRC messages shall not exceed 512 Bytes, including line termination characters (carriage return/line feed). Possible explanations for observing some IRC flows with packets in this bucket include: TCP’s aggregation of application-layer messages within lower-layer retransmissions due to prior packet loss; TCP’s aggregation of data due to slow-start or, in general, congestionwindow based throttling of transmissions (particularly when server-to-client communication is providing multicast service to multiple IRC channel speakers); and flow misclassification (use of another protocol on port 6667).

4.2 Associations A botnet standing “at the ready” would correspond to having zombies maintain long-term open communications to the rendezvous point through which the botnet operator interacts. So, one approach to identifying candidate nefarious infrastructure is to search for longlived TCP flows. We searched for such flows in the Dartmouth data, and found communications lasting over a week. There were five IRC flows on port 6667 that lasted 7.9 to 18.2 days even though the traces were collected from wireless networks rather than from wired networks. The campus endpoint for these flows could have been a mobile device taken elsewhere, or a PC that was rebooted, reducing flow lifetime. Instead, the endpoints maintained long-term connectivity. We also searched for long-term open communications that could be formed by a collection of successive TCP flows. These associations constructed from multi-

ple flows are even stronger evidence than a single long-lived TCP flow, because the repeated and persistent maintenance of the channel highlights that the interaction is not accidental. Stitching flows together to form an association addresses unintentional failures (e.g., PC rebooting) that are out of the control of the botnet operator, as well as intentional gaps introduced by the botnet operator. Multiple bots in [9] include commands to drop a connection (e.g., for 30 minutes), following which connectivity would be reestablished. To do this, we matched flows from IP clients that connected to the same IP server and TCP port. The client’s TCP port was intentionally allowed to vary, since the networking stack typically chooses the client’s port number from the set of free and unreserved port numbers. In general, networking stack algorithms include a bias toward using ports that have not been recently used, leading to distinct numbers for each successive client connection. As a result, two more open communication channels on the IRC port (6667) lasting more than a week were identified. Furthermore, twenty-four lasting more than three days were found. These associations typically used well-formed TCP connections. A flow would be cleanly closed, as evidenced by use of TCP FIN. Of the 24 associations, 13 included flows for which 100% ended with FIN. All but one of the others included flows that terminated in FIN 92-99% of the time. Among 6,719 multi-flow associations lasting more than 5 minutes, the average association’s TCP flows end with FIN 93% of the time. These associations also exhibited limited outages. Among 6,719 multi-flow associations that lasted more than 5 minutes, the average inter-flow outage was 66 seconds. (It is possible that this is an intentional bias to avoid timer-based flow definitions.) On average, 51 flows were used to stitch together an association lasting 2.5 hours. The 66 associations lasting at least a day were very persistent. They averaged 173 TCP flows per association, maintained 98% availability, and incurred outages averaging 27 seconds. Upon sorting the associations by the number of TCP flows used to maintain communication, we ran across three associations using more than ten thousand TCP flows, even though 100% of the flows terminated with FIN. We don’t have a sensible explanation of this behavior, but suggest that such persistent associations deserve attention.

5

Flow-Level Filtering

We recognize that there is a trade-off between identifying botnet C2 flows and stepwise reduction of the data set to the meaningful subset of flows. The

Figure 10—Pipeline of Flow-Level Filters

selection of the cutoff for quick filtering for data reduction requires both quantitative statistical information and human judgment. Even if the selection of the cutoff is phrased in terms of meeting a false positive or a false negative goal, that goal is based upon judgment. The filters and filter parameters we chose reflect this. Our analysis and subsequent experiments suggest that five distinct filters form an effective subsystem for a larger system, and these are shown in Figure 10. The first filter is based upon the IP protocol used by the flow. We save TCP flows and discard other flows (UDP, ICMP, etc.), because IRC-based botnets are constructed on top of TCP, resulting in 8,933,303 flows remaining. This filter has permanence for TCPbased botnets; the only intervention of human judgment is the assumption that TCP is the underlying transport layer, rather than UDP. Filtering for other C2 structures could change or relax this assumption. The second filter removed the nuisance portscanning chaff, reducing the data set to 4,750,262 flows. Unfortunately for today’s Internet, probes of system vulnerabilities are commonplace. While they indicate suspicious activity that may be worth investigation, they are not chat or botnet C2 flows. A TCP SYN that is sent to search for a service to attack is filtered if it never leads to establishment of a TCP connection. Similarly, a TCP RST that is sent in response, to indicate that the probed service is unavailable or non-existent, is filtered. Flows containing only TCP SYN or TCP RST indicate that communication was never established, and so provide no information about C2 flows. About 47% of the flows are eliminated by this step; all of the ground-truth botnet C2 flows survived the filter. This filter has medium- to long-term value because no application-level data is transferred. For a botnet to take advantage of this port-scanning activity, the bot would have to introduce low-level networking code and interpret the resulting signal (much like Morse code or a signal fire). This could be done, but is an extra effort that is only necessary for the botnet devel-

oper after detection efforts become widely deployed and more effective. Follow-on efforts could relax this assumption, but in the near term it is more important to investigate other avenues. This filter is consistent with current IRC-based botnet infrastructure; all of the bot C2 flows survived the filter. The third (“Cheetah”) and fourth (average packet size) filters are to eliminate bulk transfers. Downloads of software updates and rich web page transfers are examples of loads that are distinct from text-based chat and botnet C2. So, high bit-rate flows are dropped. Similarly, flows with a large average packet size are dropped. These filters have permanence if the botnet is intended to have a long-term use, to amortize the benefits of its construction, rather than intended for one-time use. Greater traffic loads mean that the C2 infrastructure is more easily identified and traced back to its origin. It especially means that the owner of the botnet is on-line and connected to the infrastructure, increasing his risk of identification and capture. Currently, this filter is consistent with ASCII-based brief botnet commands, especially with an “at the ready” C2 infrastructure. The Cheetah filter drops flows that are at least 50 packets long that also average at least 8000 bits per second. From the data rate study we know that the IRC traffic rate is rarely above three packets per second, and from the packet size histogram study we know that 300 bytes per packet is a reasonable threshold: 3 packets of 300 bytes is roughly 8000 bits per second. In the entire Dartmouth data set, there were only four flows on the IRC port (6667) that were dropped as a result of these criteria. Since ports above 1024 are not typically reserved for access by the operating system, it is possible that those few flows accidentally used the IRC port, or that they were another application hiding on that port. Our judgment call was that this criteria represented a safe false negative rate (< 1E-5) even in the unlikely case that these truly were chat flows spewing

text faster than users can read. From a botnet perspective, they would similarly be outliers. The average packet size filter drops flows that include at least 8 packets that also average at least 300 bytes, per the packet size histogram study. In the entire set of 472,037 IRC flows, this leads to a false negative rate of 0.25%. A spot-check of five of the monitoring points indicated that the filtered IRC flows were server-to-client. The client-to-server flows survived the filter, meaning that the server is still identified. From a botnet perspective, commands and responses could be larger than chat’s and still be identified. About 7% of the flows are dropped, leaving 4,385,435. All of the ground-truth botnet C2 flows survived these filters. There is no silver bullet related to packet size that quickly eliminates an order of magnitude of the raw data. To drop half the non-IRC data would require a cutoff of 145 bytes (based upon examining what is left after port-matching and scan-elimination filters). That would preserve 98.7% of the chat. To drop threequarters of the non-IRC data would require a cutoff of 56 Bytes. That would preserve 91.7% of the chat. If instead, one examines what is left after protocol matching, scan elimination, and cheetah filtering, the corresponding cutoffs shift to 142 and 55 Bytes. In any case, tuning is possible, but the risk of a mistake is clearly high; one should engage in a thorough examination of chat and bot payloads before relying on small cutoffs. The fifth filter drops flows that are too brief to represent zombies standing by “at the ready” (fewer than 2 packets or 60 seconds). Real chats and botnets are likely not well represented by excessively short duration flows. This filter has a significant effect, reducing the data by a factor of about 18.4, dominating even the elimination of the port-scanning activities. All of the ground-truth botnet C2 flows survived the filter. Currently, this filter is consistent with IRC-based botnet infrastructure, which maintains open channels. Some bots do allow connections to be dropped for a brief time, but then the channels are restored and held open once again. This fifth filter is also biased toward false positives. If in doubt, we keep rather than drop data. We keep any flow that has at least two packets, and that lasts at least sixty seconds. If we see the entire flow, the threeway handshake of TCP alone will cause a long-lived flow to be remembered. If we monitor only part of the flow’s lifetime, whether due to startup or outage deployment reasons, we still remember it if we see few packets over a long time. Overall, the data set is reduced by a factor of about 37.5, from 8,933,303 TCP flows down to 238,252, while still preserving the ground-truth botnet C2 flows. This filtering stage avoided the use of TCP port num-

bers, and therefore is relevant to situations where applications may be masquerading on unexpected ports. Furthermore, this significant data reduction resulted without the use of whitelisting services as trusted IP address and port number combinations.

6

Discussion

While it is clear that botnet controllers will migrate from IRC as their preferred C2 infrastructure [1], the abstract model of tight central control represented by IRC is very efficient and will likely survive for quite some time. It is important, therefore, to provide a solution for a system that detects evidence of tight botnet C2 activity within very large data sets. Our system performs gross, simple filtering to reduce the amount of data that will be subjected to more computationally intensive algorithms. Once the data has been filtered, the flows can be subjected to more expensive and sophisticated botnet detection algorithms such as machine learning classifiers and flow correlators in order to find clusters of flows likely to be part of a botnet. The filtering described in this paper is part of a larger solution, and is not the solution by itself. We have used it as a preparatory stage for machine learning ([2, 6]), which automates the selection and use of classification criteria. Machine learning could have great value in automating the use of joint distribution functions, rather than the simpler independent functions used here. We have had greater success using filtering as a preparatory stage for flow correlation ([6,5]) and topological analysis [6], which can be viewed as social network identification and analysis. They address the identification of the IRC server multicast to the bots by similarity in time and other characteristics. Flow correlation algorithms vary in complexity. Clearly, tracking N2 correlations over time becomes very demanding if N is large and the number of events is high. With work like that in this paper, the split in responsibilities is that the correlation algorithm focuses on the state variable and event complexity, while filtering reduces the data set size (biased to reduce false negatives and allow false positives). The quick filtering of this paper reduces N significantly, and is a significant enabler of correlation’s success. Our experiment with Dartmouth campus data, starting with nearly 9 million flows augmented with traffic traces from a benign botnet, shows that the ground truth botnet C2 flows can indeed survive the data reduction. The filtering stage requires very simple logic to cull the data set down by a factor of 37. While we may not be able to expect that degree of reduction in all cases, there was nothing particularly special about

the Dartmouth data that contributed to the reduction factor. This paper described the selection of potential key attributes, the summarization of that packet level data by flow-level data, and the characterization of the resulting flows. The flow attributes have been analyzed, showing that when considered in isolation there will be limited data reduction, and that there will be both false positives and false negatives. However, taken as a set of filters, the data reduction is quite effective.

7

Future Work

The filtering focus is quick data reduction, preparing data sets for more resource-intensive portions of the botnet identification solution. It does not attempt to separate botnet traffic from human-generated IRC traffic, leaving that task to those later stages. A possible successor stage could analyze response times. If the unidirectional flows evaluated here are paired into bidirectional flows for analysis, then response time can be measured. Perhaps C2 flows would respond more quickly than human actors. Complications would include accounting for the variation in Internet network congestion (loss and delay), link rate (serialization delay at the access link), and fundamental path latency. Chat and C2 could be interacting locally or globally. This would be simple to counter, by incorporating a random delay that models human response time into the bot source code, but it may have some pragmatic short-term value.

Acknowledgments We wish to thank Doug Maughan and Cliff Wang for their support. We also thank David Kotz and gratefully acknowledge the use of wireless data from the CRAWDAD archive at Dartmouth College.

References [1]

E. Cooke, F. Jahanian, and D. McPherson, “The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets,” USENIX SRUTI Workshop, July 7, 2005.

[2]

C. Livadas, R. Walsh, D. Lapsley, and T. Strayer, “Using Machine Learning Techniques to Identify Botnet Traffic,” 2nd IEEE LCN Workshop on Network Security, Tampa, FL, November 14, 2006. [3] A. W. Moore and D. Zuev, “Internet Traffic Classification using Bayesian Analysis Techniques,” Proc. 2005 ACM SIGMETRICS International Conf. on Measurement and Modeling of Computer Systems, Banff, Alberta, Canada, 2005. [4] Gu, Guofei, Roberto Perdisci, Junjie Zhang, and Wenke Lee, “BotMiner: Clustering Analysis of Network Traffic for Protocol- and StructureIndependent Botnet Detection,” Proceedings of the 17th USENIX Security Symposium (Security'08), 2008. [5] W. Timothy Strayer, Christine E. Jones, Beverly Schwartz, Sarah Edwards, Walter Milliken, and Alden W. Jackson, “Efficient Multi-Dimensional Flow Correlation,” Proceedings of the 32nd IEEE Conference on Local Computer Networks (LCN), Dublin, Ireland, October 15-18, 2007. [6] W. Timothy Strayer, Robert Walsh, Carl Livadas, and David Lapsley, “Detecting Botnets with Tight Command and Control,” Proceedings of the 31st IEEE Conference on Local Computer Networks (LCN), November 15-16, 2006, Tampa, Florida. [7] A. Householder, Art Manion, Linda Pesante, George M. Weaver, and Rob Thomas, “Managing the Threat of Denial-of-Service Attacks,” CERT Coordination Center, October 2001. [8] P. Barford and V. Yegneswaran, “An Inside Look at Botnets,” Special Workshop on Malware Detection, Advances in Information Security, Springer Verlag, 2006. [9] The Honeynet Project, Know Your Enemy: Learning about Security Threats, 2nd Edition, Addison-Wesley, 2004. [10] D. Kotz and T. Henderson, “CRAWDAD: A Community Resource for Archiving Wireless Data at Dartmouth,” IEEE Pervasive Computing, Volume 4, Issue 4, October-December 2005.

Suggest Documents