DNS query failure and algorithmically generated domain-flux detection

7 downloads 0 Views 370KB Size Report
to register only one such domain name. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets.
DNS QUERY FAILURE AND ALGORITHMICALLY GENERATED DOMAIN-FLUX DETECTION Ibrahim Ghafir and Vaclav Prenosil Faculty of Informatics, Masaryk University Brno, Czech Republic [email protected], [email protected]

Keywords: Cyber attacks, bonet, domain flux, malware, intrusion detection system.

Abstract Botnets are now recognized as one of the most serious security threats. Recent botnets such as Conficker, Murofet and BankPatch have used domain flux technique to connect to their command and control (C&C) servers, where each Bot queries for existence of a series of domain names used as rendezvous points with their controllers while the owner has to register only one such domain name. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets. In this paper we present our methodology for detecting algorithmically generated domain flux. Our detection method is based on DNS query failures resulting from domain flux technique. We process the network traffic, particularly DNS traffic. We analyze all DNS query failures and propose a threshold for DNS query failures from the same IP address. We applied our methodology on packet capture (pcap) file which contains real and long-lived malware traffic and we proved that our methodology can successfully detect domain flux technique and identify the infected host. We also applied our methodology on campus live traffic and showed that it can automatically detect domain flux technique and identify the infected host in the real time.

1 Introduction One of the most insidious cyber threats for security community is represented by diffusion of botnets. Botnets are networks formed by “enslaving” host computers, called bots (derived from the word robot), that are controlled by one or more attackers, called botmasters, with the intention of performing malicious activities [1]. In other words, bots are malicious codes running on host computers that allow botmasters to control the host computers remotely and make them perform various actions [2]. The primary purpose of botnets is for the controlling criminal, group of criminals or organized crime syndicate to use hijacked computers for fraudulent online activity. Experts believe that approximately 16–25% of the computers connected to the Internet are members of botnets [3] and [4]. Some reports indicate that approximately 80% of all email

traffic is spam and that most of these messages are sent via botnets, such as the so called Grum, Cutwail and Rustock botnets [6]. Although these unwanted email messages may be filtered at their destinations, all of these messages are generally allowed to travel along the backbones of the Internet, burdening traffic and wasting network resources. A botnet could be used to conduct a cyber-attacks, such as a DDoS, against a target or to conduct a cyber-espionage campaign to steal sensitive information. There are various classifications of botnets, it’s possible to discriminate them from the architecture implemented, the used network protocol or technology on which they are based [5]. Infected machines receive commands from C&C servers that instruct the overall architecture to operate to achieve the purpose for which it has been composed such as creation of SMTP mail relays for targeted spam campaign, implementation of a fraud scheme (e.g. Banking information gathering) or to launch a denial of service attack. To find the victim’s binary repository or the C&C server, the malware installed during the initial infection (or the bot itself) should contain the address of the machines. These addresses may be encoded directly as a list of static IP addresses (hardcoded IP) or through the use of a list of domain names, which can be static or dynamic, making it harder to disable the command and control channel. While this makes it more difficult to take down or block a specific C&C server, the use of only a static domain name constitutes a single point of failure [6]. Recent botnets such as Conficker, Murofet, BankPatch and Torpig have used more advanced technique to connect to their C&C servers. They have used domain flux technique, in which each bot independently uses a domain generation algorithm (DGA) to compute a list of domain names. It is important, however, that the algorithm used for generating names to be very efficient, or else name patterns may be inferred and future names could be registered to take over the botnet [7]. By using domain flux technique, each Bot queries for existence of a series of domain names used as rendezvous points with their controllers while the owner has to register only one such domain name. The large number of potential

rendezvous points makes it difficult for law enforcement to effectively shut down botnets. This technique was popularized by the family of worms Conficker.a and .b which, at first generated 250 domain names per day. Starting with Conficker.C, the malware would generate 50,000 domain names every day of which it would attempt to contact 500, giving an infected machine a 1% possibility of being updated every day if the malware controllers registered only one domain per day. To prevent infected computers from updating their malware, law enforcement would have needed to preregister 50,000 new domain names every day [13]. From the point of view of botnet owner(s), the economics work out quite well. They only have to register one or a few domains out of the several domains that each bot would query every day, whereas security vendors would have to preregister all the domains that a bot queries every day, even before the botnet owner registers them. In all the cases above, the security vendors had to reverse engineer the bot executable to derive the exact algorithm being used for generating domain names. In some cases, their algorithm would predict domains successfully until the botnet owner would patch all his bots with a repurposed executable with a different domain-generation algorithm [8]. In this paper we present our methodology for detecting algorithmically generated domain flux. Our detection method is based on DNS query failures resulting from domain flux technique. We process the network traffic, particularly DNS traffic. We analyze all DNS query failures and propose a threshold for DNS query failures from the same IP address aiming to detect domain flux technique and identify the infected host. We applied our methodology on packet capture (pcap) file which contains real and long-lived malware traffic and we proved that our methodology can successfully detect domain flux technique and identify the infected host. We also applied our methodology on campus live traffic and showed that it can automatically detect domain flux technique and identify the infected host in the real time. Our methodology can be also used to send an alert email to the network security team to perform additional forensics. Furthermore, this alert can be correlated with other detection methods alerts for targeted attacks detection. The remainder of this paper is organized as follows. Section 2 presents previous related work to domain flux detection. Our methodology and implementation of our detection method are explained in Section 3. Section 4 shows our results and section 5 concludes the paper.

2 Related work In [9], they developed a methodology to detect domain fluxes in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. In particular, they looked at distribution of alphanumeric characters as well as bigrams in all domains that are mapped to the same set of IP addresses. They presented and compare the performance of several distance

metrics, including K–L distance, Edit distance, and Jaccard measure. They trained by using a good dataset of domains obtained via a crawl of domains mapped to all IPv4 address space and modeling bad datasets based on behaviors seen so far and expected. In [10], they tried to prevent the end user from visiting malicious sites which are the target of domain flux technique. They described an approach based on automated URL classification, using statistical methods to discover the telltale lexical and host-based properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. In particular, they explored the use of statistical methods from machine learning for classifying site reputation based on the relationship between URLs and the lexical and host-based features that characterize them. They showed that these methods are able to sift through tens of thousands of features derived from publicly available data sources, and can identify which URL components and meta-data are important without requiring heavy domain expertise. In [11], they studied the problem of dynamically assigning reputation scores to new and unknown domains. Their main goal was to automatically assign a low reputation score to a domain that is involved in malicious activities, such as malware spreading, phishing, and spam campaigns. Conversely, they wanted to assign a high reputation score to domains that are used for legitimate purposes. The reputation scores enable dynamic domain name blacklists to counter cyber attacks much more effectively. With dynamic blacklisting their goal was to decide, even for a new domain, whether it is likely used for malicious purposes. To this end, they proposed Notos, a system that dynamically assigns reputation scores to domain names. Their work was based on the observation that agile malicious uses of DNS have unique characteristics, and can be distinguished from legitimate, professionally provisioned DNS services. With regards to botnet detection, in [12] they proposed a novel and passive approach for detecting and tracking malicious flux service networks. Their detection system is based on passive analysis of recursive DNS (RDNS) traffic traces collected from multiple large networks. Their approach is not limited to the analysis of suspicious domain names extracted from spam emails or precompiled domain blacklists. Instead their approach is able to detect malicious flux service networks in-the-wild, i.e., as they are accessed by users who fall victims of malicious content advertised through blog spam, instant messaging spam, social website spam, etc., beside email spam.

3 Methodology In this section we propose our methodology for detecting algorithmically generated domain flux. In domain flux technique, the infected host uses a domain generation algorithm (DGA) to query for existence of a series of domain

names which are expected to be C&C servers while the owner has to register only one such domain name. This technique leads to many of DNS query failures because not all of these domain names are registered. Our detection method is based on DNS query failures resulting from domain flux technique. As it is shown in Figure 1, we process the network traffic, particularly DNS traffic. We analyze all DNS query failures and propose a threshold for DNS query failures from the same IP address aiming to detect domain flux technique and identify the infected host.

therefore if the current IP address is already exist in t_dns_failure table we should increase the counter by one (++ t_dns_failure[c$id$orig_h]). After that, we check the number of DNS query failures if it is greater than dns_failure_threshold, we set this threshold to 50 DNS query failures per 5 minutes based on the fact that recent malwares can generate 50,000 domain names every day. When the last condition is true we delete the current IP address from t_dns_failure table to be out of counting again. Then we check if that IP address (infected host) is not exist in t_dns_failure2 table, this table is to suppress the alerts to one alert about the same IP address (same infected host) per day.

Figure 1: Methodology of domain flux detection We have implemented our detection method on top of Bro [14]. Bro is a passive, open-source network traffic analyzer. It is primarily a security monitor that inspects all traffic on a link in depth for signs of suspicious activity. The most immediate benefit that we gain from deploying Bro is an extensive set of log files that record a network’s activity in high-level terms. These logs include not only a comprehensive record of every connection seen on the wire, but also application-layer transcripts such as, e.g., all HTTP sessions with their requested URIs, key headers, MIME types, and server responses; DNS requests with replies; and much more. Figure 2 explains the implementation of our detection method in Bro, first we start processing the network traffic. Bro is able to reduce the incoming packet stream into a series of higher-level events, so we can filter the network traffic into DNS traffic then we can get dns_message event. This event is generated for all DNS messages and gives you information about the connection to DNS server. Through this event we should check for two conditions, first one is to check if this connection is established by a host from our network and we can do it by is_local_addr function, this function returns true if an address corresponds to one of the defined local networks, false if not. The second condition is to check if this dns_message is about DNS error of NXDOMAIN, when DNS server returns NXDOMAIN code it means that the domain name is not exist (either not registered or invalid) and we can extract this information from dns_message event (c$dns$rcode_name=="NXDOMAIN").

Figure 2: Implementation of domain flux detection method

Then we should save the source IP address (which queries for unregistered domain name ) in t_dns_failure table, this table is to count DNS query failures of the same IP address,

Now we can write the following information into domain_flux.log: timestamp = c$start_time

alert_type = " domain_flux_alert " connection = c$id infected_host = c$id$orig_h domain_name = c$dns$quer We send an alert email about domain flux technique detection to RT (Request Tracker) where the network security team can perform additional forensics and response to it [15]. We also generate an event about domain flux detection; this event could be used for alert correlation. We should add the IP address of infected host to t_dns_failure2 table and it will stay there for one day to be sure that we will not get another alert about the same infected host during one day.

4 Evaluation and results We applied our methodology on packet capture (pcap) file which contains real and long-lived malware traffic and we proved that our methodology can successfully detect domain flux technique and identify the infected host. We got this pcap file from MCFP (Malware Capture Facility Project), which is an effort from the Czech Technical University ATG Group for capturing, analyzing and publishing real and long-lived malware traffic [16]. The captured network traffic took place between Nov 2013 and January 2014 in their capture facility. After analyzing, they found out a malware uses domain flux technique to connect to C&C server. There were a large group of packets going to the IP address 192.35.51.30, destination port 53/TCP. The content of these packets are DNS requests asking for domains being generated with a DGA. We applied our methodology on that pcap file and our detection method was able to detect domain flux technique used by that malware and identify the infected host. All the information about domain flux detection was written into domain_flux.log as it is shown in Figure 3.

Figure 3: Part of domain_flux.log

Figure 4: Locations of DNS servers used by the malware

Moreover, those DNS query failure connections were being made to DNS servers chosen by the attacker and not defined by the network. We were able to identify 25 DNS servers located in three different cities as it is shown in Figure 4. Table 1 shows the top 5 DNS servers used by the malware. DNS server

Location

Number of connections

192.41.162.30 192.33.14.30 192.42.93.30 192.12.94.30 192.48.79.30

US, Reston US, Reston US, Sterling US, Sterling US, Sterling

10175 10105 10051 9987 9980

Table 1: Top 5 DNS servers used by the malware Although a large number of domains were generated with DGA like (ndyotrc.com, adtoytb.ru, affwifbcyvv.cc, axtyaptci.net), we noticed that the malware generated the same number of each TLD as it is shown in Figure 5.

Figure 5: Number of generated domains for each TLD We also applied our methodology on campus live traffic and our detection method could automatically detect domain flux technique and identify the infected host in the real time. We ran our detection method on a server monitoring the campus traffic, and then we used one computer in campus network to simulate domain flux technique. We also ran our detection method on that computer. We tried to connect to domains which we previously know that they are not exist like (aomdkmr.com), we reduced dns_failure_threshold to 3 DNS query failures per 5 minutes instead of 50 per 5 minutes because we didn’t want to repeat these connections 50 times. It was directly detected and written into the computer screen in the real time as it is shown in Figure 6.

Figure 6: Detection of domain flux in the real time

Information about this detection was also written into domain_flux.log on monitoring server, which is similar to the log in Figure 3. Our detection method also sent an alert email about this detection to RT where the network security team can perform additional forensics and response to it. Figure 7 shows BroDomainFlux ticket which was sent by email to RT.

Figure 7: BroDomainFlux ticket

5 Conclusion In this paper we present our methodology for detecting algorithmically generated domain flux. Our detection method is based on DNS query failures resulting from domain flux technique. We process the network traffic, particularly DNS traffic. We analyze all DNS query failures and propose a threshold for DNS query failures from the same IP address. We applied our methodology on packet capture (pcap) file which contains real and long-lived malware traffic. We also applied our methodology on campus live traffic and showed that it can automatically detect domain flux technique and identify the infected host in the real time. Our methodology can be also used to send an alert email to the network security team to perform additional forensics. Furthermore, this alert can be correlated with other detection methods alerts for targeted attacks detection.

Acknowledgements This work has been supported by the project “CYBER-2” funded by the Ministry of Defence of the Czech Republic under contract No. 1201 4 7110.

References [1] P. Bacher, T. Holz, M. Kotter, and G. Wich-Eriski, “Know your enemy:Tracking botnets, 2005,” URL http://www. honeynet. org/papers/bots, vol. 4, pp. 24– 33. [2] H. Choi, H. Lee, and H. Kim, “Botgad: detecting botnets by capturing group activities in network traffic,” in

Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE. ACM, 2009, p. 2. [3] W. Sturgeon, “Net pioneer predicts overwhelming botnet surge,” ZDNet News, January, vol. 29, 2007. [4] B. AsSadhan, J. M. Moura, D. Lapsley, C. Jones, and W. T. Strayer, “Detecting botnets using command and control traffic,” in Network Computing and Applications, 2009. NCA 2009. Eighth IEEE International Symposium on. IEEE, 2009, pp. 156–162. [5] L. Jing, X. Yang, G. Kaveh, D. Hongmei, and Z. Jingyuan, “Botnet: classification, attacks, detection, tracing, and preventive measures,” EURASIP journal on wireless communications and networking, vol. 2009, 2009. [6] S. S. Silva, R. M. Silva, R. C. Pinto, and R. M. Salles, “Botnets: A survey,” Computer Networks, vol. 57, no. 2, pp. 378–403, 2013. [7] B. Stone-Gross, M. Cova, B. Gilbert, R. Kemmerer, C. Kruegel, and G. Vigna, “Analysis of a botnet takeover,” Security & Privacy, IEEE, vol. 9, no. 1, pp. 64–72, 2011. [8] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, “Your botnet is my botnet: analysis of a botnet takeover,” in Proceedings of the 16th ACM conference on Computer and communications security. ACM, 2009, pp. 635–647. [9] S. Yadav, A. K. K. Reddy, A. N. Reddy, and S. Ranjan, “Detecting algorithmically generated domain-flux attacks with dns traffic analysis,” Networking, IEEE/ACM Transactions on, vol. 20, no. 5, pp. 1663– 1677, 2012. [10] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: learning to detect malicious web sites from suspicious urls,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009, pp. 1245–1254. [11] M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a dynamic reputation system for dns.” in USENIX security symposium, 2010, pp. 273– 290. [12] R. Perdisci, I. Corona, D. Dagon, and W. Lee, “Detecting malicious flux service networks through passive analysis of recursive dns traces,” in Computer Security Applications Conference, 2009. ACSAC’09. Annual. IEEE, 2009, pp. 311–320. [13] Wikipedia, “Domain generation algorithm,” http://en.wikipedia.org/wiki/Domain_generation_algorit hm, accessed: 1-9-2014. [14] Bro project, “The bro network security monitor,” http://www.bro.org/, accessed: 1-9-2014. [15] Wikipedia, “Request tracker,” http://en.wikipedia.org/wiki/Request_Tracker, accessed: 1-9-2014. [16] Weebly, “Malware capture facility project,” http://mcfp.weebly.com/, accessed: 1-9-2014.

Suggest Documents