Malicious File Hash Detection and Drive-by ...

Malicious File Hash Detection and Drive-by Download Attacks Ibrahim Ghafir, Vaclav Prenosil Faculty of Informatics, Masaryk University 60200 Brno, Czech Republic [email protected], [email protected]

Abstract. Malicious web content has become the essential tool used by cybercriminals to accomplish their attacks on the Internet. In addition, attacks that target web clients, in comparison to infrastructure components, have become prevalent. Malware drive-by downloads are a recent challenge, as their spread appears to be increasing substantially in malware distribution attacks. In this paper we present our methodology for detecting any malicious file downloaded by one of the network hosts. Our detection method is based on a blacklist of malicious file hashes. We process the network traffic, analyze all connections and calculate MD5, SHA1 and SHA256 hash for each new file seen being transferred over a connection. Then we match the calculated hashes with the blacklist. The blacklist of malicious file hashes is automatically updated each day and the detection is in the real time. Keywords: Cyber attacks, botnet, malware, malicious file hash, intrusion detection system.

1 Introduction A drive-by download is any installing of software that occurs without the realization and permission of a user. Nowadays, drive-by downloads form a serious threat to the Internet and its users [1]. In a typical attack, the user’s computer can be infected with malware by only visiting a web site that contains the malicious content. Then, the malicious code which is installed on the victim’s machine can control the infected machine and perform malicious activities. Typically, sensitive data is exfiltrated, passwords are stolen and keystrokes are recorded. Also, the infected computer may become a member of a botnet [2]. Then these infected computers can be exploited for denial of service attacks [3] or spam campaigns [4]. In typical drive-by download attack, first, a web browser requests a web page from a remote web server. The server returns a web page as a response to the web browser request; this web page contains a malicious code that exploits a web browser's vulnerability. The malware can be delivered as part of the malicious code, or a downloader, which is a special payload, can be used to pull and then execute malware on the local workstation [5]. The whole attack occurs without the user's knowledge or permission.

In this paper we present our methodology for detecting any malicious file downloaded by one of the network hosts. Our detection method is based on a blacklist of malicious file hashes. We process the network traffic, analyze all connections and calculate MD5, SHA1 and SHA256 hash for each new file seen being transferred over a connection. Then we match the calculated hashes with the blacklist. File hash blacklist is not generally effective at detecting new or previously unknown malicious file hash because the malicious file hash has to be known before it can be added to the blacklist. However, as part of a larger solution, performing listing of file hashes is generally worth the effort. Depending on file hash blacklist is relatively "low cost" in that using them in blocking rules or log searches doesn't severely impact system performance. The remainder of this paper is organized as follows. Section 2 presents previous related work to drive-by download attacks detection. Our methodology and implementation of our detection method are explained in Section 3. Section 4 shows the results and section 5 concludes the paper.

2 Related Work In [6], Hsu et al. proposed BrowserGuard, a runtime and behavior-based system. BrowserGuard protects the browser against drive-by download attacks by recording the download scenario of every file that is loaded into the host through the browser. Then based on this scenario, it can provide the user with alerts about suspicious downloaded files. Another drive-by download detection method was proposed by Zhang et al. [7]. They does not rely on the malicious content, instead of that, their algorithm depends on the URLs of the MDN's (malware distribution network) central servers. Based on those URLs, they generate a set of regular expression-based signatures. Google conducted a study in which they investigated malware pushed onto the client-machine as a result of drive-by download attacks [8]. The Google researchers found that the malware pushed by a URL does not change, while some URLs become different over time. Cujo, a system for automatic detection and prevention of drive-by download attacks, was presented by Rieck et al. [9]. It uses efficient mechanisms of machine learning for extracting and analyzing static and dynamic code features. Many drive-by download attacks use heap sprays [11] to achieve the attacks. Nozzle [10] detects heap spray attacks based on the monitoring that shellcode used in a heap spray attack is often prepended with a long NOP sled. Gadaleta et al. [12] inserts interrupt orders into each Javascript string variable, and then stores it storing it in the heap and gets back the modified string to the original one before using it. In [1], Provos et al. managed large-scale, long-duration study, in which billions of URLs were analyzed. From these URLs, more than 3 million launched drive-by download attacks. They found that drive-by downloads are triggered by visiting web sites which are not necessarily of malicious intent, but the content of these sites attracts users into the malware distribution network. Song et al. [13] proposed an inter-module communication monitoring based mechanism to detect malicious exploitation of vulnerable components.

3 Methodology In this section we propose our methodology for detecting any malicious file downloaded by one of the network hosts. Our detection method is based on a blacklist of malicious file hashes. As it is shown in Figure 1, we process the network traffic, analyze all connections and calculate MD5, SHA1 and SHA256 hash for each new file seen being transferred over a connection. Then we match the calculated hashes with the blacklist. The blacklist of malicious file hashes is automatically updated each day and the detection is in the real time. We have implemented our detection method on top of Bro [14] which is a passive, open-source network traffic analyzer.

Fig. 1. Methodology of malicious file hash detection.

We have made use of Bro Intelligence Framework [15]; this framework enables you to consume data from different data sources and make it available for matching. In our detection method we configured the intelligence framework to monitor all file hashes which are seen being transferred over the network traffic. We connected this framework to blacklist.intel text file which contains the file hash blacklist. Figure 2 explains the implementation of our detection method in Bro, first we start processing the network traffic. Bro is able to reduce the incoming packet stream into a series of higher-level events, so we can get file_new event. This event indicates that an analysis of a new file, seen being transferred over a connection, has begun. At this time, we calculate MD5, SHA1 and SHA256 hash for the current file; these calculations are performed by three functions in Bro, which are add_analyzer_md5, add_analyzer_sha1, and add_analyzer_sha256. All calculated hashes will be sent to the intelligence framework where its presence should be checked within the intelligence data set (blacklist.intel text file). When a piece of intelligence data is detected, the intelligence framework will generate Intel::match event. This event is generated when any indicator_type of intelligence data is detected, because the intelligence data set may contain many of indicator_types for intelligence data (like ADDR, DOMAIN, CERT_HASH) and not only FILE_HASH indicator_type, therefore if the indicator_type is FILE_HASH, that means this connection is interesting for us.

Fig. 2. Implementation of malicious file hash detection method.

This detection method is able to detect the malicious file in both cases, uploaded and downloaded, but we are interested only in downloaded malicious file. Thus, we check if this connection is oriented to one host from our network by checking the connection destination IP address through is_local_addr function; this function returns true if an address corresponds to one of the defined local networks, false if not. And here we should define the subnet of our network. After that and before we raise an alert, we check if we got an alert from the same host and for the same file hash during the last day because we don't want to send many alerts about the same set (host and hash) during one day, therefore we check if the current set is existent in t_suppress_hash_alert table, this table contains all detected sets during the last day. We send an alert email about malicious file hash detection to RT (Request Tracker) where the network security team can perform additional forensics and response to it. We generate an event, hash_alert, about malicious file hash detection; this event can be used for alert correlation [16]. We should add the current detected set (host and alert) into t_suppress_hash_alert table where it will stay for one day to be sure that we will not get another alert about the same set during one day.

For automatic update of blacklist.intel text file which is used in our methodology, Figure 3 shows how it is done.

Fig. 3. Automatic update of the intelligence data.

We start from user crontab file which is configured to run blacklist_update.sh each day at 3:00 am, this shell script will connect through Internet to the data source server [17] and download updated blacklist of malicious file hashes into a new blacklist.intel text file. This text file is connected to the Intelligence Framework which consumes it as it is explained above. This automatic update is done without stopping network traffic live monitor.

4 Evaluation and Results We used three scenarios to evaluate our methodology. In the first one, we ran a script, downloading malicious files, on a computer connected to the monitored network. In the second scenario, we applied our method on a pcap file which contained malicious files transferred over the connections. In the third scenario, we monitored the campus network for hosts downloading malicious files.

Fig. 4. Detection delay of our method.

In the first scenario, a script downloading malicious files was installed on a computer connected to the Internet through a network monitored by our detection

method. In fact, those downloaded files were not malicious but, only for this test, we added their hashes into the blacklist. In this scenario, we focused on the real-time detection capabilities of our method. The method was set up to send a report to RT (Request Tracker) as soon as a malicious file was detected. Request Tracker is often used to help incident handlers to deal with events that need an automated action or human attention. The test consisted of the following steps. First, a script downloaded a malicious file and noted the connection time with millisecond precision. Second, the detection method detected the downloaded malicious file after the first step and automatically created an RT ticket. Third, we received the RT ticket and noted the time of arrival with millisecond precision. We compared the start-up time with the time of RT ticket arrival and noted the detection delay. Average detection delay was 340 ms with a standard deviation of 60 ms. Figure 4 shows the results. In the second scenario, we applied our methodology on a pcap file contained traffic infected by Nuclear EK malware, which has an MD5 file hash dc5c71aef24a5899f63c3f9c15993697 [18]. This pcap file was analyzed by the provider, so we used this fact to set the ground truth. The infection was delivered by drive-by download attack and five malicious IPs were involved. We set up our method that it consumed the pcap file and produced a log file. We applied our detection method on the pcap file and it was successfully able to detect the malicious file (Nuclear EK malware) and determine the connection over which the malware had been downloaded. Note that we did not provide the ground truth blacklist to our detection method. Figure 5 shows part of blacklist_detection_hash.log produced by our detection method. The log file contains more information about the malicious connection than in the figure (like source IP address, source and destination ports) but the figure shows only the interesting part of the log.

Fig. 5. Part of the log produced by our detection method.

In the third scenario, we monitored the campus live traffic for detecting the hosts involved in downloading malicious files. The method was set up to create a log file of detected malicious file hashes. We set up a server hosting our detection method and passively analyzing the campus live traffic. The monitoring was performed for one month. We correlated the list of hosts involved in downloading malicious files with results of a malicious IP address detection method. As it is shown in Figure 6, 19 hosts were detected involved in downloading malicious files and 37 hosts were detected involved in malicious IP address connections. Of these, 12 hosts were detected involved in both malicious IP connection and downloading malicious file, indicating that there was a malware infection.

Fig. 6. Detected hosts by malicious file hash and IP detection methods.

Our detection method also sent an alert email about each malicious file hash detection to RT, where the network security team can perform additional forensics and response to it. Figure 7 shows an example of Bro_Malicious_Hash ticket which was sent by email to RT.

Fig. 7. Bro_Malicious_Hash ticket.

5 Conclusion and Future Work Drive-by download attacks are one of the most serious security threats to computer and network systems nowadays. In this paper we have presented our methodology for detecting any malicious file downloaded by one of the network hosts. Our detection method is based on a blacklist of malicious file hashes. The blacklist is automatically updated each day and the detection is in the real time. For future work, the output of this detection method will be correlated with the outputs of other detection methods to raise an alert on APT attack detection.

Acknowledgments. This work has been supported by the project CYBER-2 funded by the Ministry of Defence of the Czech Republic under contract No. 1201 4 7110.

References 1. N. P. P. Mavrommatis and M. A. R. F. Monrose, “All your iframes point to us,” in USENIX Security Symposium, 2008, pp. 1–16. 2. S. S. Silva, R. M. Silva, R. C. Pinto, and R. M. Salles, “Botnets: A survey,” Computer Networks, vol. 57, no. 2, pp. 378–403, 2013. 3. D. Moore, C. Shannon, D. J. Brown, G. M. Voelker, and S. Savage, “Inferring internet denial-of-service activity,” ACM Transactions on Computer Systems (TOCS), vol. 24, no. 2, pp. 115–139, 2006. 4. C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage, “Spamalytics: An empirical analysis of spam marketing conversion,” in Proceedings of the 15th ACM conference on Computer and communications security. ACM, 2008, pp. 3–14. 5. C. Seifert, “Cost-effective detection of drive-by-download attacks with hybrid client honeypots,” 2010. 6. F.-H. Hsu, C.-K. Tso, Y.-C. Yeh, W.-J. Wang, and L.-H. Chen, “Browserguard: A behavior-based solution to drive-by-download attacks,” Selected Areas in Communications, IEEE Journal on, vol. 29, no. 7, pp. 1461–1468, 2011. 7. J. Zhang, C. Seifert, J. W. Stokes, and W. Lee, “Arrow: Generating signatures to detect drive-by downloads,” in Proceedings of the 20th international conference on World wide web. ACM, 2011, pp. 187–196. 8. N. Provos, D. McNamee, P. Mavrommatis, K. Wang, N. Modadugu et al., “The ghost in the browser analysis of web-based malware,” in Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, 2007, pp. 4–4. 9. K. Rieck, T. Krueger, and A. Dewald, “Cujo: efficient detection and prevention of drive-bydownload attacks,” in Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 2010, pp. 31–39. 10. A. Sotirov, “Heap feng shui in javascript,” Black Hat Europe, 2007. 11. P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn, “Nozzle: A defense against heapspraying code injection attacks.” in USENIX Security Symposium, 2009, pp. 169–186. 12. F. Gadaleta, Y. Younan, and W. Joosen, “Bubble: A javascript engine level countermeasure against heap-spraying attacks,” in Engineering Secure Software and Systems. Springer, 2010, pp. 1–17. 13. C. Song, J. Zhuge, X. Han, and Z. Ye, “Preventing drive-by download via inter-module communication monitoring,” in Proceedings of the 5th ACM symposium on information, computer and communications security. ACM, 2010, pp. 124–134. 14. The-Bro-Project, “The bro network security monitor,” https://www.bro.org/, accessed: 1502-2015. 15. Bro-Project, “Intellegence framework,” https://www.bro.org/sphinx/frameworks/intel.html, accessed: 15-02-2015. 16. Y. B. Leau, S. F. Tan, S. Manickam et al., “A comparative study of alert correlations for intrusion detection,” in Proceedings-2013 International Conference on Advanced Computer Science Applications and Technologies, ACSAT 2013. IEEE, 2014, pp. 85–88. 17. Computer-Incident-Response-Center-Luxembourg, “Md5, sha1 and sha256 blocklist,” http://misp.circl.lu/, accessed: 15-02-2015. 18. Network-Traffic-Analysis, “Nuclear EK delivers digitally-signed cryptowall malware,” http://malware-traffic-analysis.net/2014/09/29/index.html, accessed: 15-02-2015.

Malicious File Hash Detection and Drive-by ...

Malicious File Hash Detection and Drive-by ...

Suggest Documents

Malicious File Hash Detection and Drive-by ...

Unorganized Malicious Attacks Detection

Effective Analysis, Characterization, and Detection of Malicious ...

Malicious Code Execution Detection and ... - Semantic Scholar

DriveBy - July 27th, 2006

Malicious PDF Files Detection Using Structural and

Malicious File for Exploiting Forensic Software

DriveBy - March 8th, 2007

DriveBy - November 10th, 2011

DriveBy - April 15th, 2010

DriveBy - August 9th, 2007

DriveBy - February 8th, 2008

Malicious pdf file download - Google Drive

TUGAS AKHIR SISTEM BERKAS HASH FILE DAN MULTIRING FILE ...

Hash-based File Content Identification Using ...

Malicious Shellcode Detection with Virtual Memory ... - CiteSeerX

Enhanced Intrusion Detection System for Discovering Malicious ...

Detection of Malicious Applications on Android OS

Testing Malicious Code Detection Tools - Semantic Scholar

Malicious Code Detection Using Penalized Splines

detection of algorithmically- generated malicious domain using

Malicious Automatically Generated Domain Name Detection Using ...

MADS: Malicious Android Applications Detection ... - Semantic Scholar

Obfuscated Malicious Javascript Detection using ... - CiteSeerX