How to detect unknown malicious code efficiently? Jaehee Lee
Hyoungjun Kim
Department of Information Security Korea University Seoul, Republic of Korea
[email protected]
Department of Consulting Ahnlab Seoul, Republic of Korea
[email protected]
Hyunsik Yoon
Kyungho Lee
Department of Information Security Korea University Seoul, Republic of Korea
[email protected]
Department of Information Security Korea University Seoul, Republic of Korea
[email protected]
Abstract—Recently, rapid developments of IT technology lead to development of various platforms. With the development of the new platforms, diverse malicious codes are created to target the new platforms. These new malicious code means critical and new threat to national infrastructure, especially the important ones that can lead to social chaos. In Korea, Korea hydro and nuclear power was hacked and blueprint was stolen, which was later posted on-line. This created great problem as the place was hacked was critical infrastructure. Thus, the vaccine related are searched out as effective method to analyze the malicious codes that are created every day uncontrollably. However, Personals that manage malicious codes are limited compared to newly create malicious codes. How to detect unknown malicious code efficiently that remain to be unanswered? However, to answer this question, malicious code analysis method has to be concerned, especially the critical ones first. In order to analyze the Unknown malicious codes effectively, Unknown malicious code detection model was introduced in the previous paper. However, this model sometimes treated normal file as malicious code. This eventually decreased its effectiveness in finding and analyzing the malicious codes. . Thus it became necessary to decrease the misdetection rate in order to increase the effectiveness of the model. As a result in this research, we created specific conditions that lead to decrease the miss detection rate significantly. Hence in this paper, we presented a method that detects the Unknown malicious codes more efficiently. Keywords—IT security; Unknown malicious code detection model; Critical Infrastructure
I. INTRODUCTION Information technology (IT) has become our life. It is basically involved in our daily activities. Without technology, we cannot live comfortably. With the advancement of technology, issues related to data and file technology. With the wide spread use of IT in society, various new IT based platforms are being created. With development of new IT platforms, malicious codes that targets the platform is also being created and this makes preexisting Signature based malicious code detection difficult. Unknown malicious code has become important and advanced Persistent Threat to Critical Infrastructure, one among them was hacking of Korean
hydro and nuclear power incident which became great national security threat. Unknown malicious codes are in action consistently without being detected by security system. To achieve its malicious goals, codes use variety of mechanisms such as stealing the password information or creating traffic in order to attack other host by using connection of command control server [1]. However due to limitation of the professional personals in the area, it is nearly impossible to analyze every malicious code in short period of time. Thus, analyzing the Unknown malicious code prior to Computer System that causes severe problems has become priority. . In order to solve the problem, Advanced Unknown Malicious Code Detection Model deciding the priority of malicious code was presented in previous paper [2]. However, after analyzing one hundred normal files and one hundred malicious files, cases were found which treated normal files as malicious decreasing the effectiveness of the malicious code detection. Thus, finding a method to decrease the identifying of normal file as a malicious became a critical issue which led to research in the field. II. RELATED WORK In this section, previous work regarding research method for detection of malicious code is effectively introduced and discussed. Further description on recent research is discussed along with the problems that came with the previous method. A. Static Analysis of Executables to Detect Malicious Patterns Detecting malicious code is an important part of information security [3]. In this paper, they present a static analysis of executables to detect malicious code patterns. In malicious code detection, malicious code writer try to obfuscate malicious code detection for evading anti-virus software. Authors tested the efficiency of three commercial anti-virus software against code-obfuscation. The anti-virus software couldn’t detect that codes are obfuscated. So they present a new architecture for detecting obfuscated malicious code.
In these days, the number of mobile malware has been increasing. Researchers tried to detect malicious java applets by using static analysis [4]. But that alone is a static analysis to detect malicious code, there is a limit. So the research on dynamic analysis was conducted to complement the signature-based analysis. B. Dynamic analysis of malicious code In this paper, they present a dynamic analysis tool for analyzing behavior of Windows executables [5]. A dynamic analysis of executables is important part of malicious code detection. Because a static analysis has limits. For example, if an anti-virus software has not some malicious code patterns, it cannot detect the malicious code. So it need to analyze and monitor the malicious code’s action. They developed a monitoring system for analyzing Windows executables. And its accuracy was good, so it is used for dynamic analysis of unknown malicious codes. But there is a limit to analyze the unknown malicious code by using only dynamic analysis. So researcher has used a static analysis and dynamic analysis at the same time [6]. And some researchers use Cuckoo sandbox for dynamic analysis of malicious code [7]. C. Recent researches on analysis of android maicious code Recently, Android phones suffered from a large malware attack. Most of researches focus on detecting android malware. The number of malicious code has rapidly increased and numerous types of malicious code have been advanced and progressed, so it is essential to require analysis for malicious codes in order to defense system. It is hard to detect malicious code by its behavior. Malicious code’s registry must be analyzed to detect. Hong et al proposed to a new approach for Malware analysis method based on registry analysis [8].
IDa
Rule Description
DRb
2
Abnormal section form detection
56%
3
Hidden/system/read only files detection
17%
4
Unknown SECTION name detection
37%
5
Process search or control API usage detection
2%
6
Process INJECTION API usage detection
10%
7
Execution compaction (known packer) detection
46%
8
Execution compaction (unknown packer) detection
9
10
11
A. Applying Advanced Unknown malicious code detection model Based on the malicious code detection model that was presented in the past paper [2], actual malicious code detection program has been designed. The created detection program helped in testing the effectiveness of the model. In the last study, we tested the detection efficiency targets only unknown malicious code. Used at the time of the rule includes a static analysis and a Dynamic Analysis. We considered the characteristics of each rule was the grouping [9]. Rule based on the total of 24, we were measured malware detection efficiency of the rule results are as follows. TABLE I. a
ID
1
Rule Description Compaction of execution file or encryption (above entropy 7)
File concealment attempt detection
Group Description
2
Malicious/Suspici ous Scan Malicious/Suspici ous String Packer
3
Signature detection Process/Thread test Patch/Hook detection
4
Stack/Heap test File generation/deletio n/modification
5
Registry generation/deletio n/modification Network connection
6
Execution attempt Entropy calculation Malicious/Suspici ous Scan
7
Malicious/Suspici ous String Packer
8
Signature detection
9
Process/Thread test Patch/Hook detection Stack/Heap test
37%
5%
2%
5%
12
Attempt of generating thread in other process detection
7%
13
Possession of service or automatic start related registry handle detection
15%
14
Attempt of modifying memory of other process detection
7%
15
IAT hooking attempt detection
2%
16
Shell code API calling attempt detection
2%
17
PE generation in system path detection
44%
18
Compact executable file generation detection
54%
19
Registration of registry in service or automation start detection
68%
10
File generation/deletio n/modification
20
Suspicious IP address connection attempt detection
5%
11
Registry generation/deletio n/modification
III. ADVANCED UNKNOWN MALICIOUS CODE DETECTION MODEL
Process vulnerability attack attempt detection (Attempt of finding its execution place: Exploit Zero Call) Process vulnerability attack attempt detection ( HEAP SPRAY )
IDGc
GROUPING RESULT b
DR
29%
IDG
1
c
Group Description Entropy calculation
TABLE III. IDa
DRb
Rule Description
21
Key input information interception attempt detection
10%
22
System utility execution blocking
7%
IDGc
Network connection
12 23
39%
24
Batch file execution attempt detection
10%
b.
Detection Rate
1
-.721** .000
N
Detection Rate a.
Severity Pearson correlation Significant probability (2-tailed)
Severity
Execution in suspicious path detection
CORRELATION ANALYSIS RESULT
Group Description
Pearson correlation Significant probability (2tailed)
24
24
-.721**
1
.000
N
Rule ID
24
24
Detection Rate c.
Group ID
Reflect the rule of the characteristic used at the time was to create a detection group with a total of 12 Group results of the Grouping. We conducted a Focus Group Interview targeting malicious code analysis of Ahnlab. Severity of infection was calculated for when the action corresponding to the rule. We were able to achieve the same results to the following.
The above represents the results as a basis to Grouping Detection Rate and Severity correlation relationship equation as follows. Y : Group Detection Rate
TABLE II.
X : FGI Score
GROUP WEIGHT RESULT
C : Constant : 0.727
IDGd
Group Description
Severity
Detection Rate
1
Entropy calculation
7.0
32%
2
Malicious/Suspicious Scan
4.0
84%
3
Malicious/Suspicious String
3.3
51%
4
Packer
3.5
62%
5
Signature detection
7.0
8%
Y : Group Detection Rate
6
Process/Thread test
6.0
30%
X : FGI Score
7
Patch/Hook detection
6.0
8%
8
Stack/Heap test
6.0
3%
1.0
92%
4.0
76%
In other through the placement of malicious code detection rule this expression was also as expected predict actual results that were higher than the malware detection efficiency. We looked out the detection rate of infection consisting of 41 samples when the arrangement order of the highest as Ordering Score Group, sequentially combined. The results were as follows.
9 10
File generation/deletion/modification Registry generation/deletion/modification
11
Network connection
10.0
5%
12
Execution attempt
1.0
51% d.
Group ID
We used a statistical analysis in order to place in consideration of the detection efficiency, the arrangement order of the group. In fact, as well as the infection detection rate of the group to the arrangement at the same time taking into account the severity of the group, because it is possible to more quickly determine the infection. Because, even if malicious code, such as different degrees adversely affects the computer according to the severity being likely to exploit different. Therefore, we analyzed the correlation was performed. The results were as follows.
We created a formula that determines the Ordering score on the basis of the expression.
TABLE IV.
GROUP DETECTION RESULT
C(n)e
2
3
4~5
6~9
10~14
15
Detection number(n)
34
37
38
39
40
41
Detection rate(n/41)
82.9%
90.2%
92.7%
95.1%
97.6%
100%
e.
Number of combination group
We saw configuring a series of steps to place the group of analyzing malware in Figure 1 in the process.
Figure 1. Malicious code detection Group Ordering Process
IV. EXPERIMENTAL RESULTS A. Dataset We planned experiments to improve the efficiency of the analysis program developed by the infection. Advanced Unknown malicious code detection model presented in the previous work was applied to one hundred normal files and one hundred malicious files for examining the effectiveness of the model and finding and fixing the limitation. These files were having variety of characteristics and were chosen randomly and were provided by Ahnlab. B. Implementation We were each 100 each analysis to improve the program and apply the ruleset Unknown malicious code normal files. A total of 67 applied ruleset rule was applied by adding a rule for each group at 24 ruleset who mentioned.
We have proposed an algorithm as shown in Figure 2 in order to reflect the actual malware analysis program implementing this process. We have developed a program that detects unknown malicious code on the basis of this algorithm. Figure 2. Advanced Unknown Malicious code Detection Model
C. Results After experimenting on these sample files there were some problems as expected. In some instances normal file was treated with similar characteristics as malicious code and was categorized as malicious code. This resulted increased the miss detection rate. The result of the experiment is as follows. TABLE V.
TEST RESULT
Index
Total Test Trial
Number of detection
Misdetection rate
Malicious code
100
100
0%
Normal file
100
45
45%
In the above experimental results showed a performance better detect malware that year with a probability of 100% or plain files can be found that the misdetection probability of 45%. The limitation of the previous model was that it had too high miss detection rate. The main l aim of this study was to increase the effectiveness of malicious code detection. To reduce the Misdetection rate we saw check out what this rule for determining normal files with malicious code. As a result, the following rule has been determined to reduce the malicious code detection efficiency. TABLE VI.
The main purpose of the program was to observe whether it can detect the Unknown malicious code effectively or not for that reason, proving the effectiveness through testing was important.
TROUBLED RULE
Rule Description
Number of Misdetection
Rank
ID
1
2
compression or encryption (over entropy 7) file detection
30
2
6
TLS call back function detection
11
3
24
detection of execution file compression by known packer
10
4
51
detecting copy of own file
6
Rank
ID
5
5
6
26
7
53
Number of Misdetection
Rule Description abnormal section characteristics detection detection of execution file compression by unknown packer detection of service or automatic execution related registry registration act
5 5 4
Taken together, the above results also show that behavior such as packing or encryption, copying, automatic execution file it showed that the general. D. Improved process We remove the rule and looked again conducted a test. The result is shown below. TABLE VII.
TEST RESULT 2
Index
Total Test Trial
Number of detection
Misdetection rate
Malicious code
100
100
0%
Normal file
100
5
5%
malicious code that occur constantly with limited manpower advanced malicious code detection model can be this solution. And will continue to be an improvement over the algorithm is also presented misdetection occur in the future dynamic analysis. If malicious code that has not been analyzed come in to appearance, New Rule will be generated to increase the efficiency of the model. Also, when a new rule is added to increase the miss detection rate would be removed through the same process it is complemented by the other rule. We are planning a new study through the model we have developed. During the test, it was found malicious code that has dynamic characteristics, act differently in different Windows. In next research problem, the reason of the action will be analyzed and model that analyze the malicious code in different Windows. ACKNOWLEDGMENT This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2015-R0992-15-1006) supervised by the IITP (Institute for Information & communications Technology Promotion) REFERENCES [1]
The results of the above misdetection rate of the infection can be confirmed that a normal file that improved to 5-45% of the original. V. CONCLUSION Unknown malicious code is modified version of the preexisting malicious codes or newly created. Thus, detection has to consider the signature and dynamic property as well. This makes unknown malicious code detection difficult and time consuming. But also the behavior of a normal file in the dynamic analysis confirmed that there would be progress with malicious code similar pattern. By disregarding the Rule that lowered the efficiency of Unknown Malicious code Detection Model, the lowering of misdetection rate was successful. Developing programs that analyze malware and will very often occur in the process of doing the same operation as above. We have to present a decision-making process to solve this reflected in the program for future improvement. Malware detection solution development is very important for national security. As well as the United States has invested heavily in order to train specialists analyze malicious code in several countries. Nevertheless, the professional malware analysis expertise is very low. In the process of analyzing the unknown
[2]
[3]
[4]
[5] [6]
[7]
[8] [9]
Sikorski, Michael, and Andrew Honig. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012. Hyoungjun et al. “Advanced Unknown Malicious Code Detection Model.” Research Briefs on Information & Communication Technology Evolution 1.1 (2015) Christodorescu, Mihai, and Somesh Jha. Static analysis of executables to detect malicious patterns. WISCONSIN UNIV-MADISON DEPT OF COMPUTER SCIENCES, 2006. Armando, Alessandro, et al. "SAM: The Static Analysis Module of the MAVERIC Mobile App Security Verification Platform." Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 2015. 225-230. Bayer, Ulrich, et al. "Dynamic analysis of malicious code." Journal in Computer Virology 2.1 (2006): 67-77 Wang, Xiaolei, et al. "A Novel Hybrid Mobile Malware Detection System Integrating Anomaly Detection With Misuse Detection." Proceedings of the 6th International Workshop on Mobile Cloud Computing and Services. ACM, 2015. Vasilescu, Mihai, Lucian Gheorghe, and Nicolae Tapus. "Practical malware analysis based on sandboxing." RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference, 2014. IEEE, 2014. Lee, Sungjin. "New Malware Analysis Method on Digital Forensics." Indian Journal of Science and Technology 8.17 (2015). Ligh, Michael, et al. Malware analyst's cookbook and DVD: tools and techniques for fighting malicious code. Wiley Publishing, 2010.