Annotated Control Flow Graph for Metamorphic Malware Detection

The Computer Journal Advance Access published December 15, 2014 c The British Computer Society 2014. All rights reserved. For Permissions, please email: [email protected] doi:10.1093/comjnl/bxu148

Annotated Control Flow Graph for Metamorphic Malware Detection Shahid Alam1∗ , Issa Traore2 and Ibrahim Sogukpinar3 1 Department of Computer Science, University of Victoria, Victoria, BC, Canada 2 Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada 3 Department of Computer Engineering, Gebze Institute of Technology, Gebze, Kocaeli, Turkey ∗Corresponding author: [email protected]

Keywords: annotated control flow graph; static binary analysis; malware detection; optimizations Received 24 August 2014; revised 7 November 2014 Handling editor: Steven Furnell

1.

INTRODUCTION

End point security is often the last defense against a security threat. An end point can be a desktop, server, laptop, kiosk or mobile device that connects to a network (Internet). Recent statistics by the International Telecommunications Union [1] show that the number of Internet users (i.e. people connecting to the Internet using these end points) in the world have increased from 20% in 2006 to 40% (almost 2.7 billion in total) in 2013. A study carried out by Symantec about the impacts of cybercrime reports, that worldwide losses due to malware attacks and phishing between July 2011 and July 2012 were $110 billion [2]. Because of the financial and other gains attached with the growing malware industry, there is a need to automate the process of malware analysis and provide realtime malware detection. In the early days, malware writers were hobbyists, however professionals have now become part of this group, either for the financial gains [3] attached to it, or for political and military

reasons. One of the basic techniques used by a malware writer is obfuscation [4]. Such a technique obscures a code to make it difficult to understand, analyze and detect malware embedded in the code. Initial obfuscators were simple and detectable by simple signature-based detectors. These signature-based detectors work on simple signatures such as byte sequences, instruction sequences and string signatures (pattern of a malware that uniquely identifies it). However, they lack information about the semantics or behavior of the malicious program. To counter these detectors the obfuscation techniques evolved, producing metamorphic malware that use stealthy mutation techniques [4–8]. Likewise, to address effectively the challenges posed by metamorphic malware, we need to develop new methods and techniques to analyze the behavior of a program and make a better detection decision with few false positives.

Section D: Security in Computer Systems and Networks The Computer Journal, 2014

Downloaded from http://comjnl.oxfordjournals.org/ at Qatar University on December 16, 2014

Metamorphism is a technique that mutates the binary code using different obfuscations and never keeps the same sequence of opcodes in the memory. This stealth technique provides the capability to a malware for evading detection by simple signature-based (such as instruction sequences, byte sequences and string signatures) anti-malware programs. In this paper, we present a new scheme named Annotated Control Flow Graph (ACFG) to efficiently detect such kinds of malware. ACFG is built by annotating CFG of a binary program and is used for graph and pattern matching to analyse and detect metamorphic malware. We also optimize the runtime of malware detection through parallelization and ACFG reduction, maintaining the same accuracy (without ACFG reduction) for malware detection. ACFG proposed in this paper: (i) captures the control flow semantics of a program; (ii) provides a faster matching of ACFGs and can handle malware with smaller CFGs, compared with other such techniques, without compromising the accuracy; (iii) contains more information and hence provides more accuracy than a CFG. Experimental evaluation of the proposed scheme using an existing dataset yields malware detection rate of 98.9% and false positive rate of 4.5%.

2

S. Alam et al.

2.

RELATED WORKS

This section discusses the previous research efforts that use CFG for metamorphic malware detection. We cover only 1 The work presented in this paper is an expansion of the previously published work in [22, 23].

academic research efforts that claim or will extend their detector to detect metamorphic malware. In [13], the authors use an intermediate language called CFGO-IL to simplify transformation of a program in the x86 assembly language to a CFG. After translating a binary program to CFGO-IL, the program is optimized to make its structure simpler. The optimizations also remove various malware obfuscations from the program. These optimizations include dead code elimination, removal of unreachable branches, constant folding and removal of fake conditional branches inserted by malware. The authors developed a prototype malware detection tool using CFGO-IL that take advantage of the optimizations and the simplicity of the language. The size of a CFGO-IL program tends to increase compared with the original assembly program and hence the CFG of the program. As mentioned in the paper [13], the technique proposed is not able to discriminate malware from a benign program if the size of the CFG is less than 100 nodes. ACFG can handle such smaller CFGs. The method described in [14] uses model-checking to detect metamorphic malware. Model-checking techniques check if a given model meets a given specification. The paper [14] used a pushdown system to build a model that takes into account the behavior of the stack. They use IDA Pro [24] (a close source disassembler) to build a CFG of a binary program. This CFG contains information about the register and the memory location values at each control point of the program, and is translated into a pushdown system. The pushdown system stores the control points and the stack of the program. Model-checking is time consuming and it can sometimes run out of memory as was the case with the previous technique [25] by the same authors. The times reported range from a few seconds (for a program consisting of 10 instructions) to over 250 s (for a program consisting of 10 000 instructions). Reallife applications are much larger than the samples tested. In [15], the authors compare the instructions of a basic block in CFG of the original malware, with that of its variants. The comparison is performed using the longest common subsequence [26]. Unlike, other techniques they do not compare the shape of two CFGs, and hence is more time efficient, but, as the preliminary results show, produce a greater number of false positives. This work is still in progress and therefore the complete results are not available. The technique described in [16] is the closest technique to the work presented in this paper. After disassembling a program they perform normalization on the disassembled program. They then build an interprocedural CFG of the normalized disassembled program. For malware detection, CFG of a benign program is checked to find if it contains a subgraph that is isomorphic to CFG of a malware program. The results reported in the paper are preliminary and are incomplete. The method presented in [11, 12] uses CFG for visualizing the control structure and representing the semantic aspects of a



The main goal of this paper is to extract behavioral and structural information from a program to detect the presence of malware in the program. Control flow analysis (CFA) [9, 10] is one of the techniques used in compilers for program analysis and optimization. The CFA of a program is expressed as a control flow graph (CFG). The CFG of a program represents all the paths that can be taken during the program execution. Current techniques [11–20] that use CFG for malware detection are compute intensive (others have poor detection rate) and cannot handle malware with smaller CFGs. It is difficult to write a new metamorphic malware [21] and in general malware writers reuse old malware. To hide detection the malware writers change the obfuscations (syntax) more than the behavior (semantic) of such a new metamorphic malware. If an unknown metamorphic malware uses all or some of the same class of behaviors as are used by the training dataset (set of old metamorphic malware) then it is possible to detect these types of malware. On the assumption and the motivation described above, we propose a new technique named Annotated Control Flow Graph (ACFG) that can enhance the detection of metamorphic malware and can handle malware with smaller CFGs. We also optimize the runtime of malware detection through parallelization and ACFG reduction, making the comparison (matching with other ACFGs) faster, while maintaining the same accuracy (without ACFG reduction) for malware detection, than other techniques that use CFG for malware detection. The annotations in an ACFG provide more information, and hence can provide more accuracy than a CFG. ACFG1 proposed in this paper: (i) captures the control flow semantics of a program; (ii) provides a faster matching of ACFGs and can handle malware with smaller CFGs, in comparison with other such techniques, without compromising the accuracy; (iii) contains more information and hence provides more accuracy than a CFG. The rest of the paper is structured as follows. In Section 2, we discuss related research efforts that use CFG for metamorphic malware detection. In Section 3, we describe in detail the patterns used for CFG annotation, illustrate how a binary program can be translated to an ACFG and introduce our malware detection approach. In Section 4, we show how parallelization and ACFG reduction reduces the runtime of a malware detector. In Section 5, we evaluate the performance of ACFG using an existing malware dataset and compare it with other such techniques. We finally conclude in Section 6.

Annotated Control Flow Graph for Metamorphic Malware Detection

3.

MALWARE ANALYSIS AND DETECTION USING ACFG

Before describing the technique proposed, we first define an ACFG as follows: Definition 1. A basic block is a maximal sequence of instructions with a single entry and a single exit point. There is no instruction after the first instruction that is the target instruction of a jump instruction, and only the last instruction can jump to a different block. Instructions that can start a basic block include the following. The first instruction, target of a branch or a function call, and a fall through instruction, i.e. an instruction following a branch, a function call or a return instruction. Instructions that can end a basic block include the following. A conditional or unconditional branch, a function call and a return instruction. Definition 2. Control flow edge is an edge that represents a control flow between basic blocks. A control flow edge from block a to block b is denoted e = (a, b). Definition 3. A CFG is a directed graph G = (V, E), where V is the set of basic blocks and E is the set of control

flow edges. The CFG of a program represents all the paths that can be taken during the program execution. Definition 4. An Annotated Control Flow Graph (ACFG) is a CFG such that each statement of the CFG is assigned a MAIL Pattern. 3.1.

Patterns for Annotation

We have recently proposed a new intermediate language named MAIL (Malware Analysis Intermediate Language) [23]. There are 21 Patterns in the MAIL language. They are used to build an ACFG of a program that can be used for graph and pattern matching during malware detection. In this section we describe in detail these patterns as follows: ASSIGN: An assignment statement, e.g. EAX=EAX+ECX; ASSIGN_CONSTANT: An assignment statement including a constant, e.g. EAX=EAX+0x01; CONTROL: A control statement where the target of the jump is unknown, e.g. if (ZF == 1) JMP [EAX+ECX+0x10]; CONTROL_CONSTANT: A control statement where the target of the jump is known. For example, if (ZF == 1) JMP 0x400567; CALL: A call statement where the target of the call is unknown, e.g. CALL EBX; CALL_CONSTANT: A call statement where the target of the call is known, e.g. CALL 0x603248; FLAG: A statement including a flag, e.g. CF = 1; FLAG_STACK: A statement including flag register with stack, e.g. EFLAGS = [SP=SP–0x1]; HALT: A halt statement, e.g. HALT; JUMP: A jump statement where the target of the jump is unknown, e.g. JMP [EAX+ECX+0x10]; JUMP_CONSTANT: A jump statement where the target of the jump is known, e.g. JMP 0x680376 JUMP_STACK: A return jump, e.g. JMP [SP=SP-0x8] LIBCALL: A library call, e.g. compare(EAX, ECX); LIBCALL_CONSTANT: A library call including a constant, e.g. compare(EAX, 0x10); LOCK: A lock statement, e.g. lock; STACK: A stack statement, e.g. EAX = [SP=SP–0x1]; STACK_CONSTANT: A stack statement including a constant, e.g. [SP=SP+0x1] = 0x432516; TEST: A test statement, e.g. EAX and ECX; TEST_CONSTANT: A test statement including a constant, e.g. EAX and 0x10; UNKNOWN: Any unknown assembly instruction that cannot be translated. NOTDEFINED: The default pattern, e.g. all the new statements when created are assigned this default value. where EAX, EBX, ECX are the general purpose registers, ZF and CF are the zero and carry flags, respectively, and SP is the stack pointer.



program. They extended the CFG with the extracted API calls to have more information about the executable program. This extended CFG is called the API-CFG. Their system consists of three components: PE (portable executable) file disassembler, API-CFG generator and classification module. They built the API-CFG as follows. First, they disassemble a PE file using a third party disassembler. Then the unnecessary instructions that are not required for building the CFG are removed from the disassembled instructions. The instructions kept are: jumps, procedure calls, API calls and target of jump instructions. A feature vector is generated using the API-CFG, which is a sparse graph and can be represented by a sparse matrix. They store all the nonzero entries in a feature vector. An algorithm is given in the paper for converting the API-CFG to a feature vector. Different classifiers (Decision Stump, SMO, Naive Bayes, Random Tree, Lazy K-Star and Random Forest) are used to process the data consisting of the feature vectors of the PE files, and then decide if a PE file is a malware or not. The system implemented is dependent on a third party close source disassembler. The disassembler used cannot disassemble more than one file at a time, so they used a script to automate disassembling a set of files at once. The proposed system cannot be used as a real-time malware detector. Furthermore, the proposed techniques cannot be used to detect metamorphic malware, but as mentioned in the paper this option will be explored in the future.

3

4 3.2.

S. Alam et al. Approach overview

3.3.

ACFG construction

To translate a binary program, we first disassemble the binary program into an assembly program, and then translate this assembly into an ACFG. We use a sample program, actually a malware sample, to illustrate the steps involved in the translation as shown in Fig. 1. This example shows how an assembly program is translated into an ACFG. The binary analysis of the function shown in Fig. 1 identifies five blocks in this function labeled 18–25. There are two sets of columns separated by →. The first set of columns lists the hex dump of the function and the second set of columns lists the corresponding translated MAIL statements with annotations. The example shows the data embedded inside the code section in block 27. This block is used to store, load and process data by the function as pointed out by the underlined addresses in the blocks. There are five instructions that change the control flow of the program as indicated by the arcs in the figure. There are two back edges 20 ← 22 and 19 ← 23, which indicate the presence of loops in the program. The jump in block 24 indicates a jump out of the function. Each MAIL statement in a block in the ACFG is annotated with a type also called a pattern. There are a total of 21 patterns as explained in Section 3.1 that are used to build the ACFG. For example, an assignment statement with a constant value and an assignment statement without a constant value are

3.4.

Subgraph matching

Before explaining the subgraph matching technique used in this paper for malware detection, we first define graph isomorphism [27] as follows: Let G = (VG , E G ) and H = (VH , E H ) be any two graphs, where VG , VH and E G , E H are the sets of vertices and edges of the graphs, respectively. Definition 5. A vertex bijection (one-to-one mapping) denoted as f V = VG → VH and an edge bijection denoted as f E = E G → E H are consistent if for every edge e ∈ E G f V maps the endpoints of e to the endpoints of edge f E (e). Definition 6. G and H are isomorphic graphs if there exists a vertex bijection f V and an edge bijection f E that are consistent. This relationship is denoted as G ∼ = H. An example of isomorphism is shown in Fig. 2. The edges of graphs G and H1 are not consistent, e.g. edge {00, 10} in graph G is not mapped to any edges in graph H1 , therefore graphs G and H1 are not isomorphic. Whereas the edges of graphs G and H2 are consistent, therefore graphs G and H2 are isomorphic. In our malware detection approach, graph matching is defined in terms of subgraph isomorphism. Given the input of two graphs, subgraph isomorphism determines if one of the graphs contains a subgraph that is isomorphic to the other graph. Generally, subgraph isomorphism is an NP-Complete (NP refers to nondeterministic polynomial time) problem [28]. An ACFG of a program is usually a sparse graph, therefore it is possible to compute the isomorphism of two ACFGs in a reasonable amount of time. Based on the definition of graph isomorphism presented above, we formulate our ACFG matching approach as follows: Let P = (V P , E P ) denote an ACFG of the program and M = (VM , E M ) denote an ACFG of the malware, where V P , VM and E P , E M are the sets of vertices and edges of the graphs, respectively. Let Psg = (Vsg , E sg ) where Vsg ⊆ V P and E sg ⊆ E P (i.e. Psg is a subgraph of P). If Psg ∼ = M, then P and M are considered as matching graphs. After the binary analysis performed in Section 3.3 we obtain a set of ACFGs (each corresponding to a separate function) of a program as shown in Fig. 1. To detect if a program contains a malware we compare the ACFGs of the program with the ACFGs of known malware samples from our training database. If a percentage of the ACFGs of the program, greater than a predefined threshold, match one or several of the ACFGs of a malware sample (from the database) then the program will be classified as malware. As part of the machine learning process,



Almost all malware use binaries to infiltrate a computer system. Binary analysis is the process of automatically analyzing the structure and behavior of a binary program. We use binary analysis for malware detection. A binary program is first disassembled (for this paper, we assume that the binary is unpacked) and translated to a MAIL program. We then build a CFG of the MAIL program and annotate it with patterns described above. This annotated CFG (ACFG) becomes part of the signature of the program and is matched against a database of known malware samples to see if the program contains malware or not. This approach is very useful in detecting known malware but may not be able to detect unknown malware. To detect unknown malware, after a program sample is translated to MAIL, an ACFG for each function in the program is built. Instead of using one large ACFG as a signature, we divide a program into smaller ACFGs, with one ACFG per function. A program signature is then represented by the set of corresponding (smaller) ACFGs. A program that contains part of the control flow of a training malware sample, is classified as malware, i.e. if a percentage (compared with some predefined threshold computed empirically) of the number of ACFGs involved in a malware signature match with the signature of a program then the program is classified as malware.

two different patterns. Jump statements can have up to three patterns and so on.


5


FIGURE 1. Disassembly and translation to ACFG of one of the functions of a malware sample. The arcs are used to represent the control flow of the program and show the transfer of control from one block to another block.


6

S. Alam et al.

(a)

(b)

(c)

FIGURE 2. Example of graph isomorphism. Graphs G and H2 are isomorphic but not G and H1 . (a) Graph G, (b) Graph H1 , and (c) Graph H2 .

(b)

FIGURE 3. An example of subgraph matching. The graph in (a) is matched as a subgraph of the graph in (b). (a) A malware sample and (b) the malware embedded (egg-shaped nodes) inside a benign program.

Very small graphs when matched against a large graph can produce a false positive. Likewise to alleviate the impact of small graphs on detection accuracy, we integrate a Pattern Matching sub-component within the Subgraph Matching component. If an ACFG of a malware sample matches with an ACFG of a program (i.e. the two ACFGs are isomorphic), then we further use the annotations/patterns, present in the ACFG, to match each statement in the matching nodes of the two ACFGs. A successful match requires all the statements in the matching nodes to have the same (exact) annotations, although there could be differences in the corresponding statement blocks. An example of Pattern Matching of two isomorphic ACFGs is shown in Fig. 4. One of the ACFGs of a malware sample, shown in Fig. 4a, is isomorphic to a subgraph of one of the ACFGs of a benign program, shown in Fig. 4b. Considering these two ACFGs as a match for malware detection will produce a wrong result, a false positive. The statements in the benign program do not match with the statements in the malware sample. To reduce this false positive we have two options: (i) we can match each statement exactly with each other or (ii) assign patterns to these statements for matching. Option (i) will not be able to detect unknown malware samples and is time consuming, so we use option and (ii) in our approach, which in addition to reducing false positives has the potential of detecting unknown malware samples. For a successful pattern matching we require all the statements in the matching blocks to have the same patterns. In Fig. 4, only the statements in block 0 satisfy this requirement. The statements in all the other blocks do not satisfy this requirement, therefore these ACFGs fail the pattern matching.

4. the threshold is computed empirically; by running the detector on the training malware samples with different values of thresholds; and picking a threshold with the best detection rate. An example of malware detection using subgraph matching is shown in Fig. 3.

Pattern matching

RUNTIME OPTIMIZATION

The Subgraph Matching component matches the ACFG against all the malware sample graphs. As the number of nodes in the graph increases the Subgraph Matching runtime increases. The runtime also increases with the increase in the number of malware samples but provide some options for



(a)

3.5.


(a)

7

(b)

optimization. We use two techniques to improve the runtime, namely, parallelization and ACFG reduction, described in the following.

4.1.

Parallelization

Figure 5 shows an example of matching ACFGs for malware detection as explained in Section 3.2. The Malware program has 4 ACFGs (functions) represented as {m 1 , m 2 , m 3 , m 4 } and the benign program has nine ACFGs (functions) represented as {b1 , b2 , b3 , b4 , b5 , b6 , b7 , b8 , b9 }. Each ACFG of the malware program is matched with ACFGs of the benign program. The example has the following successful matches: m 1 → b5 , m 2 → b4 and m 4 → b9 . The three ACFGs of the malware program successfully match with the three ACFGs out of the total nine ACFGs of the benign program, i.e. over 25% match. Using a threshold of 25% the benign program is detected as malware. The example discussed above, is matching one benign sample with one malware sample. Similarly, each testing sample is matched with all the training malware samples in the

FIGURE 5. Matching of ACFGs for malware detection. ACFGs at the top belongs to a malware program and ACFGs at the bottom belongs to a benign program. A successful match is shown through a bold line with arrow on both sides.

dataset. Based on these matchings, there are two opportunities to parallelize the Subgraph Matching component. 1. Each ACFG matching in a sample is independent of the other matchings, these matchings can be performed in parallel. 2. Each sample can be processed independently of the other samples, the processing of the samples can be performed in parallel.



FIGURE 4. Example of pattern matching of two isomorphic ACFGs. The ACFG in (a) is isomorphic to the subgraph (blocks 0–3) of the ACFG in (b).

8

S. Alam et al. TABLE 1. Runtime optimization by parallelizing the Subgraph Matching component using different number of threads. 2 Cores Number of threads 2 4 8 16 32 250 8

4 Cores

Runtime reduced by (times) 3.20 5.92 7.20 5.72 5.64 5.41 14.87

Number of threads 4 8 32 64 128 250 64

Runtime reduced by (times) 3.98 4.11 5.95 7.76 7.53 6.37 14.53

Machines used: Intel Core i5 CPU M 430 (2 Cores) @ 2.27 GHz with 4 GB of RAM and Windows 8 Professional installed Intel Core 2 Quad CPU Q6700 (4 Cores) @ 2.67 GHz with 4 GB of RAM and Windows 7 Professional installed.

Based on this experiment, we developed Equation (1) to compute the maximum number of threads to be used by the Subgraph Matching component. We also give an option to the user to choose the maximum number of threads used by the tool for the Subgraph Matching component. Threads = (NC)3

(1)

where NC = Number of CPUs/Cores. 4.2.

ACFG reduction

One of the advantages of using ACFG is that it contains annotations for malware detection. Our detection method uses both subgraph matching and pattern matching techniques for metamorphic malware detection. Even if we reduce the number of blocks in an ACFG (it is possible for an ACFG of some binaries to be reduced to a very small number of blocks) we still get a good detection rate because of the combination of the two techniques. To reduce the number of blocks (nodes) in an ACFG for runtime optimization, we carried out ACFG reduction, also called ACFG shrinking. We reduce the number of blocks in an ACFG by merging them together. Two blocks are merged only if the merging does not change the control flow of a program. Given two blocks A and B in an ACFG, if all the paths that reach node B pass through block A, and all the children of A are reachable through B, then A and B will be merged. Figure 6 shows an example of ACFG, from the dataset described in this paper, before and after shrinking. The shrinking does not change the shape of a graph. As we can see



Multicore processors are becoming popular nowadays. All the current desktops, laptops and even the energy efficient small mobile devices contain a multicore processor. Intel has recently announced its Single-Chip Cloud Computer [29], a processor with 48 cores on a chip. Keeping in view this ubiquitousness of multicores in the host machines (also called the end points), the two opportunities listed above and to optimize the runtime, we decided to use Windows threads to parallelize the Subgraph Matching component. The following paragraphs give an overview of our thread implementation approach. We create two sets of threads. One for each sample under test in the dataset, called the global threads, and the other for each ACFG matching in the sample, called the local threads, i.e. each sample runs in its own thread and this thread creates children threads, one for each ACFG in the sample. A local thread manager (LTM) is implemented to synchronize between the set of threads local to a sample. The LTM counts the number of successful matches and sends a signal to all other local threads to stop if the count crosses the threshold, i.e. the sample is malware and hence there is no need for further ACFG matching. A global thread manager (GTM) is implemented to synchronize between the global threads created for each sample. The GTM and LTM count and keep the total number of threads in the system to MAX_THREAD. The value of MAX_THREAD is computed using Equation (1). Threads share and synchronize information (such as thread count etc.) through a shared variable. To decrease the wait time and improve the runtime of a thread for next matching, other information such as ACFGs are not shared among threads, and are copied to the memory space of each thread. To evaluate the effectiveness of our parallelization approach described above, we carried out an experiment using a different number of threads ranging from 2 to 250. Another reason for this experiment was to develop an empirical model to compute the number of threads to be used in the Subgraph Matching component. We used 250 malware samples and 30 benign applications each with a different size of ACFG. The experiment was run on machines with 2 and 4 Cores. Details of the machines used and the results of this experiment are shown in Table 1. The percentage of CPU utilization on average was three times more with threads than without threads. This is also confirmed by the improvement in the runtime. As we increased the number of threads the reduction in runtime also increased up to a certain number (8 in 2 Cores and 64 in 4 Cores) of threads, after which the reduction in the runtime started decreasing. As we increased the number of benign samples from 30 to 1387 (the last row in Table 1) the runtime only reduced approximately by two times. This shows that increasing the number of samples will not effect the results of this empirical study. Beside other factors such as the number of CPUs/Cores and the memory available to the application, the other reasons for this reduction in the runtime are the fact that more time was spent on thread management and the increased sharing of information among threads.


9

(a)

(b)

in the figure, the shape of the graphs before and after shrinking are the same. More of these examples are available at [30]. We were able to substantially reduce the number of nodes per ACFG (in total a 90.6% reduction), as shown in Table 4. This reduced the runtime on average by 2.7 times (for a smaller dataset) and 100 times (for a larger dataset), and still achieved a detection rate of 98.9% with a false positive rate of 4.5% as is latter shown in Table 6.

5.

EXPERIMENTS

We conducted an experiment to evaluate the performance of our malware detection technique. The evaluation was carried out using the prototype implementation of our detector named MARD (for Metamorphic malware Analysis and Real-time Detection) [22]. MARD fully automates the malware analysis and detection process, without any manual intervention during a complete run. We present, in this section, the evaluation metrics, experimental settings, obtained results and a comparison of ACFG with other such techniques.

5.1.

Performance metrics

Before evaluating the performance of our malware detection technique, we first define the following performance metrics: True positive (TP) is the number of malware that are classified as malware. True negative (TN) is the number of benign programs that are classified as benign. False positive (FP) is the number of benign programs that are classified as malware. False negative (FN) is the number of malware that are classified as benign. Precision is the fraction of detected malware samples that are correctly detected. Accuracy is the fraction of samples, including malware and benign, that are correctly detected as either malware or benign. These two metrics are defined as follows: Precision =

TP TP + FP

Accuracy =

TP + TN P+N

where P and N are the total number of malware and benign programs, respectively. Now we define the mean maximum precision (MMP) and mean maximum accuracy (MMA) for



FIGURE 6. ACFG of one of the functions of one of the samples of the MWOR class of malware, before and after shrinking. The ACFG has been reduced from 92 nodes to 47 nodes. (a) Before shrinking and (b) after shrinking.

10

S. Alam et al.

n-fold cross-validation as follows: MMP = MMA =

TABLE 2. Class distribution of the 1020 metamorphic malware samples.

n 1 Precisioni n

(2)

1 n

(3)

i=1 n

Accuracyi

70

NGVCK_2

200

G2 MWOR_1

50 100

TP P FP FPR = N TPR =

MWOR_2

100

MWOR_3

100

MWOR_4

100

MWOR_5

100

MWOR_6

100

MWOR_7

100

(4) (5)

Dataset

The dataset used for the experiments consists of total 3350 sample Windows and Linux programs. Out of the 3350 programs, 1020 are metamorphic malware samples collected from three different resources [31–33], and the other 2330 are benign programs. The 1020 malware samples belong to the following three metamorphic family of viruses: Next Generation Virus Generation Kit (NGVCK) [34], Second Generation Virus Generator (G2) [35] and Metamorphic Worm (MWOR) generated by metamorphic generator [31]. NGVCK and MWOR family of viruses are further divided into two and seven Classes, respectively. This Class distribution is shown in Table 2. The dataset distribution based on the number of ACFGs for each sample, and size of the ACFG after normalization and shrinking is shown in Tables 3 and 4, respectively. The normalizations carried out are removal of NOP (no operation) and other junk instructions. The dataset contains a variety of programs with ACFGs ranging from simple to complex for testing. As shown in Table 3, the number of ACFGs per malware sample ranges from 2 to 1272 and the number of ACFGs per benign program ranges from 0 to 1148. Some of the Windows dynamic link libraries that were used in the experiment do not have code but only data (i.e. they cannot be executed) and as a result they have 0 node ACFGs. The sizes of these ACFGs are shown in Table 4. The size of the ACFGs of the malware samples range from 1 to 301 nodes, and the size of the ACFGs of the benign programs range from 1 to 521 nodes. This variety of ACFGs and malware classes in the samples provides a good testing platform for the proposed malware detection framework.

Comments Generated by NGVCK with simple set of obfuscations, such as dead code insertion and instruction reordering etc. Generated by NGVCK with complex set of obfuscations, such as indirect jump (e.g. push followed by a ret instruction) to one of the data sections etc. Generated by G2 Generated by MWOR with a padding ratio of 0.5 Generated by MWOR with a padding ratio of 1.0 Generated by MWOR with a padding ratio of 1.5 Generated by MWOR with a padding ratio of 2.0 Generated by MWOR with a padding ratio of 2.5 Generated by MWOR with a padding ratio of 3.0 Generated by MWOR with a padding ratio of 4.0

MWOR uses two morphing techniques: dead code insertion and equivalent instruction substitution. Padding ratio is the ratio of the number of dead code instructions to the core instructions of the malware. Padding ratio of 0.5 means that the malware has half as much dead code instructions as core instructions [31].

5.3.

Empirical study

In this section, we discuss the experiments, and present the performance results obtained. The experiments were run on the following machine: Intel Core i5 CPU M 430 (2 Cores) @ 2.27 GHz with 4 GB RAM, operating system Windows 8 professional. We carried out two experiments: one with a smaller dataset using 10-fold cross validation and the other with a larger dataset using 5-fold cross validation. To find the runtime improvement by ACFG reduction, both these experiments were carried out with and without ACFG reduction. We were able to complete all the runs (10 times) with and without ACFG reduction for the smaller dataset. We completed all the runs (five times) with ACFG reduction with the larger dataset but completed only one run (took over 34 h to complete) without ACFG reduction. This is one of the other reasons to use a smaller dataset with a larger dataset, to get accurate results for runtime improvement. In the following two



NGVCK_1

i=1

We also use two other metrics, TP rate (TPR) and FP rate (FPR). The TPR, also called detection rate (DR), corresponds to the percentage of samples correctly recognized as malware out of the total malware dataset. The FPR metric corresponds to the percentage of samples incorrectly recognized as malware out of the total benign dataset. These two metrics are defined as follows:

5.2.

Number of samples

Class

Annotated Control Flow Graph for Metamorphic Malware Detection TABLE 3. Dataset distribution based on the number of ACFGs for each program sample. Malware samples (1020) Number of ACFGs 2 5–32 33–57 58–84 85–133 140–249 133–1272

Number of samples 250 204 222 133 105 94 12

11

TABLE 5. Malware detection results for smaller dataset.

Benign program samples (2330)

Training set size

Number of ACFGs

Number of samples

25 125

0–50 51–100 101–199 201–399 400–598 606–987 1001–1148

1277 317 310 282 111 30 3

DR (%)

FPR (%)

MMP

MMA

94 99.6

3.1 4

0.86 0.85

0.96 0.97

On average it took MARD 15.2037 s with ACFG shrinking and 40.4288 s without ACFG shrinking to complete the malware analysis and detection for 1351 samples including 25 training malware samples. This time excludes time for the training. MARD achieved the same values for all the other performance metrics (DR, FPR, MMP and MMA) with and without ACFG shrinking.

TABLE 4. Dataset distribution based on the size (number of nodes) for each ACFG after normalization and shrinking. Malware samples (1020) Number of nodes 1–10 11–20 21–39 41–69 70–96 104–183 221–301

Benign program samples (2330)

Number of ACFGs

Number of nodes

60 284 1652 2177 288 207 254 2

1–10 11–20 21–39 40–69 70–99 100–194 245–521

Number of ACFGs 242 240 3466 1086 368 61 60 15

Total number of nodes before ACFG shrinking = 4 908 422. Total number of nodes after ACFG shrinking = 462 974 (90.6% reduced). Runtime on average reduced by 2.7 times (for smaller dataset) and 100 times (for larger dataset).

sections, we present and describe these two experiments with the results. 5.3.1.

Experiment with smaller dataset using 10-fold cross validation Out of 3350 Windows programs we selected randomly 1351 Windows programs. Out of these 1351 programs, 250 are metamorphic malware samples and the other 1101 are benign programs. The 10-fold cross validation was conducted by selecting 25 malware samples out of the 250 malware to train our detector. The remaining 225 malware samples along with the 1101 benign programs were then used to test the detector. These two steps were repeated 10 times and each time a different set of 25 malware samples were selected for training and the remaining samples for testing. The overall performance results were obtained by averaging the results obtained in the 10 different runs.

Training set size

DR (%)

204 510

97 98.9

FPR (%)

MMP

MMA

4.3 4.5

0.91 0.91

0.96 0.97

On average it took MARD 946.5824 s with ACFG shrinking and over 125 400 s (over 34 h) without ACFG shrinking to complete the malware analysis and detection for 3350 samples including 204 training malware samples. This time excludes time for the training. Because of the time constraints we did not perform 5-fold cross validation without ACFG shrinking. The time (over 34 h) reported is just for one run of the experiment without ACFG shrinking.

Using a threshold value of 25%, we conducted further evaluation by increasing the size of the training set from 25 samples to 125 malware samples (50% of the malware samples). The obtained results are listed in Table 5. The DR improved from 94% when the size of the training set is 25– 99.6% when we used a training dataset of 125 samples. 5.3.2.

Experiment with larger dataset using 5-fold cross validation The 5-fold cross validation was conducted by selecting 204 malware samples out of the 1020 malware to train our detector. The remaining 816 malware samples along with the 2330 benign programs in our dataset were then used to test the detector. These two steps were repeated five times and each time a different set of 204 malware samples were selected for training and the remaining samples for testing. The overall performance results were obtained by averaging the results obtained in the five different runs. Using a threshold value of 25%, we conducted further evaluation by increasing the size of the training set from 204 samples to 510 malware samples (50% of the malware samples). The obtained results are listed in Table 6. The DR improved from 97% when the size of the training set was 204– 98.9% when we used a training dataset of 510 samples.



TABLE 6. Malware detection results for larger dataset.

12

S. Alam et al.

TABLE 7. Summary and comparison with ACFG of the metamorphic malware analysis and detection systems discussed in Section 2 and others.

System

DR (%)

FPR (%)

Data set size benign/malware

Real-time

Platform

Static Static Static Static Static Static Dynamic Static Static Static Static Static Static Static Dynamic

98.9 97.53 98.4 100 ∼98 91 100 ∼90 93.5 ∼98 100 100 91 100 98

4.5 1.97 2.7 12 ∼2 0 3 ∼2 0.5 ∼0.5 1 1 52 0 2.9

2330/1020 2140/2305 3234/3256 100/100 40/200 300/100 56/ 42 40/200 102/77 40/800 41/200 8/200 150/1209 25/30 385/826

✓ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗

Win & Linux 64 Win 32 Win 32 Win 32 Win & Linux 32 Win 32 Win XP 64 Win & Linux 32 Win & Linux 32 Linux 32 Win & Linux 32 Win 32 Win 32 Win 32 Win XP 64

ACFG API-CFG [11, 12] Call-Gram [36] CFGO-IL [13] Chi-squared [37] Code-graph [38] DTA [39] Opcode-HMM-Wong [40] Opcode-HMM-Austin [41] Opcode-SD [42] Opcode-graph [33] Model-checking [14] MSA [43] VSA-1 [44] VSA-2 [45]

Real-time here means the detection is fully automatic and finishes in a reasonable amount of time as shown in Tables 5 and 6. Some of the reasons why other techniques does not provide real-time detection are mentioned in Section 2. Moreover, none of the other techniques compute or mention their runtime for malware detection. The perfect results should be validated with more number of samples than tested in the paper. The values for Opcode-Graph are not directly mentioned in the paper. We compute these values by picking a threshold of 0.5 for the similarity score in the paper.

5.4.

Comparison with others

Table 7 gives a comparison of ACFG with the research efforts of detecting the metamorphic malware discussed in Section 2 and others, that do not use CFG and claim to detect metamorphic malware. None of the prototype systems implemented can be used as a real-time detector. Furthermore a few systems that claim a perfect detection rate were validated using small datasets. Out of all the research efforts, API-CFG, Call-Gram and VSA-2 show impressive results. API-CFG does not yet support detection of metamorphic malware, VSA-2 is using a controlled environment for detection, and Call-Gram is not fully automated and their performance overheads are not mentioned in the paper. The dataset used by VSA-2 is comparatively smaller than the other two. Most of the other techniques, such as Chi-Squared, Opcode-SD, Opcode-Graph and Opcode-Histogram show good results, and some of them may have the potential to be used in a real-time detector by improving their implementation. Table 7 reports the best DR results achieved by these detectors. Out of the 15 systems, ACFG clearly shows superior results (the perfect results in Table 7 should be validated with more samples) and, unlike others is fully automatic, supports metamorphic malware detection for 64 bit Windows (PE binaries) and Linux (ELF binaries) platforms and has the potential to be used as a real-time detector. We have also compared our malware detection with four commercial antimalware programs, including AVG AntiVirus,

Microsoft Security Essentials, McAfee and Kaspersky. We used G2 and NGVCK malware families (320 out of the 1020 malware samples, Table 2) for testing these antimalware programs. Out of the 320 malware samples, Microsoft Security Essentials detected 20 (6.25%), McAfee detected 268 (83.75%), Kaspersky detected 268 (83.75%) and AVG detected 274 (85.63%). The combined results of these antimalware programs tagged all the 320 samples as malware, however, none of the individual programs achieved a DR over 86%, whereas ACFG achieved a much higher DR.

6.

CONCLUSION

In this paper, we have presented a new technique named ACFG, and shown through experimental evaluation its effectiveness in metamorphic malware analysis and detection and the ability to handle malware with smaller CFGs. We have also optimized the runtime of malware detection through parallelization and ACFG reduction, that makes the comparison (matching with other ACFGs) faster, while keeping the same accuracy (without ACFG reduction) for malware detection, than other techniques that use CFG for malware detection. The annotations in an ACFG provide more information, and hence can provide more accuracy than a CFG. Currently, we are carrying out further research into using similar techniques for web malware analysis and detection. Since metamorphic malware is a challenging test case, we believe that our proposed technique can also be effectively



Analysis type

Annotated Control Flow Graph for Metamorphic Malware Detection used for the detection of other malware and would like to carry out such experiments in the future. Our future work will also consist of strengthening our existing algorithms by investigating and incorporating more powerful pattern recognition techniques.

[16]

[17]

REFERENCES [18]

[19]

[20]

[21] [22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

Code. Proc. 2nd Int. Conf. Security of Information and Networks, SIN’09, New York, NY, USA, October, pp. 225–228. ACM SIGSAC. Bruschi, D., Martignoni, L. and Monga, M. (2006) Detecting Self-Mutating Malware Using Control-Flow Graph Matching. DIMVA, 2006, July, pp. 129–143. Springer, Berlin, Heidelberg. Cesare, S. and Xiang, Y. (2011) Malware Variant Detection Using Similarity Search over Sets of Control Flow Graphs. TrustCom, 2011, Changsha, China, November, pp. 181–189. IEEE Computer Society. Guo, H., Pang, J., Zhang, Y., Yue, F. and Zhao, R. (2010) Hero: A Novel Malware Detection Framework based on Binary Translation. ICIS, 2010, Xiamen, China, October, pp. 411–415. IEEE. Kirda, E., Kruegel, C., Banks, G., Vigna, G. and Kemmerer, R.A. (2006) Behavior-Based Spyware Detection. Proc. 15th Conf. USENIX Security Symposium, vol. 15, USENIX-SS’06, Berkeley, CA, USA, April 19. USENIX Association. Flake, H. (2004) Structural Comparison of Executable Objects. In Flegel, U. and Michael, M. (eds), DIMVA, LNI 46, Dortmund, Germany, July, pp. 161–173. GI. Szor, P. (2005) The Art of Computer Virus Research and Defense. Addison-Wesley Professional, Boston, MA, USA. Alam, S., Horspool, R.N. and Traore, I. (2014) MARD: A Framework for Metamorphic Malware Analysis and Real-Time Detection. Advanced Information Networking and Applications, Research Track—Security and Privacy, AINA’14, Washington, DC, USA, May, pp. 480–489. IEEE Computer Society. Alam, S., Horspool, R.N. and Traore, I. (2013) MAIL: Malware Analysis Intermediate Language—A Step Towards Automating and Optimizing Malware Detection. Security of Information and Networks, SIN’13, New York, NY, USA, November, pp. 233– 240. ACM SIGSAC. Eagle, C. (2008) The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, San Francisco, CA, USA. Song, F. and Touili, T. (2012) Pushdown model checking for malware detection. In Flanagan, C. and Konig, B. (eds), Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 7214. Springer, Berlin, Heidelberg, Germany. Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C. (2009) Introduction to Algorithms (3rd edn). The MIT Press, Cambridge, MA, USA. Gross, J.L. and Yellen, J. (2005) Graph Theory and Its Applications, Second Edition (Discrete Mathematics and Its Applications). Chapman & Hall/CRC, Boca Raton, FL, USA. Cook, S.A. (1971) The Complexity of Theorem-Proving Procedures. Proc. 3rd Annual ACM Symposium on Theory of Computing, STOC’71, New York, NY, USA, May, pp. 151–158. ACM. Mattson, T.G., et al. (2010) The 48-Core SCC Processor: The Programmer’s View. SC’10, Washington, DC, USA, November, pp. 1–11. IEEE Computer Society. Alam, S. (2013). Examples of cfgs before and after shrinking. http://web.uvic.ca/∼salam/PhD/cfgs.html



[1] International Telecommunications Union (ITU) (2011) The World in 2011: ICT Facts and Figures. http://www.itu.int/ ITU-D/ict/facts/2011/material/ICTFactsFigures2011.pdf [2] Symantec, C. (2012) Norton Cybercrime Report. http://now-static.norton.com/now/en/pu/images/Promotions/ 2012/cybercrimeReport/2012_Norton_Cybercrime_Report_ Master_FINAL_050912.pdf [3] Bauer, J., Eeten, M. and Wu, Y. (2008) ITU Study on the Financial Aspects of Network Security: Malware and Spam. http://www.itu.int/ITUD/cyb/cybersecurity/docs/itu-studyfinancial-aspects-of-malware-and-spam.pdf [4] Collberg, C., Thomborson, C. and Low, D. (1998) Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. Proc. 25th ACM SIGPLAN-SIGACT Sym. Principles of Programming Languages, POPL’98, New York, NY, USA, January, pp. 184– 196. ACM. [5] OKane, P., Sezer, S. and McLaughlin, K. (2011) Obfuscation: the hidden malware. IEEE Secur. Priv., 9, 41–47. [6] Linn, C. and Debray, S. (2003) Obfuscation of Executable Code to Improve Resistance to Static Disassembly. ACM CCS, New York, NY, USA, October, pp. 290–299. ACM. [7] Kuzurin, N., Shokurov, A., Varnovsky, N. and Zakharov, V. (2007) On the Concept of Software Obfuscation in Computer Security. Proc. 10th Int. Conf. Information Security, ISC’07, Berlin, Heidelberg, October, pp. 281–298. Springer. [8] Borello, J.-M. and Mé, L. (2008) Code obfuscation techniques for metamorphic viruses. J. Comput.Virol., 4, 211–220. [9] Aho, A.V., Lam, M.S., Sethi, R. and Ullman, J.D. (2006) Compilers: Principles, Techniques, and Tools (2nd edn). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. [10] Muchnick, S.S. (1997) Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA. [11] Eskandari, M. and Hashemi, S. (2012) A graph mining approach for detecting unknown malwares. J. Vis. Lang. Comput., 23, 154–162. [12] Eskandari, M. and Hashemi, S. (2012) ECFGM: Enriched control flow graph miner for unknown vicious infected code detection. J. Comput. Virol., 8, 99–108. [13] Anju, S.S., Harmya, P., Jagadeesh, N. and Darsana, R. (2010) Malware Detection Using Assembly Code and Control Flow Graph Optimization. A2CWiC, 2010, New York, NY, USA, September, p. 65. ACM. [14] Song, F. and Touili, T. (2012) Efficient malware detection using model-checking. Form. Methods, 7436, 418–433. [15] Vinod, P., Laxmi, V., Gaur, M.S., Kumar, G.P. and Chundawat, Y.S. (2009) Static CFG Analyzer for Metamorphic Malware

13

14

S. Alam et al. [39] Yin, H. and Song, D. (2013) Privacy-Breaching Behavior Analysis. Automatic Malware Analysis. Springer Briefs in Computer Science, pp. 27–42. Springer, New York, NY, USA. [40] Wong, W. and Stamp, M. (2006) Hunting for metamorphic engines. J. Comput. Virol., 2, 211–229. [41] Austin, T.H., Filiol, E., Josse, S. and Stamp, M. (2013) Exploring Hidden Markov Models for Virus Analysis: A Semantic Approach. 2013 46th Hawaii International Conference on System Sciences (HICSS), Hawaii, USA, January, pp. 5039– 5048. IEEE. [42] Shanmugam, G., Low, R.M. and Stamp, M. (2013) Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech., 9, 159–170. [43] Vinod, P., Laxmi, V., Gaur, M. and Chauhan, G. (2012) MOMENTUM: Metamorphic Malware Exploration Techniques Using MSA Signatures. IIT, Abu Dhabi, UAE, March, pp. 232– 237. IEEE. [44] Leder, F., Steinbock, B. and Martini, P. (2009) Classification and Detection of Metamorphic Malware Using Value Set Analysis. MALWARE, 2009, Montreal, QC, Canada, October, pp. 39–46. IEEE. [45] Ghiasi, M., Sami, A. and Salehi, Z. (2012) Dynamic Malware Detection Using Registers Values Set Analysis. Information Security and Cryptology, Tabriz, Iran, September, pp. 54–59. IEEE.



[31] Madenur Sridhara, S. and Stamp, M. (2013) Metamorphic worm that carries its own morphing engine. J. Comput. Virol. Hacking Tech., 9, 49–58. [32] Rad, B., Masrom, M. and Ibrahim, S. (2012) Opcodes Histogram for Classifying Metamorphic Portable Executables Malware. ICEEE, Lodz, Poland, September, pp. 209–213. IEEE. [33] Runwal, N., Low, R.M. and Stamp, M. (2012) Opcode Graph Similarity and Metamorphic Detection. J. Comput. Virol., 8, 37–52. [34] NGVCK. Next generation virus generation kit, http://vxheaven. org/vx.php?id=tn02 (accessed November 7, 2014). [35] G2. Second generation virus generator, http://vxheaven.org/ vx.php?id=tg00 (accessed November 7, 2014). [36] Faruki, P., Laxmi, V., Gaur, M.S. and Vinod, P. (2012) Mining Control Flow Graph as API Call-Grams to Detect Portable Executable Malware. Security of Information and Networks, SIN’12, New York, NY, USA, October, pp. 130–137. ACM SIGSAC. [37] Toderici, A. and Stamp, M. (2012) Chi-squared distance and metamorphic virus detection. J. Comput. Virol., 9, 1–14. [38] Lee, J., Jeong, K. and Lee, H. (2010) Detecting Metamorphic Malwares Using Code Graphs. SAC, 2010, New York, NY, USA, March, pp. 1970–1977. ACM.

Annotated Control Flow Graph for Metamorphic Malware Detection

Annotated Control Flow Graph for Metamorphic Malware Detection

Suggest Documents

Metamorphic Malware Detection using Control Flow Graph Mining

Metamorphic Malware Detection using Control Flow Graph Mining

control Flow Graph Based Multiclass Malware detection using Bi ...

Sliding window and control flow weight for metamorphic malware

Metamorphic Malware Detection Using Linear Discriminant

Unknown Metamorphic Malware Detection: Modelling with ... - IAENG

Unknown Metamorphic Malware Detection: Modelling with ... - IAENG

Flow Based Algorithm for Malware Traffic Detection

Semantic Malware Detection by Deploying Graph Mining

Boolean Signatures for Metamorphic Malware - Science Direct

Boolean Signatures for Metamorphic Malware

Short Review on Metamorphic Malware Detection in ... - IJARCSSE

AJcFgraph - AspectJ Control Flow Graph Builder for

Control Flow to Detect Malware - Nnt.es

Capturing System-wide Information Flow for Malware Detection and ...

Normalizing Metamorphic Malware Using Term Rewriting - CiteSeerX

Detecting Metamorphic Malware by using Behavior ...

Contextual Weisfeiler-Lehman Graph Kernel For Malware

Malware detection based on dependency graph using hybrid ...

MARD: A Framework for Metamorphic Malware Analysis and Real ...

A Survey on Malware and Malware Detection

Constructing Control Flow Graph for Java by Decoupling Exception ...

A Java Based System for Specifying Hierarchical Control Flow Graph ...

Sound Control-Flow Graph Extraction for Java Programs ... - CSC - KTH