Practical Summary-Based Semi-simulation to Detect Vulnerability in

0 downloads 0 Views 262KB Size Report
the parameter a is tainted by scanf. III. .... function scanf or fread, cannot be replaced by any other ..... Consider a map which is implemented by a C++ library. We.
2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11

LoongChecker: Practical summary-based semi-simulation to detect vulnerability in binary code Shaoyin Cheng1, Jun Yang1, Jiajie Wang2, Jinding Wang1 and Fan Jiang1 1

Information Technology Security Evaluation Center, University of Science and Technology of China Hefei, 230027, P. R. China 2

China Information Technology Security Evaluation Center Beijing, 100085, P. R. China

{sycheng, fjiang}@ustc.edu.cn, {yangjun1, jdwang}@mail.ustc.edu.cn [email protected] Abstract—The automatic detection of security vulnerabilities in binary code is challenging and lacks efficient tools. This paper presents a novel semi-simulation approach to statically detect potential vulnerabilities in binary code. The semisimulation approach simulates address related instructions accurately using value set analysis, and only traces data dependence on other instructions using data dependence analysis. We have implemented this approach on a tool called LoongChecker, and evaluate it on three real world programs, and detect three known vulnerabilities and two zero-day vulnerabilities. The results show our approach is practical and can be applied to large real world software. Keywords-Semi-simulation; static analysis; binary code; vulnerability detection; taint analysis; function summary

I.

Currently, fuzzing [11-12] remains the main and efficient way in detecting vulnerabilities in binary code. Like dynamic analysis, it has low code coverage and is very inefficient in exploring paths. To hackers, discovering a single vulnerability in well known software may become a great victory. However, to a software company or a security assurance department, no matter how many vulnerabilities have been detected, they need the confidence to assure that seldom will new vulnerabilities be discovered. Recently, a powerful constraint based technology called dynamic symbolic execution has been applied to both source code and binary code [2-7]. Symbolic execution is much more efficient than fuzzing, considering a simple example with a branch “if x == 50”; random fuzzing may need to take 2 32 times to hit this branch, while symbolic execution needs only one. Though powerful, dynamic symbolic execution has the path explosion problem, which combers its application to large scale programs (e.g., conclusion in SAGE [7] ). In contrast, flow-sensitive static analysis has high code coverage but may provide too many false positives. Previous work in static analysis of binary code focus mainly on information recovery (such as Reps’s work [13-14], and others [15-16]), or alias analysis [17-20]. Only few work have focused on bug finding or vulnerability detection [8-9]. To distinguish from them, our work does not aim at detecting vulnerabilities directly; instead, we want to combine it with dynamic symbolic execution, and thus both problems of the two can be solved. Flow-sensitive static analysis is used widely in industrial level bug finding or vulnerability detection of source code. However, when applying to binary code, it may encounter the following challenges: x Lack of information. The type and structure of variables, function information, logics of original programs have been totally lost. x Indirect control transfers. Programs written with “switch” statements and designed in an object oriented manner are prone to contain indirect jumps and indirect calls (or call them indirect control transfers), which cannot be statically determined. x Alias. Alias problem is much more serious in binary than in source code, for all variables in source code are assessed by address in binary code. x False positive. It is much easier to rule out false positives in source code than in binary code. In one respect, the logics of program are relatively straightforward to follow in source code, while understanding logics of binary code lacking so

INTRODUCTION

As security has become an important issue these days, large efforts have been taken to develop automatic tools to find bugs and vulnerabilities [1-9]. There is a big demand for a practical tool, which uses flow-sensitive static analysis to detect vulnerabilities in binary code. In one respect, directly analyzing binary code has advantages over source code; in another respect, other technologies, such as dynamic tainting, fuzzing, and dynamic symbolic execution, may have problems when applying to large scale real world programs. Compared to source code, directly analyzing binary code benefits from the following aspects: x To an end user, the source code of software is often not available. If not given the assurance of its safety, these software cannot be used in security-sensitive areas. x Commercial off-the-shelf (COTS) components and DLLs lack source code, which may largely lower the accuracy of source code level analysis on applications using these components. x Compiler errors or optimizers may cause a WYSINWYX phenomenon: “What You See Is Not What You eXecute”. Thus, source code analysis cannot find low-level problems. x While source code level tools need to support dozens of kinds of languages, binary level ones only need to support several platform-specific ones. Dynamic analysis of binary code, such as dynamic tainting [1, 10], has caught the attention of the security community for its effectiveness. However, the path coverage of dynamic technology is highly dependent on the inputs which drive the execution. If inputs are not well chosen, dynamic technologies may have low coverage and may thus miss important errors. 978-0-7695-4600-1/11 $26.00 © 2011 IEEE DOI 10.1109/TrustCom.2011.22

150

much information is quite difficult. In another respect, developers of the software under test will be called to do this work, while in binary code, it is often the hackers’ work. These two make false positives hard to be ruled out statically. To overcome these challenges, we apply the following existing technologies: x We make use of the decompile information from the state of the art industrial tool IDAPro [21] and Hexrays [22]. Moreover, we design a type inferring system to recover possible type information. x We use an accurate Value Set Analysis (VSA) [23] to find out as many as possible addresses to solve both indirect control transfers and alias problems. x We do not make static analysis the sole technology to detect vulnerabilities; instead, we combine it with other technologies. Currently, we’ve already combined it with fuzzing to convict the potential vulnerabilities we’ve found. In the future, we will combine it with dynamic symbolic execution. Moreover, to distinguish our work from the previous ones, we make the following contributions: x We propose a novel approach called semi-simulation to build summaries of binary code. The semi-simulation consists of an accurate VSA algorithm to trace variable addresses and an inaccurate Data Dependence Analysis (DDA) algorithm to trace data flow dependence. Thus, our approach is not only accurate in solving indirect control transfers and alias problems, but also efficient when applying to large programs. Moreover, the semi-simulation is based on summary, which can greatly speed up our analysis. x We apply a lazy taint checking technology to detect potential vulnerabilities. We do not use taint lattice to trace whether a variable can be tainted by an unfiltered input; instead, we trace the possible data dependence from the entry point. In an early taint checking technology, most time will be wasted on tainting between registers; our approach only changes taint states (taint states in our approach are actually data dependence) at memory store points. Furthermore, accurate ordered results of tainting can be gained for we not only know whether a variable is tainted, but also know from which and which kind of source it is tainted, such as inputs from files, uninitialized parameters, and inputs from users interfaces. To those who want to use other lattices besides taint lattice, our approach is attractive, for we solely trace the data dependence, and thus extensions based on our work need little change. x We implement our approach on a practical tool called LoongChecker, and have already used it to verify three known vulnerabilities and discovered two zero-day vulnerabilities. The primary results not only show the efficiency but also the effectiveness of our tool. In a word, we use a context-sensitive, flow-sensitive abstract interpretation and dataflow analysis, which is called semisimulation, to build summaries, and then check the taint property based on summaries. Our approach is neither sound nor precise, however practical, to detect vulnerabilities in binary code. The rest of this paper is organized as follows: Section II gives an overview of LoongChecker and an example which will be used several times through this paper. Section III describes

the general approach, including the basis for our analysis, intraprocedural analysis, interprocedural analysis and taint checking. The core algorithms of VSA and DDA will be described in intraprocedural analysis. Section IV evaluates LoongChecker on three real world programs. Section V discusses the limitations of LoongChecker. Section VI presents related work. Section VII concludes.

II.

OVERVIEW

This section gives an overall organization of the LoongChecker, the framework of which is shown in Figure 1. It is built upon two industrial level tools: IDAPro [21] and Binnavi [24]. Database Decompile Information

Binary Code

Check Taint

Ordered Results Configure

Parse Binary

Summary

Build Summary

Build CFGs

Generate REIL

IDA Pro

Binnavi

Generate eREIL Loong Checker

Figure 1. The framework of LoongChecker. A binary code, is first parsed by IDAPro to build Control Flow Graphs (CFGs) and then decompiled by Hexrays [22] to get decompile information (e.g., variable type and function information). Binnavi then uses IDAPython to import CFGs from an .idb file, which is generated by IDAPro, to a database. After that, it translates disassembled assembly code to REIL. Actually, our tool LoongChecker is running as a plugin of Binnavi. The translations from assembly to REIL and from REIL to eREIL are demand driven: it does not translate all the functions in a module at once; instead, it translates each function lazily. A simple source code example is shown in Figure 2(a), and the corresponding eREIL of function foo is shown in Figure 2 (b). eREIL is almost the same with REIL, except some details suitable for static analysis [25]. For example, the “cmp” instruction of assembly code is translated into several lines of REIL to compute eflags. This is needed for dynamic simulation; however, to static analysis, we do not gain any benefit but lose the simplicity to collect constraints. Another example is the arithmetic instructions. After eREIL is generated, LoongChecker uses intraprocedural analysis and interprocedural analysis to build summaries. It then checks taints on these summaries, and finally gives a list of ordered potential vulnerabilities to analyzers. Consider a simple example in Figure 2. First, the main function is set as the entry point. The building summary process begins and semi-simulation is used to simulate the current function main. Semi-simulation backups the program state before each call instruction. When encountered a system call or a standard library call, semi-simulation replaces the current program state with the return state of library summaries (these

151

1: int gi = 100; 2: int foo ( int a, char* b){ 3: int c = a; 4: char buf[8]; 5: if ( c < gi ) { 6: strcpy(buf, b); 7: }else{ 8: c = a + gi; 9: } 10: return c; } 11: 12: 13: 14: 15: 16: 17: 18: 19:

int main(){ int a = 0; char b1[1024] ; scanf("%d", &a ) ; scanf("%s", b1 ) ; if ( foo(a, b1) > 0 ){ malloc( foo(a, b1) ); } return 1; }

(a) A source example.

code

1: ldm 0x407030, t0 2: str t0, eax 3: sub esp, 0x8, t0 4: str t0, esp 5: sub esp, 0x4, t0 6: str t0, esp 7: stm esi, esp 8: add 0x10, esp, t0 9: ldm t0, t1 10: str t1, esi 11: jge esi, eax, larger 12: add 0x14, esp, t0 13: ldm t0, t1 14: str t1, eax 15: add 0x4, esp, t0 16: str t0, ecx 17: sub esp, 0x4, t0 18: str t0,esp 19: stm eax, esp 20: sub esp, 0x4, t0 21: str t0, esp 22: stm ecx, esp 23: sub esp, 0x4, t0 24: str t0, esp 25: stm strcpy, esp 26: call strcpy … larger: 27: add esi, eax, t0 28: str t0, eax …

; mov eax, [407030] ;gi

Entry

; sub esp,8 A 1-11

; push esi

; mov esi, [esp+10]

;a

; cmp esi,eax ; jge short larger ;mov eax, [esp+14]

;b

;lea ecx, [esp+4]

;buf

;push eax

;b

;push ecx

;buf

B 12-26

C 27-28

Exit

(c) CFG of (b).

E1-8 E1-4 E1

;call strcpy

buf E2-420

E1+4 return addr E2-41C a E1+8 E2-418 b E2-414 E2

; add eax,esi

;c = a + gi

(d) The stack layout before the first instruction in (b).

(b) eREIL of function foo in (a). Figure 2. A simple example. Define Precondition. A map before a call instruction, mapping summaries are manually simulated). When encountered a call to from UIV in callee’s summary to the current value of its relative user defined functions, semi-simulation checks whether the variable in caller. summary of the callee has been built before replacing the current If the callee is not a user defined function, such as system or program state. Here, as the summary of foo has not been built, standard library functions, precondition is a list of parameters of the process of building the summary foo begins. The process of the callee. These functions are simulated manually. building summary stops if all the summaries of user defined In the general approach (shown in Figure 3), we will only callees have been recursively built. The process of checking show the fundamental elements of our analysis. It mainly taint begins after summaries have been built. At each call site, consists of two phases: a summary building phase to build the backup program state is replaced using variables from the module summary based on intraprocedural analysis and caller. The parameter b at line 6 in foo is replaced by the interprocedural analysis and a taint checking phase to detect variable b1 in main. Then, the parameters are checked for vulnerabilities based on summaries. In intraprocedural analysis, whether they have been tainted. As b1 is tainted by scanf, strcpy backward slicing is first performed on the CFG of a function, at line 6 is reported as a potential vulnerability. Using the same which selects instructions that should use VSA from those that technology, an integer overflow error is detected at line 17, for only need to trace the data flow. In interprocedural analysis, the parameter a is tainted by scanf. most UIVs are resolved by top-down UIV replacements. III. GENERAL APPROACH Indirect control transfers are solved by performing In this section, a general approach will be presented. First of intraprocedural analysis again [26]. Thus, a module summary all, the definition of UIV and precondition will be given, for from a specific entry point is built. A taint checker then checks ease of description. all possible sinks based on summaries. Results are finally shown Define UIV. Unknown Initial value. in an ordered manner. UIVs are used to represent memory blocks accessed but not Termination should be assured considering two factors: initialized in the current function in previous work [18]. In our loops and mutual recursive or self recursive functions. To loops, approach, we treat all unknown values as UIVs (e.g., registers). a simple algorithm [18] is applied. To recursive functions, a Furthermore, each UIV is represented using an uppercase break-replace summary-based approach [26] is applied. symbol, e.g., UIV in eax is represented by EAX.

152

Binary Code

Build CG and CFG

Intraprocedural

Interprocedural

Ordered Results

Backward Slice

Resolve Indirect

Check Taint

SemiSimulation

Resolve UIVs

Summary

Build Summary

tuple to present a global variable and offset is set to 0 if not accessed by offset. x Heap. Variables in Heap region are called heap variables. In c/c++, their addresses are often generated by a library function “malloc” or a “new” operator. Thus, their addresses cannot be statically decided. We use a tuple to represent a heap variable, in which id is used as an identifier to a heap variable. x Register. In a high level view, registers are a set of temporary variables which perform computation for memory related variables. We represent register variable directly using its register name. For example, we use eax to represent a normal register eax; we use t1 to represent a temporary register t1 (in eREIL, we use, e.g., eax, to indicate that the length of eax is 8 bit; for simplicity, we ignore the length in our description).

Check Taint

Figure 3. General Approach.

A. Basis For convenience, we will first define an intermediate language called eREIL (extended Reverse Engineering Intermediate Language) [27].

1) eREIL

3) Constructing Type Inferring System

The language definition of eREIL is showed in Figure 4. The instruction set of eREIL can be divided into four groups: aop to perform arithmetic operations and bop to perform bitwise operations; dop to transfer data, including stm, str and ldm; jop to transfer controls, such as call, jmp, and jl. eREIL transfers data from left to right. For example, “add t0,eax,t1”, takes t0 and eax as sources, and t1 as the target.

Define Value. The possible values that each variable may contain during the abstract interpretation are treated as a value in our type inferring system. Define Type. A type is introduced to each value, to perform the execution correctly. Our type system is weak typed, for type information has been lost on assembly level, that most types are inaccurate inferences of our analysis. There are five types of values: x Undefined value υ. There are two kinds of undefined value: 1) in interprocedural analysis, the UIVs of the entry procedure cannot be replaced by any other value; 2) inputs from external environment, e.g., inputs got from library function scanf or fread, cannot be replaced by any other value either. To improve the accuracy, undefined value has several subtypes, such as Entry_Value, File_Input. x UIV μ. As already mentioned, UIV is an unknown value in the current function and should be replaced during interprocedural analysis. x Address value α. Address value is a pointer, which may refer to all kinds of memory variables. An address value may become a function pointer, when used as the target operand of jop group instructions. In semi-simulation, an address value is represented by a value set. x Normal value η. A normal value is represented by a dependence list in semi-simulation. x Immediate value ι. All the integers encountered in assembly are first treated as immediate values.

program ::= instr* instr ::= aop var, var, var | bop var, var, var | dop var, var | jop var ::= reg< num > | t num < num > | num aop ::= add | sub | mul | div | mod | bsh bop ::= and | or | xor dop ::= ldm | stm | str jop ::= call var | jmp var | cjmp var, var, var

Figure 4. Definition of eREIL.

2) Abstract Model Define Variable. All memory points and registers are treated as variables. Define Base value. The base value is a UIV, which can represent the initial value of a stack pointer (e.g., esp or ebp). For example, the first instruction of a function is “sub esp,4”. Using ESP as the base value, the value of esp after this instruction is ESP-4. Thus, all the addresses of local variables can be expressed by . There are five kinds of variables in our abstract model: x Stack. Variables in Stack region are those called local variables. As it’s not possible to decide the accurate value of esp or ebp statically, we refer to local variables by its offset to the current base value. Given a symbol to represent the base value, such as BASE, variables accessed by the form of “BASE-n” are treated as local variables of the current function; those by “BASE+n” are treated as parameters passed from the caller. A tuple is used to represent a local variable, in which func represents the current function. x Global. Variables in Global region are called global variables, which are referenced directly by address. There are two ways to access a global variable: by an immediate number or by an offset to an immediate number. We use a

B. Intraprocedural analysis Intraprocedural analysis is the core part of building summaries. During it, backward slicing is first used to divide the instructions of the function under analysis into two sets. Then semi-simulation uses VSA on one set and DDA on the other to summary up the transformer of this function. The semisimulation is introduced and implemented using the type inferring system we’ve constructed above.

1) Backward Slice The algorithm of backward slicing (shown in Figure 5(a)) we’ve applied is quite simple. It traverses over the CFG of a given function func in a backward order. This is a work list based algorithm. Each block maintains a set S of memory related registers (we call S register set). In other words, these registers may contain memory addresses, and thus VSA is used to

153

accurately trace them on these instructions in semi-simulation. The rn operand represents a register, a regular one or a temporary one tn. The en operand represents a register or a number. To jop group instructions, only the target address operand will be considered here, such as the first operand in “jmp eax”, and the third operand in “jg t0, t1, eax”. Using Figure 2(b) as an example to go through, its CFG (shown in Figure 2(c)) consists of only three normal nodes except the Entry and Exit nodes. Each normal node contains the start and end of the line number. The results of performing Algorithm 1 on normal nodes are shown in Table 1. As the example is relatively simple, the S of each block is esp or temporary register t0. All the instructions dealing with addresses are included in R. Supposing the current block is A, Algorithm 1 first merges

Table 1.The results of Algorithm 1 on Figure 2(c). b

S(before)

S(after)

R

C

ϕ

ϕ

ϕ

B

ϕ

esp

12,15,17,18,20,21,23,24

A

esp

esp

3,4,5,6,8

all register sets from children in Merge_Slice. The register set of B is {esp}, and that of C is ϕ, thus S of A is {esp}. Then, during Backward_Slice of A, each instruction of A will be traversed in a backward order. To “stm e1, r2”, r2 is an address; to “str r1, r2”, r1 is an address if r2 is; to “ldm r1, e2”, r1 is an address; to “jop r3” or “jop e1, e2, r3”, r3 is an address; to “aop e1, e2, r3” or “bop e1, e2 r3”, ,e1 and e2 are addresses if r3 is.

Algorithm 1 Backward Slice (cont.)

Algorithm 1 Backward Slice

Algorithm 2 Inferring Types

Backward_Slice( func ){ 1: W ← ϕ 2: b ← exit block of func 3: W ← WĤ{b} 4: while (W != ϕ){ 5: b ← the first node in W 6: W ← W\{b} 7: Merge_Slice( b ) 8: Backward_Slice ( b ) 9: foreach b’s parent p{ 10: W ← WĤ{p} }//for }//while } Merge_Slice( b ){ // a set of memory related registers of p 11: S ←ϕ 12: foreach b’s child c{ 13: S ← S ĤS’ // S’ is c’s register set } }

Backward_Slice ( b ){ 14: foreach b’s instruction i backward { 15: switch( i ){ 16: stm e1, r2: 17: S ← S Ĥ{r2} 18: if e1 is register and e1ę S 19: then R ← RĤ{i} 20: str r1, r2: 21: if r2ę S then{ 22: S ← S \{r2}Ĥ{r1} 23: R ← RĤ{i} } 24: ldm r1, e2: 25: S ← S Ĥ{r1} 26: if e2 is register and e2ę S 27: then R ← RĤ{i} 28: jop r3: jop e1, e2, r3: 29: S ← S Ĥ{r3} 30: aop e1, e2, r3: bop e1, e2, r3: 31: if r3ę S then { 32: if e1 is register 33: then S ← S \{r3}Ĥ{e1} 34: if e2 is register 35: then S ← S \{r3}Ĥ{e2} 36: R ← R Ĥ{i} }//if }//switch }//for }

Infer_Types( i ){ 1: foreach i’s operand ek { 2: if ek is a number 3: then vk : τk ← ek : ι 4: else if ek is a register 5: then if ek is a UIV then{ 6: I ← IĤ{Ek} 7: v k : τk ← E k : μ }//if }//for 8: switch( i ){ 9: stm e1, e2: 10: v2: τ2 ← v2: α 11: if e2 not ęΟ 12: then Ο ← Ο Ĥe2 13: *e2 ← v1: τ1 14: str e1, e2: 15: v2: τ2 ← v1: τ1 16: ldm e1, e2: 17: v1: τ1 ← v1: α 18: if e1 not ęΟ then { 19: v1: τ1 ← *e1 20: v2: τ3 ← v1: τ1 } 21: bsh e1, e2, e3: 22: v3: τ3 ← v1Δ v2: τ1 23: aop e 1, e2, e3: bop e1, e2, e3: 24: v3: τ3 ← v1Δ v2: τ1Θ τ2 }//switch }

(a) The algorithm of backward slice. (b) The algorithm of inferring types. Figure 5. The algorithm of backward slice and inferring types.

passing parameters by registers. Thus, all the variables except the local variables with negative offsets will be included in the return state. As we only present a general approach in this section, other issues, such as sink pattern, are not considered. Besides the three elements needed for summary, a variable tuple Ο, is maintained during the semi-simulation of the current function. Π, Λ, Ν, Η stand for the table which contains register, local, global and heap variables, respectively.

2) Structure of Summary The structure of summary consists of three fundamental elements: x Input Table Ι. The input table contains all UIVs, including all of the unknown memory variables and register variables. x Call Table Γ. The call table contains all invocations to callees in the current function. For later tracing use, each element in this table contains information to locate the call instruction. For simplicity, we use i to represent all the information here. M represents the precondition of the call instruction. x Return State Σ. In binary code, it’s hard to tell which memory variables and register variables in the return state of callee will be used by its caller, considering alias and

3) Inferring Types The operands in assembly can either be a register or be a number. After building the type system, the algorithm of inferring different types is needed (described in Figure 5(b)).

154

In the algorithm, we use vk : τk to represent the value and type of the operand ek. For simplicity, accesses to a variable by address e2 is simplified by *e2. For the same reason, adding a new variable to the variable table is simplified by “Ο ← Ο Ĥe2”, which ignores the detail of which table among the four (Π, Λ, Ν, Η) will e2 be added to. We give a partial order of the casting Θ between two types: α> η> υ> μ> ι (e.g., if type α meets type η, the result will be type α). The symbol Δ on two values here represents operations in the semi-simulation. The algorithm first figures out all the immediate values and UIVs of the current instruction i. Then, according to the type of i’s opcode, different actions will be taken. To “stm e1, e2”, it first tries to get the variable from Ο by address e2. If not found, a new variable will be added to one of the tables Π, Λ, Ν, Η. It then assigns the value of the variable with e1. To “str e1, e2”, it directly assign the value of e2 with the one of e1. To “ldm e1, e2”, it decides whether the variable with address e1 exists in Ο. If not, it ignores this operation. To the aop and bop group instructions, the casting between types is decided by a partial order. The “bsh e1, e2, e3” instruction is special, in which the type of e3 is decided solely by e1.

We call it “semi-simulation” for the reason that instructions not related to address are not really simulated; instead, only the data dependence is traced. We first give a framework of semi-simulation in Figure 6(a). This is a forward work list based algorithm. Each block maintains a tuple Ο to record variables. We use an operator Φ to represent the merging between two Ο. Semi-simulation uses VSA to simulate the address manipulations, and uses DDA to trace the data dependence. VSA and DDA only differ in the operator Φ and Δ in the framework. Below, we will introduce them respectively.

a) VSA The Value Set Analysis (VSA) technology applied by Reps et al. [13-14] is mainly used to recover variable and structure information. Although they intend to find bugs [9] accurately based on their analysis platform, their approach is quite finegrained and thus time consuming. For practical purpose, we take a simple approach in implementing VSA. Each memory point is recognized as a variable, and memory points accessed by an offset to an address are grouped into a structure. We also do not recognize arrays, which may have little impact on our analysis. The algorithm of VSA is shown in Figure 6(b). Each variable maintains a value set, which in the algorithm is represented as Vn. We introduce a new operator Ξ to represent the aop or bop group operations. Since it is relative easy, we will not explain it.

4) Semi-Simulation Define semi-simulation. During the abstract interpretation, instructions related to address are simulated in an accurate way, while others are only cared for the possible data dependence. Algorithm 3 Semi Simulation

Algorithm 4 VSA

Algorithm 5 DDA

Semi_Simulation( func ){ 1: W ← ϕ 2: b ←entry block of func 3: W ← WĤ{b} 4: while (W != ϕ){ 5: b ← the first node in W 6: W ← W\{b} 7: Merge_Simulation( b ) 8: Semi_Simulation ( b ) 9: foreach b’s child c{ 10: W ← WĤ{c} }//for }//while } Merge_Simulation( b ){ 11: Ο ← ϕ 12: foreach b’s parent p { 13: Ο ← Ο Φ Ο’ //Ο’ is the tuple maintained by p } } Semi_Simulation ( b ){ 14: foreach b’s instruction i { 15: Infer_Types( i ) } }

Φ ( Ο1, Ο2 ){ 1: foreach Ο2’s variable v2 { 2: if v2 ę Ο1 then { 3: foreach V2’s value u { //Vn is the value set of vn 4: V1 ← V1 Ĥ{u} }//for }//if 5: else copy v2 to Ο1 }//for } Δ ( v1, v2, Ξ ){ 6: V3 ← ϕ 7: foreach V1’s value u1 { 8: foreach V2’s value u2 { 9: V3 ← V3 Ĥ {u1 Ξ u2} }//for }//for 10: return V3

Φ ( Ο1, Ο2 ){ 1: foreach Ο2’s variable v2 { 2: if v2 ę Ο1 then { 3: foreach L2’s dependence d { // L n is the dependence list of vn 4: L1 ← L1 Ĥ{d} }//for }//if 5: else copy v2 to Ο1 }//for } Δ ( v1, v2, Ξ ){ 6: L3 ← ϕ 7: if τ1 == ι then L1 ← {C} 8: if τ2 == ι then L2 ← {C} // C is a symbol to represent immediate value 9: foreach L1’s dependence d1 { 10: L 3 ← L 3 Ĥ{d1} }//for 11: foreach L2’s dependence d2 { 12: L 3 ← L 3 Ĥ{d2} }//for 13: return L3 }

(a) The framework of semi-simulation.

}

(b) The algorithm of VSA.

(c) The algorithm of DDA.

Figure 6. The algorithm of semi-simulation. After introducing DDA, we have finished explaining all the algorithm of semi-simulation. For ease of understanding, the semi-simulation will apply to Figure 2(b). The results of performing semi-simulation on Node A are shown in Table 2.

b) DDA Unlike VSA, DDA only traces possible data dependences. The algorithm is shown in Figure 6 (c). Each variable maintains a list of possible dependences, which is represented as L n. We use C to represent an immediate value.

155

Table 2.The results of performing semi-simulation on Node A in Figure 2(c). Instruction 1

VSA

2 3

4 5 6 7

8 9

The algorithm of translating a value set into a dependence list is easy, i.e. simply translating each UIV in each value to one dependence in a dependence list. If the value which represents an address is translated into a dependence list, accesses of this address will be ignored (e.g., “ldm t0,t1” will not be simulated if t0 is a dependence list).

DDA Π:t0 Ν: I: : μ t0 = {: μ} Π: eax eax= {: μ}

C. Interprocedural analysis Considering the call from caller f to callee g, the interprocedural analysis involves two phases: constructing the precondition of each invocation and resolving g’s return state so that f can continue to build its summary.

Π: esp I: ESP BASE=ESP t0 = [(ESP-8) : μ] esp = [(ESP-8) : μ] t0 = [(ESP-12) : μ] esp = [(ESP-12) : μ]

The core part of both phases is how to resolve UIVs, while other issues are trivial.

1) Resolving UIVs

Π: esi I: ESI Λ: = {ESI : μ} esp =[(ESP-12) : α]

The most difficult part is mapping from UIV to its relative local variable, for mapping of other variables is straightforward. Now go back to the example in Figure 2(a). Suppose we have just simulated the call instruction at line 6. We are now before the first instruction in Figure 2(b). The current stack layout is shown in Figure 2(d). Using E1 and E2 to represent the initial value of the callee and caller respectively, we can clearly get the equation “E1=E2-420”. To be universal, we use H to replace “420”; thus, E1=E2+H (*). We use to represent local variables in the callee, < E2, O2> in the caller, respectively. If the two refer to the same address, another equation E1+O1=E2+O2 (**) can be gained. Replace E1 in ** using *, then we get O2= O1+H.

t0 = [(ESP+4) : α]

10

Λ: < ESP,4> I: < ESP,4>: μ t1 = {< ESP,4> : μ} esi ={ < ESP,4> : μ}

c) Combining VSA with DDA Problems seem to be perfectly solved by performing VSA and DDA separately on two different instruction sets. However, this is the ideal situation. The practical situation is:

Thus, if we know the offset (H) of esp to its base value before entering the callee, the offset of each local variable in the callee can be easily mapped to the offset of the relative variable in the caller by adding it with H. Resolving return state is simple. UIVs in the tuple Ο (ignore local variables with negative offsets) are replaced using their relative values in the caller.

x DDA needs VSA. As aforementioned, we apply a backward slicing before semi-simulation. However, for efficiency, the algorithm is quite simple: it only traces registers and it does not use interprocedural slicing. The alias problem thus appears. Considering that eax and ebx both point to the same variable, eax is used as the target address of stm instruction while ebx not. Thus, instructions related to eax will use VSA while ones related to ebx will use DDA. The interprocedural example is that a parameter is never used in the caller as address, but may be used in the callee as address. x VSA needs DDA. Consider the situation that each value in the value set has a complex expression. Or consider that the value set is very large. Computing these values will consume quite a lot of time. x VSA may meet DDA. Consider that one operand maintains a value set and the other maintains a dependence list. We apply a trade-off method to solve these problems: x DDA does not start directly with a dependence list. It first maintains a value set which restricts to no more than two values, and each value is an expression which contains no more than one UIV (e.g., one UIV plus an immediate value). If these two conditions are violated, DDA translates these values into a dependence list. x VSA restricts the value set to no more than 16 and the expression of each value can contain no more than three UIVs. For VSA is used to compute addresses, in most situations, these restrictions may not be violated. If violated, these values are then translated into a dependence list. x If a dependence list meets a value set, the value set will be translated into a dependence list.

2) Resolving addresses and indirect control transfers We apply a break-replace algorithm [26] to solve these problems. We only introduce indirect calls here. If a call instruction i’s target address A cannot be decided during semi-simulation, the current function f is broken into two pieces: one before the call instruction and one after. The semisimulation begins after the call instruction as it is a new function f’. During interprocedural analysis, the target function g of i is decided if A can be resolved into an immediate value. After resolving the return state of g, we then replace All UIVs in f’ by the values resolved from g’s return state.

D. Taint Checking Taint checking consists of two phases: replacing preconditions and checking sink points. Before taint checking, some basic elements should be included: x Source recognition. In the process of building summaries, we have not mentioned the source recognition. However, source recognition happens in both intra- and interprocedural analysis. In intraprocedural analysis, a call instruction to an IO input function is recognized as a source point. Taints are brought in through parameters or the return value. In interprocedural analysis, the parameters of the

156

x Vulnerability pattern. There are two important patterns. The first one is the buffer overflow pattern: 1) there is a loop, 2) there is an iterator, which compares itself with a const number, 3) there are write operations from the iterator to the target, 4) the begin address of the iterator is tainted. The second one is the integer overflow caused buffer overflow pattern: 1) the length parameter of “malloc” or “new” is a tainted variable with a constant, 2) dangerous points or buffer overflow patterns use the return value of “malloc” or “new” as the target address.

entry point and other uninitialized variables are recognized as tainted. x Sink pattern. Sink points are sometime difficult to recognize. To a dangerous system or standard library function, the recognition is trivial (simply by its function name). However, some dangerous functions are translated into several lines of assembly code by a compiler or optimizers. For example, the function strcpy in Figure 2(a) is often translated into several lines of assembly code. Even manually written code may contain “dangerous points”. We design several sink pattern search engines to find potential dangerous points. x Library simulation. It is easy to trace into library functions, for our analysis is based on binary code and we already have the binary code of libraries. However, we have not analyzed library functions for the following reasons: first, we only care about taints, while most libraries, except ones related to IO, will not introduce new taints or remove old ones; second, our analysis only traces data dependences and thus is not precise. In other words, simulations according to api documents will be more precise than our analysis.

Table 3. The evaluation on three real world programs.

Module Entry Func MaxDep ASMIns eREILIns SumT CheckT Memory1 VulNum Detected?

1) Replacing and Checking After building summaries, each function summary contains an input table, a map of preconditions at each call site, and a return state. Replacing preconditions and checking taints can be performed simultaneously in a top-down manner. Beginning at the entry point, all the preconditions of each callee of the entry point are replaced using values from the entry point. Recursively repeating this process, all the preconditions of each callee on CG will be replaced by values passing from its caller. At the same time, a taint checking is performed immediately after each replacement at sink points. For example, at a sink point strcpy in Figure 2(a), its parameters will be checked whether they are tainted or not. As one parameter b is tainted in main by scanf at line 15, this sink point will be reported as a stack overflow vulnerability. Another potential error is an integer overflow at line 17. From the return state of foo, we can infer that the return value of foo depends on a and gi. As a is tainted by scanf at line 14, an integer overflow may happen by setting a to a specific value. In this example, a + 100 will overflow to 1 if setting a to 0xFFFFFF9C.

IV.

Serenity player Serenity.ex e Sub_40127 0 86 6 5.0K 26.5K 49s38ms 15ms 19.6M 1/8

Fox Player

Kingsoft Office Writer

FoxPlayer.e xe __tmainCR TStratu 274 19 20.0K 103.8K 34s188ms 93ms 33.0M 1/27

DocReader.dl l _dr_CreateSo urce3EX 1261 22 48.6K 237.1K 65s906 ms 94ms 77.1M 0/42

Ĝ

Ĝ

TypeCo re.dll Sub_488AFE 20 4046 62 384.9K 1832.3K 623s203ms 15s844ms 94.0M 3/103

Ĝ

B. Verifying Known Vulnerabilities The results of the evaluation are shown in Table 3. For our analysis is summary based, the function number, ASM instructions and eREIL instructions are all counted for only once. The max depth is counted when the callee’s summary has not been built. Thus the real counts are much larger than these. Below, we will explain the three vulnerabilities respectively. x Serenity player [28]. The vulnerability is discovered in sub_404870. Using our tool, we find that the taint is introduced in by fopen at 0x40497F. Then it is used as the source of sscanf at 0x4049AC. For the length of the source is not limited, any length of the source can be copied to the target, and thus this causes a stack overflow. x FoxPlayer [29]. The vulnerability is discovered in sub_40DE00. The taint is from ReadFile at 0x40DFEE. Like serenity, it is used as the source of sscanf at 0x40E01C, and for the same reason, it causes a stack overflow. x Kingsoft Office Writer [30]. The vulnerability is discovered in sub_4864E8C0. The taint is introduced from another library numberfmt in sub_485E9CB0. The taint is then passed to sub_4862B7F0. From 0x48523DF1 to 0x48523DFE, this piece of code matches the buffer overflow pattern: 1) it is a loop, 2) cx is an iterator, and it compares itself with 0, 3) there is a write from cx to [edx+eax], 4) eax is tainted, which is tainted from the library numberfmt. If inputs from the relative file section do not contain 0, a stack overflow will take place. We discover the other two vulnerabilities in a similar way with [30], which are both discovered by the buffer overflow pattern.

EVALUATION

In this section, we will evaluate our tool LoongChecker on three real world applications. Before that, some implementation details will be presented.

A. Implementation Details We have implemented our approach on a tool LoongChecker, which consists of more than 30k LOC in java. There are several details that we should pay attention to: x Taint checking. We eliminate taint checking on call sites with the same precondition. This optimization saves nearly 20% of taint checking time (50% in most). x Calls between libraries. As LoongChecker is based on Binnavi, which saves all information in database, dealing with a call to another DLL file is straightforward: simply loading the information of that DLL from database.

1

157

Memory here represents the max memory consumption, including memory taken up by Binnavi during analysis.

V.

VI.

DISCUSSION

RELATED WORK

Although LoongChecker has shown that it is powerful to detect vulnerabilities, several limitations cumber its ability.

There are two pieces of work most similar to our work: Guo’s [18] and Reps’ [9, 13-14, 32-33].

x Convicting Vulnerabilities. It is the most serious problem. Xu et al.’s work [31] shows a suitable way to solve this problem. Currently, we use a fuzzing tool to randomly generate inputs and manually choose those who can reach a given dangerous point. Then we manually trace the parameters of this point back to its source to see whether it is controllable. This process is quite inefficient and makes our work of convicting vulnerabilities difficult. In future, we need a dynamic tool to trace data flow from source to sink point dynamically, for manually tracing is both boring and heavy handed. Our tool already provides a call string from source to sink. However, it provides all the possibilities which may be hard to follow in dynamic debugging. We also need a convicting tool like [31], which can provide a detailed source. In practice, most times, even if you have found the source of a sink point in memory, you still cannot decide whether it is controllable from its final source (e.g., from a file). x False positive. As aforementioned, false positive is the biggest problem of flow-sensitive static analysis. Currently, we are trying two ways to mitigate this problem. First, we collect simple constraints on inputs and dangerous points. In other words, the current return state will be replaced by a map from a constraint to its relative return state. Also, constraints will put upon dangerous points, so that only tainted inputs which satisfy the constraint of a dangerous point will be reported. We do not solve these constraints in a constraint solver, and instead, we solely compare them in a simple way. Second, we prune branches which may be useless. It is hard to decide whether a branch is use or useless, and we make it configurable so that users can give a branch pattern which they believe useless. Both the two have not been implemented in LoongChecker currently. As our final target is to combine our current work with dynamic symbolic execution, false positives can be largely pruned by dynamic symbolic execution. x Library simulation. We only simulate some IO and memory related library functions. In most situations, it is enough; sometimes, however, this may cause big problems. Consider a map which is implemented by a C++ library. We do not simulate the library operations on this map, such as “put” and “remove”. Thus, a key-value pair added into this map is ignored by our analysis and if the value appears again later, we treat it as an undefined value. x Entry point. For a more accurate analysis, the entry point should be well chosen. If choosing a quite upper function as the entry point, our analysis may provide lots of false alarms, for largely widening of results. Moreover, for message based applications, such as most GUI applications, the control flow is hard to trace; choosing an upper function may soon lead to the termination of the analysis. However, if choosing a bottom function, the results may be useless for lacking of taint sources. We leave the entry point configurable too.

Although we use a summary structure similar to Guo’s, ours still differs from theirs in the following aspects: x SSA. They use a SSA (Static Single Assignment) form. SSA can ease analysis in several ways: input recognition, expression propagation and ease of use-define analysis [16]. However, recognizing input is simple; Expression propagation is less useful for we only use DDA for most instructions; although use-define information plays an important role in our analysis, our analysis needs only one pass of analysis, while there will be two if adding SSA, plus the time of translating out of SSA. x UIV. They do not treat registers as UIVs. This will make the analysis quite inaccurate if some parameters are passing by registers. In contrast, we treat all unknown values as UIVs. x Address parameter. They assume that there are no address parameters. This assumption makes their work quite limited in practice. Moreover, this is the most difficult part of summary based approaches. We deal with indirect jumps, indirect calls and variables with unknown addresses, which are all caused by address parameters, using a break-replace algorithm [26]. Reps’ work focuses on the recovery of variables and structures, in order to provide a platform for further analysis. We differ from them in the following ways: x VSA. They use intervals to approximate array-like structures and improve efficiency, while we do not recognize arrays and use a simple implementation of VSA. We find this inaccuracy have little impact on detecting vulnerabilities. x Predicts. In their recent work [9, 34], they use a meet-overall-paths (MOP) solution instead of a maximal-fixedpoint (MFP) one to increase the accuracy. Their approach is similar to ESP [35] and Fischer et al.’s [36], both of which are considered not practical by us. ESP selects paths related to the properties they are interested in. It is useful when checking a specific property, but may encounter path explosion problem when checking many. Fischer et al.’s, however, may be too expensive to refine predicates. Summary-based approaches have long been used in static analysis [37-40]. While most of these approaches are available in source code, few work have been done in binary code [8, 18, 34, 41]. Cova et al.’s work [8] is based on Kruegel’s [41], using a symbolic execution technology which ignores alias information.

VII. CONCLUSION This paper presents a novel semi-simulation approach to statically detect potential vulnerabilities in binary code. A tool called LoongChecker has been implemented and evaluated on real world programs. The results are promising. In the future, we’ll develop an assistant dynamic tool and apply our approach to more real world programs.

158

[22] Hexrays. http://www.hex-rays.com/ [23] G. Balakrishnan, R. Gruian, T. Reps, and T. Teitelbaum, "CodeSurfer/x86 - A platform for analyzing x86 executables," Compiler Construction, Proceedings, vol. 3443, pp. 250-254, 2005. [24] Binnavi. http://www.zynamics.com/binnavi.html [25] J. Yang, S. Cheng, J. Wang, and F. Jiang, "eREIL: An intermediate representation for static analysis of binary code," technical report. [26] J. Yang, S. Cheng, J. Wang, and F. Jiang, "A break-replace algorithm for summary based static analysis of bianry code," technical report. [27] T. Dullien and S. Porst, "REIL: A platform-independent intermediate representation of disassembled code for static code analysis," ed: CanSecWest, 2009. [28] Serenity Player Exploit. http://www.exploitdb.com/exploits/10226/ [29] FoxPlayer exploit. www.exploit-db.com/exploits/11333/ [30] Kingsoft Office Writer exploit. www.exploitdb.com/exploits/12710/ [31] Z. Lin, X. Zhang, and D. Xu, "Convicting exploitable software vulnerabilities: An efficient input provenance based approach," in Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008, pp. 247-256. [32] T. Reps, G. Balakrishnan, and J. Lim, "Intermediaterepresentation recovery from low-level code," 2006, p. 111. [33] G. Balakrishnan, T. Reps, D. Melski, and T. Teitelbaum, "Wysinwyx: What you see is not what you execute," Verified Software: Theories, Tools, Experiments, pp. 202213, 2008. [34] D. Gopan and T. Reps, "Low-level library analysis and summarization," in CAV 2007, 2007, pp. 68-81. [35] M. Das, S. Lerner, and M. Seigle, "ESP: path-sensitive program verification in polynomial time," presented at the Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, Berlin, Germany, 2002. [36] J. Fischer, R. Jhala, and R. Majumdar, "Joining dataflow with predicates," presented at the Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, Lisbon, Portugal, 2005. [37] P. Cousot and R. Cousot, "Static determination of dynamic properties of recursive procedures," in IFIP Conference on Formal Description of Programming Concepts, 1977, p. 237. [38] P. Cousot and N. Halbwachs, "Automatic discovery of linear restraints among variables of a program," in 5th ACM POPL, 1978, pp. 84-96. [39] T. Reps, S. Horwitz, and M. Sagiv, "Precise interprocedural dataflow analysis via graph reachability," 1995, pp. 49-61. [40] M. Sharir and A. Pnueli, "Two approaches to interprocedural data flow analysis," Program Flow Analysis: Theory and Applications, p. 189¨C234, 1981. [41] C. Kruegel, et al., "Automating Mimicry Attacks Using Static Binary Analysis," presented at the USENIX Security Symposium, 2005.

ACKNOWLEDGMENTS This research was supported in part by the Fundamental Research Funds for the Central Universities of China (No. WK0110000007).

REFERENCES [1]

J. Newsome and D. Song, "Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software," 2005. [2] C. Cadar, D. Dunbar, and D. Engler, "Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs," 2008. [3] C. Cadar, et al., "EXE: Automatically Generating Inputs of Death," Acm Transactions on Information and System Security, vol. 12, Dec 2008. [4] P. Godefroid, N. Klarlund, and K. Sen, "DART: Directed automated random testing," Acm Sigplan Notices, vol. 40, pp. 213-223, Jun 2005. [5] D. M. Koushik Sen, Gul Agha, "CUTE: A concolic unit testing engine for C," 13th ACM SIGSOFT 2005. [6] D. Song, et al., "BitBlaze: A New Approach to Computer Security via Binary Analysis," Information Systems Security, Proceedings, vol. 5352, pp. 1-25 307, 2008. [7] P. Godefroid, M. Levin, and D. Molnar, "Automated whitebox fuzz testing," in Proceedings of Network and Distributed Systems Security (NDSS 2008), 2008. [8] M. Cova, V. Felmetsger, G. Banks, and G. Vigna, "Static detection of vulnerabilities in x86 executables," in Proceedings of the Annual Computer Security Applications Conference (ACSAC), 2006, pp. 269-278. [9] G. Balakrishnan and T. Reps, "Analyzing stripped devicedriver executables," in TACAS, 2008, pp. 124-140. [10] J. Clause, W. Li, and A. Orso, "Dytan: a generic dynamic taint analysis framework," 2007, p. 206. [11] V. Ganesh, T. Leek, and M. Rinard, "Taint-based directed whitebox fuzzing," 2009, pp. 474-484. [12] M. Sutton and A. Greene, "The art of file format fuzzing," 2005. [13] G. Balakrishnan and T. Reps, "Analyzing memory accesses in x86 executables," in Proc. Int. Conf. on Compiler Construction, 2004, pp. 2732-2733. [14] G. Balakrishnan and T. Reps, "Divine: Discovering variables in executables," in VMCAI, 2007, pp. 1-28. [15] L. Harris and B. Miller, "Practical analysis of stripped binary code," ACM SIGARCH Computer Architecture News, vol. 33, p. 68, 2005. [16] M. Van Emmerik, "Static Single Assignment for Decompilation," PhD theses at the University of Queensland, 2007. [17] B. De Sutter, et al., "On the static analysis of indirect control transfers in binaries," in PDPTA, 2000, pp. 10131019. [18] B. Guo, et al., "Practical and accurate low-level pointer analysis," in 3nd Int. Symp. on Code Gen. and Opt., 2005, pp. 291-302. [19] D. Brumley and J. Newsome, "Alias analysis for assembly," Carnegie Mellon University, School of Computer Science Technical Report CMU-CS-06-180, 2006. [20] S. Debray, R. Muth, and M. Weippert, "Alias analysis of executable code," in Princ. of Prog. Lang., 1998, pp. 12-24. [21] IDAPro. http://www.hex-rays.com/idapro/

159