A Model for Security Vulnerability Pattern Hyungwoo Kang1, Kibom Kim1, Soonjwa Hong1, and Dong Hoon Lee2 1
National Security Research Institute, 161Gajeong-dong, Yuseong-gu, Daejeon, 305-350, Korea {kanghw, kibom, hongsj}@etri.re.kr 2 Center for Information Security Technologies(CIST), Korea University, Seoul, 136-704, Korea
[email protected]
Abstract. Static analysis technology is used to find programming errors before run time. Unlike dynamic analysis technique which looks at the application state while it is being executed, static analysis technique does not require the application to be executed. In this paper, we classify security vulnerability patterns in source code and design a model to express various security vulnerability patterns by making use of pushdown automata. On the basis of the model, it is possible to find a security vulnerability by making use of Abstract Syntax Tree (AST) based pattern matching technique in parsing level. Keywords: Static analysis, Software security, Buffer overflow, Abstract Syntax Tree (AST), Pushdown Automata (PDA).
1 Introduction Static analysis is extremely beneficial to small and large businesses alike, although the way they are deployed and used may be different. They help keep development costs down by finding bugs as early as possible in the product cycle, and ensure the final application has far fewer exploitable security flaws. Static analysis tools are very good at code level discovery of bugs and can help enforce coding standards and keep code complexity down. Metrics can be generated to analyze the complexity of the code to discover ways to make the code more readable and less complex. Over the half of all security vulnerabilities, the buffer overflow vulnerability is a single most important security problem. The classic buffer overflow is a result of misuse of string manipulation functions in the standard C library. An example of buffer overflow resulting from misusing strcpy() is shown in Figure 1.
1 char dst[256]; 2 char *s = read_string(); 3 strcpy(dst, s); Fig. 1. Classic buffer overflow M. Gavrilova et al. (Eds.): ICCSA 2006, LNCS 3982, pp. 375 – 384, 2006. © Springer-Verlag Berlin Heidelberg 2006
376
H. Kang et al.
The string s is read from the user on line 2 and can be of arbitrarily long. The strcpy() function copies it into the dst buffer. If the length of the user string is greater than 256, the strcpy() function will write data past the end of the dst array. If the array is located on the stack, a buffer overflow can be used to overwrite the return address of a function and execute codes specified by the attacker [1]. However, the mechanics of exploiting software vulnerabilities are outside the scope of this work. A number of static analysis techniques have been used to detect specific security vulnerabilities in software. Most of them are not suitable for large scale software. In this paper, we propose a new mechanism being able to detect security vulnerability patterns in large scale source codes by making use of compiler technologies. This paper is organized as follows. Chapter 2 reviews related researches. In chapter 3, a new mechanism for checking security vulnerability patterns is introduced. The implementation and experiment on proposed mechanism are showed in chapter 4. Finally, in chapter 5, we draw conclusions.
2 Related Works Previous static analysis techniques can be classified into the following two types: lexical analysis based approach and semantic based approach. 2.1 Lexical Analysis Based Approach Lexical analysis technique is used to turn the source codes into a stream of tokens, discarding white space. The tokens are matched against known vulnerability patterns in the database. Well-known tools based on lexical analysis are Flawfinder [2], RATS [3], and ITS4 [4]. For example, these tools find the security vulnerability of strcpy() by scanning security-critical source codes in Figure 1. While these tools are certainly a step up from UNIX utility grep, they produce a hefty number of false positives because they make no effort to account for the target code’s semantics. A stream of tokens is better than a stream of characters, but it’s still a long way from understanding how a program will behave when it executes. To overcome the weakness of lexical analysis approach, a static analysis tool must leverage more compiler technology. By building an abstract syntax tree (AST) from source code, such a tool can take into account the basic semantics of the program being evaluated. Lexical analysis based tools can be confused by a variable with the same name as a vulnerable function name, but AST analysis will accurately distinguish the different kinds of identifiers. On the AST level, complicated expressions are analyzed, which can reveal vulnerabilities hidden from lexical analysis based tools. 2.2 Semantic Based Approach BOON [5] applies integer range analysis to detect buffer overflows. The range analysis is flow-insensitive and generates a very large number of false alarms. CQUAL [6] is a type-based analysis tool that provides a mechanism for specifying and checking properties of C programs. It has been used to detect format string vulnerability. SPLINT [7] is a tool for checking C program for security vulnerabilities and programming mistakes. It uses a lightweight data-flow analysis to verify assertions about
A Model for Security Vulnerability Pattern
377
the program. Most of the checks performed by SPLINT require the programmer to add source codes annotations which guide the analysis. An abstract interpretation approach [8, 9] is used for verifying the AirBus software. The basic idea of abstract interpretation is to infer information on programs by interpreting them using abstract values rather than concrete ones, thus, obtaining safe approximations of programs behavior. Although this technique is a mature and sound mathematical approach, the static analysis tool which uses abstract interpretation can’t scale to larger code segments. According to the testing paper[10], target program has to be manually broken up into 20-40k lines-of-code blocks to use the tool. So, we need a static analysis approach to handle large and complex programs. Microsoft set up the SLAM [11, 12, 13] project that uses software model checking to verify temporal safety properties in programs. It validates a program against a well designed interface using an iterative process. However, SLAM does not yet scale to very large programs because of considering data flow analysis. MOPS [14, 15] is a tool for verifying the conformance of C programs to rule that can be expressed as temporal safety properties. It represents these properties as finite state automata and uses a model checking approach to find if any insecure state is reachable in the program. MOPS, however, isn’t applicable to various security rules because of considering order constraint only. A lot of security vulnerability can not be expressed by temporal safety properties. Therefore, a new static analysis technique that is possible to express various security properties is needed.
3 Mechanism for Checking Security Vulnerability Pattern In this chapter, we propose a mechanism for checking security vulnerability patterns in source codes. The mechanism uses an AST based pattern matching and PDA in order to find vulnerability pattern in target source codes. 3.1 Problem There are lots of security vulnerability patterns in source codes causing system crash or illegal system privilege acquisition. We classify the vulnerability patterns in source level into following 3 types. Vulnerability pattern type 1 Type 1 is the simplest pattern having only one function, such as strcpy() or strcat(). Figure 1 is a typical example for vulnerability pattern type 1. This vulnerability type is easily detected by lexical analysis tools which use pattern matching for single token in code level. In this case, these tools can detect the security vulnerability using strycpy() as a token. Vulnerability pattern type 2 Type 2 is a pattern providing order constraint which is expressed by more than two tokens, such as function names. For example, Figure 2(a) shows an insecure program having pattern type 2. A setuid-root program should drop root privilege before execut-
378
H. Kang et al.
ing an untrusted program. Otherwise, the untrusted program may execute with root privilege and therefore compromise the system. The pattern type 2 can be detected by MOPS or SLAM by making use of model checking technique using finite state automata (FSA) to checking temporal safety property. Figure 2(b) shows simplified FSA describing setuid-root vulnerability.
1 // The program has root privilege 2 if ((passwd = getpwuid(getuid())) != NULL) { 3 fprintf(log, “drop priv for %s”, passwd->pw_name); 4 setuid(getuid()); // drop privilege 5} 6 execl(“/bin/sh”, “/bin/sh”, NULL); // risky syscall (a) setuid-root program
unpriv
setuid()
priv
execl()
error
(b) A FSA describing setuid-root vulnerability Fig. 2. Security vulnerability pattern type 2
Vulnerability pattern type 3 Type 3 is the most complex pattern including function names, operators, and variables. Figure 3 shows an example of MS windows Bof vulnerability published in 2003[16, 17]. The program is a typical program having vulnerability pattern type 3.
1 // buffer copy while the condition is TRUE 2 while ( *id2 != ‘\’) 3 *id1++ = *id2++; // buffer copy Fig. 3. Example of security vulnerability pattern type 3
The program in Figure 3 copies from a buffer id2 to another buffer id1 while the condition on line 2 is true. There is no consideration about the size of target buffer while buffer copy. There is no problem in the program at ordinary times. However, buffer overflow happens when an attacker fills the buffer id2 with no ‘\’ character up to size of target buffer id1. The Bof vulnerability in Figure 3 can’t be detected by previous model checking tool, such as MOPS and SLAM, because these tools consider order constraints only. But the source codes may have the pattern type 3 vulnerability in real environment. Therefore, a new mechanism being able to detect vulnerability pattern type 3 is needed.
A Model for Security Vulnerability Pattern
379
We call an expressional safety property to detect vulnerability pattern type 3. An expressional safety property dictates the set of various security-relevant operations including function names, operators, and variables. 3.2 Model of Security Vulnerability Pattern We introduce a new model for expressional safety property in order to detect pattern type 3 including various tokens such as functions, operators, and variables. The model is based on parse tree (AST) which is a structural representation of sentences (expressions) in program source code.
S E1 (condition)
while id2
E2 E
E
!=
‘
letter (\)
‘
*
id1
E
=
*
++
id2
++
Fig. 4. Parse tree describing the program in Figure 3
S while
E1 (condition)
No consideration about size of target buffer(id1)
E2 Buffer copy(id2=>id1) Pattern (1) *id1++ = *id2++; Pattern (2) id1[i++] = id2[j++]
Fig. 5. Simplified parse tree for original tree in Figure 4
Figure 4 shows a parse tree which is output of parsing for the program in Figure 3 having Bof vulnerability. The S stands for a sentence and the E stands for an expression. We can simplify the parse tree to an abstracted model, such as Figure 5. The E1 means that there is no consideration about the size of target buffer id1. Any security vulnerability doesn’t exist when there is a consideration about the size of target buffer id1 in E1. Therefore, we simply check whether there is a target buffer id1 in E1 or not.
380
H. Kang et al.
E2 means that there is a copy statement from source buffer id2 into the target buffer id1. We can check whether there is the pattern (1) or (2) in E2 or not. The Parse tree can be recognized by context-free Grammar (CFG) which is type 2 grammar in Chomsky Hierarchy. A CFG takes a PDA to recognize context-free language. That means we need to construct PDA in order to recognize the security vulnerability pattern type 3. Figure 6 shows a PDA to recognize the parse tree in Figure 5.
PDA M = (Q, ¦, *, G, q0, Z, F), where Q={q0, q1, q2, q3, q4, q5, q6, q7, q8}, ¦={while, id1, id2, *, =, ++, [, ]}, *={Z, id1, id2}, F={q8} G(q0, while, Z) = {(q1, Z)}, G(q1, id2, Z) = {(q2, id2Z}} G(q2, *, id2) = {(q3, id2)}, G(q3, id1, id1) = {(q0, H)} G(q3, id1, id2) = {(q4, H)}, G(q4, ++, Z) = {(q5, Z)} G(q5, *, Z) = {(q6, Z)}, G(q6, id2, Z) = {(q7, Z)} G(q7, ++, Z) = {(q8, H)} Fig. 6. PDA M being able to recognize simplified parse tree in Figure 5
The reason why PDA is needed to recognize security vulnerability pattern type 3 is that FSA does not have stack storage. Stack is used to store the variable id2 in E1 and to extract the variable id2 in E2 at Figure 4. To check the vulnerability in the program, we need to check whether there exists the id1 in E1 when E2 is processed. Therefore, PDA having stack storage is needed. The program in Figure 3 has vulnerability because there is no variable id1 in condition of while loop at Figure 4. That means the PDA M using stack storage recognizes the program in Figure 3. After all, a target program has a vulnerability violating expressional safety property when the program reaches a final state (that is an error state) of PDA. We model the expressional safety property as a PDA M. Then, we use AST based pattern matching to determine whether a state violating the expressional safety property is reachable in the target program. We can check for various security properties by making use of AST based pattern matching technique in parsing level. This approach determines in compile time whether there are any security vulnerability patterns in target program that may violate a security property. So, proposed mechanism is suitable for checking security property for large scale software. Our model can express not only vulnerability pattern type 3 but also vulnerability pattern type 1 and 2 which can be expressed by FSA. According to Chomsky Hierarchy, a set of context-free language recognized by PDA includes a set of regular language recognized by FSA. 3.3 Formal Expression We present a formal mechanism for checking expressional safety property. Let ∑ be the set of security-relevant operations. Let B⊆∑* be a set of sequences of security operations that violate the security property (B stands for bad). An expression e∈∑*
A Model for Security Vulnerability Pattern
381
will represent a sequence of operations expressed in target program. Let E⊆∑* denote the set of all feasible expressions, extracted from all statements of the program (E stands for expression). The problem is to decide if E∩B is empty. If so, then the security property is satisfied. If not, then some expressions in the program may violate the security property. In the above model, B and E are arbitrary languages. First, we showed that B, the set of sequence of security operations that violate the expressional safety property, is a context-free language. The reason why B is not a regular language is that pushdown automaton is needed to recognize various vulnerability patterns including functions, operators, and variables. Especially, temporal storage, such as stack, is used to memorize the variables. We show that most expressional safety properties can be described by context-free languages (see Sections 3.2). Since B is a context-free language, there exists a PDA M that accepts B (M stands for model); in other words, B = L(M). We need to show if L(M)∩E is empty. Since E is the set of all feasible expressions in target program, we have to check whether the set E is accepted by a PDA M. If the PDA M accepts the E of feasible expression in target program, there is a security vulnerability in the target program violating the security property defined in advance. According to automata theory, there are efficient algorithms to determine if a language is accepted by a PDA[18]. Hence we obtain a means to verify whether the security property is satisfied by the program. We are guaranteed that E∩B = E∩L(M). Consequently, if E∩L(M) is empty, we can conclude that E∩B is also empty, hence the program definitely satisfies the security property; in contrast, if L(M)∩E is non-empty, then we can only say that E∩B is non-empty, hence the program may not satisfy the security property. This means that our analysis is sound and there is no false negative in proposed mechanism. However, in case of making use of inappropriate expressional safety property, a false positive could be happened in proposed mechanism. So, it is very important to make use of well-defined expressional safety property in proposed mechanism. 3.4 Identification of Vulnerability The Figure 7 shows a concrete process recognizing security vulnerability based on proposed mechanism. The problem is to check whether the target program in Figure 3 violates the security property having a buffer copy without consideration about size of id1, id1/H
Start q0
q1 while, Z/Z
q8
++, Z/H
q2 id2, Z/ id2Z
q7
id2, Z/Z
q6
*, id2/ id2 *, Z/Z
q3
id1, id2/H q4
q5
++, Z/Z
Fig. 7. Process of identification for security property modeled by PDA M in Figure 5
382
H. Kang et al.
target buffer. In this problem, the set of security operations is ∑ = {while, id1, id2, *, =, ++, [, ]}. The set B⊆∑*, the sequences of security-relevant operations that violate the security property, is accepted by the PDA M shown in Figure 6. The set E⊆∑*, the feasible expressions of the program in Figure 3, is E = {[while, id2, *, id1, ++, =, *, id2, ++], [….] …}. According to Figure 3, the sequence [while, id2, *, id1, ++, =, *, id2, ++] in E is accepted by PDA M. Therefore, we find that E∩L(M) ≠∅, or in other words, we can recognize the existence of an expression in the target program which violates the security property. This indicates the presence of security vulnerability.
4 Implementation and Experiment In this chapter, we present the implementation and results of experiment on proposed mechanism. 4.1 Implementation The implemented tool consists of a parser and a model checker. The parser takes a C source codes as a target program and outputs its AST. The model checker takes the AST and PDA describing an expressional safety property, and decides if any expression in the target program may violate the security property. If so, the tool reports these expressions. The Figure 8 shows a brief architectural overview of implemented tool which checks security vulnerability in source code. Our tool is implemented as a module in CIL[19], a front end for C written in OCaml[20]. CIL parses C codes into an AST format and provides a framework for performing passes over this AST.
Security Vulnerability Model
Model Checker OCAML Language
Target Source code
Parse Tree (AST)
Parser: CIL Library
Vulnerable Code Results
Fig. 8. Architecture of implemented tool for checking expressional safety property
4.2 Result of the Experiment We make an experiment on 2 source programs in which have security vulnerability shown in Figure 3. Target programs are sample source codes including critical security vulnerability in WebDav[16](Figure 9) and RPCSS Service[17](Figure 10) respectively. The second vulnerability in RPCSS Service is famous as Blaster Worm. We model the security vulnerability in 2 target programs to PDA M shown in Figure 6. The codes of Bold and italic type in Figure 9 and 10 have pattern of security vulnerability which is modeled to PDA M. The implemented tool succeeds in detecting security vulnerability in target programs.
A Model for Security Vulnerability Pattern
383
long PathName(long nBufferLength, char *Buffer2){ while ( *Buffer2 && !IS_HAVE_SEPERATOR(*Buffer2) ) { if ( *Buffer2 == ';' ) { Buffer2++; *Buffer1++ = *Buffer2++; } else *Buffer1++ = *Buffer2++; } } Fig. 9. First target source codes having security vulnerability in WebDav int GetComputerName( char *InputBuffer, char MachineName[MAX_LENGTH] ){ char *Buffer2 = InputBuffer + 2; while ( *Buffer2 != ';' ) *Buffer1++ = *Buffer2++; } Fig. 10. Second target source codes having security vulnerability in RPCSS (Blaster Worm)
5 Conclusion Static analysis is a proven technology in the implementation of compilers and interpreters. Recently, it has begun to see application of static analysis techniques in novel areas such as software validation and bug checking in software. We conclude by summarizing the main contributions of our work: • We classify the security vulnerability patterns into 3 types. Type1 is simple pattern having suspicious single token. Type 2 has more than 2 tokens being able to consider temporal safety properties. Type 3 is the most complex pattern being able to consider various expressional safety properties. • A model expressing various security vulnerability patterns is provided. The model is used to express various expressions related security property when the proposed mechanism checks a target program for security vulnerability. All of the vulnerability pattern types can be expressed by our model. There was no attempt to model these vulnerability types. • A mechanism based on formal language theory is proposed. The mechanism provides an infrastructure being able to check a target program for security properties which is defined by users in advance. • We implemented proposed mechanism as a tool for checking expressional safety property. As we mentioned in the section 4.2, the tool showed excellent results in detecting security vulnerability. The proposed mechanism has several advantages: • It is sound because it can reliably detect all bugs of the specified properties. • It can check for various security vulnerability patterns by making use of PDA. • It is scalable due to using AST based pattern matching technique in parsing level.
384
H. Kang et al.
References 1. 2. 3. 4. 5.
6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19.
20.
Aleph One: Smashing the stack for fun and profit. Phrack 49-14 (1996) D. A Wheeler: Flawfinder. ttp://www.dwheeler.com/flawfinder/. RATS. http://www.securesw.com/rats/. J. Viega, J. T. Bloch, T. Kohno and G. McGraw: ITS4: A static vulnerability scanner for C and C++ code. ACM Transactions on Information and System Security 5(2) (2002). D. Wagner, J. S. Foster, E. A. Brewer and A. Aiken: A first step towards automated detection of buffer overrun vulnerabilities. In Network and distributed system security symposium, 3–17. San Diego, CA (2000) J. Foster: Type qualifiers: Lightweight specifications to improve soft-ware quality. Ph.D. thesis, University of California, Berkeley (2002) D. Evans: SPLINT. http://www.splint.org/. B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Mine, D. Monniaux and X. Rival: A Static Analyzer for Large Safety-Critical Software (2003) Abstract interpretation. http://www.polyspace.com/downloads.htm (2001) M. Zitser, R. Lippmann, T. Leek: Testing Static Analysis Tools using Exploitable Buffer Overflows from Open Source Code, pp.97-106, SIGSOFT'04 (2004) T. Ball, R. Majumdar, T. Millstein, and S. Rajamani: Automatic predicate abstraction of C programs. PLDI. ACM SIGPLAN Not. 36(5) (2001), 203–213. T. Ball, A. Podelski, and S. Rajamani: Relative completeness of abstraction refinement for software model checking. TACAS (2002), LNCS 2280, Springer, 158–172. T. Ball and S. Rajamani: The SLAM project: debugging system software via static analysis. 29th ACM POPL (2002), LNCS 1254, Springer, 72–83. H. Chen and D. Wagner: MOPS: an infrastructure for examining security properties of software. In Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS), Washington, DC (2002) H. Chen, D. Wagner, and D. Dean: Setuid demystified. In Proceedings of the Eleventh Usenix Security Symposium, San Francisco, CA (2002) Microsoft Security Bulletin MS03-007. http://www.microsoft.com/technet/security/ bulletin/MS03-007.mspx. Microsoft (2003) Microsoft Security Bulletin MS03-026. http://www.microsoft.com/technet/security/ bulletin/MS03-026.mspx. Microsoft (2003) J. Hopcroft and J. Ullman: Introduction to automata theory, languages, and computation. Addison-Wesley (1979) G. C. Necula, S. McPeak, S. P. Rahul and W. Weimer: CIL:Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of CC 2002: 11’th International Conference on Compiler Construction. Springer-Verlag, Apr. 2002. D. R´emy and J. Vouillon: Objective ML: An effective object-oriented extension of ML. Theory and Practice of Object Systems, 4(1):27–52, 1998.