Internet Explorer fails to due to buffer overflow. Impact h ..... E.g., Source code, library APIs ..... Derived from the control flow graph (CFG) of a program unit.
Background Mitigating Program Security Vulnerabilities: Approaches and Challenges
Programs are accessible to both legitimate users and hackers Ideally a program should conform to expected behaviors Ideally, Provide desired outputs based on supplied inputs
H. Shahriar and M. Zulkernine S h l off Computing School C t Queen’s University, Kingston, Canada http://cs.queensu.ca/~shahriar
In practice, a program’s behavior is unexpected Program crashes or leaks sensitive information
For ACM SAC 2011 Attendees Only
© Shahriar and Zulkernine, 2011
© Shahriar and Zulkernine, 2011
Unexpected program behavior
2011, ACM SAC-2
Information deletion
A very long URL in the address bar of Microsoft Internet Explorer (6.0) (Source: CVE 2006-3869) Cause Ca > Internet Explorer fails to due to buffer overflow
Impact > With h a specially ll crafted f request, an attacker k can execute arbitrary b code
“The existence of an SQL injection attack on the RIAA's site came to light via social network news site … hackers turned the site into a blank slate, among g other things.” g
Source: Common Vulnerabilities and Exposures (CVE) http://cve.mitre.org
“The RIAA has restored RIAA.org, although whether it's any more secure than before remains open to question” - January 2008 Source: http://www.theregister.co.uk/2008/01/21/riaa_hacktivism/
© Shahriar and Zulkernine, 2011
2011, ACM SAC-3
© Shahriar and Zulkernine, 2011
2011, ACM SAC-4
More list of incidents
Information modification
Common Vulnerabilities and Exposures (CVE) http://cve.mitre.org Open Source Vulnerability Database (OSVDB), http://osvdb.org Web-Application pp Security y Consortium (WASC) http://www.webappsec.org/projects/whid/ “The The Web sites of Australia Australia'ss two major political parties contain cross-site cross site scripting (XSS) flaws” - October 2007 Source: http://www.builderau.com.au/news/soa/XSS-flaw-makes-PM-say-I-want-top d a a n a a a ay an suck-your-blood-/0,339028227,339282682,00.htm © Shahriar and Zulkernine, 2011
2011, ACM SAC-5
> > > >
Attack method Outcome Location …
© Shahriar and Zulkernine, 2011
What is wrong in these programs?
2011, ACM SAC-6
Tutorial objectives
These programs are developed and tested by the brightest engineers in the industry
Provide an overview of common program security vulnerability exploitations (Part I)
Some interesting aspects might be missing
Introduce mitigation techniques
Program security vulnerability mitigations in pre and post d l deployment t phases h Ensure that programs cannot be exploited with malicious inputs
© Shahriar and Zulkernine, 2011
2011, ACM SAC-7
Program security testing (Part II) Static analysis (Part III) Monitoring (Part IV) Hybrid H b d analysis l (Part (P t V)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-8
Part I: Program security vulnerability Vulnerabilities are flaws in programs that allow attackers to expose or alter sensitive information (Dowd et al. 2007) Exploitation of vulnerabilities are attacks
Buffer overflow (BOF) Specific flaws in implementations that allow attackers to overflow data buffers Consequence of BOF attacks Program g am state a corruption p n Program crash due to overwriting of sensitive neighboring variables such as return addresses
Common examples of vulnerabilities Buffer Overflow (BOF) Format String Bug (FSB) SQL Injection (SQLI) Cross Site Scripting (XSS)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-9
© Shahriar and Zulkernine, 2011
Buffer overflow example
2011, ACM SAC-10
Format string bug (FSB) Invocation of format functions with non-validated user supplied inputs which contain format strings [3]
1. void foo (int a) { 2. int var1; 3. char buf [8]; strcpy 4. strcpy(buf, (buf,“AAAAAAAAAAAAAA”); “AAAAAAAAAAAAAA”); … return; }
E.g., E format f t ffunctions ti off ANSI C standard t d d library lib
Consequences Arbitrary reading and writing to format function stacks
C code snippet of foo function (adapted from [2]) Sensitive neighboring variables
buf [0] … Special classes of bugs that may cause security breaches
How these bugs are introduced in programs?
> Program states and response messages are used to identify the presence or absence of attacks
> Implementation language features y APIs (ANSI C library) y > Library > Inputs (network protocol data unit)
Test cases are run against implementations The presence of vulnerabilities are confirmed by comparing program states or outputs with known malicious states or outputs under attacks
Test coverage Does a test suite reveal specific vulnerabilities? Example coverage of program security testing
> A ttestt suite it should h ld reveall all ll BOF and d SQLI vulnerabilities l biliti
© Shahriar and Zulkernine, 2011
2011, ACM SAC-33
Classification of security testing approaches* T t case generation ti method th d Test Source of test case Test case input type
© Shahriar and Zulkernine, 2011
2011, ACM SAC-34
Classification feature: Test case generation method Example techniques Fault injection Attack signature Mutation analysis Search-based
We will discuss more on these techniques later
*Shahriar and Zulkernine 2011, ACM CSUR © Zulkernine and Shahriar, 2011
2011, ACM SAC-35
© Shahriar and Zulkernine, 2011
2011, ACM SAC-36
Classification feature: Source of test case
Classification feature: Test case input type
f h are used for f generating test cases Artifacts that
b d on data d types and d vulnerabilities l bl It varies based
Program artifacts
y programs p g Data received by
Exploiting BOF involve generating strings of particular lengths Exposing FSB requires strings containing format specifiers
Vulnerable APIs (ANSI C library functions) Runtime states (registers or variable values)
Vulnerabilities
Inputs
A stored XSS requires two URLs (storing a malicious script and downloading the script)
Valid files, pprotocol data units (PDUs)
Environments File systems, systems networks, networks and processors
© Shahriar and Zulkernine, 2011
2011, ACM SAC-37
© Shahriar and Zulkernine, 2011
2011, ACM SAC-38
Test case generation: Fault injection
Test case generation T t case generation ti method th d Test
l used test case generation techniques One off the most widely
Fault injection Attack signature g Mutation analysis Search-based
Suitable for black box-based testing Widely y used to discover BOF and FSB vulnerabilities
The objective Corrupt C pt input i p t data and a d variables a iabl Supply to executable programs Observe unexpected responses to conform vulnerabilities
Fault injection can be divided into three types based on the target g of corruption p Input Environment Program g am state tate © Shahriar and Zulkernine, 2011
2011, ACM SAC-39
© Shahriar and Zulkernine, 2011
2011, ACM SAC-40
Test case generation: Fault injection (cont.)
Test case generation (cont.) T t case generation ti method th d Test
Fault injection in Input data Malform valid data structure into an invalid one E.g., E g lexical structure modification (replace a non printable character with a printable character) E.g., semantic structure modification (replace the date format “mmddyyyy” mmddyyyy to “yyyyddmm”) yyyyddmm )
Fault injection Attack signature g Mutation analysis Search-based
Fault injection in environment
Modify M dif the th attributes attrib t off pr program gram inputs i p t at different diff r t execution ti stages tag E.g., during program execution, delete a file or toggle file permissions
F lt iinjection Fault j ti in i program state t t
Test whether program code can handle malformed states or not E.g., data variables that control program execution flow
© Shahriar and Zulkernine, 2011
2011, ACM SAC-41
Test case generation: Attack signature Wid l used d in i security it testing t ti off web-based b b d programs Widely p p p g Replace some parts of normal inputs with attack signatures Signatures are developed from the experience and vulnerability reports E.g., http://www.xyz.com/login.php?name=john&pwd=secret SQLI: http://www.xyz.com/login.php?name=‘ p y g p p a or 1=1&pwd= &p d XSS: http://www.xyz.com/login.php?name=“>...&pwd=
Two ways to traverse web pages
2011, ACM SAC-42
Test case generation: Attack signature (cont.) R t and d response-based b d method th d Request
Requests web pages, input fields are filled with attack inputs, and submitted to websites Vulnerability presence is checked by confirming known response messages under attacks Widely used to discover SQLI and XSS vulnerabilities
Request and response-based method fails to reach inner logics of programs where vulnerabilities might be present Use case-based method can alleviate the limitation
Request and response Use case:
© Shahriar and Zulkernine, 2011
© Shahriar and Zulkernine, 2011
Relies on interactive user inputs to perform functionalities Generates a collection of user inputs (or URLs) The inputs are replayed back and malicious inputs are injected at specific points of URLs
2011, ACM SAC-43
© Shahriar and Zulkernine, 2011
2011, ACM SAC-44
Test case generation (cont.)
Test case generation: Mutation analysis
T t case generation ti method th d Test
Testing off test cases
Fault injection Attack signature g Mutation analysis Search-based
Assessment of test suite adequacy to reveal faults
Modifies the source code of programs to create mutants A mutant is killed if at least one test case generates different output between an implementation and a mutant If no test case can kill a mutant, it is said to be equivalent of the original g implementation p Compute the adequacy with mutation score (MS) (Total (T tal # off mutants m tant killed) kill d) / (Total (T tal # off n non-equivalent n q i al nt mutants) m tant )
© Shahriar and Zulkernine, 2011
2011, ACM SAC-45
Test case generation: Mutation analysis (cont.) E l off mutation t t l Example analysis Mutant
1. int foo (int a, int b){ 2. if (a > b) 3. return (a – b); 4. else 5. return (a + b); }
1. int foo (int a, int b){ 2. if (a ≥ b) //∆ROR 3. return (a – b); 4. else 5. return (a + b); }
Test case generation: Mutation analysis (cont.) Define mutation operators to inject vulnerabilities Identify mutant killing criteria to generate test cases
The mutation operators are related to attack type coverage The killing criteria specify how to compare outputs b t between a program p g a and a d a mutant ta t The end output between an program and a mutant (strong) The intermediate program states between a program and a mutant ( (weak) k) At any point between the mutated code and the end of a program (firm)
Before killing the mutant: T {(3, 2)} After killing the mutant: T {(3, 2), (4, 4)}
© Shahriar and Zulkernine, 2011
2011, ACM SAC-46
Mutation-based b d security testing
Relational operator replacement (ROR)
Original program
© Shahriar and Zulkernine, 2011
2011, ACM SAC-47
© Shahriar and Zulkernine, 2011
2011, ACM SAC-48
Test case generation: Mutation analysis (cont.) Categor Category
Operator Brief description
Mutating library function calls
S2UCP S2UCT S2UGT S2USN S2UVS RSSBO
Replace strncpy with strcpy. Replace strncat with strcat. Replace fgets with gets. R l Replace snprintf i f with i h sprintf. i f Replace vsnprintf with vsprintf. Replace buffer size with destination buffer size plus one.
RFSNS
R l Replace “%ns” “% ” format f string i with i h “%s”. “% ”
RFSBO
Replace “%ns” format string with “%ms”, where, m = size of the destination buffer plus one. Replace p “%ns” format stringg with “%ms”,, where,, m is the size of the destination buffer plus Δ.
Mutating buffer size arguments M Mutating i fformat Strings
RFSBD RFSIFS Mutating buffer variable sizes Removing statements
MBSBO
Replace “%s” format string with “%ns”, where n is the size of the destination buffer. Increase buffer size by one byte.
RMNLS
Remove null character assignment statement.
© Shahriar and Zulkernine, 2011
2011, ACM SAC-49
Test case generation: Mutation analysis (cont.)
O Operator t B i f description Brief d i ti
Killing Killi criteria
Mutating library function calls
S2UCP S2UCT S2UGT S2USN S2UVS RSSBO
C1 or C2
Mutating buffer size arguments Mutating format strings
MBSBO
Replace strncpy with strcpy. Replace strncat with strcat. Replace fgets with gets. Replace snprintf with sprintf. Replace vsnprintf with vsprintf. Replace buffer size with destination buffer size plus one. Replace “%ns” format string with “%s”. Replace “%ns” format string with “%ms”, where, m = size of the destination buffer plus one. one Replace “%ns” format string with “%ms”, where, m is the size of the destination buffer plus Δ. Replace “%s” format string with “%ns”, where n is the size of the destination buffer. buffer Increase buffer size by one byte.
RMNLS
Remove null character assignment statement.
RFSNS RFSBO RFSBD RFSIFS
Mutating buffer variable sizes Removing statements
© Shahriar and Zulkernine, 2011
C1
ESP ≠ ESM (strong)
C2
Len(BufP) ≤ N && Len(BufM) > N (weak)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-50
Test case generation: Mutation analysis (cont.) A l off mutation t ti analysis l i ffor BOF vulnerability l bilit An example String Original program (P) Mutated program (M) length (src)
Output Output Status (P) (M)
[32]
No crash No crash
C2 C1 or C2 C2
Killing criteria
P: The original implementation. M: The mutant version. ESP: The exit status of P. ESM: The exit status of M. Len(BufP): Length of Buf in P. Len(BufM): Length of Buf in M. N: Buffer size.
Test case generation: Mutation analysis (cont.) C t Category
Name
[256]
char dest [32]; char dest [32]; ….. ….. sscanf (src, %32s , dest);sscanf %40s , dest); (src “%32s” dest); sscanf (src, (src “%40s” //ΔRFSBD As above As above
No Crash Crash
Live
Killed
C1 C1 or C2 C1 C1 or C2
2011, ACM SAC-51
© Shahriar and Zulkernine, 2011
2011, ACM SAC-52
Test case generation: Mutation analysis (cont.) Relationship between BOF attack type and mutation operator Attack
Operator
Overwriting return addresses
All except RSSBO and RFSBO
Overwriting stack and heap
All the proposed operators except RMNLS
One O bbyte overflow fl
RSSBO and d RFSBO
Arbitrary reading of stack
RMNLS
© Shahriar and Zulkernine, 2011
2011, ACM SAC-53
Test case generation: Search-based technique
Test case generation (cont.) T t case generation ti method th d Test Fault injection Attack signature g Mutation analysis Search-based
© Shahriar and Zulkernine, 2011
Test case generation: Search-based technique (cont.)
h be b huge h Program test input space might
f Fitness function
Search algorithms can be applied to generate test cases
Mutation operator
Finding good test cases becomes time consuming E.g., discovering BOF requires generating a large sized strings
A random input is chosen as the initial solution The solution is evolved unless an objective function f (fitness f function) value remains unchanged Fitness functions are designed in ways so that good test cases obtain higher values than that of bad test cases
2011, ACM SAC-54
Assess generated test cases based on a numeric score Guides the choosing of test cases to reveal vulnerabilities
Modification of some random elements in the population to allow evolution of solution
Genetic algorithm
1. 1 Generate initial population 2. Assess fitness of each population 3. Select a subset of population to perform crossover and mutation 4 Repeat steps 2 and 3 unless maximum iteration is reached or 4. fitness value of population remains unchanged
© Shahriar and Zulkernine, 2011
2011, ACM SAC-55
© Shahriar and Zulkernine, 2011
2011, ACM SAC-56
Part II: Summary
Test case generation: Search-based technique (cont.) Example code
Cross over of 10 and 25
char * foo (int len, char *str){ char buf[32]; if (len>0){ strncpy(buf, str, len); //no checking of buf length return buf; }else{ return NULL; } }
10 = [001010], 25=[011001]
Security testing is analogous to software testing, except some changes in defining requirements, coverage, and oracle
M1: 001001 -> 101001 (41)
Fault injection is the most widely used technique
M2: 010011 (19)
Provide no information related to vulnerability and code coverage
Fitness function: F = max(32–len, 0) Initial population (len):{5 (len):{5, 10, 10 25}
Let us assume that a member is represented with 6 bits, max. number is 63
Attack signature-based techniques are mainly applied to webbased programs
F(5) = 27, F(10) = 22, F(25) = 7 Choose chromosomes with lower fitness values, {25, 10, 5} Perform one point cross over on 10 and 25 Randomly replace 0 with 1 in the first gene: {41, 19} New population {41, 25, 19, 10, 5} (len=41) is a test case that reveals BOF
© Shahriar and Zulkernine, 2011
2011, ACM SAC-57
© Shahriar and Zulkernine, 2011
References
Part III: Static analysis
[1]] H. Shahriar h h and d M. Zulkernine, lk “Mitigating Program Security Vulnerabilities: l bl Approaches h and d Challenges, h ll " ACM Computing Surveys (to appear) [2] H. Shahriar and M. Zulkernine, "Assessing Test Suites for Buffer Overflow Vulnerabilities," International Journal of Software Engineering and Knowledge Engineering (IJSEKE), World Scientific, Vol. 20, Issue 1, February 2010, pp. 73-101. [3] A. A Mathur, M th Foundations F d ti off Software S ft Testing, T ti First Fi t edition, diti Pearson P Education, Ed ti 2008 2008. [4] L. Morell, “A Theory of Fault-based Testing,” IEEE Transactions on Software Engineering, Volume 16, Issue 8, 1990, pp. 844–857. [5] B. Potter and G. McGraw, “Software Security Testing,” IEEE Security & Privacy, 2004, pp. 32-34. [6] I. Sommerville, Software Engineering, Addison Wesley, 2001.
© Shahriar and Zulkernine, 2011
2011, ACM SAC-58
2011, ACM SAC-59
Classification of static analysis-based approaches Inference techniques
© Shahriar and Zulkernine, 2011
2011, ACM SAC-60
Static analysis: Overview
Classification of static analysis approaches*
Id t f potential t t l vulnerabilities l b l t by b analyzing l d Identify source code Static analysis techniques do not execute program code
Ignore I related l t d issues i such h as environment i t variables i bl and d test t t cases for f reaching specific program points
f Inference type Analysis sensitivity C mpleteness Completeness Soundness
*Shahriar and Zulkernine 2011, ACM CSUR © Shahriar and Zulkernine, 2011
2011, ACM SAC-61
Classification feature: Inference type F t Four common types
© Shahriar and Zulkernine, 2011
2011, ACM SAC-62
Classification feature: Analysis sensitivity l Static analysis approach might
Tainted data flow-based C nst aint based Constraint-based Annotation-based String pattern matching-based
Generate false positive warnings to be examined manually Suffer from false negative g (i.e., vulnerabilities are unreported) p
Approaches use pre-computed information Reduce R d both b th fal false ppositive iti aand d negative gati rates at E.g., control flow, context, point-to, value range
© Shahriar and Zulkernine, 2011
2011, ACM SAC-63
© Shahriar and Zulkernine, 2011
2011, ACM SAC-64
Classification feature: Analysis sensitivity (cont.) flow An analysis applies inference technique based Control flow: on statement execution sequence Derived from the control flow graph (CFG) of a program unit
Classification feature: Analysis sensitivity (cont.) Point-to: compute a set off memory objects b that h a pointer data type might access in program code Flow sensitive: a program’s program s control flow is taken into account to compute point-to graph Flow insensitive: statements can be analyzed in any order while computing mp ting point-to p int t g graph aph
Context: Multiple call sites of a function with arguments are analyzed separately Example code
Context insensitive
Context sensitive
void foo (int len, char *str){ char buf[32]; if (len>0){ strcpy(buf, str); //call site1 }else{ strcpy(buf, “hello”); //call site2 } }
Call site 1 (BOF) Call site 2 (BOF)
Call site 1 (BOF) Call site 2 (No BOF)
Code
Flow insensitive (Andresen 1994)*
char x[16], p, z[16]; x[16] *p p = @x for(i=0; i0){ 4. size = 16; -∞ … } 5 memcpy (dest, 5. (dest src, src size) -∞ ∞
Node represents data defined and used in a basic block Edge represents control flow among blocks Gen[n]: Block n assigns a tainted value to a variable Kill[n]: Block n sanitizes or checks a tainted variable IN[n] = U OUT[p], p is a predecessor node of n OUT[n] = GEN[n] U (IN[n] \ KILL[n])
size(max) +∞ 16 16
Code
Block
In
Gen
Kill Out
0. 1 1. 2. 3. 4. 5 5. 6. 7.
b0(0-2)
str
size, dest
Φ
str, size, dest
b1(3-5) b1(3 5) b1(6)
str, size, str size dest size, size dest size, dest Φ
str Φ
size, dest size size, dest
void foo (char *str){ int size; char dest [8]; if (str != NULL){ //b1 size = 16; memcpy (dest, (dest src, src size); printf (dest); } else{ //b2 … }
© Shahriar and Zulkernine, 2011
2011, ACM SAC-67
© Shahriar and Zulkernine, 2011
2011, ACM SAC-68
Classification feature: Soundness
Classification feature: Soundness (cont.)
A approach h is sound d iff it generates no false f l negative An warnings
d d iff an analysis l h a No warning is provided is not certain that vulnerability can be exploited (result interpretation)
Most approaches are not sound
Unsupported language features such as specific data type E.g., E g no BOF detection for buffers that are members of union data structures (Wagner et al. 2000)
Reduce false warnings, but suffers from soundness
Underlying assumptions to limit the scope of vulnerabilities Vulncheck assumes that most BOF vulnerabilities are present in lb library function f t calls ll accepting t unsanitized t d arguments t
Lack of analysis sensitivity introduces false negative
E.g., lack of point to analysis (Weber et al. 2001) Limitation of analysis sensitivity (e.g., flow insensitive points to a ay ) analysis) > Some BOF vulnerabilities remain undetected
© Shahriar and Zulkernine, 2011
2011, ACM SAC-69
© Shahriar and Zulkernine, 2011
Classification feature: Completeness A approach h is said d to b l f l An be complete, iff it generates no false positive warnings Analysis sensitivity: Inference approaches l f h based on insensitive analysis result in false positive warnings
2011, ACM SAC-70
Inference technique T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based
All unsafe library function calls can be warned as BOF vulnerabilities However, unsafe function calls present in infeasible paths cannot be exploited
Assumption on program code: Approaches assume that programmers write specific patterns of code Excluded code ppattern results in false ppositive warnings g E.g., inference based on known loop structures (Evans et al. 2002) > for (i=0; i for(; ; ){buf[i] = x; if (..) break;}
© Shahriar and Zulkernine, 2011
2011, ACM SAC-71
© Shahriar and Zulkernine, 2011
2011, ACM SAC-72
Inference technique: Tainted data flow l are marked as tainted and their h propagations Input variables are tracked Warnings are generated, if values derived from tainted inputs are used in sensitive operations
St ti data d t type t iis extended t d d tto iinclude l d ttainted i t d or untainted t i t d Static information (Shankar et al. 2001) int main (int argc, char *argv[]) int main (int argc, argc tainted char *argv[]) argv[])
Check if a labeled source or any value derived from a labeled source participates in sensitive operations Writing to buffers (BOF) Writing to files/consoles (FSB) n a ng query q y statements a m n (SQLI) ( ) Generating Writing HTML contents (XSS)
Some common techniques Static data d type Implicit
© Shahriar and Zulkernine, 2011
Inference technique: Tainted data flow (cont.)
2011, ACM SAC-73
Inference technique: Tainted data flow (cont.) C il type t h ki techniques t h i b leveraged l d to t Compiler checking can be identify expected type deviation for warning generation Type checking h k is guided d d by b a qualifier l f lattice l that h represents subtyping relationships among variables A subtyping relationship l h is analogous l to the h relationship l h between a class and a subclass
If Q is a subclass of P (Q < P), then an object of class Q can be used whenever h an object bj t off class l P is i expected, t d but b t nott the th vice i versa
© Shahriar and Zulkernine, 2011
2011, ACM SAC-74
Inference technique: Tainted data flow (cont.) Implicit tainting Suitable for languages that require no static type declarations of variables (e.g., PHP) Program P variables i bl are nott explicitly li itl labeled l b l d as tainted t i t d or untainted t i t d Tainting is guided based on the nature of vulnerabilities E.g., tainted data derived from HTTP request parameters are vulnerable to XSS (Jovanovic et al. 2006) Tainted information flow is performed based on a data flow graph Identify y locations where tainted data reach to sensitive sinks > If a variable is sanitized, then it is not considered as tainted
Example subtyping for FSB: untainted char < tainted char A variable bl labeled l b l d as “untainted d char” h results l in a warning, iff it is assigned with a “tainted char” variable
© Shahriar and Zulkernine, 2011
2011, ACM SAC-75
© Shahriar and Zulkernine, 2011
2011, ACM SAC-76
Inference technique (cont.)
Inference technique: Constraint-based
T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based
© Shahriar and Zulkernine, 2011
f f Generate safety constraints from program code E.g., for BOF, len(s) > alloc(s), for all string variables s. Constraints are generated and propagated while traversing a program If a solution can be found for a set of constraints, a warning is raised Constraint solvers compute input values that violate constraints
2011, ACM SAC-77
Inference technique: Constraint-based(cont.) al 2000) Integer range analysis (Wagner et al.
A constraint is expressed by a pair of integer ranges E.g., buffer allocation size and current size of an allocated buffer get gets(s) ( ) length ength off s can an be in the range ange [[1,, ∞]] char s[10] allocation size of s is [10, 10]
© Shahriar and Zulkernine, 2011
2011, ACM SAC-78
Inference technique (cont.) T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based
Obtained constraints are solved for each buffer declared and accessed in a program We obtain a range of allocation of a buffer [a, [a b]
allocation size of a buffer can vary from a bytes to b byte
We obtain current size of a buffer [c, d]
current size of a buffer can vary from c bytes to d bytes
Case 1: b > c , non vulnerable buffer is present C 2: Case 2 b = maxRead(s2) @*/ The highest lvalue accessed by s1 must be equal or greater than that of s s2
CBT vs. annotation-based technique In CBT, CBT safety conditions nditi ns are generated aut automatically mati ally In annotation, the burden of writing conditions is on programmers
© Shahriar and Zulkernine, 2011
2011, ACM SAC-81
Inference technique (cont.)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-82
Inference technique: String pattern matching
T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based
Program code is tokenized and scanned to identify pattern of strings that represent vulnerable function calls and arguments E g , format f mat function fun ti n calls a s that include in ude non n n constant nstant format f mat st ings E.g., strings and format strings as the last argument
Executable files are analyzed to detect vulnerabilities (Tevis et al. 2006) Instead of tokens, an approach analyze symbol tables Check Ch k fforr th the pr presence s n off vvulnerable ln rabl library functions f n ti ns (e.g., ( g strcpy) str py)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-83
© Shahriar and Zulkernine, 2011
2011, ACM SAC-84
Part III: Summary
References
Introducing analysis sensitivity results in better detection of vulnerabilities Each inference mechanism is valuable
Annotation verifies that certain vulnerabilities are not present Constraint-based Constraint based inference can prioritize vulnerabilities Tainted data flow can detect multiple vulnerabilities
© Shahriar and Zulkernine, 2011
2011, ACM SAC-85
[1] H. H Shahriar and M. M Zulkernine, Zulkernine "Mitigation Mitigation of Program Security Vulnerabilities: Approaches and Challenges," To appear in ACM Computing Surveys (ACM CSUR). [2] A. Dekok, “Pscan (1.2-8) Format String Security Checker for C Files,” http://packages.debian.org/etch/pscan [[3]] D. Evans a and a d D. Larochelle, a , “Improving p g Security y Using g Extensible Lightweight g g Static a Analysis,” a y , IEEE Software, January 2002, pp. 42-51. [4] A. Sotirov, Automatic Vulnerability Detection Using Static Analysis, MSc Thesis, The University of Alabama, 2005, http://gcc.vulncheck.org/sotirov05automatic.pdf y Vulnerabilities in Scripting g Languages,” g g Proceedings g of [5] Y. Xie and A. Aiken, “Static Detection of Security the 15th Conference on USENIX Security Symposium, Vancouver, Canada, July 2006, pp. 179-192 [6] N. Jovanovic, C. Kruegel, and E. Kirda, “Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities,” Proc. of the IEEE Symposium on Security and Privacy, Oakland, May 2006, pp. 258-263. [7] U. Shankar, K. Talwar, J. Foster, and D. Wagner, “Detecting Format String Vulnerabilities with Type Qualifiers,” lf ” Proceedings d off the h 10th 1 h USENIX Security Symposium, 2001, 1 pp. 201-220. 1 [8] J. Viega, J. Bloch, T. Kohno, and G. McGraw, “Token-based scanning of source code for security problems,” ACM Transactions on Information and System Security (TISSEC), Volume 5, Issue 3, August 2002, pp. 238-261 [9] D D. Wagner Wagner, JJ. F Foster, ster E. E Brewer, Bre er and A. A Aiken, Aiken “A First Step T Towards ards Aut Automated mated Dete Detection ti n off Buffer Overrun Vulnerabilities,” Proceedings of Network and Distributed System Security Symposium, San Diego, February 2000, pp. 3-17. [10] FlawFinder, http://www.dwheeler.com/flawfinder [15] J. J Tevis and JJ. Hamilton Hamilton, “Static Static Analysis of Anomalies and Security Vulnerabilities in Executable Files, Files ” Proceedings of the 44th Annual Southeast Regional Conference, Melbourne, Florida, 2006, pp. 560-565. © Shahriar and Zulkernine, 2011
2011, ACM SAC-86
A generic program monitor
Part IV: Monitoring Classification of program monitors Monitoring aspects Input
Output Program
Input
Program
Output
Monitor Runtime environment Program execution before monitoring
A program without monitor Program takes input, process them, m, and g generates n a output p © Shahriar and Zulkernine, 2011
2011, ACM SAC-87
© Shahriar and Zulkernine, 2011
Runtime environment Program execution after monitoring
A program with monitor A monitor adds an extra layer between program and execution n environment n nm n 2011, ACM SAC-88
Classification features of monitors*
Classification feature: Monitoring aspect
M it i aspectt Monitoring Program state utilization
Program runtime properties that are checked Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
*Shahriar and Zulkernine 2011, Journal off System and Software f © Shahriar and Zulkernine, 2011
2011, ACM SAC-89
Classification feature: Monitoring aspect (cont.)
2011, ACM SAC-90
Classification feature: Monitoring aspect (cont.) Memory access with flexible bounds
Program operation: Memory access
Accidental or intentional memory operations outside valid address spaces are allowed E.g., randomize the location of memory objects and increase the size of allocated objects at least twice (BOF, Berger et al. 2006)
Strict bound Flexible bound b und Function call Output generation
Function call
Memory access with strict bound Do D nott allow ll read, d write, it and d free f operations ti on invalid i lid memory locations E.g., Tagging of memory bytes for readability and writability status (BOF Hastings (BOF, H ti ett al. l 1992) E.g., Tracking the base and size of all memory objects at runtime (BOF, Jones et al. 1997, Ruwase et al. 2004) © Shahriar and Zulkernine, 2011
© Shahriar and Zulkernine, 2011
2011, ACM SAC-91
Examine argument counts and argument lists of functions E.g., match the number of arguments and types with format specifiers f in format f strings (FSB, Cowan et al. l 2001)
Output generation Program output points are monitored for presence of tainted inputs or values derived from tainted inputs E.g., E input i td data t contains t i format f t string t i (FSB, (FSB Newsome N ett al. l 2005) © Shahriar and Zulkernine, 2011
2011, ACM SAC-92
Classification feature: Monitoring aspect (cont.) Program runtime properties
Code execution flow and origin: Check allowable and unallowable control flow and program code location
Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
© Shahriar and Zulkernine, 2011
Classification feature: Monitoring aspect (cont.)
E.g., E g information flow tracking Data sources (e.g., file) are marked as tainted Tainted data sources should not compute jump locations Detect D t t exploits l t th thatt change h control t l fl flows (BOF (BOF, FSB FSB, SQLI )
2011, ACM SAC-93
Classification feature: Monitoring aspect (cont.) Program runtime properties
© Shahriar and Zulkernine, 2011
2011, ACM SAC-94
Classification feature: Monitoring aspect (cont.) Monitor if code conforms to desired syntactic structure
Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
Interpreter I t t executes t iimplemented l t d code d and d throws th exception for injected code (SQLI , Boyd et al. 2005) Add random numbers in legitimate g SQL keywords y > SELECT becomes SELECT123
Attackers fail to correctly add random numbers for keywords > SELECT S C
De-randomization is performed before being parsed by query processors > SELECT123 becomes SELECT > SELECT is not recognized by a parser and generates warning
© Shahriar and Zulkernine, 2011
2011, ACM SAC-95
© Shahriar and Zulkernine, 2011
2011, ACM SAC-96
Classification feature: Monitoring aspect (cont.) Program runtime properties
Monitor sensitive program variables or environment values which are modified during attacks
Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
© Shahriar and Zulkernine, 2011
Classification feature: Monitoring aspect (cont.)
Sensitive location without modification Sensitive location with modification Injected value
2011, ACM SAC-97
Classification feature: Monitoring aspect (cont.)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-98
Classification feature: Monitoring aspect (cont.) Program runtime properties
Sensitive memory location without modification Values stored in sensitive memory locations of programs are checked for corruptions E.g., the integrity of a function’s return address can be checked to detect a BOF attack
Sensitive memory location with modification
Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
To avoid g guessing g the content of memory y locations, sensitive values are stored in encrypted form E.g., return addresses are XORed with random keys
Injected value: Check the integrity of injected values s E.g., canary value is injected before a return address (Cowan et al. 1998) © Shahriar and Zulkernine, 2011
2011, ACM SAC-99
© Shahriar and Zulkernine, 2011
2011, ACM SAC-100
Classification feature: Monitoring aspect (cont.) Data is examined for the presence of unwanted values Input value Output value
Input p value: Inputs p are checked for black and white listed characters E.g., SQL meta characters, HTML characters, format specifiers (black isted) listed) E.g., JavaScript code implemented in a program (white listed)
Classification feature: Monitoring aspect (cont.) Program runtime properties Program operation Code execution flow and origin Code structure Value integrity Unwanted U t d value l Invariant
Output value: l Check h iff output reveals l sensitive information f Opening a connection from browser that includes cookie (XSS, Krida et al. 2005) © Shahriar and Zulkernine, 2011
2011, ACM SAC-101
Classification feature: Monitoring aspect (cont.) h violation l Monitor the off desired properties Requires executing a program with a set of normal (non attack) p g inputs (i.e., p profiling)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-102
Classification features of monitors (cont.) M it i aspectt Monitoring Program state utilization
A monitor identifies any deviation from the learned invariants t (or ( profiles) f l ) during d an actual t l program run E.g., legitimate API invocation sequence (BOF, Han et al. 2007)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-103
© Shahriar and Zulkernine, 2011
2011, ACM SAC-104
Classification feature: Program state utilization h a known A program state needs to be compared with program state under attack Known states are derived through information extraction, addition, or modification of program states Program state utilization
f Information addition Information is added in current program state, which is retrieved and p d with a future program p g state compared E.g., the return address of a function is stored in a safe location E.g., a canary is added before a return address (Cowan et al. 1998)
Information modification
I f Information ti extraction t ti Program code can be analyzed to extract useful information E.g., a list of known JavaScript code (Johns et al. 2007)
© Shahriar and Zulkernine, 2011
Classification feature: Program state utilization (cont.)
2011, ACM SAC-105
Modify d y program p g a (e.g ( g., code, d , memory) y) to aalter pprogram g a behaviors a during attacks E.g., code randomization (Boyd et al. 2005) E.g., E g reorganization of program variables such as place all buffers before function pointers (Etoh et al. 2005)
© Shahriar and Zulkernine, 2011
Part IV: Summary
References
Each monitoring aspect is useful for specific vulnerabilities Program g operation p and value integrity-based g y monitoring g aspects have mainly addressed BOF attacks Unwanted value and code structure-based structure based monitoring have addressed SQLI and XSS attacks Some approaches monitor fine f grained level Results in high overhead
© Shahriar and Zulkernine, 2011
2011, ACM SAC-106
2011, ACM SAC-107
[1] H. Shahriar and M. Zulkernine, “Taxonomy and Classification of Automatic Monitors for Program Security Vulnerabilities ” Journal of Systems and Software , Elsevier Science, Vulnerabilities, Science Vol Vol. 84 84, Issue 2 2, February 2011, 2011 pp. pp 250-269. 250-269 [2] R. Hastings and B. Joyce, “Purify: Fast Detection of Memory Leaks and Access Errors,” Proceedings of the USENIX Winter Conference, San Francisco, January 1992, pp. 125-138. [3] E. Berger and B. Zorn, “DieHard: probabilistic memory safety for unsafe languages,” Proc. of the Conf. on Programming Language Design and Implementation, Ottawa, Canada, June 2006, pp. 158-68. [4] C. Cowan, C. Pu, D. Maier, H. Hinton, J. Walpole, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang, “Automatic Adaptive Detection and Prevention of Buffer Overflow Attacks, Attacks ” Proc. Proc of the7th USENIX Security Conference, Conference San Antonio, Texas, January 1998. [5] H. Han, X. Lu, L. Ren, B. Chen, and N. Yang, “AIFD: A Runtime Solution to Buffer Overflow Attack,” International Conference on Machine Learning and Cybernetics, August 2007, Hong Kong, pp. 3189-3194. [6] . Kirda, C. Kruegel, G. Vigna, and N. Jovanovic, “Noxes: A Client-Side Solution for Mitigating Cross-Site Scripting Attacks,” Proc. of the 21st ACM Symposium on Applied Computing, Dijon, France, April 2006, pp. 330-337 . [7] J. J Newsome and D. D Song, Song “Dynamic Dynamic Taint Analysis for Automatic Detection Detection, Analysis, Analysis and Signature Generation of Exploits on Commodity Software,” Proceedings of the Network and Distributed System Security Symposium, 2005. [8] D. Dhurjati and V. Adve, “Detecting all Dangling Pointer Uses in Production Servers,” Proceedings of the International Conference on Dependable Systems and Networks, Philadelphia, June 2006, pp. 269-280. [9] M. Rinard, C. Cadar, D. Dumitran, D. Roy, and T. Leu, “A Dynamic Technique for Eliminating Buffer Overflow Vulnerabilities (and Other Memory Errors),” Proceedings of the 20th Annual Computer Security Applications Conference, Tucson, Tucson AZ, AZ December 2004, 2004 pp. pp 82 82-90 90 [10] G. Zhu and A. Tyagi, “Protection against Indirect Overflow Attacks on Pointers,” Proc. of the 2nd Intl. Information Assurance Workshop, North Carolina, April 2004, pp. 97-106. [11] G. Buehrer, B. Weide, and P. Sivilotti, “Using Parse Tree Validation to Prevent SQL Injection Attacks,” Proc. of the 5th Workshop on Software Engineering and middleware, Lisbon, Portugal, 2005,pp.106-113. [12] S. Boyd and A. Keromytis, “SQLrand: Preventing SQL Injection Attacks,” Proc. of the 2nd Intl. Conf. on Applied Cryptography and Network Security, 2004, 2004 pp. pp 292-302. 292-302 [13] M. Johns, B. Engelmann, and J. Posegga, “XSSDS: Server-Side Detection of Cross-Site Scripting Attacks,” Proc. of the Annual Computer Security Applications Conference (ACSAC), Anaheim, California, December 2008, pp. 335-344. [14] C. Cowan, M. Barringer, S. Beattie, G. Hartman, M. Frantzen, and J. Lokier, “FormatGuard: Automatic Protection From printf Format String Vulnerabilities,” Proceedings of the 10th USENIX Security Symposium, August 2001, Washington, D.C., pp. 191-200. [15] H. H Etoh, Etoh GCC Extension for Protecting Applications from Stack-smashing Stack smashing Attacks Attacks, http://www http://www.trl.ibm.com/projects/ trl ibm com/projects/ security/ssp © Shahriar and Zulkernine, 2011
2011, ACM SAC-108
Part V: Hybrid analysis
Hybrid analysis vs. monitoring Similarity
Static analysis suffers from false positive warnings Programmers spend significant time for examining warnings
Both techniques examine runtime program states > States vary such as data structure structure, memory locations locations, etc. etc
The combination of static and dynamic analysis (hybrid analysis, ana y , Ernst n et al. a 2003)) Static analysis identifies program locations that need to be examined Exploitations of vulnerabilities are confirmed by executing programs with test cases and examining runtime states (dynamic analysis) > E.g., runtime value stored in a sensitive program location
© Shahriar and Zulkernine, 2011
Dissimilarity y Dynamic analysis is an active analysis > Apply before the release of a program ange off app applications i ati ns (p (profiling, fi ing, deb debugging) gging) > Used in a wide range
Monitoring is a passive analysis technique > Apply after the release of a program > Do not stop a program execution unless there is a deviation of expected program behaviors > Used in (attack/intrusion detection)
2011, ACM SAC-109
Classification features of hybrid analysis*
© Shahriar and Zulkernine, 2011
2011, ACM SAC-110
Classification feature: Static inference Three common static inference techniques
Static inference Static information Dynamic checking
Tainted data flow: Discussed in Part III (static analysis) String pattern matching: Discussed in Part III (static analysis) Untainted data flow: The extraction of legitimate information that is valid in sensitive program operations > Instruction sets that can define (or assign) the value of a variable (Castro et al. 2006)
Tainted d vs. untainted d data d flow fl A untainted data flow-based technique tracks valid data A tainted data fflow-based w based technique te hnique tracks t a s suspected suspe ted data
*Shahriar and Zulkernine 2011, ACM Computing Surveys © Shahriar and Zulkernine, 2011
2011, ACM SAC-111
© Shahriar and Zulkernine, 2011
2011, ACM SAC-112
Classification feature: Static information f l Indicate the information gathered in a static analysis Depend on vulnerability types and analysis techniques E.g., g unsafe library y function calls having g pointer p arguments g (BOF, Aggarwal et al. 2006) E.g., trusted strings (SQLI, Halfond et al. 2006)
© Shahriar and Zulkernine, 2011
2011, ACM SAC-113
Classification feature: Dynamic checking f Program states that are checked to conform exploitations in a dynamic analysis Examples of program state checking Program operation: Memory allocation, release, and access (e.g., BOF Castro et al BOF, al. 2006) Code structure: Validate known structure of program code during runtime (e.g., SQLI, Wei et al. 2006) Unwanted value: Check the presence of unwanted values in program outputs (e.g., XSS, Lucca et al. 2004)
© Shahriar and Zulkernine, 2011
Part V: Summary
References
The precision off static Th t t analysis l is important t t and d needs d to t be b carefully considered before applying in a hybrid approach Most of the works apply string pattern matching and tainted data flow analysis in the static analysis phase Program operation and code integrity are widely used in the dynamic y analysis y phase p
© Shahriar and Zulkernine, 2011
2011, ACM SAC-114
2011, ACM SAC-115
[1] H. H Shahriar and M M. Zulkernine Zulkernine, “Mitigating Mitigating Program Security Vulnerabilities: Approaches and Challenges ,” ACM Computing Surveys (to appear) [2] M. Ernst, “Static and Dynamic Analysis: Synergy and Duality,” Proceedings of ICSE Workshop on Dynamic Analysis, Portland, May 2003, pp. 24-27. [3] W. Halfond and A. Orso, “Combining static analysis and runtime monitoring to counter SQLinjection attacks,” k Proc. off the h 3rdd Intl.Workshop l k h on Dynamic Analysis l , Missouri, May 2005, pp. 1-7. 1 [4] M. Castro, M. Costa, and T. Harris, “Securing Software by Enforcing Data-flow Integrity,” Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Seattle, WA, 2006, pp. 11-11. [5] A. Aggarwal and P. Jalote, “Integrating Integrating Static and Dynamic Analysis for Detecting Vulnerabilities,” Vulnerabilities, Proceedings of the 30th Annual International Computer Software and Application Conference, July 2006, pp. 343-350. [6] S. Yong and S. Horwitz, “Protecting C Programs from Attacks via Invalid Pointer Dereferences,” In ACM SIGSOFT Software Engineering Notes, Volume 28, Issue 5, September 2003, pp. 307-316. [7] W. W Halfond, Halfond A. A Orso, Orso and P. P Manolios, Manolios “Using Using positive tainting and syntax syntax-aware aware evaluation to counter SQL injection attacks,” Proc. of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Portland, Oregon, November 2006, pp. 175-185. [8] M. Ringenburg and D. Grossman, “Preventing Format-string Attacks via Automatic and Efficient Dynamic Checking,” Proc. of 12th Conference on Computer and Communications Security, Nov 2005 Al 2005, Alexandria, d i pp pp. 354 354-363. 363 [9] G. Lucca, A. Fasolino, M. Mastoianni, and P. Tramontana, “Identifying Cross Site Scripting Vulnerabilities in Web Applications,” Proc. of the 6th Intl. Workshop on Web Site Evolution, Chicago, September 2004, pp. 71-80.
© Shahriar and Zulkernine, 2011
2011, ACM SAC-116
Conclusions l l l Program security vulnerabilities exist in reall world Each mitigation technique is important and offers unique advantages The classification feature indicates that techniques can addresses a subset of vulnerabilities
© Shahriar and Zulkernine, 2011
2011, ACM SAC-117