Mitigating Program Security Vulnerabilities: Approaches and ... - SIGAPP

19 downloads 136225 Views 745KB Size Report
Internet Explorer fails to due to buffer overflow. Impact h ..... E.g., Source code, library APIs ..... Derived from the control flow graph (CFG) of a program unit.
Background Mitigating Program Security Vulnerabilities: Approaches and Challenges

Programs are accessible to both legitimate users and hackers Ideally a program should conform to expected behaviors Ideally,  Provide desired outputs based on supplied inputs

H. Shahriar and M. Zulkernine S h l off Computing School C t Queen’s University, Kingston, Canada http://cs.queensu.ca/~shahriar

In practice, a program’s behavior is unexpected  Program crashes or leaks sensitive information

For ACM SAC 2011 Attendees Only

© Shahriar and Zulkernine, 2011

© Shahriar and Zulkernine, 2011

Unexpected program behavior

2011, ACM SAC-2

Information deletion

A very long URL in the address bar of Microsoft Internet Explorer (6.0) (Source: CVE 2006-3869)  Cause Ca > Internet Explorer fails to due to buffer overflow

 Impact > With h a specially ll crafted f request, an attacker k can execute arbitrary b code

“The existence of an SQL injection attack on the RIAA's site came to light via social network news site … hackers turned the site into a blank slate, among g other things.” g

Source: Common Vulnerabilities and Exposures (CVE) http://cve.mitre.org

“The RIAA has restored RIAA.org, although whether it's any more secure than before remains open to question” - January 2008 Source: http://www.theregister.co.uk/2008/01/21/riaa_hacktivism/

© Shahriar and Zulkernine, 2011

2011, ACM SAC-3

© Shahriar and Zulkernine, 2011

2011, ACM SAC-4

More list of incidents

Information modification

Common Vulnerabilities and Exposures (CVE) http://cve.mitre.org Open Source Vulnerability Database (OSVDB), http://osvdb.org Web-Application pp Security y Consortium (WASC) http://www.webappsec.org/projects/whid/ “The The Web sites of Australia Australia'ss two major political parties contain cross-site cross site scripting (XSS) flaws” - October 2007 Source: http://www.builderau.com.au/news/soa/XSS-flaw-makes-PM-say-I-want-top d a a n a a a ay an suck-your-blood-/0,339028227,339282682,00.htm © Shahriar and Zulkernine, 2011

2011, ACM SAC-5

> > > >

Attack method Outcome Location …

© Shahriar and Zulkernine, 2011

What is wrong in these programs?

2011, ACM SAC-6

Tutorial objectives

These programs are developed and tested by the brightest engineers in the industry

Provide an overview of common program security vulnerability exploitations (Part I)

Some interesting aspects might be missing

Introduce mitigation techniques

 Program security vulnerability mitigations in pre and post d l deployment t phases h  Ensure that programs cannot be exploited with malicious inputs

© Shahriar and Zulkernine, 2011

2011, ACM SAC-7

 Program security testing (Part II)  Static analysis (Part III)  Monitoring (Part IV)  Hybrid H b d analysis l (Part (P t V)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-8

Part I: Program security vulnerability Vulnerabilities are flaws in programs that allow attackers to expose or alter sensitive information (Dowd et al. 2007) Exploitation of vulnerabilities are attacks

Buffer overflow (BOF) Specific flaws in implementations that allow attackers to overflow data buffers Consequence of BOF attacks  Program g am state a corruption p n  Program crash due to overwriting of sensitive neighboring variables such as return addresses

Common examples of vulnerabilities  Buffer Overflow (BOF)  Format String Bug (FSB)  SQL Injection (SQLI)  Cross Site Scripting (XSS)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-9

© Shahriar and Zulkernine, 2011

Buffer overflow example

2011, ACM SAC-10

Format string bug (FSB) Invocation of format functions with non-validated user supplied inputs which contain format strings [3]

1. void foo (int a) { 2. int var1; 3. char buf [8]; strcpy 4. strcpy(buf, (buf,“AAAAAAAAAAAAAA”); “AAAAAAAAAAAAAA”); … return; }

 E.g., E format f t ffunctions ti off ANSI C standard t d d library lib

Consequences  Arbitrary reading and writing to format function stacks

C code snippet of foo function (adapted from [2]) Sensitive neighboring variables

buf [0] … Special classes of bugs that may cause security breaches

 How these bugs are introduced in programs?

> Program states and response messages are used to identify the presence or absence of attacks

> Implementation language features y APIs (ANSI C library) y > Library > Inputs (network protocol data unit)

Test cases are run against implementations  The presence of vulnerabilities are confirmed by comparing program states or outputs with known malicious states or outputs under attacks

Test coverage  Does a test suite reveal specific vulnerabilities?  Example coverage of program security testing

> A ttestt suite it should h ld reveall all ll BOF and d SQLI vulnerabilities l biliti

© Shahriar and Zulkernine, 2011

2011, ACM SAC-33

Classification of security testing approaches* T t case generation ti method th d Test Source of test case Test case input type

© Shahriar and Zulkernine, 2011

2011, ACM SAC-34

Classification feature: Test case generation method Example techniques  Fault injection  Attack signature  Mutation analysis  Search-based

We will discuss more on these techniques later

*Shahriar and Zulkernine 2011, ACM CSUR © Zulkernine and Shahriar, 2011

2011, ACM SAC-35

© Shahriar and Zulkernine, 2011

2011, ACM SAC-36

Classification feature: Source of test case

Classification feature: Test case input type

f h are used for f generating test cases Artifacts that

b d on data d types and d vulnerabilities l bl It varies based

Program artifacts

y programs p g Data received by

 Exploiting BOF involve generating strings of particular lengths  Exposing FSB requires strings containing format specifiers

 Vulnerable APIs (ANSI C library functions)  Runtime states (registers or variable values)

Vulnerabilities

Inputs

 A stored XSS requires two URLs (storing a malicious script and downloading the script)

 Valid files, pprotocol data units (PDUs)

Environments  File systems, systems networks, networks and processors

© Shahriar and Zulkernine, 2011

2011, ACM SAC-37

© Shahriar and Zulkernine, 2011

2011, ACM SAC-38

Test case generation: Fault injection

Test case generation T t case generation ti method th d Test

l used test case generation techniques One off the most widely

 Fault injection  Attack signature g  Mutation analysis  Search-based

 Suitable for black box-based testing  Widely y used to discover BOF and FSB vulnerabilities

The objective  Corrupt C pt input i p t data and a d variables a iabl  Supply to executable programs  Observe unexpected responses to conform vulnerabilities

Fault injection can be divided into three types based on the target g of corruption p  Input  Environment  Program g am state tate © Shahriar and Zulkernine, 2011

2011, ACM SAC-39

© Shahriar and Zulkernine, 2011

2011, ACM SAC-40

Test case generation: Fault injection (cont.)

Test case generation (cont.) T t case generation ti method th d Test

Fault injection in Input data  Malform valid data structure into an invalid one  E.g., E g lexical structure modification (replace a non printable character with a printable character)  E.g., semantic structure modification (replace the date format “mmddyyyy” mmddyyyy to “yyyyddmm”) yyyyddmm )

 Fault injection  Attack signature g  Mutation analysis  Search-based

Fault injection in environment

 Modify M dif the th attributes attrib t off pr program gram inputs i p t at different diff r t execution ti stages tag  E.g., during program execution, delete a file or toggle file permissions

F lt iinjection Fault j ti in i program state t t

 Test whether program code can handle malformed states or not  E.g., data variables that control program execution flow

© Shahriar and Zulkernine, 2011

2011, ACM SAC-41

Test case generation: Attack signature Wid l used d in i security it testing t ti off web-based b b d programs Widely p p p g Replace some parts of normal inputs with attack signatures  Signatures are developed from the experience and vulnerability reports  E.g., http://www.xyz.com/login.php?name=john&pwd=secret  SQLI: http://www.xyz.com/login.php?name=‘ p y g p p a or 1=1&pwd= &p d  XSS: http://www.xyz.com/login.php?name=“>...&pwd=

Two ways to traverse web pages

2011, ACM SAC-42

Test case generation: Attack signature (cont.) R t and d response-based b d method th d Request

 Requests web pages, input fields are filled with attack inputs, and submitted to websites  Vulnerability presence is checked by confirming known response messages under attacks  Widely used to discover SQLI and XSS vulnerabilities

Request and response-based method fails to reach inner logics of programs where vulnerabilities might be present Use case-based method can alleviate the limitation

 Request and response  Use case:

© Shahriar and Zulkernine, 2011

© Shahriar and Zulkernine, 2011

 Relies on interactive user inputs to perform functionalities  Generates a collection of user inputs (or URLs)  The inputs are replayed back and malicious inputs are injected at specific points of URLs

2011, ACM SAC-43

© Shahriar and Zulkernine, 2011

2011, ACM SAC-44

Test case generation (cont.)

Test case generation: Mutation analysis

T t case generation ti method th d Test

Testing off test cases

 Fault injection  Attack signature g  Mutation analysis  Search-based

 Assessment of test suite adequacy to reveal faults

Modifies the source code of programs to create mutants A mutant is killed if at least one test case generates different output between an implementation and a mutant If no test case can kill a mutant, it is said to be equivalent of the original g implementation p Compute the adequacy with mutation score (MS)  (Total (T tal # off mutants m tant killed) kill d) / (Total (T tal # off n non-equivalent n q i al nt mutants) m tant )

© Shahriar and Zulkernine, 2011

2011, ACM SAC-45

Test case generation: Mutation analysis (cont.) E l off mutation t t l Example analysis Mutant

1. int foo (int a, int b){ 2. if (a > b) 3. return (a – b); 4. else 5. return (a + b); }

1. int foo (int a, int b){ 2. if (a ≥ b) //∆ROR 3. return (a – b); 4. else 5. return (a + b); }

Test case generation: Mutation analysis (cont.)  Define mutation operators to inject vulnerabilities  Identify mutant killing criteria to generate test cases

The mutation operators are related to attack type coverage The killing criteria specify how to compare outputs b t between a program p g a and a d a mutant ta t  The end output between an program and a mutant (strong)  The intermediate program states between a program and a mutant ( (weak) k)  At any point between the mutated code and the end of a program (firm)

Before killing the mutant: T {(3, 2)} After killing the mutant: T {(3, 2), (4, 4)}

© Shahriar and Zulkernine, 2011

2011, ACM SAC-46

Mutation-based b d security testing

 Relational operator replacement (ROR)

Original program

© Shahriar and Zulkernine, 2011

2011, ACM SAC-47

© Shahriar and Zulkernine, 2011

2011, ACM SAC-48

Test case generation: Mutation analysis (cont.) Categor Category

Operator Brief description

Mutating library function calls

S2UCP S2UCT S2UGT S2USN S2UVS RSSBO

Replace strncpy with strcpy. Replace strncat with strcat. Replace fgets with gets. R l Replace snprintf i f with i h sprintf. i f Replace vsnprintf with vsprintf. Replace buffer size with destination buffer size plus one.

RFSNS

R l Replace “%ns” “% ” format f string i with i h “%s”. “% ”

RFSBO

Replace “%ns” format string with “%ms”, where, m = size of the destination buffer plus one. Replace p “%ns” format stringg with “%ms”,, where,, m is the size of the destination buffer plus Δ.

Mutating buffer size arguments M Mutating i fformat Strings

RFSBD RFSIFS Mutating buffer variable sizes Removing statements

MBSBO

Replace “%s” format string with “%ns”, where n is the size of the destination buffer. Increase buffer size by one byte.

RMNLS

Remove null character assignment statement.

© Shahriar and Zulkernine, 2011

2011, ACM SAC-49

Test case generation: Mutation analysis (cont.)

O Operator t B i f description Brief d i ti

Killing Killi criteria

Mutating library function calls

S2UCP S2UCT S2UGT S2USN S2UVS RSSBO

C1 or C2

Mutating buffer size arguments Mutating format strings

MBSBO

Replace strncpy with strcpy. Replace strncat with strcat. Replace fgets with gets. Replace snprintf with sprintf. Replace vsnprintf with vsprintf. Replace buffer size with destination buffer size plus one. Replace “%ns” format string with “%s”. Replace “%ns” format string with “%ms”, where, m = size of the destination buffer plus one. one Replace “%ns” format string with “%ms”, where, m is the size of the destination buffer plus Δ. Replace “%s” format string with “%ns”, where n is the size of the destination buffer. buffer Increase buffer size by one byte.

RMNLS

Remove null character assignment statement.

RFSNS RFSBO RFSBD RFSIFS

Mutating buffer variable sizes Removing statements

© Shahriar and Zulkernine, 2011

C1

ESP ≠ ESM (strong)

C2

Len(BufP) ≤ N && Len(BufM) > N (weak)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-50

Test case generation: Mutation analysis (cont.) A l off mutation t ti analysis l i ffor BOF vulnerability l bilit An example String Original program (P) Mutated program (M) length (src)

Output Output Status (P) (M)

[32]

No crash No crash

C2 C1 or C2 C2

Killing criteria

P: The original implementation. M: The mutant version. ESP: The exit status of P. ESM: The exit status of M. Len(BufP): Length of Buf in P. Len(BufM): Length of Buf in M. N: Buffer size.

Test case generation: Mutation analysis (cont.) C t Category

Name

[256]

char dest [32]; char dest [32]; ….. ….. sscanf (src, %32s , dest);sscanf %40s , dest); (src “%32s” dest); sscanf (src, (src “%40s” //ΔRFSBD As above As above

No Crash Crash

Live

Killed

C1 C1 or C2 C1 C1 or C2

2011, ACM SAC-51

© Shahriar and Zulkernine, 2011

2011, ACM SAC-52

Test case generation: Mutation analysis (cont.) Relationship between BOF attack type and mutation operator Attack

Operator

Overwriting return addresses

All except RSSBO and RFSBO

Overwriting stack and heap

All the proposed operators except RMNLS

One O bbyte overflow fl

RSSBO and d RFSBO

Arbitrary reading of stack

RMNLS

© Shahriar and Zulkernine, 2011

2011, ACM SAC-53

Test case generation: Search-based technique

Test case generation (cont.) T t case generation ti method th d Test  Fault injection  Attack signature g  Mutation analysis  Search-based

© Shahriar and Zulkernine, 2011

Test case generation: Search-based technique (cont.)

h be b huge h Program test input space might

f Fitness function

Search algorithms can be applied to generate test cases

Mutation operator

 Finding good test cases becomes time consuming  E.g., discovering BOF requires generating a large sized strings

 A random input is chosen as the initial solution  The solution is evolved unless an objective function f (fitness f function) value remains unchanged  Fitness functions are designed in ways so that good test cases obtain higher values than that of bad test cases

2011, ACM SAC-54

 Assess generated test cases based on a numeric score  Guides the choosing of test cases to reveal vulnerabilities

 Modification of some random elements in the population to allow evolution of solution

Genetic algorithm

 1. 1 Generate initial population  2. Assess fitness of each population  3. Select a subset of population to perform crossover and mutation 4 Repeat steps 2 and 3 unless maximum iteration is reached or  4. fitness value of population remains unchanged

© Shahriar and Zulkernine, 2011

2011, ACM SAC-55

© Shahriar and Zulkernine, 2011

2011, ACM SAC-56

Part II: Summary

Test case generation: Search-based technique (cont.) Example code

Cross over of 10 and 25

char * foo (int len, char *str){ char buf[32]; if (len>0){ strncpy(buf, str, len); //no checking of buf length return buf; }else{ return NULL; } }

10 = [001010], 25=[011001]

Security testing is analogous to software testing, except some changes in defining requirements, coverage, and oracle

M1: 001001 -> 101001 (41)

Fault injection is the most widely used technique

M2: 010011 (19)

 Provide no information related to vulnerability and code coverage

Fitness function: F = max(32–len, 0) Initial population (len):{5 (len):{5, 10, 10 25}

 Let us assume that a member is represented with 6 bits, max. number is 63

Attack signature-based techniques are mainly applied to webbased programs

F(5) = 27, F(10) = 22, F(25) = 7 Choose chromosomes with lower fitness values, {25, 10, 5}  Perform one point cross over on 10 and 25  Randomly replace 0 with 1 in the first gene: {41, 19}  New population {41, 25, 19, 10, 5}  (len=41) is a test case that reveals BOF

© Shahriar and Zulkernine, 2011

2011, ACM SAC-57

© Shahriar and Zulkernine, 2011

References

Part III: Static analysis

[1]] H. Shahriar h h and d M. Zulkernine, lk “Mitigating Program Security Vulnerabilities: l bl Approaches h and d Challenges, h ll " ACM Computing Surveys (to appear) [2] H. Shahriar and M. Zulkernine, "Assessing Test Suites for Buffer Overflow Vulnerabilities," International Journal of Software Engineering and Knowledge Engineering (IJSEKE), World Scientific, Vol. 20, Issue 1, February 2010, pp. 73-101. [3] A. A Mathur, M th Foundations F d ti off Software S ft Testing, T ti First Fi t edition, diti Pearson P Education, Ed ti 2008 2008. [4] L. Morell, “A Theory of Fault-based Testing,” IEEE Transactions on Software Engineering, Volume 16, Issue 8, 1990, pp. 844–857. [5] B. Potter and G. McGraw, “Software Security Testing,” IEEE Security & Privacy, 2004, pp. 32-34. [6] I. Sommerville, Software Engineering, Addison Wesley, 2001.

© Shahriar and Zulkernine, 2011

2011, ACM SAC-58

2011, ACM SAC-59

Classification of static analysis-based approaches Inference techniques

© Shahriar and Zulkernine, 2011

2011, ACM SAC-60

Static analysis: Overview

Classification of static analysis approaches*

Id t f potential t t l vulnerabilities l b l t by b analyzing l d Identify source code Static analysis techniques do not execute program code

 Ignore I related l t d issues i such h as environment i t variables i bl and d test t t cases for f reaching specific program points

f Inference type Analysis sensitivity C mpleteness Completeness Soundness

*Shahriar and Zulkernine 2011, ACM CSUR © Shahriar and Zulkernine, 2011

2011, ACM SAC-61

Classification feature: Inference type F t Four common types

© Shahriar and Zulkernine, 2011

2011, ACM SAC-62

Classification feature: Analysis sensitivity l Static analysis approach might

Tainted data flow-based C nst aint based Constraint-based Annotation-based String pattern matching-based

 Generate false positive warnings to be examined manually  Suffer from false negative g (i.e., vulnerabilities are unreported) p

Approaches use pre-computed information  Reduce R d both b th fal false ppositive iti aand d negative gati rates at  E.g., control flow, context, point-to, value range

© Shahriar and Zulkernine, 2011

2011, ACM SAC-63

© Shahriar and Zulkernine, 2011

2011, ACM SAC-64

Classification feature: Analysis sensitivity (cont.) flow An analysis applies inference technique based Control flow: on statement execution sequence  Derived from the control flow graph (CFG) of a program unit

Classification feature: Analysis sensitivity (cont.) Point-to: compute a set off memory objects b that h a pointer data type might access in program code  Flow sensitive: a program’s program s control flow is taken into account to compute point-to graph  Flow insensitive: statements can be analyzed in any order while computing mp ting point-to p int t g graph aph

Context: Multiple call sites of a function with arguments are analyzed separately Example code

Context insensitive

Context sensitive

void foo (int len, char *str){ char buf[32]; if (len>0){ strcpy(buf, str); //call site1 }else{ strcpy(buf, “hello”); //call site2 } }

Call site 1 (BOF) Call site 2 (BOF)

Call site 1 (BOF) Call site 2 (No BOF)

Code

Flow insensitive (Andresen 1994)*

char x[16], p, z[16]; x[16] *p p = @x for(i=0; i0){ 4. size = 16; -∞ … } 5 memcpy (dest, 5. (dest src, src size) -∞ ∞

 Node represents data defined and used in a basic block  Edge represents control flow among blocks  Gen[n]: Block n assigns a tainted value to a variable  Kill[n]: Block n sanitizes or checks a tainted variable  IN[n] = U OUT[p], p is a predecessor node of n  OUT[n] = GEN[n] U (IN[n] \ KILL[n])

size(max) +∞ 16 16

Code

Block

In

Gen

Kill Out

0. 1 1. 2. 3. 4. 5 5. 6. 7.

b0(0-2)

str

size, dest

Φ

str, size, dest

b1(3-5) b1(3 5) b1(6)

str, size, str size dest size, size dest size, dest Φ

str Φ

size, dest size size, dest

void foo (char *str){ int size; char dest [8]; if (str != NULL){ //b1 size = 16; memcpy (dest, (dest src, src size); printf (dest); } else{ //b2 … }

© Shahriar and Zulkernine, 2011

2011, ACM SAC-67

© Shahriar and Zulkernine, 2011

2011, ACM SAC-68

Classification feature: Soundness

Classification feature: Soundness (cont.)

A approach h is sound d iff it generates no false f l negative An warnings

d d iff an analysis l h a No warning is provided is not certain that vulnerability can be exploited (result interpretation)

 Most approaches are not sound

Unsupported language features such as specific data type  E.g., E g no BOF detection for buffers that are members of union data structures (Wagner et al. 2000)

 Reduce false warnings, but suffers from soundness

Underlying assumptions to limit the scope of vulnerabilities  Vulncheck assumes that most BOF vulnerabilities are present in lb library function f t calls ll accepting t unsanitized t d arguments t

Lack of analysis sensitivity introduces false negative

 E.g., lack of point to analysis (Weber et al. 2001)  Limitation of analysis sensitivity (e.g., flow insensitive points to a ay ) analysis) > Some BOF vulnerabilities remain undetected

© Shahriar and Zulkernine, 2011

2011, ACM SAC-69

© Shahriar and Zulkernine, 2011

Classification feature: Completeness A approach h is said d to b l f l An be complete, iff it generates no false positive warnings Analysis sensitivity: Inference approaches l f h based on insensitive analysis result in false positive warnings

2011, ACM SAC-70

Inference technique T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based

 All unsafe library function calls can be warned as BOF vulnerabilities  However, unsafe function calls present in infeasible paths cannot be exploited

Assumption on program code: Approaches assume that programmers write specific patterns of code  Excluded code ppattern results in false ppositive warnings g  E.g., inference based on known loop structures (Evans et al. 2002) > for (i=0; i for(; ; ){buf[i] = x; if (..) break;}

© Shahriar and Zulkernine, 2011

2011, ACM SAC-71

© Shahriar and Zulkernine, 2011

2011, ACM SAC-72

Inference technique: Tainted data flow l are marked as tainted and their h propagations Input variables are tracked Warnings are generated, if values derived from tainted inputs are used in sensitive operations

St ti data d t type t iis extended t d d tto iinclude l d ttainted i t d or untainted t i t d Static information (Shankar et al. 2001)  int main (int argc, char *argv[])  int main (int argc, argc tainted char *argv[]) argv[])

Check if a labeled source or any value derived from a labeled source participates in sensitive operations  Writing to buffers (BOF)  Writing to files/consoles (FSB) n a ng query q y statements a m n (SQLI) ( )  Generating  Writing HTML contents (XSS)

Some common techniques  Static data d type  Implicit

© Shahriar and Zulkernine, 2011

Inference technique: Tainted data flow (cont.)

2011, ACM SAC-73

Inference technique: Tainted data flow (cont.) C il type t h ki techniques t h i b leveraged l d to t Compiler checking can be identify expected type deviation for warning generation Type checking h k is guided d d by b a qualifier l f lattice l that h represents subtyping relationships among variables A subtyping relationship l h is analogous l to the h relationship l h between a class and a subclass

 If Q is a subclass of P (Q < P), then an object of class Q can be used whenever h an object bj t off class l P is i expected, t d but b t nott the th vice i versa

© Shahriar and Zulkernine, 2011

2011, ACM SAC-74

Inference technique: Tainted data flow (cont.) Implicit tainting  Suitable for languages that require no static type declarations of variables (e.g., PHP)  Program P variables i bl are nott explicitly li itl labeled l b l d as tainted t i t d or untainted t i t d  Tainting is guided based on the nature of vulnerabilities  E.g., tainted data derived from HTTP request parameters are vulnerable to XSS (Jovanovic et al. 2006)  Tainted information flow is performed based on a data flow graph  Identify y locations where tainted data reach to sensitive sinks > If a variable is sanitized, then it is not considered as tainted

Example subtyping for FSB: untainted char < tainted char  A variable bl labeled l b l d as “untainted d char” h results l in a warning, iff it is assigned with a “tainted char” variable

© Shahriar and Zulkernine, 2011

2011, ACM SAC-75

© Shahriar and Zulkernine, 2011

2011, ACM SAC-76

Inference technique (cont.)

Inference technique: Constraint-based

T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based

© Shahriar and Zulkernine, 2011

f f Generate safety constraints from program code  E.g., for BOF, len(s) > alloc(s), for all string variables s.  Constraints are generated and propagated while traversing a program  If a solution can be found for a set of constraints, a warning is raised  Constraint solvers compute input values that violate constraints

2011, ACM SAC-77

Inference technique: Constraint-based(cont.) al 2000) Integer range analysis (Wagner et al.

 A constraint is expressed by a pair of integer ranges  E.g., buffer allocation size and current size of an allocated buffer  get gets(s) ( )  length ength off s can an be in the range ange [[1,, ∞]]  char s[10]  allocation size of s is [10, 10]

© Shahriar and Zulkernine, 2011

2011, ACM SAC-78

Inference technique (cont.) T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based

Obtained constraints are solved for each buffer declared and accessed in a program We obtain a range of allocation of a buffer [a, [a b]

 allocation size of a buffer can vary from a bytes to b byte

We obtain current size of a buffer [c, d]

 current size of a buffer can vary from c bytes to d bytes

Case 1: b > c , non vulnerable buffer is present C 2: Case 2 b = maxRead(s2) @*/ The highest lvalue accessed by s1 must be equal or greater than that of s s2

CBT vs. annotation-based technique  In CBT, CBT safety conditions nditi ns are generated aut automatically mati ally  In annotation, the burden of writing conditions is on programmers

© Shahriar and Zulkernine, 2011

2011, ACM SAC-81

Inference technique (cont.)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-82

Inference technique: String pattern matching

T t dd t fl b d Tainted data flow-based Constraint-based Annotation-based String pattern matching-based

Program code is tokenized and scanned to identify pattern of strings that represent vulnerable function calls and arguments E g , format f mat function fun ti n calls a s that include in ude non n n constant nstant format f mat st ings  E.g., strings and format strings as the last argument

Executable files are analyzed to detect vulnerabilities (Tevis et al. 2006)  Instead of tokens, an approach analyze symbol tables  Check Ch k fforr th the pr presence s n off vvulnerable ln rabl library functions f n ti ns (e.g., ( g strcpy) str py)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-83

© Shahriar and Zulkernine, 2011

2011, ACM SAC-84

Part III: Summary

References

Introducing analysis sensitivity results in better detection of vulnerabilities Each inference mechanism is valuable

 Annotation verifies that certain vulnerabilities are not present  Constraint-based Constraint based inference can prioritize vulnerabilities  Tainted data flow can detect multiple vulnerabilities

© Shahriar and Zulkernine, 2011

2011, ACM SAC-85

[1] H. H Shahriar and M. M Zulkernine, Zulkernine "Mitigation Mitigation of Program Security Vulnerabilities: Approaches and Challenges," To appear in ACM Computing Surveys (ACM CSUR). [2] A. Dekok, “Pscan (1.2-8) Format String Security Checker for C Files,” http://packages.debian.org/etch/pscan [[3]] D. Evans a and a d D. Larochelle, a , “Improving p g Security y Using g Extensible Lightweight g g Static a Analysis,” a y , IEEE Software, January 2002, pp. 42-51. [4] A. Sotirov, Automatic Vulnerability Detection Using Static Analysis, MSc Thesis, The University of Alabama, 2005, http://gcc.vulncheck.org/sotirov05automatic.pdf y Vulnerabilities in Scripting g Languages,” g g Proceedings g of [5] Y. Xie and A. Aiken, “Static Detection of Security the 15th Conference on USENIX Security Symposium, Vancouver, Canada, July 2006, pp. 179-192 [6] N. Jovanovic, C. Kruegel, and E. Kirda, “Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities,” Proc. of the IEEE Symposium on Security and Privacy, Oakland, May 2006, pp. 258-263. [7] U. Shankar, K. Talwar, J. Foster, and D. Wagner, “Detecting Format String Vulnerabilities with Type Qualifiers,” lf ” Proceedings d off the h 10th 1 h USENIX Security Symposium, 2001, 1 pp. 201-220. 1 [8] J. Viega, J. Bloch, T. Kohno, and G. McGraw, “Token-based scanning of source code for security problems,” ACM Transactions on Information and System Security (TISSEC), Volume 5, Issue 3, August 2002, pp. 238-261 [9] D D. Wagner Wagner, JJ. F Foster, ster E. E Brewer, Bre er and A. A Aiken, Aiken “A First Step T Towards ards Aut Automated mated Dete Detection ti n off Buffer Overrun Vulnerabilities,” Proceedings of Network and Distributed System Security Symposium, San Diego, February 2000, pp. 3-17. [10] FlawFinder, http://www.dwheeler.com/flawfinder [15] J. J Tevis and JJ. Hamilton Hamilton, “Static Static Analysis of Anomalies and Security Vulnerabilities in Executable Files, Files ” Proceedings of the 44th Annual Southeast Regional Conference, Melbourne, Florida, 2006, pp. 560-565. © Shahriar and Zulkernine, 2011

2011, ACM SAC-86

A generic program monitor

Part IV: Monitoring Classification of program monitors Monitoring aspects Input

Output Program

Input

Program

Output

Monitor Runtime environment Program execution before monitoring

A program without monitor Program takes input, process them, m, and g generates n a output p © Shahriar and Zulkernine, 2011

2011, ACM SAC-87

© Shahriar and Zulkernine, 2011

Runtime environment Program execution after monitoring

A program with monitor A monitor adds an extra layer between program and execution n environment n nm n 2011, ACM SAC-88

Classification features of monitors*

Classification feature: Monitoring aspect

M it i aspectt Monitoring Program state utilization

Program runtime properties that are checked  Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

*Shahriar and Zulkernine 2011, Journal off System and Software f © Shahriar and Zulkernine, 2011

2011, ACM SAC-89

Classification feature: Monitoring aspect (cont.)

2011, ACM SAC-90

Classification feature: Monitoring aspect (cont.) Memory access with flexible bounds

Program operation: Memory access

 Accidental or intentional memory operations outside valid address spaces are allowed  E.g., randomize the location of memory objects and increase the size of allocated objects at least twice (BOF, Berger et al. 2006)

 Strict bound  Flexible bound b und  Function call  Output generation

Function call

Memory access with strict bound  Do D nott allow ll read, d write, it and d free f operations ti on invalid i lid memory locations  E.g., Tagging of memory bytes for readability and writability status (BOF Hastings (BOF, H ti ett al. l 1992)  E.g., Tracking the base and size of all memory objects at runtime (BOF, Jones et al. 1997, Ruwase et al. 2004) © Shahriar and Zulkernine, 2011

© Shahriar and Zulkernine, 2011

2011, ACM SAC-91

 Examine argument counts and argument lists of functions  E.g., match the number of arguments and types with format specifiers f in format f strings (FSB, Cowan et al. l 2001)

Output generation  Program output points are monitored for presence of tainted inputs or values derived from tainted inputs  E.g., E input i td data t contains t i format f t string t i (FSB, (FSB Newsome N ett al. l 2005) © Shahriar and Zulkernine, 2011

2011, ACM SAC-92

Classification feature: Monitoring aspect (cont.) Program runtime properties

Code execution flow and origin: Check allowable and unallowable control flow and program code location

 Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

© Shahriar and Zulkernine, 2011

Classification feature: Monitoring aspect (cont.)

 E.g., E g information flow tracking  Data sources (e.g., file) are marked as tainted  Tainted data sources should not compute jump locations  Detect D t t exploits l t th thatt change h control t l fl flows (BOF (BOF, FSB FSB, SQLI )

2011, ACM SAC-93

Classification feature: Monitoring aspect (cont.) Program runtime properties

© Shahriar and Zulkernine, 2011

2011, ACM SAC-94

Classification feature: Monitoring aspect (cont.) Monitor if code conforms to desired syntactic structure

 Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

Interpreter I t t executes t iimplemented l t d code d and d throws th exception for injected code (SQLI , Boyd et al. 2005)  Add random numbers in legitimate g SQL keywords y > SELECT becomes SELECT123

 Attackers fail to correctly add random numbers for keywords > SELECT S C

 De-randomization is performed before being parsed by query processors > SELECT123 becomes SELECT > SELECT is not recognized by a parser and generates warning

© Shahriar and Zulkernine, 2011

2011, ACM SAC-95

© Shahriar and Zulkernine, 2011

2011, ACM SAC-96

Classification feature: Monitoring aspect (cont.) Program runtime properties

Monitor sensitive program variables or environment values which are modified during attacks

 Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

© Shahriar and Zulkernine, 2011

Classification feature: Monitoring aspect (cont.)

 Sensitive location without modification  Sensitive location with modification  Injected value

2011, ACM SAC-97

Classification feature: Monitoring aspect (cont.)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-98

Classification feature: Monitoring aspect (cont.) Program runtime properties

Sensitive memory location without modification  Values stored in sensitive memory locations of programs are checked for corruptions  E.g., the integrity of a function’s return address can be checked to detect a BOF attack

Sensitive memory location with modification

 Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

 To avoid g guessing g the content of memory y locations, sensitive values are stored in encrypted form  E.g., return addresses are XORed with random keys

Injected value: Check the integrity of injected values s  E.g., canary value is injected before a return address (Cowan et al. 1998) © Shahriar and Zulkernine, 2011

2011, ACM SAC-99

© Shahriar and Zulkernine, 2011

2011, ACM SAC-100

Classification feature: Monitoring aspect (cont.) Data is examined for the presence of unwanted values  Input value  Output value

Input p value: Inputs p are checked for black and white listed characters  E.g., SQL meta characters, HTML characters, format specifiers (black isted) listed)  E.g., JavaScript code implemented in a program (white listed)

Classification feature: Monitoring aspect (cont.) Program runtime properties  Program operation  Code execution flow and origin  Code structure  Value integrity  Unwanted U t d value l  Invariant

Output value: l Check h iff output reveals l sensitive information f  Opening a connection from browser that includes cookie (XSS, Krida et al. 2005) © Shahriar and Zulkernine, 2011

2011, ACM SAC-101

Classification feature: Monitoring aspect (cont.) h violation l Monitor the off desired properties  Requires executing a program with a set of normal (non attack) p g inputs (i.e., p profiling)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-102

Classification features of monitors (cont.) M it i aspectt Monitoring Program state utilization

A monitor identifies any deviation from the learned invariants t (or ( profiles) f l ) during d an actual t l program run  E.g., legitimate API invocation sequence (BOF, Han et al. 2007)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-103

© Shahriar and Zulkernine, 2011

2011, ACM SAC-104

Classification feature: Program state utilization h a known A program state needs to be compared with program state under attack Known states are derived through information extraction, addition, or modification of program states  Program state utilization

f Information addition  Information is added in current program state, which is retrieved and p d with a future program p g state compared  E.g., the return address of a function is stored in a safe location  E.g., a canary is added before a return address (Cowan et al. 1998)

Information modification

I f Information ti extraction t ti  Program code can be analyzed to extract useful information  E.g., a list of known JavaScript code (Johns et al. 2007)

© Shahriar and Zulkernine, 2011

Classification feature: Program state utilization (cont.)

2011, ACM SAC-105

 Modify d y program p g a (e.g ( g., code, d , memory) y) to aalter pprogram g a behaviors a during attacks  E.g., code randomization (Boyd et al. 2005)  E.g., E g reorganization of program variables such as place all buffers before function pointers (Etoh et al. 2005)

© Shahriar and Zulkernine, 2011

Part IV: Summary

References

Each monitoring aspect is useful for specific vulnerabilities Program g operation p and value integrity-based g y monitoring g aspects have mainly addressed BOF attacks Unwanted value and code structure-based structure based monitoring have addressed SQLI and XSS attacks Some approaches monitor fine f grained level  Results in high overhead

© Shahriar and Zulkernine, 2011

2011, ACM SAC-106

2011, ACM SAC-107

[1] H. Shahriar and M. Zulkernine, “Taxonomy and Classification of Automatic Monitors for Program Security Vulnerabilities ” Journal of Systems and Software , Elsevier Science, Vulnerabilities, Science Vol Vol. 84 84, Issue 2 2, February 2011, 2011 pp. pp 250-269. 250-269 [2] R. Hastings and B. Joyce, “Purify: Fast Detection of Memory Leaks and Access Errors,” Proceedings of the USENIX Winter Conference, San Francisco, January 1992, pp. 125-138. [3] E. Berger and B. Zorn, “DieHard: probabilistic memory safety for unsafe languages,” Proc. of the Conf. on Programming Language Design and Implementation, Ottawa, Canada, June 2006, pp. 158-68. [4] C. Cowan, C. Pu, D. Maier, H. Hinton, J. Walpole, P. Bakke, S. Beattie, A. Grier, P. Wagle, and Q. Zhang, “Automatic Adaptive Detection and Prevention of Buffer Overflow Attacks, Attacks ” Proc. Proc of the7th USENIX Security Conference, Conference San Antonio, Texas, January 1998. [5] H. Han, X. Lu, L. Ren, B. Chen, and N. Yang, “AIFD: A Runtime Solution to Buffer Overflow Attack,” International Conference on Machine Learning and Cybernetics, August 2007, Hong Kong, pp. 3189-3194. [6] . Kirda, C. Kruegel, G. Vigna, and N. Jovanovic, “Noxes: A Client-Side Solution for Mitigating Cross-Site Scripting Attacks,” Proc. of the 21st ACM Symposium on Applied Computing, Dijon, France, April 2006, pp. 330-337 . [7] J. J Newsome and D. D Song, Song “Dynamic Dynamic Taint Analysis for Automatic Detection Detection, Analysis, Analysis and Signature Generation of Exploits on Commodity Software,” Proceedings of the Network and Distributed System Security Symposium, 2005. [8] D. Dhurjati and V. Adve, “Detecting all Dangling Pointer Uses in Production Servers,” Proceedings of the International Conference on Dependable Systems and Networks, Philadelphia, June 2006, pp. 269-280. [9] M. Rinard, C. Cadar, D. Dumitran, D. Roy, and T. Leu, “A Dynamic Technique for Eliminating Buffer Overflow Vulnerabilities (and Other Memory Errors),” Proceedings of the 20th Annual Computer Security Applications Conference, Tucson, Tucson AZ, AZ December 2004, 2004 pp. pp 82 82-90 90 [10] G. Zhu and A. Tyagi, “Protection against Indirect Overflow Attacks on Pointers,” Proc. of the 2nd Intl. Information Assurance Workshop, North Carolina, April 2004, pp. 97-106. [11] G. Buehrer, B. Weide, and P. Sivilotti, “Using Parse Tree Validation to Prevent SQL Injection Attacks,” Proc. of the 5th Workshop on Software Engineering and middleware, Lisbon, Portugal, 2005,pp.106-113. [12] S. Boyd and A. Keromytis, “SQLrand: Preventing SQL Injection Attacks,” Proc. of the 2nd Intl. Conf. on Applied Cryptography and Network Security, 2004, 2004 pp. pp 292-302. 292-302 [13] M. Johns, B. Engelmann, and J. Posegga, “XSSDS: Server-Side Detection of Cross-Site Scripting Attacks,” Proc. of the Annual Computer Security Applications Conference (ACSAC), Anaheim, California, December 2008, pp. 335-344. [14] C. Cowan, M. Barringer, S. Beattie, G. Hartman, M. Frantzen, and J. Lokier, “FormatGuard: Automatic Protection From printf Format String Vulnerabilities,” Proceedings of the 10th USENIX Security Symposium, August 2001, Washington, D.C., pp. 191-200. [15] H. H Etoh, Etoh GCC Extension for Protecting Applications from Stack-smashing Stack smashing Attacks Attacks, http://www http://www.trl.ibm.com/projects/ trl ibm com/projects/ security/ssp © Shahriar and Zulkernine, 2011

2011, ACM SAC-108

Part V: Hybrid analysis

Hybrid analysis vs. monitoring Similarity

Static analysis suffers from false positive warnings  Programmers spend significant time for examining warnings

 Both techniques examine runtime program states > States vary such as data structure structure, memory locations locations, etc. etc

The combination of static and dynamic analysis (hybrid analysis, ana y , Ernst n et al. a 2003))  Static analysis identifies program locations that need to be examined  Exploitations of vulnerabilities are confirmed by executing programs with test cases and examining runtime states (dynamic analysis) > E.g., runtime value stored in a sensitive program location

© Shahriar and Zulkernine, 2011

Dissimilarity y  Dynamic analysis is an active analysis > Apply before the release of a program ange off app applications i ati ns (p (profiling, fi ing, deb debugging) gging) > Used in a wide range

 Monitoring is a passive analysis technique > Apply after the release of a program > Do not stop a program execution unless there is a deviation of expected program behaviors > Used in (attack/intrusion detection)

2011, ACM SAC-109

Classification features of hybrid analysis*

© Shahriar and Zulkernine, 2011

2011, ACM SAC-110

Classification feature: Static inference Three common static inference techniques

Static inference Static information Dynamic checking

 Tainted data flow: Discussed in Part III (static analysis)  String pattern matching: Discussed in Part III (static analysis)  Untainted data flow: The extraction of legitimate information that is valid in sensitive program operations > Instruction sets that can define (or assign) the value of a variable (Castro et al. 2006)

Tainted d vs. untainted d data d flow fl  A untainted data flow-based technique tracks valid data  A tainted data fflow-based w based technique te hnique tracks t a s suspected suspe ted data

*Shahriar and Zulkernine 2011, ACM Computing Surveys © Shahriar and Zulkernine, 2011

2011, ACM SAC-111

© Shahriar and Zulkernine, 2011

2011, ACM SAC-112

Classification feature: Static information f l Indicate the information gathered in a static analysis  Depend on vulnerability types and analysis techniques  E.g., g unsafe library y function calls having g pointer p arguments g (BOF, Aggarwal et al. 2006)  E.g., trusted strings (SQLI, Halfond et al. 2006)

© Shahriar and Zulkernine, 2011

2011, ACM SAC-113

Classification feature: Dynamic checking f Program states that are checked to conform exploitations in a dynamic analysis Examples of program state checking  Program operation: Memory allocation, release, and access (e.g., BOF Castro et al BOF, al. 2006)  Code structure: Validate known structure of program code during runtime (e.g., SQLI, Wei et al. 2006)  Unwanted value: Check the presence of unwanted values in program outputs (e.g., XSS, Lucca et al. 2004)

© Shahriar and Zulkernine, 2011

Part V: Summary

References

The precision off static Th t t analysis l is important t t and d needs d to t be b carefully considered before applying in a hybrid approach Most of the works apply string pattern matching and tainted data flow analysis in the static analysis phase Program operation and code integrity are widely used in the dynamic y analysis y phase p

© Shahriar and Zulkernine, 2011

2011, ACM SAC-114

2011, ACM SAC-115

[1] H. H Shahriar and M M. Zulkernine Zulkernine, “Mitigating Mitigating Program Security Vulnerabilities: Approaches and Challenges ,” ACM Computing Surveys (to appear) [2] M. Ernst, “Static and Dynamic Analysis: Synergy and Duality,” Proceedings of ICSE Workshop on Dynamic Analysis, Portland, May 2003, pp. 24-27. [3] W. Halfond and A. Orso, “Combining static analysis and runtime monitoring to counter SQLinjection attacks,” k Proc. off the h 3rdd Intl.Workshop l k h on Dynamic Analysis l , Missouri, May 2005, pp. 1-7. 1 [4] M. Castro, M. Costa, and T. Harris, “Securing Software by Enforcing Data-flow Integrity,” Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Seattle, WA, 2006, pp. 11-11. [5] A. Aggarwal and P. Jalote, “Integrating Integrating Static and Dynamic Analysis for Detecting Vulnerabilities,” Vulnerabilities, Proceedings of the 30th Annual International Computer Software and Application Conference, July 2006, pp. 343-350. [6] S. Yong and S. Horwitz, “Protecting C Programs from Attacks via Invalid Pointer Dereferences,” In ACM SIGSOFT Software Engineering Notes, Volume 28, Issue 5, September 2003, pp. 307-316. [7] W. W Halfond, Halfond A. A Orso, Orso and P. P Manolios, Manolios “Using Using positive tainting and syntax syntax-aware aware evaluation to counter SQL injection attacks,” Proc. of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Portland, Oregon, November 2006, pp. 175-185. [8] M. Ringenburg and D. Grossman, “Preventing Format-string Attacks via Automatic and Efficient Dynamic Checking,” Proc. of 12th Conference on Computer and Communications Security, Nov 2005 Al 2005, Alexandria, d i pp pp. 354 354-363. 363 [9] G. Lucca, A. Fasolino, M. Mastoianni, and P. Tramontana, “Identifying Cross Site Scripting Vulnerabilities in Web Applications,” Proc. of the 6th Intl. Workshop on Web Site Evolution, Chicago, September 2004, pp. 71-80.

© Shahriar and Zulkernine, 2011

2011, ACM SAC-116

Conclusions l l l Program security vulnerabilities exist in reall world Each mitigation technique is important and offers unique advantages  The classification feature indicates that techniques can addresses a subset of vulnerabilities

© Shahriar and Zulkernine, 2011

2011, ACM SAC-117

Suggest Documents