Dec 3, 2013 - Keywords [Security testing, test case generation, security oracle, smartphone ..... A security oracle to judge if the system under analysis passes.
PhD Dissertation
International Doctorate School in Information and Communication Technologies
DISI - University of Trento Security Testing of Web and Smartphone Applications Andrea Avancini Advisor: PhD. Paolo Tonella Fondazione Bruno Kessler - FBK Co-Advisor: PhD. Mariano Ceccato Fondazione Bruno Kessler - FBK 3 December 2013
Abstract
Web applications have become integral part of everyday life, as they are used by a huge number of customers on regular basis, for daily operations in business, leisure, government or academia, and so correctness of these applications is fundamental. In particular, security is a crucial concern especially for these applications that are constantly exposed to potentially malicious environments. Cross-site scripting (XSS for short) is considered one of the major threats to the security of web applications. Missing input validation can be exploited by attackers to inject malicious code into the application under attack. Static analysis supports manual security review in mitigating the impact of XSS-related issues, by suggesting a set of potential problems, expressed in terms of candidate vulnerabilities. A security problem spotted by static analysis, however, only consists of a list of (possibly complicated) conditions that should be satisfied to concretely exploit a vulnerability. Static analysis does not provide examples of what input values must be used to make the application execute the sometimes complex execution path that causes a XSS vulnerability. Executable test cases, on the contrary, consist of a runnable and reproducible evidence of the vulnerability mechanics. Then, test cases represent a valuable support for developers who
4
should concretely understand security problems in detail before fixing them. The urge for reliable and secure web applications motivates the development of automatic, inexpensive, thus effective security testing methods, whose aim is to verify the presence of security-related defects. Security tests consist of two major parts, input values that need to be generated to run the application in the hope of exposing the vulnerabilities, and the decision if the obtained output actually exposes the vulnerabilities, the latter is known as the “oracle”. However, current approaches to either generate security tests and to define security oracles have limitations. To address the shortcomings of approaches for input value generation for security, this dissertation proposes a structured approach, inspired by software testing, based on the combination of genetic algorithms and concrete symbolic execution. This combined strategy is compared with genetic algorithms and with concrete symbolic execution in their atomic forms, in terms of coverage and efficiency on four case study web applications, showing to be effective for security testing. In fact, genetic algorithms resulted to be able to generate input values only for few and simple vulnerabilities when not combined with other approaches. However, their contribution is fundamental to improve the coverage of those input values generated by concrete symbolic execution. The dissertation also explores the possibility to define oracle components that can be integrated with input generation strategies to perform security testing of web applications, so to expose securityrelated faults. A security oracle can be seen as a classifier able to
5
detect when a vulnerability is exploited by a test case, i.e. verifying if a test case is an instance of a successful attack. This dissertation presents two distinct approaches to define security oracles, either (1) by applying tree kernel methods, and (2) by resorting to a model of the application under analysis when run in harmless situations. In the former approach, the classifier is trained on a set of test cases containing both safe executions and successful attacks, in the aim of learning important structural properties of web pages. In the latter, the learning phase is devoted to analyze web pages generated only in safe conditions, in order to build a “safe” model of their syntactic structure. Then, in the actual testing phase, both oracles are used to classify new output pages either as “safe tests” or as “successful attacks”. Furthermore, the dissertation moves few steps onto the world of applications for smartphone, in the attempt of breaking the barriers of our research and bringing the lesson learned from the experience in the domain of web applications towards a new domain. To motivate our work, we noticed that an important reason behind the popularity of smartphones and tablets is the huge amount of available applications to download, to expand functionalities of the devices with brand new features. Official stores provide a plethora of applications developed by third parties, for entertainment and business, mostly for free. Again, security represents a fundamental requirement: for example, confidential data (e.g., phone contacts, global GPS position, banking data and emails) might be disclosed by vulnerable applications and so, sensitive applications should carefully be tested to avoid security problems. The dissertation proposes a novel approach to perform
6
security testing with respect to the communication among applications on mobile devices with the objective of spotting errors in the routines that validate incoming messages.
Keywords [Security testing, test case generation, security oracle, smartphone applications]
Contents 1 Introduction 1.1
1
Input Value Generation for Security Testing . . . . . .
4
1.1.1
Genetic Algorithms for Input Value Generation
5
1.1.2
Concrete Symbolic Execution for Input Value Generation
. . . . . . . . . . . . . . . . . . . .
5
The Security Oracle . . . . . . . . . . . . . . . . . . . .
7
1.2.1
Oracle Based on Tree Kernels . . . . . . . . . .
8
1.2.2
Oracle Based on Safe Model . . . . . . . . . . .
8
1.3
Security Testing Smartphone Applications . . . . . . .
10
1.4
Structure of the Thesis . . . . . . . . . . . . . . . . . .
12
1.2
2 Problem Definition
15
2.1
Cross-site Scripting Vulnerabilities
. . . . . . . . . . .
15
2.2
Running Example . . . . . . . . . . . . . . . . . . . . .
16
2.3
Taint Analysis . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
The Input Generation Problem . . . . . . . . . . . . .
27
2.5
The Oracle Problem . . . . . . . . . . . . . . . . . . .
28
3 Input Value Generation for Security Testing 3.1
31
Input Value Generation with Genetic Algorithms . . .
34
3.1.1
35
Chromosomes . . . . . . . . . . . . . . . . . . .
ii
CONTENTS
3.2
3.3
3.1.2
Fitness Function . . . . . . . . . . . . . . . . .
36
3.1.3
Selection . . . . . . . . . . . . . . . . . . . . . .
37
3.1.4
Crossover . . . . . . . . . . . . . . . . . . . . .
38
3.1.5
Mutation Operators
. . . . . . . . . . . . . . .
38
3.1.6
Generation of Random Values . . . . . . . . . .
40
Input Value Generation with Concrete Symbolic Execution . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.2.1
Symbolic Values . . . . . . . . . . . . . . . . . .
42
3.2.2
Symbolic Conditions . . . . . . . . . . . . . . .
45
3.2.3
Constraint Selection . . . . . . . . . . . . . . .
48
3.2.4
Solver Integration . . . . . . . . . . . . . . . . .
49
Tool Support . . . . . . . . . . . . . . . . . . . . . . .
52
4 The Security Oracle
55
4.1
Motivating Example . . . . . . . . . . . . . . . . . . .
56
4.2
The Security Oracle Based on Tree Kernels . . . . . . .
59
4.2.1
Tree Kernels . . . . . . . . . . . . . . . . . . . .
61
4.2.2
Oracle Construction . . . . . . . . . . . . . . .
62
4.2.3
Static Analysis and Coverage Criterion . . . . .
64
4.2.4
Learning: Test Case Generation . . . . . . . . .
64
4.2.5
Learning: Attack Generation . . . . . . . . . . .
65
4.2.6
Learning: Manual Filtering
. . . . . . . . . . .
65
4.2.7
Learning and Classification . . . . . . . . . . . .
66
Security Oracle Based on the Safe Model . . . . . . . .
67
4.3.1
Test Case Perturbation . . . . . . . . . . . . . .
68
4.3.2
Parse Tree Abstraction . . . . . . . . . . . . . .
71
4.3.3
Abstraction Merge . . . . . . . . . . . . . . . .
74
4.3.4
Oracle Decision Procedure . . . . . . . . . . . .
77
4.3
CONTENTS
iii
5 Security Testing Smartphone Applications 5.1
5.2
5.3
5.4
Background . . . . . . . . . . . . . . . . . . . . . . . .
80
5.1.1
Android Design . . . . . . . . . . . . . . . . . .
80
5.1.2
Communication among Applications . . . . . .
81
5.1.3
Intent Filters . . . . . . . . . . . . . . . . . . .
83
5.1.4
Taint Analysis . . . . . . . . . . . . . . . . . . .
84
Threat Model . . . . . . . . . . . . . . . . . . . . . . .
86
5.2.1
Attack Scenario . . . . . . . . . . . . . . . . . .
86
5.2.2
Testing Conditions . . . . . . . . . . . . . . . .
87
Test Case Generation . . . . . . . . . . . . . . . . . . .
89
5.3.1
Adequacy Criterion . . . . . . . . . . . . . . . .
89
5.3.2
Filter Violation . . . . . . . . . . . . . . . . . .
89
5.3.3
Trace Analysis . . . . . . . . . . . . . . . . . .
91
5.3.4
Dynamic Taint Analysis . . . . . . . . . . . . .
93
Tool Support . . . . . . . . . . . . . . . . . . . . . . .
98
5.4.1
Static Analysis . . . . . . . . . . . . . . . . . .
98
5.4.2
Trace Analysis . . . . . . . . . . . . . . . . . .
99
5.4.3
Dynamic Taint Analysis . . . . . . . . . . . . . 100
5.4.4
Test Case Generation . . . . . . . . . . . . . . . 100
6 Experimental Results 6.1
6.2
79
103
Case Study Applications . . . . . . . . . . . . . . . . . 103 6.1.1
Web Applications . . . . . . . . . . . . . . . . . 103
6.1.2
Android Applications . . . . . . . . . . . . . . . 104
Input Value Generation . . . . . . . . . . . . . . . . . . 106 6.2.1
Experimental Settings . . . . . . . . . . . . . . 106
6.2.2
RQ1 : What Is the Role of Configuration Parameters in the Genetic Algorithm? . . . . . . . . . 110
iv
CONTENTS
6.3
6.4
6.5
6.2.3
RQ2 : How Effective Are the Different Strategies in Generating Security Test Cases? . . . . . . . 111
6.2.4
RQ3 : How Fast Are the Different Strategies in Generating Security Test Cases? . . . . . . . . . 115
6.2.5
Discussion . . . . . . . . . . . . . . . . . . . . . 120
Security Oracle Based on Tree Kernels . . . . . . . . . 125 6.3.1
Experimental Settings . . . . . . . . . . . . . . 126
6.3.2
Results on a PHP Prototype . . . . . . . . . . . 126
6.3.3
Empirical Assessment on a Real Application . . 129
Security Oracle Based on the Safe Model . . . . . . . . 133 6.4.1
Assessment Test Cases . . . . . . . . . . . . . . 133
6.4.2
Empirical Results . . . . . . . . . . . . . . . . . 135
6.4.3
Discussion . . . . . . . . . . . . . . . . . . . . . 138
Security Testing for Smartphone Applications . . . . . 140 6.5.1
Testing Results . . . . . . . . . . . . . . . . . . 140
6.5.2
Results on AnkiDroid . . . . . . . . . . . . . . . 141
6.5.3
Results on Jamendo . . . . . . . . . . . . . . . 143
6.5.4
Results on OpenSudoku . . . . . . . . . . . . . 144
7 Related Works
147
7.1
Security Test Case Generation . . . . . . . . . . . . . . 147
7.2
Security Oracle . . . . . . . . . . . . . . . . . . . . . . 150
7.3
Security Testing Smartphone Applications . . . . . . . 153
8 Conclusion
157
8.1
Summary of the Research . . . . . . . . . . . . . . . . 158
8.2
Future Works . . . . . . . . . . . . . . . . . . . . . . . 161 8.2.1
Security Input Value Generation . . . . . . . . . 161
CONTENTS
8.2.2 8.2.3 Bibliography
v
Security Oracles . . . . . . . . . . . . . . . . . . 162 Security Testing Smartphone Applications . . . 162 163
List of Tables 2.1
Taint analysis result for the running example, vulnerabilities are $a@10 and $b@12. . . . . . . . . . . . . . .
25
3.1
Symbolic conditions collected during execution. . . . .
48
4.1
Rule for merging pattern attributes of tags.
. . . . . .
76
5.1
Examples of execution traces for OpenSudoku. . . . . .
92
6.1
Case study web applications. . . . . . . . . . . . . . . . 104
6.2
Android case study applications. . . . . . . . . . . . . . 105
6.3
ANOVA of Fitness by Vulnerability, Mutation probability and Crossover probability. . . . . . . . . . . . . . 111
6.4
Sanity check of Fitness . . . . . . . . . . . . . . . . . . 113
6.5
Comparison of fitness (Wilcoxon test). . . . . . . . . . 114
6.6
Descriptive statistics of the time to generate test cases that cover vulnerabilities. . . . . . . . . . . . . . . . . . 117
6.7
ANOVA of Time by Strategy. . . . . . . . . . . . . . . 117
6.8
Comparison of time (Mann-Whitney test) . . . . . . . 118
6.9
ANOVA of Productivity by Strategy. . . . . . . . . . . 120
6.10 Comparison of Productivity (Mann-Whitney Test) . . . 120 6.11 Experimental results for mock-up application. . . . . . 128
viii
LIST OF TABLES
6.12 Test cases automatically generated for Yapig. . . . . . 129 6.13 Experimental results for Yapig. . . . . . . . . . . . . . 131 6.14 Empirical data on Yapig. . . . . . . . . . . . . . . . . . 136 6.15 Empirical data on PhpPlanner. . . . . . . . . . . . . . 136 6.16 Empirical results on the case studies. . . . . . . . . . . 142
List of Figures 2.1
Running example of a XSS vulnerability on PHP code.
17
2.2
Another example of XSS vulnerability on PHP code. .
23
2.3
Control flow graph for the example in Figure 2.2. . . .
24
3.1
Running example for input value generation. . . . . . .
32
3.2
Genetic algorithm for path sensitization
. . . . . . . .
35
3.3
Example of open-point crossover . . . . . . . . . . . . .
38
3.4
Two examples of mutation of parameter value. . . . . .
39
3.5
Examples of mutation by new parameter insert. . . . .
40
3.6
Examples of mutation by parameter remove. . . . . . .
40
3.7
Example of retrieving the symbolic value of a program variable. . . . . . . . . . . . . . . . . . . . . . . . . . .
43
Two examples of retrieving the symbolic value of an expression. . . . . . . . . . . . . . . . . . . . . . . . . .
43
Example of instrumentation to trace symbolic values. .
44
3.10 Example of instrumentation to collect symbolic constraints. . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.1
57
3.8 3.9
Running example of a XSS vulnerability on PHP code.
x
LIST OF FIGURES
4.2
Parse trees of output pages for the running example. Trees (a) and (b) represent safe executions. Tree (c) represents an injection attack. . . . . . . . . . . . . . .
60
4.3 4.4 4.5 4.6
Overview of the process to construct the security oracle. 69 Compact list pattern. . . . . . . . . . . . . . . . . . . . 71 Compact list with same children pattern. . . . . . . . . 72 Abstraction transformation on the running example:
4.7 4.8
(a) HTML output, (b) remove text, (c) compact (d) compact list with same children. . . . . . . . . Merge equal tags rule. . . . . . . . . . . . . . . . Merge different tags rule. . . . . . . . . . . . . . .
list, . . . . . . . . .
73 74 75
4.9 Compute difference on child tags rule. . . . . . . . . . . 4.10 Example of combination of two parse tree reductions (a) and (b) into the safe model (c). . . . . . . . . . . .
76 78
5.1 5.2 5.3
Example of Android manifest file. . . . . . . . . . . . . Scenario of security threat. . . . . . . . . . . . . . . . . Example of AspectJ code to adapt tainting. . . . . . .
80 87 95
5.4 5.5
Example of AspectJ code to adapt a sink. . . . . . . . Overview of the tool implementation. . . . . . . . . . .
97 98
6.1 6.2 6.3
Boxplot of Coverage . . . . . . . . . . . . . . . . . . . 112 Boxplot of Time (only covered cases). . . . . . . . . . . 116 Boxplot of Productivity (only covered cases). . . . . . . 119
“Space, the final frontier. These are the voyages of the starship Enterprise. Her five-year mission: to explore strange new worlds, to seek out new life and new civilizations, to boldly go where no one has gone before.” [James T. Kirk]
Chapter 1 Introduction Internet is having a huge expansion in offering dedicated services to more and more customers, especially from the commercial point of view. The continuous growth of the number of users changes the focus from the Internet used for sharing knowledge to an Internet of (possibly commercial) services. The innovation rate of the techniques used to implement commercial Internet services is very high, but this fact often introduces some vulnerability problems related, for example, to the continuity of the services (i.e a Denial of Service (DoS) attack against a web server) or related to the privacy of the customer data (i.e. user data stolen from an attacked web site). These problems may generate a significant revenue loss for the organizations involved. Carefully testing applications, however, is expensive and requires great effort. In fact, according to [Bei90], testing represents the 30 − 90% of the labor cost, and evaluating results when tests are large and voluminous can easily lead to errors. Furthermore, web applications introduce additional challenges because of their specificity, the extremely short time-to-market, the distribution of the
2
Introduction
components, and the high accessibility (24 hours per 7 days) that must be granted to customers [Off02]. An established practice to identify vulnerabilities in the source code of web applications is to conduct manual code review [NS04], a task that is usually performed by experienced professionals that possess the required skills. Static analysis techniques [JKK06] are used to support such task by identifying candidate vulnerabilities as starting points for expert review. In certain cases, however, such analysis returns false alarms. False alarms are due to an intrinsic limitation of static analysis which is, by definition, over-conservative. The use of static analysis can also produce false negative results, where concrete, existent vulnerabilities are not reported by the analysis. This might occur when, for example, static analysis can not cope with external application’s components. Moreover, static analysis produces only a list of candidate vulnerabilities, without generating real test cases that can be executed, hence helping programmers to better understand how such vulnerabilities can be exploited. Test cases would give the developers a chance to understand and fix problems in a faster and more structured way, and they can also be re-executed in batch after major changes or maintenance. However, manual definition of tests cases may be a time consuming task, so automation is highly desirable. Then, the problem of finding test inputs for the application under analysis can be modeled as a search problem, where the goal of the search is to find valid inputs that make the application trap into a vulnerability. The search space, however, is too wide to be explored exhaustively, or it involves complex (or even undecidable) constraints to solve. User input data
3
are often combinations of symbols, letters and digits (like user ids and passwords) and so, under these conditions, generating test cases by exploring the entire search space is not feasible. Automated input value generation strategies are needed, strategies that can be concretely applied in production to test web applications in terms of their security. Furthermore, the generation strategy has to be coupled with a so called security oracle [KGJE09], an automatic decision process to validate the generated test cases and to check if real vulnerabilities are concretely exposed. The problem of defining a security oracle for web application is open, and approaches presented in the literature still suffer some limitations. Ideally, a security oracle should be able to automatically decide whether a test case represents a safe execution or a successful attack, so to be tight with a specific vulnerability class, but at the same time independent from input generation strategies and technologies used to implement applications. The dissertation contributes to this endeavor by presenting novel approaches to security testing, that can be summarized as: • Automatic generation of test inputs for security with solutions based on genetic algorithms and concrete symbolic execution; • Definition of a security oracle based on tree kernels; • Definition of a security oracle based on the safe model; • Automatic security testing smartphone (Android) applications; • Empirical assessments of the proposed techniques.
4
1.1
Introduction
Input Value Generation for Security Testing
Attempting to tackle with the problem of how to automatically generate a set of security test cases for web applications has represented our starting point for a possible contribution in the domain of web applications. We focused our attention on one of the most common, damaging and widespread class of vulnerabilities, Cross-site scripting (XSS). XSS vulnerabilities [OWA] are caused by improper or missing validation of the input data that come from external sources, like the user or a database, input data which may contain pieces of malicious code that could be injected in the content of a response web page. When the user’s browser executes the injected code, sensitive data may be disclosed to the attackers or the user session may be hijacked. In this scenario, a suite of security test cases is required to properly test web applications, to show how potentially dangerous inputs can reach security sensitive statements, avoiding input validation. An automatic input generation strategy is required to find those values that make the execution of the application traverse the particular sequence of statements that makes a vulnerability manifest. The search space of possible inputs, however, is too wide to be explored in an exhaustive way, or it may involve hard (or even undecidable) constraints to solve. Under these conditions, generating security inputs by exploring the entire search space is not feasible. Suitable search strategies, like heuristics, are required to perform exploration in a more effective way.
1.1 Input Value Generation for Security Testing
1.1.1
5
Genetic Algorithms for Input Value Generation
Our first attempt to develop an effective technique for generating security tests for web applications was presented in [AC10], in which static taint analysis is combined with a heuristic, the genetic algorithms [Mcm04], in the aim of improving the testing coverage and exposing XSS vulnerabilities. In the proposed approach, static taint analysis is used to identify families of vulnerable paths (target paths) in the application control flow, by detecting when data coming from the external environment (i.e. tainted sources, inputs provided by the users) are not properly validated before being used in sensitive statements, like print statements. Then, a genetic engine is run to generate security test cases that make the execution flow traverse those target paths. Even if the generated test cases do not represents actual attacks, they can demonstrate how validation and sanitization statements may be bypassed by an attacker to inject malicious code. These test cases should represent the starting point for understanding and patching security issues. The approach was validated on a case study application, showing encouraging preliminary results that gave the push to continue researching on the problem. 1.1.2
Concrete Symbolic Execution for Input Value Generation
As genetic algorithms are optimization heuristics, they are not ensured to converge to the solution because, for example, the optimization could end in a local optimum, a point of the search space from which the algorithm is not able to improve further. Then, in the aim of
6
Introduction
defining more and more effective test case generation strategies, the genetic algorithm was coupled with a strategy based on a constraint solver, inspired by concrete symbolic execution, to possibly escape from local optima. Concrete symbolic execution [SMA05] consists of tracing symbolic values of input variables while the program is executed concretely. This allows boolean expressions to be interpreted on decision points (e.g., in if statements) as conditions on symbolic inputs. A constraint solver shall be used to reason on symbolic values and to elaborate new input values that satisfy conditions not satisfied by previous executions, so as to synthesize new test cases that cover parts of the application not yet covered by old tests. Capitalize on concrete execution presents two main advantages. At first, it can limit the propagation of symbolic conditions only to paths that are concretely executed, thus avoiding exponential explosion of the symbolic state to propagate on symbolic execution. At second, it is still possible to resort to concrete values when conditions involving operators not supported by solvers need to be handled. Concrete values, for instance, can be used to simplify formulas by replacing a subset of the symbolic values with the corresponding concrete values available at run-time. Then, concrete symbolic execution can be applied in combination with genetic algorithms, to exploit the advantages of the two techniques, while, at the same time, attempting to mitigate their limitations. This approach was proposed in [AC11], and later extensively experimentally assessed in [AC13c]. In the latter work, a wider experi-
1.2 The Security Oracle
7
mentation was performed, in terms of comparison among various input generation techniques like pure random testing, genetic algorithms, concrete symbolic execution and their combinations.
1.2
The Security Oracle
After having conducted experiments on input value generation for security testing, it was clear that a fundamental ingredient of the security testing process was still missing, an oracle able to determine whether a generated security test case triggers a safe execution or produces the manifestation of a concrete attack. A security oracle to judge if the system under analysis passes security test cases is an overlooked problem. Exploring the literature, it has been initially defined as a manual task [TBM+ 05], while automatic processes have been proposed later. The oracle, however, has always been very specific to the approach adopted for generating test cases. Such oracles are closely tied to the corresponding generation algorithm, and they can be hardly used out of their test case generation context. For example, oracles proposed in the literature [KGJE09] often consist in checking if a web page contains the same JavaScript fragment that was used to generate the corresponding test case. However, the construction of a security oracle is a problem in itself, and a reusable security oracle should not depend too much on how test cases have been generated, but just on the specific class of vulnerability to tackle. In this dissertation, the problem of constructing a security oracle for XSS vulnerabilities of web applications was addressed in two ways, by applying a binary classifier [CST00] to distinguish between safe or
8
Introduction
malicious executions, and by defining a safe model [AC12a] of harmless executions of the web page that is currently under analysis. 1.2.1
Oracle Based on Tree Kernels
A web page can be represented by the parse tree of the corresponding HTML code, thus any code injection corresponds to a change in the structure of the parse tree with respect to the intended one, i.e. the structure of the page as is in normal conditions. In general, a security oracle should distinguish between those page variations that are safe, due to the dynamic behavior of the application under test, and those variations caused by code injection due to successful attacks. A possible classifier shall be trained with instances of parse trees taken from both the classes, attacks and safe executions. Under these assumptions, the security oracle problem can be formulated as a binary classification problem, that can be addressed by relying on kernel methods. In particular, parse trees are the structures to manage, so kernel methods that fit this problem definition better are tree kernels. We proposed an oracle ( [AC13d], [AC12b]) implementation that relies on tree kernels to classify HTML parse trees either as safe executions or as successful attacks. Promising results were obtained, with optimal recall (at least for a subset of the kernel methods adopted) and fair precision. 1.2.2
Oracle Based on Safe Model
When attack examples are not available for properly training the binary classifier, different approaches shall be taken into account. The solution proposed in [AC12a] and later in [AC13b] is based on
1.2 The Security Oracle
9
the same assumption that a successful code injection attack, i.e. the malicious content that has been injected in the page, can be recognized as a structural change in the web page. However, the approach builds the safe model of those structural properties that characterize the web pages that are generated by an application when it only runs under harmless conditions. This model is obtained by merging together, according several merging rules, a set of parse trees that represents the output pages generated by the application under test when running with safe inputs, i.e. inputs with no malicious code injected. Before being merged, the parse trees are also abstracted not only to model those peculiarities of a page that are common across different executions, but also to remove those syntactic elements that are too tied to a specific test case. When the model is available, it is used to assess whether a new test is a successful attack. The parse tree obtained by executing the new test is checked against the safe model. If the parse tree and the safe model result to be equal, it means that the oracle can not detect any structural difference among the new execution and any safe execution, so the test is classified as pass, meaning that no attack has taken place. A difference between the test and the model, instead, represents a structural change with respect to safe executions. In the latter case, the oracle detects a potential code injection, and it classifies the test as non-pass, i.e. an attack. For the empirical validation, precision and recall scored by the oracle were measured, in order to check oracle’s performances in classifying attacks and safe tests for two case study applications written in PHP.
10
Introduction
1.3
Security Testing Smartphone Applications
As illustrated by market studies [Kah10], tablets and smartphones meet bigger and bigger success, replacing desktops and laptops for many daily activities. Android is recognized as the world’s leading smartphone platform (59% of the market share in 2012 [Sig12]) with 400 millions of Android devices activated [Bar12] at the beginning of the third quarter of 2012, with a rate of 1 million of new activations per day [Bur12]. The increasing success of smartphones and tablets embroiled software producers in a race to be the first to publish newer and content-rich applications. Being the first application available in official application stores guarantees higher visibility, bigger market share and, eventually, may impact the life-and-death struggle among companies. Given the fast time-to-market model of mobile applications, most of the effort is devoted to develop new features and just short time is spent on the assessment of code quality. While crashes and faults are detrimental for the user experience, subtle bugs that involve security features are dangerous for the security and the confidentiality of user data. In fact, malicious applications started to spread, attempting to steal sensitive information and to threat the user privacy [Jun12] (e.g. passwords, GPS locations, contacts, browser history) or trying to directly attack applications that involve paying services [She11] or that contain private data [EGDD11]. The underlying operating system faces the challenge to keep the system secure and so a very restrictive isolation is imposed among applications. When accessing sensitive data sources, policies are accurately checked and enforced.
1.3 Security Testing Smartphone Applications
11
Despite the mechanisms deployed to enforce security, poorly designed applications may still contain defects that may expose vulnerabilities. Inter-application messages are a potential source of attacks to vulnerable code. An aggressor application, in fact, may rely on faulty validation of input data to take control of benign applications. Software testing represents a valuable strategy to identify program faults, but security testing the specificity of Android application is still preliminary [MEK+ 12]. Although some tools are available to test mobile application graphical user interface [HN11, AFT+ 12, AFT11], approaches to identify security problems in mobile applications are less structured [MS12, EGgC+ 10, FWM+ 11]. They just produce, in fact, reports on candidate vulnerabilities and/or descriptions of errors. Even if these contributions move to the direction of security review, they are still too far from the peculiar development model of mobile applications. Security review should be integrated in the development environment and security faults should be easy to detect and fast to replicate. Test cases represent a way to solve the contrasting tension between high security and fast development. To achieve this objective, we elaborated a novel approach [AC13a] to test Android applications with respect to inter-application communication. Routines devoted to validate input values coming from inter-application communication messages are intensively tested. In particular, we defined (i) an adequacy criterion for testing data validation routines, (ii) a test case generation strategy and (iii) an oracle to validate test cases. The technique has shown to be able
12
Introduction
to discover three previously unknown security-related bugs that have been confirmed by the developers of the subject applications.
1.4
Structure of the Thesis
The dissertation is structured as follows.
Problem Definition
Chapter 2 presents the problem of automatically
generating security tests for web applications, in terms of issues and challenges. Cross-site scripting vulnerabilities are presented, as one of the most relevant class of security flaws in the domain of web applications. Limitations of static analysis, one of the most used solution for detecting vulnerabilities in web applications, must be overcame in order to offer developers effective approaches.
Chapter 3 describes two strategies for generating security inputs for web applications. Approaches based on genetic algorithms and concrete symbolic execution Input Value Generation for Security Testing
are presented in details in terms of their design and implementation.
Chapter 4 presents the challenges of automatically determining if a security test case for a web application passes or not. The dissertation proposes two solutions for implementing a security oracle able to detect code injection. The first solution provides The Security Oracle
a binary classifier built on top of a set of kernel methods, while the second is based on the concept of safe model.
1.4 Structure of the Thesis
13
Chapter 5 introduces the topic of security testing for Android applications. The field was chosen to broaden the boundaries of our research and for its affinities with Security Testing Smartphone Applications
the domain of web applications. The Chapter describes the problem of generating and validating security tests for Android applications, it defines a threat model and proposes a possible solution which has shown to be promising in terms of obtained results. Experimental Results
Chapter 6 presents empirical evaluations of
security testing web and smartphone applications. Input generation strategies are compared, either atomically, and in terms of their combination, by measuring their effectiveness in producing security input values. Oracle strategies are also assessed by looking at their ability to distinguish between those test cases that produce safe executions and those that are real, concrete attacks, able to compromise security. The solution proposed for security testing of smartphone applications is also validated on three publicly available Android applications. Related Works
Chapter 7 describes the relevant literature on test case
generation and on security oracles. A survey of the relevant works in the area of testing smartphone applications closes the Chapter. Chapter 8 summarizes the contributions of this dissertation and describes potential future research directions. Conclusion
14
Introduction
Chapter 2 Problem Definition 2.1
Cross-site Scripting Vulnerabilities
Cross-site scripting vulnerabilities [OWA] (XSS hereafter) are one of the most common and damaging threat in the domain of the web applications. Even if well known, XSS were found in major sites in 2013 ([Kug13], [Gol13]). Three different types of XSS vulnerabilities exist in practice: reflected, stored and DOM-based. In particular, in this dissertation we focus on the most widespread, the reflected variant. Reflected XSS vulnerabilities are caused by improper or missing validation of the input data, which may contain pieces of malicious code that could be injected in the content of a response web page. When the user browser executes injected code, sensitive data may be disclosed to the attackers, or the user session may be hijacked. The mechanics behind the attack are the following: an attacker can craft a special URL that includes malformed inputs (strings containing scripts) for a vulnerable web application that contains some XSS vulnerabilities. Then, the attacker sends the URL to a victim in form,
16
Problem Definition
for example, of a link in an email and makes the victim click on that malicious link. The victim, unaware of the potential risks, clicks the link, passing the malicious inputs to the vulnerable server. The server echoes the inputs back to the browser of the victim, which executes the scripts embedded in the response received by the vulnerable server, causing a security fault, for example by sending the user’s cookie to the attacker. Then, the cookie can be used by the attacker to hijack the user session. In stored XSS vulnerabilities, the malicious code is injected on the server in a persistent way. Malicious code is executed when a user requests a previously attacked resource, such as from the database, a message forum or comment fields. Unlike reflected and stored XSS attacks, DOM-based attacks are executed by modifying the document object model in the browser of the designated victim, for example by forcing the client-side code to execute in an unexpected manner.
2.2
Running Example
Generally speaking, XSS vulnerabilities are caused by the improper validation of input data (e.g., coming from the external world). In case of reflected XSS, input data may contain HTML fragments that could flush to a web page and alter the resulting content such that malicious code is injected. When executed by the user browser, the injected code may disclose sensitive data to third parties. While the dissertation is fully focused on web applications mainly written in PHP, the aforementioned class of vulnerabilities is one of the most widespread and affects applications written in any programming language.
2.2 Running Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14
17
$ u s e r = $ GET [ ” username ” ] ; $ p a s s = $ GET [ ” password ” ] ; $ p a s s 2 = $ GET [ ” password2 ” ] ; i f ( strpos ( $ u s e r , ”