Email: {philipp.zech,michael.felderer,basel.katt,ruth.breu}@uibk.ac.at. AbstractâSecurity .... functional security testing by fully automated test case derivation and ...
2014 Eighth International Conference on Software Security and Reliability
Security Test Generation by Answer Set Programming Philipp Zech, Student Member, IEEE, Michael Felderer, Basel Katt, and Ruth Breu Institute of Computer Science University of Innsbruck Tyrol, Austria Email: {philipp.zech,michael.felderer,basel.katt,ruth.breu}@uibk.ac.at Most of the security flaws in current applications are due to already well–known implementation bugs. A search for the year 2012 in the Open Source Vulnerability Database (OSVDB) [28] shows a total of 9762 newly reported vulnerabilities. Although this number is critical already, the vulnerabilities constituting it are even more serious. More than half of these are Cross–site scripting (2303; 23.59%), SQL injection (908; 9.3%), denial–of–service (879; 9.0%), code execution (663; 6.79%) and overflow errors (632; 6.47%). The problem with this data is that we basically know about those vulnerabilities for more than decades, yet we still see modern systems break down due to poorly and improper performed security testing. A common issue for this is, at the most, a lack of expert–level security knowledge, when security testing is done.
Abstract—Security testing still is a hard task, especially if focusing on non–functional security testing. The two main reasons behind this are, first, at the most a lack of the necessary knowledge required for security testing; second, managing the almost infinite amount of negative test cases, which result from potential security risks. To the best of our knowledge, the issue of the automatic incorporation of security expert knowledge, e.g., known vulnerabilities, exploits and attacks, in the process of security testing is not well considered in the literature. Furthermore, well–known “de facto” security testing approaches, like fuzzing or penetration testing, lack systematic procedures regarding the order of execution of test cases, which renders security testing a cumbersome task. Hence, in this paper we propose a new method for generating negative security tests by logic programming, which applies a risk analysis to establish a set of negative requirements for later test generation.
To fulfill this need, we designed and implemented a novel method for non–functional security testing. Our method allows to generate efficient negative security test cases. Based on formalized security knowledge and a declarative system model of a SUT we perform a risk analysis by answer set programming (ASP) [12], [22]. ASP is a form of logic programming founding on the stable model semantics [14] for not necessarily stratifiable logic programs. Using ASP for our risk analysis is motivated by that ASP programs, in principle, always terminate [22]. The application of our risk analysis results in a risk profile for the SUT. We next use this risk profile to generate executable negative security test cases in Scala, more specifically, scalatest [33]. We use scalatest due to its behavior–driven development style, thus not only useful for unit testing but also for higher test levels, i.e., integration, system and acceptance testing, as well as for security testing. The test cases then are ready to be executed against the SUT. We demonstrate the feasibility and efficiency of our proposed method in a case study using a web application written in PHP.
Keywords—Security Testing, Test Generation, Software Testing, Security Engineering, Logic Programming, Knowledge Representation, Answer Set Programming
I.
I NTRODUCTION
In security testing one can distinguish between functional, i.e., positive, and non–functional, i.e., negative, security testing [31]. Whereas functional security testing aims at verifying any positive, i.e., stakeholder–defined requirements, non–functional security testing aims at revealing any hidden security bugs. Ideally, non–functional security testing thereby relies on potential security risks, i.e., negative requirements, for the system under test (SUT). Whereas for functional security testing well–established techniques exist [25], in doing non– functional security testing, two problematic issues arise, viz. 1) 2)
how to integrate the necessary expert–level security knowledge efficiently into a testing process to establish valid test cases, and how to manage the almost infinite amount of negative security test cases.
One of the key features of our method is the successful integration of expert knowledge and automation of the risk analysis process. The application of logic programming, and thus, knowledge representation, allows both, to make security testing feasible for lays and, at the same time, generating efficient negative security test cases by searching the space of negative test cases in an automated manner.
The application of logic programming and knowledge representation is promising to bundle the work of security experts. Moreover, to address the problem of how to manage the almost infinite amount of negative test cases, optimally, non– functional security testing is combined with risks [2], [13], [34]. Ideally, these risk stem from a risk analysis. Such a risk analysis then also can be implemented by logic programming, i.e., by using rules to apply and match the previously formalized knowledge onto a system model to derive, and later, generate negative security test cases. 978-1-4799-4296-1/14 $31.00 © 2014 IEEE DOI 10.1109/SERE.2014.22
A. Contributions This paper contributes as follows: C1
88
A novel method to generate negative security test cases by logic programming, which successfully decreases
•
the level of expertise, necessary to perform successful non–functional security testing. On the other side, our method increases the level of automation for non– functional security testing by fully automated test case derivation and generation based on logic programming and knowledge representation. The application of knowledge representation immediately yields in an extensible, deductive database for a risk analysis, e.g., a vulnerability knowledge base. As a partial result, this also yields in an logic–based security language for formalizing vulnerability knowledge and risks.
Contradictory to SLDNF resolution, the stable model semantics can deal with non–stratifiable programs. The underlying intuition is to treat negated atoms, which are a source of contradiction or instability in a special way [12]. By applying the Gelfond–Lifschitz (GL) reduct [14] on a normal, grounded program Π for any set of atoms M , we retrieve another normal, yet negation–free, grounded program ΠM by the following two steps:
C2
A risk analysis to derive a risk profile, based on ASP and thus, capable to deal with incomplete knowledge.
1)
C3
Automatic generation of executable negative security test cases from a risk profile in Scala. We also present a case study showing the feasibility of our proposed method.
2)
The remainder of our paper is structured as follows. Section II provides the theoretical foundations of our approach. Section III then introduces our method for generating security test cases by answer set programming (ASP). Section IV presents initial results for our test generation method, discusses related work and positions our method w.r.t. it. Finally, Section V concludes and presents future work.
For example, consider program (2) which, after grounding, results in the following program Π p(1), q(1) ← p(1), q(1) ← ¬r(1), r(1) ← q(1).
F OUNDATIONS
In this section, we introduce the necessary theoretical and practical foundations of our proposed method. We first discuss the stable model semantics [14], the underlying logic used by ASP, followed by introducing the necessary extensions towards ASP, leading to the introduction of ASP. Finally, we sketch Disjunctive Datalog [11] and the DLV System [21], a language and a tool for ASP.
p(1), q(1) ← p(1), q(1), r(1) ← q(1).
The stable model semantics, as introduced by Gelfond and Lifschitz [14], are a declarative semantics for normal logic programs with negation–as–failure (i.e., weak negation). Programs then are sets of rules in the form (1)
During inference, stable model semantics allow two modes, viz. brave and cautious reasoning. An atom a is a brave consequence of Π, i.e., Π |=b a, iff a is true in all stable models of Π. On the other side, under cautious reasoning, i.e., Π |=c a, a is a cautious consequence of Π, iff a is true in at least one stable model of Π. Both, |=b and |=c , are non– monotonic, as a change to the set of rules may invalidate any prior conclusion [12].
(2)
For an empty body, i.e., a rule only consisting of its head, it is called a fact. Using classic SLDNF resolution [20], program (2) cannot be solved, as it is not stratified, i.e., there exists no consistent assignment S of natural numbers to predicates, satisfying •
(4)
The minimum Herbrand model of ΠM is {p(1), q(1), r(1)}, which clearly does not coincide with M , hence, M is not stable. On the other side, assuming M = {p(1), q(1), r(1)} results in the same minimum Herbrand model, {p(1), q(1), r(1)}, yet, this time it coincides with M , hence M is a stable model.
where A is an atom and L1 , . . . , Lm are literals (atoms or negated atoms), i.e., p(1), q(x) ← p(x), q(x) ← ¬r(x), r(x) ← q(x).
(3)
Assuming M = {p(1)}, results, after applying the GL reduction in the program ΠM
A. Stable Model Semantics
A ← L1 , . . . , Lm
remove all rules from Π which have a negative literal ¬B in its body with B ∈ M , and remove all negative literals in the remaining rules.
The answer of the resulting negation–free program ΠM then can be retrieved by calculating its unique minimal Herbrand Π ⊂ HB . Here, Hm denotes the minimal Hermodel, i.e., Hm brand model, a consistent subset (i.e., containing no refutable goal) of the Herbrand Base HB . The Herbrand Base HB contains the set of all possible ground goals, which can be represented by some logic program (e.g., ΠM ). If the resulting model coincides with M , then M is called a stable model of Π.
B. Paper Organization
II.
S(P ) > S(Q), iff P is the head of a rule and is derived from some negated predicate Q.
B. Answer Set Programming ASP is a form of declarative programming, based in the stable model semantics, which makes three extension to normal logic programs. Before focusing on ASP, we briefly discuss these extensions.
S(P ) ≥ S(Q), iff P is the head of a rule and derived from a predicate Q (i.e., Q occurs in the body of the rule), or
89
a) Integrity Constraints: Constraints are headless rules, i.e., only consisting of a body. The idea is to define a so– called kill–clause, which eliminates all models that satisfy the constraint. Put another way, constraints reduce the potential set of stable models by killing invalid candidate models, resulting from solving a program Π.
declarative syntax. Consider the following grounded, normal logic program Πs module(mymodule), operation(mymodule, myoperation1, [a, b], int), parameter(myoperation, a, int), parameter(myoperation, b, int), operation(mymodule, myoperation2, [x], bool), parameter(myoperation, x, char).
b) Strong Negation: Strong (or true) negation is an extension to normal logic programs. Contrary to negation– as–failure, which indicates something, that cannot be proved (derived) and thus is assumed to be false, the usage of strong (or true) negation allows to prove, that something is false. Knowing that something is false (strong negation) is clearly different from just assuming it to be false (negation–as– failure).
(5)
Besides only using facts, what is more interesting in program Πs is its represented knowledge. Its facts, e.g., module/1 or operation/4 are declarative counterparts of well–known UML model elements (e.g., Class or Operation). Due to a model’s set–based nature, this type of declarative modeling is easily enabled by just defining the relevant predicates (e.g., module/1 and the necessary constants (e.g., mymodule, and the like), which constitute the knowledge of the real– world to be formalized, the domain of discourse. Program Πs thus describes a small software system with one module, e.g., mymodule and two operations, viz. myoperation1 and myoperation2. Each of the operations has associated input parameters, viz. a and b for myoperation1, and x for myoperation2, and return types, e.g., int for myoperation1. Further, each parameter also has a type, e.g., int for a of operation myoperation1. Such a dedicated predicate for declaring a parameter is favorable, as it “transforms” a parameter from being a plain constant to something we can reason about.
c) Disjunction: Disjunction is another extension to normal logic programs with disjunctions in rule heads. The resulting, extended logic programs, thus are able to deal with indefinite knowledge. The usage of disjunctive information enables the notion of a guess in the resulting extended logic programs, thus, introducing non–determinism. In case of disjunctive rules, the semantics are, that one of the alternatives is concluded to be true (minimality principle) [12]. One of the main advantages of ASP is its search algorithm. Classic SLDNF resolution (as, e.g., used by Prolog) always selects the leftmost goal, and thus, can enter an endless loop, depending on the order of rules. ASP, on the other side, uses enhancements of the Davis–Puttnam–Logemann– Loveland (DPLL) algorithm [8], [9], which is based on the quantification method, i.e., to bind some variable which ranges over a domain of discourse (i.e., the Herbrand universe HU of a program Π). Contrary to SLDNF resolution, generally, such algorithms terminate, despite the order of rules [22]. Such a termination property is favorable if focusing on automating a process, i.e., a risk analysis for later security test generation.
In the Herbrand theory, Πs describes a subset of the Herbrand base HB , i.e., Πs ⊂ HB , which is computed on the basis of Πs ’s Herbrand universe HU . In the stable model Π ⊂ HB , is semantics, a minimal Herbrand model, i.e., Hm computed as a solution for any normal logic program Π (see Section II). As this computation is done on the Herbrand base HB , our intuition of a declarative system model utterly fits into the notion of the stable model semantics, and thus, ASP.
In the remainder of this paper, we use the DLV system [21], a deductive database system by disjunctive logic programming, for implementing our method by ASP. We decided to use DLV as its free for academic use, works in stand-alone mode, and is under active development (thus, stable). Disjunctive Datalog is the language used by DLV for representing logic programs. Contrary to the syntax of normal logic programs, programs written in Disjunctive Datalog, i.e., disjunctive logic programs, support the extension to normal logic programs, necessary for ASP.
Due to its obvious foundations, we will skip any further discussion of our intuition of declarative system modeling and move on by introducing our method in the next Section. There, we also give further examples of declarative system models. III.
S ECURITY T EST G ENERATION BY A NSWER S ET P ROGRAMMING
This section is devoted to presenting our proposed security test generation method. After introducing necessary terminology, we first sketch the main ideas to then continue with an in–depth discussion of our method.
C. Declarative System Modeling Generally, a software model is considered a formal abstraction of some real–world entity or process. In this sense, a model can be defined as a finite, enumerable set of facts with different properties, which formalize some real–world knowledge (e.g., on some entity or process).
Terminology: In the following, we refer to a declarative system model as the security problem. The resulting stable model from our risk analysis by ASP further on is called the risk profile. The risk profile is a set of risks, whereas a risk is defined as an entity, which encapsulates information regarding a certain threat to the SUT. For a generated negative security test case in scalatest, in the following, we use the term (negative) test case (where negative may be omitted).
As introduced in Section II, logic programs consist of sets of rules in the form of (1). Supporting both, rules and facts (i.e., rules with empty bodies), the semantics of logic programs naturally support both, dynamic and static modeling of software (i.e., entity and process modeling). Put another way, using semantics of logic programs allows to define the domain of discourse, i.e., describing a software model with a
Overview: Figure 1 shows a schematic overview of our method. To generate negative security test cases, our method 90
security testing techniques. For example, listings 1 and 2 show sample fragments of vulnerable PHP and C code.
Security Problem
1 2 3 4
Extensional Database
5 6
Solver
7
Risk Profile
if($_GET["act"] == "get_db_entry") { mysql_connect($DBHOST, $DBUSER, $DBPASS); mysql_select_db($DBNAME); $query = "SELECT * FROM ‘test‘ WHERE ‘id‘ = ’". $_GET["var"] ."’;"; $result = mysql_query($query); //... } Listing 1.
Intensional Database 1 2 3 4
Test Case Generator
5
void process(char* data) { char* store[10]; strcpy(data,store); //... } Listing 2.
Proposed Security Test Generation Method
The C code fragment from Listing 2 shows a classic vulnerability, a call to strcpy, without checking the actual size of data. This introduces a buffer overflow vulnerability, which can result in serious damage, if exploited. Although many modern, managed languages avoid such vulnerabilities, there still exist scenarios, where buffer overflows are possible, e.g., if performing system calls.
accepts a security problem as input and works in two steps as specified below: S1
S2
Fragments of Vulnerable C Code
The PHP fragment from Listing 1 is vulnerable to SQL injection attack. This is possible due to missing input validation of $var. Depending on what get_db_entry does, malicious input (e.g., 1’ OR ’1’=’1 for attempting an SQL injection) could lead to serious damage.
Test Cases
Fig. 1.
Fragments of Vulnerable PHP Code
First, a risk analysis computes a risk profile from the security problem. This risk analysis is based on ASP and requires three components, viz.: 1) The extensional database (EDB) stores formalized security knowledge for the risk analysis by normal logic programs, as introduced in Section II-A. 2) The intensional database (IDB) contains extended logic programs as introduced in Section II-B, which describe matching rules. These matching rules are used to identify potential security flaws in the security problem. 3) An (ASP) solver finally computes the risk profile by matching the rules of the IDB against the security problem using the knowledge of the EDB. This results in the risk profile, which describes potential security risks to the SUT. These concepts are discussed in Sections III-A and III-B.
A. Security Problem The security problem is a reduced, normal logic program Π. It is a set of grounded rules in the form of A←.
(6)
where A is an atom and ← . denotes an empty body. An atom consists of a predicate symbol applied to the appropriate number of ground terms, i.e., terms containing no free variables. Such rules in future are called facts, for brevity, ← is omitted. The security problem comprises a declarative model of a SUT subjected to non-functional security testing, thus also its name, as it describes a security problem by a declarative system model. Put another way, the modeled SUT may contain security flaws, thus it represents a security problem. We define four predicates for describing a system model of a SUT using declarative semantics, viz.
As a second step, our method uses the just generated risk profile to generate executable, negative security test cases. This is done by the test case generator which implements a model–2–text transformation by parsing the risk profile and generating executable test code in the Scala programming language. The test case generator is discussed in Section III-C.
We designed our method to detect hidden security vulnerabilities in software, which are not covered by classic, positive
91
•
module/2 is a predicate, which takes two terms, first, the module’s name and second, a flag indicating whether the SUT is local or remote (by method calling). It declares a software component.
•
uri/2 is a predicate to declare a remote module’s URI. The first term is the module’s name and the second some URI enclosed in double quotes. For of a local module, this predicate can be omitted.
•
operation/4 is a predicate, which takes as terms the name of the containing module, the name of
goals, its attack pattern, and the like. The following provides a list of the predicates, we use for formalizing security related knowledge:
the operation, a list of parameter names and the return type. It declares module owned, user–accessible (controllable) operations. •
parameter/3 is a predicate, which takes as a first term the name of an operation (which declares the parameter), the parameter name and its type. It declares operation parameters and their types, which carry (potentially malicious) user input. This predicate is necessary to assign parameters some meaning instead of only declaring some symbol.
Using these definitions allows to model software as the domain of discourse using a purely declarative syntax. Considering get_db_entry from Listing 1 as some user controllable function by its input, it can be modeled using the above predicates, resulting in program ΠP HP =1 module(dbutils, r), uri(dbutils, ”http : //jarvis/index.php”), operation(dbutils, get db entry, [var], any), parameter(get db entry, var, text).
(7)
As a security problem, program (7) allows to, provided that the necessary knowledge and matching rules are given, reason on potential exploits (see Section III-B). This is due to the fact that if testing user–accessible functions, their parameters define the attack vectors, the means by which a malicious agent attempts to exploit a system. As another example, consider again the vulnerable C code fragment from Listing 2. Using our predicates results in the following program ΠC = module(utils, l), operation(utils, process, [data], void), parameter(process, data, char pointer).
•
exploit/1 is a predicate, which declares an exploit and requires as the only term the exploit’s name (i.e., a constant).
•
attack/4 is a predicate, which declares a manifestation of an exploit. It requires four terms, the base exploit, the concrete attack manifestation (e.g., bypassing filters or evading the signature in case of SQL injection), an attack pattern and a list of potential attack goals (e.g., denial of service or information disclosure).
•
vul_type/2 is a predicate to declare some type, the declared exploit commonly employs (e.g., string for SQL injection). It requires two terms, first the exploit’s name and second the type’s name.
•
data/3 is a predicate to declare malicious data. It requires three terms, first the exploit’s name, second, an attack manifestation and third, some data string2 .
•
type/2 is a general, non–exploit specific predicate. As part of the EDB its purpose is to declare globally known types (e.g., type(string,primitive)). Its first term is a type name and the second a constant, indicating whether it is a complex (i.e., user–defined) or primitive (i.e., built–in) type.
•
attack_pattern/5 is the core predicate of a declared exploit. It requires five terms—a unique id; the exploit’s name; a module name; an operation name; an intrusion point (e.g., a parameter to be employed in the exploit). attack_pattern/5 describes an attack vector which is searched for during the risk analysis, i.e., the risk analysis tries to match the attack_pattern/5 predicate with operations of the security problem.
(8)
B. Risk Analysis Due to its declarative nature, our risk analysis is done by pattern matching. By attack patterns, a method for finding potential security holes, we identify the set of potentially flawed operations for which we then compute a risk profile to further generate test cases from this risk profile.
From those predicates, attack_pattern/5 is the only one which requires a body, the other ones are simple facts. In this sense, it employs existing knowledge of the EDB, yet is not intended to complete (i.e., can be satisfied) without rules from the IDB and further knowledge from the security problem (attack_pattern/5 misses necessary knowledge from the security problem; joining of those two portions of knowledge happens during risk profile calculation).
In our following discussion of the risk analysis and its predicates, we follow an API–like style. Hence, we do not give usage examples of our predicates, but rather keep our discussion theoretic by focusing on syntax and semantics of our logic programs. Section IV shows their usage.
2) Intensional Database: The IDB contains the rules, we use to match knowledge from the EDB against the security problem to compute the risk profile. More detailed, the rules of the IDB first establish a set of potential threats to the system which are then assessed, resulting in the risk profile. We implemented the following predicates for this:
1) Extensional Database: The EDB contains security related knowledge, i.e., it stores formal descriptions of attacks and their attack patterns by logic programs. Such a formal description, among others, contains knowledge on different manifestations of an exploit by different attacks, its potential
•
1 The necessary information for the module name and the URL are not contained in Listing 1, simply due to that they are not part of the source code; the module name is just the name of the PHP file implementing the operation and the URL is known by the modeler, but generally not available from source code. The type for parameter var is retrieved from the input form declaration for the get_db_entry action (again, not shown in Listing 1 due to space restrictions). The same, i.e., regarding the module name, applies for program P iC .
blacklist/4 is a predicate, which requires four terms—a module name; an operation name; the name of the exploit; the matched attack vector.
2 At this time we only support types, built into logic programming systems, i.e., numerals and strings. As future work we plan to support user–defined types (see Section V).
92
blacklist/4 is used as a rule head, its body contains the predicates, necessary to identify a potentially flawed operation, viz. module/1, operation/4 and attack_pattern/5. •
threat/6 is a predicate to declare an identified threat. It requires six terms—an exploit; the exploit’s manifestation; the matched attack vector; an attack goal; a module name; an operation name. Again, this predicate is a rule head, it is used to collect all available information on a potential threat. Its body contains the predicates blacklist/4 and attack/4.
Contrary to the EDB, the rules from the IDB all have a body. This is due to that their purpose is to infer new knowledge based on the security problem and the knowledge from the EDB. The main goal of these rules to match is threat/6. We use its collected information to compute the risk profile for later test case generation. Risk Profile Computation The computation of the risk profile is based on the successful matches of the threat/6 predicate. The main task is to generate, for each identified threat ti , a risk ri and do its assessment. This is done by calculating a complexity factor c for an operation oti (p1 : t1 , . . . , pn : tn ) of threat ti 3 . The complexity factor c then reflects the complexity of an operation oi by its signature and is denoted c(oi ). The more complex the set of input parameter types, the higher is the overall complexity c. The complexity a of type ti considers whether a type is primitive or complex (i.e., has an internal structure) and is denoted c(ti ). The dependence factor between t1 , . . . , tn is denoted by d. The overall complexity c of an operation o with the operation signature o(p1 : t1 , . . . , pn : tn ) then is the sum of the input parameter type complexities c(ti ) and the dependence factor d(t1 , . . . , tn ), i.e., c(o) =
n
•
dcomp/3 is a helper predicate used by comp/3 for calculating the complexity of an operation. It requires three terms, an operation complexity c(o) and a type complexity c(ti ) and computes the new operation complexity. dcomp/3 uses a table similar to Table I for computation.
•
risk_level/3 is a predicate to calculate the overall risk level of a risk. It requires three terms, the computed impact (first) and probability (second) and third, the resulting risk level.
•
risk/7 is a predicate to declare a risk and the main goal to match during risk assessment. It requires seven terms, where the first six ones are the same as for threat/6, and the last a triplet with the results of the risk assessment. Thus, risk/7 results in a valued threat.
(9)
Program (9) starts by checking whether an operation O has none or only one parameter. If this applies, rules one and two match, and program (9) has found a solution. If an operation O instead has a list of parameters greater than one, comp/3 descends recursively (rule 3), until reaching the empty list. When returning, in each step, comp/3 uses the currently computed parameter complexity C1, which results from compt/2 and the intermediary operation complexity C2 to compute the new complexity C of operation O using dcomp/3. After terminating with a solution, comp/3 returns the operation complexity c(o) in its last term.
We defined the following predicates for this:
compt/2 is a helper predicate to compute a type’s complexity. At this time, it assigns primitive types low complexity and complex ones medium complexity, i.e., compt(T, low) ← type(T, primitive).
probability/3 is a predicate, which calculates the probability for a risk to become reality. It requires three terms, the complexity c, the computed impact and the resulting probability.
comp(O, [], C) ← operation( , O, [], ), C = low, comp(O, [P ], C) ← operation( , O, [P ], ), parameter(O, P, T ), compt(T, C), comp(O, [P |R], C) ← operation( , O, [P |T ], ), parameter(O, P, T ), compt(T, C1), dcomp(C1, C2, C), comp(O, R, C2).
i=1
•
•
Program (9) shows a simplified version of the complexity calculation (we have skipped the dependence factor d due to only supporting built–in types at this time; also, program (9) returns some quantified complexity c, e.g.: low, medium or high, instead of some numeric value).
Based on the complexity c, we next calculate impact, probability and overall risk level for each generated risk in the risk profile. We motivate this complexity factor as it directly relates to the attack vector, which, in most attacks, is based on the signature of operations, i.e., their list of parameters.
comp/3 is a predicate which calculates the complexity c of an operation (see above). For this, it requires three terms, an operation name, its parameter list and the resulting complexity.
impact/2 is a predicate to calculate the impact of a risk. It requires two terms, an operation’s complexity c and the resulting impact.
For assessing a risk, probability/3 and risk_level/3 use Table I.
c(ti ) + d(t1 , . . . , tn )
•
•
Probability
TABLE I.
such a computation is done once, when operation oti is encountered first. Using memorization, allows to skip further calculations. 3 Usually
93
LOW MEDIUM HIGH
Impact LOW
MEDIUM
HIGH
LOW LOW HIGH
LOW MEDIUM HIGH
MEDIUM HIGH HIGH
L OOK – UP TABLE FOR R ISK A SSESSMENT
3) Risk Profile: The risk profile results as the answer from our risk analysis. It does not define any custom predicates, but instead, being again some normal logic program, reuses one predicate of the risk analysis, viz. risk/7. Program (10) shows a risk as it would result from running the risk analysis on the security problem from program (7), risk(sql injection, evasion, var, [authentication, exception, leakage, tampering], dbutils, get db entry, [high, high, high). threat(sql injection, evasion, var, [authentication, exception, leakage, tampering], dbutils, get db entry).
1 2 3 4
5 6 7 8 9
(10)
test("TC1") { try { val var1 = "’ OR ’1’ = ’1" val result = invoke("dbutils"," get_db_entry",List(var1),R) assert(result != null) } catch { case e : Exception => println(e) } }
Listing 3.
One rather difficult problem we ran into during test generation was the automated generation of a valid test oracle. The rationale behind this is that for non-functional security testing, we cannot rely on a specification for oracle generation, as we go “beyond” the specification. Put another way, we do not show anymore that a system behaves according to a specification but that it violates the specification somehow. However, prior to executing the test, we do not know how a system violates the specification. Thus, we cannot check against a certain outcome value, but rather monitor, whether the system produces output or not, thus the assert(result != null) statement in Line 5 in Listing 3, which monitors, if the system returns with some output or null, i.e., it crashes.
(11)
Program (10) describes the potential risk of an SQL injection on the get_db_entry operation by signature evasion, potentially leading to information disclosure. Program (11) shows the corresponding threat. We skip any further discussion of the risk profile, as its predicates were already discussed earlier. Instead, we move on to the last part of this section, namely Test Generation.
IV.
C. Test Generation
First, the risk profile is parsed to generate in–memory representations (ASTs) of the test cases. This also includes querying the EDB for test data which then is also generated into the corresponding test cases.
S2
Second, using serialization techniques, we generate executable negative security test cases in Scala using scalatest.
P ROOF – OF –C ONCEPT AND D ISCUSSION
After discussing the theoretical aspects of our method, in this Section, we present a proof–of–concept by a tool implementation of our introduced non–functional security testing method4 . As a development and test environment we chose IntelliJ IDEA [19] due to its good support for Scala. As mentioned earlier, we use DLV [21] as an underlying knowledge representation system and for implementing our risk analysis. The rationale for this decision, besides DLV’s ease of use and support for running in stand–alone mode (i.e., no installation is necessary), is its termination property. We skip any further discussion of implementation specifics of our tool, as this would go beyond the scope of our paper. Our main intention now is to show our method’s feasibility and effectiveness.
As stated earlier, we generate test cases in scalatest. Thus, naturally, we also use Scala to generate those test cases. However, the main rationale behind this decision are Scala’s parser combinators [27]. Parser combinators emerged from the domain of functional programming and support writing a language parser in a declarative manner. Additionally, Scala supports the generation of an in–memory AST (Abstract Syntax Tree) of the parsed text. For us, this means, that test generation is done in two steps: S1
Test Case as it results from Program (10)
A. Results We have performed first experiments with our tool. As a SUT we used a web application written in PHP, which contains the vulnerable code fragment from Listing 1. The SUT was designed and implemented by a project–external researcher, who is experienced in PHP and modeling, but not in security issues.
Using this procedure, i.e., the just mentioned steps for generating test cases, results in executable test cases in the form of the test case shown in Listing 3. Listing 3 shows a test case as generated for the risk from program (10). It declares some malicious data (var1) which then is passed to the operation in question (get_db_entry) by invoking it. The invocation is done by the invoke function, a custom function using reflection for dynamically invoking the proper operation on a SUT. As parameters it requires the name of a module and an operation, the list of parameter values, and a flag, indicating whether this is a remote (R) or local call (L).
The application communicates with a MySQL database server and thus, is vulnerable for SQL injection attacks. Hence, to test for SQL injection using our method, we formalized the notion of an SQL injection attack in the EDB, which resulted in program (12). It contains the relevant predicates for declaring attacks, as introduced in Section III-B. First, we define the exploit using exploit/1, followed by the concrete attack manifestations (attack/4), which can lead to the exploit. Using data/3 we declare malicious data to be used by
The number of generated test cases equals the number of potential goals of a risk, i.e., we unfold the list of goals and generate a test case for each extracted goal.
4 Our tool is available for download at https://github.com/lokalmatador/CCS. git. IntelliJ IDEA is required to run it.
94
an attack. Finally, after declaring types, vulnerable to SQL injection, using vul_type/2, we declare the attack pattern used to search for SQL injection by attack_pattern/5. Informally, what attack_pattern/5 states is to look for any operation, which has parameters, whose types are among the vulnerable types affected by SQL injection. This is motivated by the fact that SQL injection requires user controllable operation parameters of primitive types like text or password (at least in case of a PHP application; this would change if we were about to test some Java based web application which, e.g., uses String or Integer for declaring parameter types. For this, changes in the EDB regarding attack_pattern/5 and vul_type/2 would be necessary). exploit(sql attack), attack(sql attack, f ilter bypass, sqlap, [authentication, exception, leakage, tampering]), attack(sql attack, type handling, sqlap, [authentication, exception, leakage, tampering]), attack(sql attack, signature evasion, sqlap, [authentication, exception, leakage, tampering]), attack(sql attack, blind, sqlap, [authentication, exception, leakage, tampering]), data(sql attack, signature evasion, ” OR 1 = 1”), data(sql attack, f ilter bypass, ”0x27204f 5220273127203d202731”), data(sql attack, type handling, ”user AN D 1 = 2”), data(sql attack, blind, ”1; SLEEP (10); − − ”), vul type(sql attack, text), vul type(sql attack, password), attack pattern(sqlap, sql attack, M, O, IP ) ← module(M, ), operation(M, O, , ), parameter(O, IP, T ), vul type(sql attack, T ).
Risk (exploit/manifestation)
Test Case (data/goal)
Outcome
R1 {SQLI/filter bypass}
TC1 {0x27204f. . . /authentication} TC2 {0x27204f. . . /exception} TC3 {0x27204f. . . /leakage} TC4 {0x27204f. . . /tampering}
Success Fail Success Success
R2 {SQLI/type handling}
TC5 {user’ AND ’1’ = ’2/authentication} TC6 {user’ AND ’1’ = ’2/exception} TC7 {user’ AND ’1’ = ’2/leakage} TC8 {user’ AND ’1’ = ’2/tamepring}
Success Fail Success Success
R3 {SQLI/signature evasion}
TC9 {’OR ’1’ = ’1/authentication} TC10 {’OR ’1’ = ’1/exception} TC11 {’OR ’1’ = ’1/leakage} TC12 {’OR ’1’ = ’1/tampering}
Success Fail Success success
R4 {SQLI/blind}
TC13 {1;SLEEP(1=);–/authentication} TC14 {1;SLEEP(1=);–/exception} TC15 {1;SLEEP(1=);–/leakage} TC16 {1;SLEEP(1=);–/tampering}
Success Fail Success Success
TABLE II.
R ESULTS OF A T EST RUN AGAINST THE PHP A PPLICATION AS DESCRIBED UNDER
this means, that it really consumes input of any kind without validating or parsing for correctness. The circumstance that no exception can be triggered is interesting. It either allows to deduce, that the SUT hides error messages, to not expose any system internals, which actually can be considered a security measure. However, on the other side, one can also deduce, that the SUT indeed, is vulnerable, as it just consumes any input without checking or sanitation. The number of successful test cases (75%) however shows, that the SUT is vulnerable to SQL injection attacks of different types. Thus, the results from Table II clearly endorse the feasibility and effectiveness of our method by detecting existing, hidden security bugs.
(12)
In our case study, we successfully demonstrated the application of knowledge representation for harnessing expert–level security knowledge during an automated risk analysis by logic programming for then establishing a set of valid test cases for non-functional security testing. Further, due to our risk assessment by comp/3 (and its related predicates, i.e., impact/2, probability/3, and risk_level/3), we not only value the identified threats in the system by their severity, but also allow for later prioritization of resulting test cases by those very assessment values. To the best of our knowledge, there does not exist any approaches in non-functional security testing, which use knowledge representation and logic programming for risk analysis and subsequent security test generation, and further, allow to prioritize resulting test cases, as we have shown in this Section. In our following discussion of realted work, we confirm this statement.
Using program (12) together with the rules of the IDB and for risk computation from Section III-B resulted in sixteen test cases for testing against SQL injection when using program (7) as input. Next, the test cases were executed against the SUT (the earlier mentioned PHP application) during online testing. From these sixteen test cases, twelve successfully revealed the SUT’s vulnerability to SQL injection attacks. Table II shows our full results.
B. Positioning in Respect to Related Work In testing, logic programming has been mainly applied for two purposes, test data generation and test case generation. For test data generation, constraint solving techniques [16] together with either symbolic execution [24] or feasible path analysis [18] have been applied. At this time, our method only supports generation of test data by relying on the data/3 predicate, i.e., declared test data. Thus, our work currently falls out of this category.
The generated test cases vary by attack manifestation, test data and attempted goal. The data from Table II clearly shows the effectiveness of our approach to detect and reveal hidden vulnerabilities (if they exist). The fact that four tests failed (TC2, TC6, TC10 and TC14) is also interesting. For the SUT,
For test case generation, current approaches using logic programming build on constraint solving techniques. In this sense, various approaches exist. Ranga et al. [32] present a 95
method for generating design tests using path enumeration and constraint programming for VHDL programs. Using annotated control–flow graphs, paths are selected for which next, constraints corresponding to the statements along the path, are generated. Solving the constraints yields in design tests specifications. Denney [10] suggests a method for generating test cases from Prolog based specifications. A custom meta interpreter allows to monitor and control execution of programs using specified paths. This enables the generation of tests for specification–based testing. Bieker and Marwedel [4] investigated retargetable self–test program generation for embedded processors. Their method works by matching test patterns on to a hardware description of a processor. In this manner, using constraint logic programming, their method thus generates executable test cases by self-test programs. G´omez-Zamalloa et al. [15] suggest to use constraint logic programming as a symbolic execution mechanism to generate test cases for object–oriented programs. Using a declarative notation of the input program their method generates test cases according to some given coverage criterion (e.g., path or statement coverage). L¨oetzebeyer and Pretschner [23] introduce an approach for testing executable system specifications (system models) of reactive systems. Their method translates such system models into constraint logic programs, which then are executed w.r.t. some predefined constraint to produce meaningful test sequences for specification–based testing. The work of Caballero et al. [6], in contrast to the above, focuses on a specific language, viz. SQL. Given a database schema and an SQL, their method generates a set of domain constraints, which, when solved, represent test database instances. These database instances allow to verify the correctness of, e.g., correlated SQL queries.
In our paper we have presented a novel method for security testing building upon logic programming. Using a preliminary risk analysis for establishing a set of negative requirements (i.e., our deduced risks), it generates a set of executable test cases for non–functional security testing. By ASP and knowledge representation, we have designed our method to leverage the skills necessary for successful security testing. At the same time, our approach automates security testing, especially the tedious task of negative security testing, to a high degree. Additionally, by computing valued threats, i.e., risks, our method allows to address the problem of managing the almost infinite amount of negative test cases. Using the results of comp/3 allows to prioritize the resulting test cases and thus, improve the efficiency of the testing process by, i.e., fault detection rate or coverage.
This discussion of related work reveals, that so far, logic programming has not been harnessed for security testing or security test generation. What has been done so far is applying logic programming for security protocol analysis and verification [1], [7], [17]. This circumstance clearly indicates the relevance of our work.
We have performed first experiments to deliver a proof of concept for our method, which we successfully did. The retrieved results show that our method, if provided with the necessary knowledge, successfully detects existing security bugs in software system, which result from unstated, negative requirements.
In the face of existing security testing techniques, our approach falls into the category of negative testing techniques, i.e., to focus on negative requirements specifying hidden security vulnerabilities. Whereas for its counterpart, functional (or positive) security testing well–established techniques exist [25], for negative security testing, well–established techniques are scarce. In practice, negative security testing simulates attacks as performed by hackers, which is called penetration testing. The main goal in this kind of testing is to compromise the security of a system [3], [5]. Another approach to negative security testing is fuzzing [26], [30], which initially was designed for testing protocol implementations on possible security flaws due to improper input handling. One of the main disadvantages of those techniques is their lack for supporting a systematic procedure when testing regarding the order of execution of test cases. Thus, negative testing is a tedious and time consuming task. A solution to this lack of a systematic procedure is to incorporate risks for testing [2], [34]. Risk– based testing considers risks to solve design, selection and prioritization of test cases [2]. Stallbaum et al. [29] propose RiteDAP, a tool to automatically generate system tests from activity diagrams which considers risks for test cases prioritization. They also investigated different prioritization strategies
Future improvements of our method focus, on the one side, on the integration of support for custom, user–defined types and thus, also reimplementing and improving comp/3. On the other side, such declarative data type definitions allow to improve test data generation by, e.g., using definite–clause grammars (DCG) for textual test data synthesis. DCGs so far have mainly been used for language processing, however, we intend to use them for language synthesis, i.e., using rules to generate malicious input data or concrete data type instances.
w.r.t. the resulting fault detection rate. Our discussion on negative testing again clearly indicates the relevance of our work. Current techniques lack support for a systematic procedure, which is of a high relevance in security testing. We successfully address this problem by computing valued threats, which allow to prioritize resulting test cases. Table III summarizes our discussion. Security Testing
Risk Integration
Knowledge Integration
Test Case Generation
Ordered Execution
Our Work RiteDAP Penetration Testing
x
x x
x
x x
x x
TABLE III.
C OMPARISON OF OUR W ORK WITH SIMILAR , W ORK
Approach
x
V.
EXISTING
C ONCLUSION
For of our tool, we intend to extend its knowledge base (i.e., the EDB) and, as part of improving our method, also improve the IDB and the actual test case generation. We plan, to extend the knowledge in two directions, first, by adding support for more exploits and corresponding attacks, and second, by supporting further systems types, e.g., object– oriented or mobile applications. As part of this improvement process, we continue to do experiments to show the efficiency of our method by an empirical case study. ACKNOWLEDGMENTS This research was partially funded by the research projects MBOSTECO (FWF P 26194-N15), and QE LaB—Living 96
Models for Open Systems (FFG 822740).
[17]
R EFERENCES [1]
[2] [3] [4]
[5] [6]
[7]
[8] [9] [10] [11] [12]
[13]
[14]
[15]
[16]
[18]
M. Alberti, F. Chesani, M. Gavanelli, E. Lamma, P. Mello, and P. Torroni. Security protocols verification in abductive logic programming: a case study. In Engineering Societies in the Agents World VI, pages 106–124. Springer, 2006. S. Amland. Risk-based testing: Risk analysis fundamentals and metrics for software testing including a financial application case study, 2000. B. Arkin, S. Stender, and G. McGraw. Software penetration testing. IEEE Security & Privacy, 3(1):84–87, 2005. U. Bieker and P. Marwedel. Retargetable self-test program generation using constraint logic programming. In Design Automation, 1995. DAC’95. 32nd Conference on, pages 605–611. IEEE, 1995. M. Bishop. About penetration testing. IEEE Security & Privacy, 5(6):84–87, 2007. R. Caballero, Y. Garc´ıa-Ruiz, and F. S´aenz-P´erez. Applying constraint logic programming to sql test case generation. In Functional and Logic Programming, pages 191–206. Springer, 2010. L. Carlucci Aiello and F. Massacci. Verifying security protocols as planning in logic programming. ACM Transactions on Computational Logic (TOCL), 2(4):542–580, 2001. M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communications of the ACM, 5(7):394–397, 1962. M. Davis and H. Putnam. A computing procedure for quantification theory. Journal of the ACM (JACM), 7(3):201–215, 1960. R. Denney. Test-case generation from prolog-based specifications. Software, IEEE, 8(2):49–57, 1991. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive datalog. ACM Transactions on Database Systems (TODS), 22(3):364–418, 1997. T. Eiter, G. Ianni, and T. Krennwallner. Answer set programming: A primer. In Reasoning Web. Semantic Technologies for Information Systems, pages 40–110. Springer, 2009. M. Felderer, B. Agreiter, P. Zech, and R. Breu. A classification for model-based security testing. In The Third International Conference on Advances in System Testing and Validation Lifecycle(VALID 2011), pages 109–114, 2011. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proceedings of the 5th International Conference on Logic programming, volume 161, 1988. M. G´omez-Zamalloa, E. Albert, and G. Puebla. Test case generation for object-oriented imperative languages in clp. Theory and Practice of Logic Programming, 10(4-6):659–674, 2010. A. Gotlieb, B. Botella, and M. Rueher. Automatic test data generation using constraint solving techniques. In ACM SIGSOFT Software Engineering Notes, volume 23, pages 53–62. ACM, 1998.
[19] [20] [21]
[22] [23] [24]
[25] [26] [27] [28] [29]
[30] [31]
[32]
[33] [34]
97
J. Y. Halpern and R. Pucella. Modeling adversaries in a logic for security protocol analysis. In Formal Aspects of Security, pages 115– 132. Springer, 2003. R. Jasper, M. Brennan, K. Williamson, B. Currier, and D. Zimmerman. Test data generation and feasible path analysis. In Proceedings of the 1994 ACM SIGSOFT international symposium on Software testing and analysis, pages 95–107. ACM, 1994. Jetbrains Inc. IntelliJ IDEA, April 2013. R. Kowalski. Predicate logic as programming language. Edinburgh University, 1973. N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and F. Scarcello. The dlv system for knowledge representation and reasoning. ACM Transactions on Computational Logic (TOCL), 7(3):499–562, 2006. V. Lifschitz. What is answer set programming. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1594–1597, 2008. H. L¨otzbeyer, A. Pretschner, and E. Pretschner. Testing concurrent reactive systems with constraint logic programming. 2000. C. Meudec. ATGen: automatic test data generation using constraint logic programming and symbolic execution. Software Testing, Verification and Reliability, 11(2):81–96, 2001. C. Michael and W. Radosevich. Risk-based and functional security testing. Build Security In, 2005. B. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Communications of the ACM, 33(12):32–44, 1990. A. Moors, F. Piessens, and M. Odersky. Parser combinators in scala. CW Reports, 2008. Open Security Foundation (OSF). Open Source Vulnerability Database (OSVDB), April 2013. H. Stallbaum, A. Metzger, and K. Pohl. An automated technique for risk-based test case generation and prioritization. In Proceedings of the 3rd international workshop on Automation of software test, pages 67–70. ACM, 2008. A. Takanen, J. DeMott, and C. Miller. Fuzzing for software security testing and quality assurance. Artech House on Demand, 2008. G. Tian-yang, S. Yin-sheng, and F. You-yuan. Research on software security testing. World Academy of Science, Engineering and Technology Issure, 69:647–651, 2010. R. Vemuri and R. Kalyanaraman. Generation of design verification tests from behavioral vhdl programs using path enumeration and constraint programming. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 3(2):201–214, 1995. B. Venners, G. Berger, and C. S. e. a. Chuah. scalatest, April 2013. C. Wysopal, L. Nelson, E. Dustin, and D. Dai Zovi. The Art of Software Security Testing: Identifying Software Security Flaws. Addison-Wesley Professional, 2006.