Testing Security Requirements with Non-Experts Approaches and Empirical Investigations Bernhard Peischl
Michael Felderer
Armin Beer
Institute for Software Technology Graz University of Technology, Graz, Austria
[email protected]
Institute for Computer Science University of Innsbruck Innsbruck, Austria
[email protected]
Beer Test Consulting Baden, Austria
[email protected]
Abstract—Security testing has become a critical quality assurance technique to provide a sufficient degree of security. However, it is regarded to be too complex to be performed by system testers, who are non-experts in security. This paper provides two approaches to testing security requirements, one based on a Failure Modes, Vulnerabilities and Effect Analysis (FMVEA) and the other based on misuse cases, both suitable for testers who have domain knowledge but are not security experts. We perform a controlled experiment to empirically compare the two testing approaches based on the quality of the derived test cases. The results of the experiment show that the use of attack patterns in the misuse-case-based approach delivers test cases with a better alignment between requirements and security test cases as well as a higher amount of correct test cases. Keywords—security testing; system testing; requirementsbased testing; attack patterns; FMVEA; misuse cases; traceability; software quality; controlled experiment.
I. INTRODUCTION Requirements-based testing is recognized as the key for aligning business value and risks in industry. For instance, in the banking and insurance domain according to the experience of the involved author from industry up to 70% of the development effort is invested into testing. A quality model which is applied nowadays in this domain is FURPS (Functionality, Usability, Reliability, Performance, Security) [4]. Functionality, reliability and performance are quality issues which are tested in integration and system tests. For example, in the Austrian social insurance institution where one of the authors works, the fulfillment of the following security aspects is mandatory: Implementation of web-security guidelines in software development. For example, the validity of the web service security headers is tested in the integration test phase. Confidentiality: Compliance with legacy on privacy. Architecture: trust boundaries protecting the internet portal. However, security testing is regarded as too expensive and complex to be used in software development on a regular basis. Recently, hackers were able to break authentication and session management with a severe impact, for instance by locking computers [21]. Therefore the risk of access to sensible data by a hacker has to be
mitigated. This trend is taken into account by the ISTQB (International Software Testing Qualifications Board, [22]), that has recently created a syllabus on security testing for the advanced level curriculum. It was released in a Beta version by the ISTQB General Assembly in February 2016 [2]. In general, test engineers perform system and integration tests, are typically skilled in design and execution of functional tests and have good domain knowledge. However, they have no experience in security testing. It is a wellknown fact that good testing goes beyond happy-day scenarios to explore the boundary conditions and exceptions. Considering today’s state of the practice, a number of factors promote extending the system testing activities towards security testing: Successful testing projects often go hand in hand with the skills, intuition and experience of test engineers. In the recent past, many companies have invested into qualification of their testing teams so that these skills are readily available. Nowadays, most companies have already established tool-supported testing and a well-defined body of knowledge on testing (e.g., ISTQB) is in place. The habit of thinking out negative scenarios is arguably an essential skill for the test engineers [7] and testers support the attitude that such tests are important for detecting faults. The security testing techniques proposed herein include the application of systematic functional testing and thus security testing is an extension to the skills of test engineers. There is the need to bridging the gap between software engineers and staff dealing with enterprise security. Due to this reasons, we investigate whether test engineers should extend their scope towards security testing. To address this issue, in this paper we provides two approaches to testing security requirements, one based on a Failure Modes, Vulnerabilities and Effect Analysis (FMVEA) and the other based on misuse cases, and investigate in an empirical study, if (system) testers, which are able to design functional tests systematically and are familiar with the domain, can design security tests. The main objective of this study is to empirically evaluate and compare the use of two methods for the design of security test cases, the method Failure Modes, Vulnerabilities and Effect Analysis
(FMVEA), an approach similar to FMEA (Failure Modes and Effects Analysis) [6] and misuse cases [7]. For this purpose, a controlled student experiment [1] was conducted and replicated at the Graz University of Technology (Austria). The experimental object is a web application actually used to manage course participants. The experiment has two treatments, i.e., the creation of security test cases on the basis of FMVEA using a set of threat modes (i.e., security-relevant failure modes) and on misuse cases by employing attack patterns defined in OWASP (Open Web Application Security Project, [23]). The results obtained from this study should foster the application of security testing in industry. II. BACKGROUND In this section we briefly discuss the relationship between system testing and security testing and the various approaches to security testing. Afterwards we introduce the two procedures that are subject of our evaluation. A. Approaches to Security Testing Security requirements are defined and reviewed as part of the requirements engineering phase in system development. The security testing process steps start with security test planning. The objective is to verify, if unintended vulnerabilities were avoided. Scope, test strategy, schedule, test environment and organization are defined in a security test plan. In the next step, security test design and test cases are created based on the security requirements, a risk analysis and a threat model. Felderer et al. [9] present a survey on security testing including properties, vulnerabilities, and attackers. Attacker models can be seen as models of the environment of a system under test, with the purpose to generate misuse cases. Models of vulnerabilities explicitly encode weaknesses in a system. They can be seen as failures or faults that are used for the generation of test cases. Potter and McGraw [24] distinguish between security functional testing and security vulnerability testing. In requirements-based testing the attackers approach is simulated. Testing security mechanisms can be performed by test organizations with functional test techniques, whereas risk-based security testing requires specific expertise. In a journal of reliability engineering and system safety [10] a comprehensive view of safety and security in industry is given. Safety and security are associated to dependability, which means “the ability to deliver service that can justifiably be trusted”. Safety is an attribute of dependability, with availability, reliability, integrity and maintainability. Security primarily refers to availability, integrity and confidentiality. However assessing a (security) threat is different from assessing a (safety) hazard. In security there is an extremely broad range of possible scenarios (e.g., from randomly generated test cases derived from the user interface [25] to the derivation of systematic test cases from requirements) whereas in safety [19, 11] the characteristics of the hazards are more accessible.
Gorbenko et al. [12] adapted FMEA to security and renamed it IMEA (Intrusion Modes and Effect Analysis). The method is also used to analyze a web service architecture. Schmittner et al. [6] introduces the method Failure Mode, Vulnarabilities and Effect Analysis (FMVEA), where security threat analysis is considered for security criticalsystems such as databases servers and corporate networks. In the FMVEA concept security based failure modes (threat modes) are considered. A threat mode is similar to the failure mode in safety analysis and describes the manner in which the security fails. These threat modes allow one to anticipate potential threats first, to assess the consequences and then to identify potential causes. FMVEA is thus a suitable instrument to perform high level analysis of a system in the early design phase. Großmann et al. [13] introduces a Test Based Security Risk Assessment (TBRA) to improve the security risk analysis with the help of security risk testing. A model-based and risk-driven approach for deriving test cases is used. Threat scenarios from the risk model are mapped to security test patterns for instance SQL injections. The Open Web Application Security Project (OWASP) Testing Guide [8] also takes security risks into account and is applied in JavaWeb projects. With the focus on interconnectedness, the Open Source Security Testing Methodology (OSTMM) [8] is continually in development as we learn more and more about what it means to be secure. In the experiment the FMVEA and misuse cases [11] together with systematic test case design techniques were used to design test cases. De Grood [3] proposes the use of functional test techniques, e.g., syntax testing for security testing. After test case design the security tests are executed. Security tests should be executed in an isolated or virtual environment, to avoid malware corrupting of a server, which is used by other parties. Security test evaluation is performed during test execution. Afterwards a security test report is issued. Security test cases should be also maintained, because security testing targets and threats change frequently. B. Testing Security Requirements with FMVEA For the remainder of this article we denote this procedure by security testing technique with FMVEA (STT-F). Testing requirements with FMVEA is performed to preventively determine the occurrence of threats in hardware- and software systems. It can be applied in an early stage during the development of a system. The engineering team starts by identifying and enumerating threat modes. For each of the threat modes, the probability of occurrence of this threat, the probability to detect the threat and its severity is estimated. The multiplication of these three probabilities results in a risk-priority-number (RPN), that defines a ranking on the list of threats. After having obtained this prioritization of threats, a test engineer continuous by covering each threat in terms of test cases. In refining the threat to a number of executable test cases, the test engineer performs the transition from specific threats
to concrete test cases with associated test data that can be executed against the application. This transition is supported in terms of a portfolio of systematic techniques (i.e., boundary value testing, syntax testing etc.) that help the test engineer to adequately cover the specific threat being considered. Finally, the obtained test cases are associated with the requirements to enable traceability between the security tests and the requirements specification as it is also done in functional testing. FMVEA [6] is applicable in the development of web applications with the aim to identify security risks. Each threat mode represents a potential security threat that can occur. Relying on this technique, test engineers can systematically derive security test cases. For instance, a thread mode captures that user authentication credentials in a web service are not protected when stored. The causal factors are associated with a defect in the security constraints or the software component. The failure effect may be that an attacker can frequently target privileged accounts. To evaluate the risk a risk priority number is calculated by taking the severity, the probability of occurrence, and detectability of the thread into account. In the next step preventive measures to mitigate the risk are defined. For instance, the test engineer derives specific security test cases that deal with authentication credentials and in addition to that the password policy is improved. C. Testing Security Requirements with Misuse Cases For the remainder we refer to this procedure by the term security testing technique with Misuse Cases (STT-M). Misuse cases, a form of negative use cases, help document negative scenarios. Use and misuse cases, employed together are valuable in threat and hazard analysis, system design, eliciting requirements, and in creating test cases [7]. Attack patterns relate to misuse cases in a similar way than design patterns relate to software designs. In that sense, we consider attack patterns as generalized misuse cases that allow test engineers to re-use a body of well-known malicious attacks. In using STT-M we start with security hazards, i.e., a situation that poses a level of threat to the application. Relying on these hazards we can select a number of attack patterns (e.g., we used the OWASP Top 10, as there is a simple one-page description of each attack pattern together with UML sequence diagrams to illustrate the attack scenarios). Attack patterns illustrate risks such as an attack via SQL injections, cross-site scripting, account harvesting and password cracking. Concrete examples are DOM-based cross-site scripting or stored cross-site scripting [23]. For each pattern, test engineers then refine these attack patterns into specific test cases in the context of the application. In general, one needs a couple of test cases to cover the malicious scenarios sketched by the attack patterns. Many of the obtained test cases correspond to a negative use of a well-defined use case of the application under test, i.e., testers can associate the obtained tests with a specific misuse case. In many cases, it is also possible to associate the misuse case with requirements, i.e., there is traceability between requirements and security tests.
Using STT-F, security is addressed from the viewpoint of the system and its components. The requirements, the architecture of the system and their potential threat modes are taken into account. In contrast to that STT-M focuses on re-using attack patterns to derive test cases by taking the point of view of an attacker. III. RELATED WORK Our work is related to empirical studies in the field of security testing in the framework of the development of web services and to introduce systematic security test design techniques in a system test team. Deak [16] investigates the motivating factors for testers working in traditional or agile projects. Testing in a traditional and agile or iterative development is compared. In iterative development testers enjoy challenges and a variety in their testing activities. The study therefore provides recommendations to companies for motivating testing personal, which leads to a better productivity and quality of the developed product. The importance of experience of domain knowledge and experience was evaluated in a case study in industry [14]. This study emphasized the importance of good domain knowledge of a tester. To investigate the role of defect taxonomies in the design of requirementsbased test cases controlled experiments were conducted at the Graz University of Technology and the University of Innsbruck [5]. The subjects were trained in test case design in the classroom and had to create defect taxonomies and assign the defect-categories to requirements in the experiment. On the basis of the requirements and defect taxonomies they had to create test-cases for a Web application. The results foster the application of defect taxonomies in test-case design with the aim to improve the quality and to reduce the number of test cases in industry. The importance of a mutual knowledge transfer between industry and academia was emphasized in [15]. Improvements in respect to a greater efficiency in testing in industrial projects are facilitated, if results of a controlled experiment are available. None of the mentioned studies investigates the influence of the skills of testers needed to create security tests related to the test design method applied. In this paper, we address this problem of high practical relevance to system testing in the context of a controlled student experiment. IV. EXPERIMENT DESCRIPTION In order to investigate the role of security testing, we performed a student experiment and replicated it. The experiment has been carried out at the Graz University of Technology (Austria) in May 2015 and its replication at the same university in December 2015. The replication was performed under the same conditions as the experiment and used to strengthen the evidence of the results by extending the number of participants. A. Research Goal and Questions The goal of this experiment is to investigate whether testers, which are typically not familiar with in depthsecurity issues, can systematically design security test cases
using two different approaches STT-M and STT-F. From the overall goal to investigate how testers, who are non-experts in security testing can bridge the gap between software engineering and security, we derive the following two research questions that we address in the student experiment and its replication: (RQ1) Does the STT influence the alignment of requirements to test cases? (RQ2) Does the STT influence the quality of test cases? Depending on the used STT, RQ1 addresses the quality of the alignment between requirements and test cases and RQ2 addresses the quality of the tests. RQ3 adds a further dimension to evaluate the adequacy of the test cases by evaluating the coverage of known vulnerabilities. B. Context Selection The experimental object is a real web application to manage course participants. The application consists of 37 requirements and is a web application. In the application, courses can be edited, the application supports search and retrieve functionalities and printing masks. There are 6 main use cases and 14 business cases. Students are familiar with the procedures covered by the application so that the participants can be considered domain experts. Similar to bank employees familiar with their banking software which they use regularly, using a university course participation management system is regularly used by all students and therefore within their domain of expertise. Overall, i.e., the experiment and its replication, the subjects of the experiment were 18 German-speaking master students in computer science at the Graz University of Technology participating in a software engineering seminar. All students have basic knowledge in software engineering including basic knowledge of the purpose and the process of testing. None of the students had knowledge of security issues, e.g., knowledge about applying misuse cases. Likewise, applying the concept of FMVEA to derive security test cases was a new concept to all students. In the course of the seminar, the students were trained in applying systematic test design techniques to given requirements. After these training units students were introduced to the requirements of the web application and asked to develop test cases that cover the functional aspects of the application. All students developed functional test cases for a selection of 8 requirements of the given application as homework. The students had to provide the abstract test cases, concrete test cases, test data and the alignment of test cases to requirements. After submitting this homework, all students were asked to respond to a questionnaire regarding functional test case design techniques (Questionnaire 1). In the second step, we introduced the participants to the concept of FMVEA and asked all participants to provide causes and effects to a number of given threat modes. The 10 threat modes are related to web security. All participants were asked to come up with potential causes, potential effects and a prioritization of the given threat modes by
relying on the severity of the threat, the probability of detecting the vulnerability and the probability of occurrence of the vulnerability. Furthermore, we trained the students in using systematic test case design techniques [18] to cover specific scenarios induced by the threat modes. In a third step, the students were introduced on how to derive test cases with misuse cases. This included a discussion on selected attack patterns. In particular, we provided selected patterns from the OWASP Top 10 repository (various forms of injections, cross-site scripting and cross-site request forgery) and showed how to refine the introduced patterns to specific test cases. Thus, the students were familiar with the given requirements as well as with applying the systematic test design techniques. As the students were subject to the same information on security test design techniques, we randomly divided them into two groups. While each member of the first group (Group A) derived security test cases by using the FMVEA, the members of the second group (Group B) were asked to come up with security test cases by relying on given misuse cases. After 60 minutes, all participants submitted the test cases. Afterwards every student of Group A developed security test cases using misuse cases as guidelines and Group B applied the FMVEA-based technique. The task to be performed did not require a high level of industrial expertise and the domain (course participation management system) was familiar to the students. The use of a student experiment could be considered appropriate as suggested in the literature [1], and the sole practical possibility to observe the influence of the STT as we did in our experiment would cause too much effort and bias in an industrial context. Working with students also implies various advantages, such as the fact that their prior knowledge on misuse cases and FMVEA is homogenous and the number of participants is larger. C. Variable Selection The independent variable of the experiment is the specific STT, which is either the approach that relies on applying FMVEA to the security field (STT-F) or the approach that is guided by misuse cases (STT-M). The dependent variables differ for each research question. For, RQ1, the dependent variables measures the alignment of security-related requirements to test cases, i.e., whether every such requirement is adequately covered by test cases. We evaluated the alignment of security-related requirements and test cases in terms of a five-point Likert scale, specifically considering two criteria: Requirements identifier, i.e., the correct and unique identification of security-related requirements (Table I, req. id) Adequacy of the test cases, the number of correctly specified test cases considering these requirements (Table I, adeq.) The evaluation is derived from our industrial experience and considers that good (security) tests can be systematically derived using test case design techniques.
Regarding RQ2, the dependent variables measure the quality of the security test cases in terms of the following criteria: Definition of the test goal, i.e., the defined test objectives are understandable, unique and reasonable (Table II, def. test goal) Application of the test design technique, i.e., correct application of test design techniques to obtain security tests (Table II, Table III, app. test des. techn.) Correctness of the test cases, i.e., test cases are syntactically and semantically correct (Table II, Table III, correctness) Overall number of test cases, i.e., the overall number of correct test cases within the security test suite (Table II, Table III, overall no. tc.) Except the overall number of test cases, all these variables are measured on a five-point Likert scale from 1 (very good) to 5 (very bad) in relation to the reference test set created by the experimenters. D. Experimental Design We selected a between-subjects balanced design in which each treatment has an equal number of subjects. The 18 individuals independently created functional test cases, potential causes and effects for the given threat modes and two portfolios of test cases using the STT-M and F. The elaboration of the FMVEA for the given set of threat modes has been carried out as homework, as this task is too time consuming to be performed in an experiment session. As the individual effects and causes, and the prioritization of the threat modes have been the foundation for deriving test cases with STT-F, the students did not affect each other. At least two experimenters always evaluated the quality of the resulting test cases. The task sheet, the template for the FMVEA, the requirements specification of the course participation management system and the pre- and postexperiment questionnaires are available online at http://1drv.ms/1Rnx7td. E. Experimental Procedure In two 120 minute preparation units the 18 students participating in the experiment were trained in applying test design techniques. For this purpose the students had to apply the presented techniques in concrete exercises. All students were able to solve these exercises. Prior to the experiment, we introduced the course participation management system to the students. For this purpose we explained typical use cases to the students. After discussing the specific requirements, the students were asked to apply test design techniques in order to come up with test cases for the functional aspects (8 pre-selected requirements) of the course participation management system. Furthermore, for a pre-defined number of security-related fault modes, the student had to elaborate potential causes and effects. All students carried out this task as homework. The experiment started with a post-homework questionnaire to check the comprehensibility and the subjectively perceived difficulty of the task. The
questionnaire (Questionnaire 1) contained questions regarding the subjectively perceived difficulty of applying systematic test case design techniques and applying the FMVEA to security issues. The experiment and its replication have been carried out in a controlled manner and all students had exactly 60 minutes for each task. In the experiment and its replication we followed exactly the same procedure. The students were randomly divided into Group A and Group B and both groups conducted the experiment in different rooms. In each room the student worked under controlled conditions and the teams were strictly separated to not influence each other. Each student received four printed documents: one document with all requirements including the securityrelated requirements, one document with the test case design techniques, one document with the use-cases and business rules and a document summarizing the misuse cases. In addition, every student used the FMVEA being submitted as a homework. Note that all students used the pre-defined set of threat modes, so that the differences in the FMVEAs between the students mainly resulted in a different ranking of the threat modes. Further we prepared a spreadsheet that defined the structure of the security test cases: Each test case has a unique identifier, a specific test goal, pre-conditions, a number of test steps with corresponding input data, an expected results, a technique used to create the test case and the a column for comments. The language of the material is German and was created and reviewed by all experimenters. Whereas we asked Group A to create security test cases by relying on STT-M, Group B applied the STT-F. After 60 minutes the roles were changed, that is, Group A applied STT-F and Group B applied STT-M. F. Analysis Procedure All three experimenters have several years of industrial experience in test process improvement and in applying systematic functional [18] and security testing techniques in practice. Two experimenters measured the dependent variables listed in Section IV.C in relation to the reference solution created by the three experimenters taking test goals, test design techniques, and STT-M and STT-F into account. For each research question, the results of the measured values for STT-M and STT-F are compared qualitatively and in addition quantitatively by Mann-Whitney U tests [19] to figure out whether the differences between STT-M and STTF are significant. V. RESULTS In this section we present the measurement results for each research question based on the metrics defined in Section III.C. A. RQ1: Does the STT influence the alignment of requirements to test cases? We evaluated the alignment of requirements to test cases in terms of two criteria, requirements identification and adequacy of the test cases. For both STT-F and STT-M, the measurement results from the 18 participants are shown in Table I.
TABLE I. ALIGNMENT OF REQUIREMENTS AND TEST CASES FOR STT-F AND STT-M (STUD. NO = STUDENT NUMBER, REQ.ID = REQUIREMENTS IDENTIFIER, ADEQ.=ADEQUACY OF TEST CASES).
stud. no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
STT-F req.id. adeq. 2 1 2 3 2 2 1 2 2 3 2 1 2 2 2 4 2 2 1 1 1 1 2 3 2 2 1 2 1 1 1 2 1 3 1 2
STT-M req. id. adeq. 1 2 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 3 4 1 1 1 1 2 1 1 1 1 1 1 2
With the measurements given in Table I we performed a Mann-Whitney U (MWU) tests (p-value = 0.05) [7] to show whether the differences for the measured attributes are significant. Regarding the attribute Requirements identification, the MWU test showed that the populations are different with a one-tailed p-value of 0.02 and a two-tailed pvalue of 0.04. With respect to the Adequacy criterion, the MWU test also confirmed the difference between the population with a one-tailed p-value of 0.004 and a twotailed p-value of 0.008. B. RQ2: Does the STT influence the quality of test cases? We assessed the influence of the STT to the quality of test cases in terms of four dependent variables: definition of the test goal, application of the test design technique, correctness and overall number of the test cases. Regarding the quality of the security test cases, Table II lists the measurement results from for STT-F and Table III lists the results obtained with STT-M. TABLE II. DEPENDENT VARIABLES FOR STT-F REGARDING QUALITY OF TEST CASES (STUD.NO = STUDENT NUMBER, DEF. TEST GOAL = DEFINITION OF THE TEST GOAL, APP. TEST DES. TECH. = APPLICATION OF THE TEST DESIGN TECHNIQUE, CORRECTNESS = CORRECTNESS OF THE TEST CASES, OVERALL NO. TC = OVERALL NUMBER OF TEST CASES)
student no. 1 2 3 4 5 6 7 8 9
def. goal
test 1 1 1 1 1 1 1 1 1
app. test des. tech. 1 2 1 1 1 1 1 1 1
correctness 2 1 2 1 1 2 2 1 1
overall no. tc. 6 4 6 5 4 8 8 3 6
10 11 12 13 14 15 16 17 18
1 1 1 3 1 1 2 1 1
1 1 2 1 1 1 3 1 1
1 2 4 3 2 2 3 1 1
5 8 9 8 6 6 4 3 6
TABLE III. DEPENDENT VARIABLES FOR STT-M REGARDING QUALITY OF TEST CASES (STUD. NO = STUDENT NUMBER, DEF. TEST GOAL = DEFINITION OF THE TEST GOAL, APP. TEST DES. TECH. = APPLICATION OF THE TEST DESIGN TECHNIQUE, CORRECTNESS = CORRECTNESS OF THE TEST CASES, OVERALL NO. TC = OVERALL NUMBER OF TEST CASES)
student no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
def. goal
test 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1
app. test des. tech. 1 1 1 1 1 1 2 2 1 1 1 3 1 1 2 1 1 1
correctness
overall no. tc.
1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
4 4 6 5 4 4 4 3 5 4 5 1 4 6 4 4 6 2
With the measurements listed in Table II and Table III we performed MWU tests (p-value = 0.05) to determine whether the differences of the respective dependent variables are significant. Table IV outlines the relevant quality dimension of the test case and corresponding one-tailed and two-tailed p-values. TABLE IV.
SIGNIFICANCE LEVELS FOR MWU-TESTS (P-VALUE= 0.05, ONE-, TWO-TAILED P-VALUE)
one-tailed p-value
two-tailed p-value
def. test goal
0.400
0.800
app. of test design techniques
0.398
0.796
correctness
0.009
0.017
overall no.
0.009
0.018
VI. DISCUSSION The student experiment and its replication that we carried out strive to answer the crucial question whether it is meaningful that testers create security test cases. Testers usually have knowledge about the application domain and in applying systematic test design techniques. Given the
pressure of time in a typical test project, it is a worthwhile question whether testers should take care about security aspects or whether a security engineer should carry out this task. On the one hand, testers have to acquire knowledge about software security, and on the other hand dedicated security experts need to acquire knowledge about systematic testing. In our experiments, all students who managed to write functional test cases for the course participation management system were able to provide security test cases. As the required security attributes are partially covered in the requirements specification, RQ1 in particular evaluates the influence of the STT on the alignment of requirements to security test cases. In this respect, our experiments confirm the hypothesis, that STT-M results in a better alignment of requirements and test cases than STT-F. One reason for this could be, that the STT-F relies on an ‘inside view’ of the system under test whereas the STT-M primarily focuses on specific attack patterns that are applied to the system with an ‘outside view’. The post-experiment questionnaire further motivates this point of view, as the students specifically mentioned to have difficulties in applying the threat mode in the context of the given architecture. For example, some students were not able to refine specific threat modes into test cases because of lack of clarity with trust boundaries. RQ2 is dealing with the quality of the obtained security tests. To close the gap between software engineers and security engineers it might be worthwhile to extend the scope of testers into the security field. In this respect, the quality of the obtained test cases is a critical factor. Our experiments show that the specific STT influences the correctness of the test cases and the overall number of test cases. Regarding these two specific quality attributes, STT-M resulted in less test cases whereas the application of STT-F yielded to a higher number of test cases. However, the two experiments also revealed that the correctness of the (lower amount of) test cases derived on the basis of misuse cases was significantly better than the correctness of the (larger amount) of test cases derived on the basis of an FMVEA. The qualitative observation that students reported intricacies in refining some threat modes and in applying systematic test case design methods when using STT-F further supports this argument. To sum up, our experiment indicates that compared to applying FMVEA in the security field, the use of attack patterns results in
a better alignment between requirements and security test cases a higher amount of correct test cases, and smaller test suites
In general, many domains use a pattern-oriented approaches (e.g., software architecture, software design, enterprise architecture etc.) to support re-use and to transfer for knowledge between organizational units. Extending the scope of testers to identify security leaks thus appears to be a
viable procedure to narrow the gap between software engineering teams and security experts. VII. THREATS TO VALIDITY In this section we discuss threats to external, internal, construct and conclusion validity according to Wohlin et al. [1] and present measures how we mitigated these threats. External validity. External validity can be threatened when experiments are performed with students, and the representativeness of these subjects may be doubtful in comparison to that of test professionals. However, our experimental setting is similar to system-level testing in industry, where it is common to recruit test personnel with some domain knowledge, but only little experience in systematic testing and security engineering. The required skills in test design are then provided in short trainings (i.e., similar to our preparation units). Similar to students, also testers in industry have to be concerned with the requirements when a new project starts. Therefore, we think our student experiment could be considered as appropriate, because students and test professionals are in a comparable situation. Another threat to external validity concerns the experimental material used. The system is a real system and was selected carefully from a domain familiar to the students, i.e., course participation management. The students received an in-depth introduction to the system and confirmed its understandability. The size of the system could also threaten the external validity of the results as only a limited number of requirements has been considered. The rationale for selecting the system used relies on the need (due to time constraints) to design security test cases based on the techniques, i.e., based on FMVEA and attack patterns which are useful for practice. Internal validity. Threats to internal validity were mitigated by the design of the experiment. The students were trained in all skills necessary to perform the experimental task and had the same prerequisites. The students designing security test cases based on different techniques worked in different rooms and also the students within one room were not allowed to communicate with each other. Due to time constraints, the student created the underlying FMVEA as a homework, which was uncritical as the FMVEA was checked to be done by each student himself. On the one hand this was required due to time constrains in the lecture itself, and on the other each student had to use his or her own FMVEA as starting point to design test cases. Construct validity. This validity may be influenced by the measures used to obtain a quantitative evaluation of the subjects’ performance, and the post-experiment questionnaire. The metrics to evaluate the quality of the designed security test cases were selected on the basis of the literature and an in-depth discussion between all three experimenters. The post-experiment questionnaire had the purpose to identify the difficulty of the tasks from the subjective viewpoint of the students. Social threats (e.g.,
evaluation apprehension) have been avoided, since the students were not graded on the results obtained. Conclusion validity. Conclusion validity concerns the data collection, the reliability of the measurement, and the validity of the statistical tests, all or any of which might affect the ability to draw a correct conclusion. The results of the experiment, i.e., the spreadsheet with security test cases as well as the questionnaire were directly collected and sent to the experimenters by the students. The data was reviewed and analyzed by all three experimenters. Statistical tests were performed to identify whether the differences between STTF and STT-M are significant. VIII. CONCLUSION AND FUTURE WORK In this paper we presented a controlled student experiment to compare two system security test design techniques based on FMVEA and misuse cases, respectively. The experiment and its replication were carried out at the Graz University of Technology (Austria). The experimental object was a web application actually used to manage course participants. The experiment had two treatments, i.e., designing security tests with and FMVEA-based technique and a misuse-case-based technique respectively. Results show that the use of attack patterns in the misuse-case-based approach delivers test cases with a better alignment between requirements and security test cases as well as a higher amount of correct test cases. In future, we will try to further triangulate the results by replications of the controlled experiment and also by application of the approach in industry. ACKNOWLEDGMENT This work has been supported by the project QE LaB – Living Models for Open Systems (www.qe-lab.at), MOBSTECO (mobsteco.info), the competence network Softnet Austria (www.soft-net.at) and SPLIT (Security Interaction Testing in Practice) under grant 851205 (Austrian Research Promotion Agency). REFERENCES [1] M. Höst, B. Regnell, and C. Wohlin, “Using students as subjects—a comparative study of students and professionals in lead-time impact assessment,” Empirical Software Engineering, 5(3):201-214, 2000. [2] International Software Testing Qualifications Board, “Certified Tester Advanced Level Syllabus – Security Tester”, Version 2016. [3] D. J. de Grood, “TestGoal – Result-Driven Testing”, Collis B.V., 2008. [4] R. Grady,”Practical Software Metrics for Project Management and Process Improvement, Prentice Hall, 1992. [5] M. Felderer, A. Beer, B.Peischl, “On the role of defect taxonomy types for testing requirements: Results of a Controlled Experiment”, Proceeding of 40th Euromicro Conf. Softw. Eng. And Advanced Applications, pp.377-384, IEEE 2014.
[6] Ch. Schmittner, Th. Gruber, P. Puschner, and E. Schoitsch, “Security Application of Failure Mode and Effect Analysis (FMVEA)”, 33rd Int. Conf. on Computer Safety, Reliability, and Security, Vol. 8666. Springer-Verlag, 310-325, 2014. [7] I. Alexander, "Misuse cases: use cases with hostile intent”, IEEE Software, vol. 20, no. 1, pp. 58-66, Jan/Feb 2003. [8] P. Herzog, The open source security testing methodology manual 3, 2010, http://www.isecom.org/research/osstmm.html (last access, April 3rd, 2016). [9] M. Felderer, M. Büchler, M. Johns, A. D. Brucker, R. Breu, A. Pretschner, “Security Testing: A Survey”, In: Atif Memon, Editor(s), Advances in Computers, Elsevier, 2016, Volume 101, pages 1-51, ISSN 0065-2458, ISBN 9780128051580. [10] L. Pietre-Cambaceds, M. Bouissou, “Cross-fertilization between safety and security engineering”, Journal of Reliability Engineering and System Safety, pages 111-123, Elsevier, 2013. [11] G. Sindre and A. L. Opdahl, “Eliciting security requirements with misuse cases”, Requirements Engineering, 10, 1 34-44, 2005. [12] A. Gorbenko, A. Kharchenko, V. Tarasyuk, A. Furmanov, “F(I)MEA-technique of web-services analysis and dependability ensuring“, Lecture notes in computer science, vol. 4157: 153-157, Springer 2006. [13] J. Grossmann, M. Schneider, J. Viehmann, M.F. Wendland., ”Combining risk analysis and security testing“, Fraunhofer Focus, Springer 2014. [14] A. Beer, R. Ramler, “The role of experience in software testing practice”, Proceedings of 34th Euromicro Conf. Softw. Eng. and Advanced Applications, 258-265, IEEE 2008. [15] M. Felderer, A. Beer: “Mutual knowledge transfer between industry and academia to improve testing with defect taxonomies”, Software Engineering (SE 2015), GI 2015. [16] A. Deak, “A comparative study of tester’s motivation in traditional and agile software development”,15th Int. Conf. on Product-Focused Softw. Process Improvement, Springer 2014. [17] B. Beizer, "Software testing techniques," Thomson Computer Press, 1990. [18] A. Beer and B. Peischl, “Testing of Safety-Critical Systems – a Structural Approach to Test Case Design,” Safety-Critical Systems Symposium (SSS 2011), 2011. [19] G. Argyrous, Statistics for research: with a guide to SPSS, Sage Publications Ltd, 2011. [20] List of cyber-attacks, https://en.wikipedia.org/wiki/List_of_cyber-attacks (last access, 28th March, 2016). [21] ISTQB – International Software Testing Qualification Board, http://www.istqb.org/ (last access, 28th March, 2016). [22] The Open Web Application Security Project, http://www.owasp.org/ (last access, 28th March, 2016). [23] D. Schadow, “Java Web Security”, dpunkt Verlag, 2014 (in German language). [24] B. Potter, G. McGraw, “Software security testing”, IEEE Security &. Priv. 2 (5) (2004). [25] B. Hofer, B. Peischl and F. Wotawa, "GUI savvy end-to-end testing with smart monkeys", ICSE Workshop on the Automation of Software Test (AST '09), 2009, pp. 130-137.