Using Defect Taxonomies for Testing Requirements Michael Felderer, University of Innsbruck, Austria Armin Beer, Beer Test Consulting, Austria Abstract: Defect taxonomies collect and organize the domain knowledge and project experience of experts and are a valuable instrument for requirements‐based testing for several reasons. They provide systematic backup for the design of tests, support decisions for the allocation of testing resources, improve the review of requirements and offer a suitable basis for measuring the product quality. In this article, we present a method of requirements‐based testing with defect taxonomies exploiting these advantages. We point out how defect taxonomies can be seamlessly integrated into a standard test process and discuss results and lessons learned with reference to industrial projects from a public health insurance institution where this approach has been successfully applied. Keywords: Software quality; test management; requirements‐based testing; defect taxonomy; requirements validation
Introduction Systematic defect management based on bug tracking systems such as Bugzilla [1] is well‐established and successfully used in many software organizations. Defect management weights the classification of failures observed during the execution of tests according to their severity and forms the basis for the implementation of effective defect taxonomies. In practice, most defect taxonomies are only used for the a‐posteriori allocation of testing resources to prioritize failures for debugging purposes. But the full potential of these taxonomies to control and improve all steps of the overall test process has so far remained unexploited. This is especially the case when testing the user requirements of a system, as system‐level defect taxonomies improve the design of requirements‐based tests, the tracing of defects to requirements and the control of the relevant defect management. Here, we present a system testing approach with defect taxonomies, named requirements‐based testing with defect taxonomies (RTDT). This approach is aligned with the standard test process and uses defect taxonomies to support all phases of testing requirements, i.e. test planning, design, execution, and evaluation. On the basis of a project from the public health insurance domain where the approach has been successfully applied, we present each step of this approach, show how defect taxonomies can improve the effectiveness of requirements‐based testing (RT), and present the lessons learned.
1
In the public health insurance institution where RTDT has been applied, an iterative and incremental development process as well as a standard test process based on the test process of the International Software Testing Qualifications Board (ISTQB) [2] are mandatory for all projects. In the presented project, named project A, to which RTDT has been successfully applied, a web application was developed to support the employees of the public health insurance institution in caring for handicapped people and managing these cases. The project consists of 41 requirements and 14 use cases. It has a development time of about nine months, four iterations and a project staff of about seven. The product implemented in our project is a web application using a web browser as a client to display the GUI and a server for data management and program control. The architecture is service‐oriented, various applications support the different business processes, and the users have role‐specific access to the various applications. In the following, we first discuss the individual steps and specific artifacts of RTDT with reference to project A. We then present the results of the application of RTDT to this project as well as the lessons learned.
Requirements Testing with Defect Taxonomies Requirements‐based testing is the process of planning, designing and executing test cases to dynamically validate whether the system fulfills its specification [3]. The standard requirements‐based test process comprises the steps test planning, design, execution and evaluation on the basis of the specified requirements. In the test planning phase a test plan is defined. A test plan describes the scope, approach, resources, and schedule of the intended test activities as well as a test strategy describing the features to be tested, the test design techniques, and exit criteria to be used and the rationale of their choice. In the test design phase tangible test cases and test conditions are derived on the basis of a test strategy. Then in the test execution phase test cases are executed manually or automatically. This contains tasks to enable the test execution such as preparing test harnesses or writing automated test scripts. The failures detected during test execution are entered in a defect management tool. During the test evaluation phase the exit criteria are evaluated and the test results are summarized in a test report. Our requirements‐based testing process, RTDT, extends the standard test process by using defect taxonomies. Figure 1 displays the steps of RTDT, the core artifacts as well as the involved roles. Based on the requirements specification, the estimated cost of the standard test process is compared to the cost of RTDT. If the estimated cost of RTDT is lower than RT, then RTDT is applied. In this case a defect taxonomy consisting of a hierarchy of defect categories is created under due consideration of the requirements. The requirements are then linked to defect categories and validated based on the linkage. Then test planning, design, execution and evaluation are performed under due consideration of defect
2
taxonomies. The observed failures are finally linked to defect categories and interpreted as part of the test evaluation.
Requirements Specification
RT ≤ RTDT
Compare Costs and Decide on Application TM
Plan Testing
PM
RT > RTDT
Create Defect Taxonomy
TM
Design Tests
TM
T
Defect Taxonomy
Execute Tests T
Link and Validate Requirements and Defect Taxonomy TM
Evaluate Tests A
Process Flow
TM
Failure
Data Flow
Figure 1. Process of requirements–based testing with defect taxonomies (RTDT). Process steps are displayed as rounded rectangles, core artifacts as rectangles and the stakeholders test manager (TM), project manager (PM), analyst (A) and tester (T) as circles.
Starting with the structure of the initial requirements specification, in the following we explain the steps of RTDT in more detail based on the project A where the approach was successfully applied.
Requirements Specification In our projects, the analyst elicits and specifies requirements in cooperation with the domain experts. The resulting system requirements specification of project A consists of 41 requirements. Each requirement has an identifier, a name, a description, assigned artifacts and an additional priority value. The assigned artifacts can be use cases (USC) including process descriptions, business rules (BR) or
3
graphical user interfaces (GUI) and are essential for the design of test cases. The priority has the value low, medium or high, and considers the importance or requirements as well as the impact of malfunctioning. Each requirement has an identifier, a description, a priority and assigned artifacts (use cases, business rules and graphical user interfaces) required to define tests. For instance, REQ_0027 “A search function has to be provided” of project A is linked to the use case description USC_search_client, the graphical user interfaces search and search_results as well as the business rule BR_client_name. The use case USC_search_client specifies the flow how a case manager selects a case by entering the client’s insurance number or name and the client’s data to be displayed. The flow is specified by an activity diagram with 11 paths and 9 decision points. Each of the two GUI descriptions, search and search_results, consists of a GUI screen shot and the specification of the GUI controls with name, type and input space, if data can be entered. The business rule BR_client_name defines the constraints, for example the usage of wildcards, when entering a name.
Compare Costs and Decide on Application If required, the test manager decides together with the project manager based on a comparison of the estimated cost of RTDT and RT whether to apply requirements testing with or without defect taxonomies in a specific project. The cost can be estimated on the basis of the cumulated time effort along the testing phases. According to our experiences, this cost comparison based on the time effort is feasible, provides realistic estimation results and can be performed very early in the project life‐cycle. Our estimation procedure, which is explained in detail in [4], is parameterized with the test project specific factors number of requirements as well as estimated number of test cases. In case of project A, the break‐even, i.e. the total time effort for RTDT is smaller than for RT, is already reached after the first test cycle. We therefore decided to apply RTDT in this project.
Create Defect Taxonomy A defect taxonomy is a system of (hierarchical) categories designed to be a useful aid for reproducibly classifying faults and failures [2]. Such a classification is concerned with removing the subjectivity of the classifier and creating distinct categories with the goal to better measure and control testing, defect management and product quality. A generic defect taxonomy widely used in software testing has been defined by Beizer [5]. It consists of the top‐level categories (1) requirements, (2) features and functionality, (3) structural defects, (4) data, (5) implementation and coding, (6) integration, (7) system, software architecture, (8) test definition and execution as well as (9) unclassified defects. RTDT is independent of a specific type of defect taxonomy. For instance, also defect taxonomies based on the established IEEE standard 1044‐1993 [6] may be used as basis for RTDT.
4
Defect Category of Beizer
Product-Specific Category
Unsuitability of the system taking the organizational 11xx . . Requirements incorrect processes and procedures into 16xx . . Requirement changes account.
1xxx . . Requirements
12xx . . Logic 13xx . . Completeness 2xxx . . Functionality as implemented 21xx . . Correctness 22xx . . Completeness, features
Incorrect handling of the syntactic or semantic constraints of GUI. Incorrect handling of the syntactic or semantic constraints of processes and GUI.
DC
Description of DC
R1
Client not identified correctly
critical
R2
Goals and measures of case manager are not processed correctly
normal
R3
Update and termination of case incorrect
normal
GUI-layout R4
9xxx . . Unclassified bugs
Interface to external components
major
F1
Client not identified correctly
critical
F2
Goals and measures of case manager, e.g. case termination, are not processed correctly
normal
F3
Check of termination of case not correct
critical
F4
Erroneous identification of client: Wrong / missing error messages
normal
F5
Wrong / missing error message: Save-button etc.
critical
F7
42xx . . Data access and handling 62xx . . External interfaces and timing 623x . . . I/O timing or throughput
Syntactic specifications of input fields Error massages
F6
4xxx . Data
Severity
GUI behaviour; Wrong / missing error message: status, domain limitations Display of incorrect data on screen/report
F8
Incorrect behaviour of GUI: disable/ enable of controls navigation; default values
major normal normal
F9
Display of syntactic incorrect data, minor errors in Layout
minor
D1
Incorrect access / update of client information, states etc.
normal
D2
Erroneous save of critical data
critical
I1 I2 U1
Data are incorrect or incomplete because of error in service call Data of clients are not available because partner application is not available e.g. sporadic failures during performance testing
normal critical normal
Figure 2. Defect taxonomy of project A.
The defect taxonomy of project A is based on the Beizer taxonomy because of its suitability for system testing and the experience available in the development organization. Figure 2 shows the defect taxonomy of project A created by the test manager. It has three levels of abstraction, starting with selected high‐level defect categories from the top‐level categories of Beizer, i.e. the categories (1), (2), (4), (6), and (9) relevant for system testing in our context. The high‐level categories are mapped to product specific categories which are then further refined to concrete low‐level defect categories (DC) with an assigned identifier and severity. The possible values for the severity of defect categories and also failures follow Bugzilla [1], and may be blocker, critical, major, normal, minor, or trivial. In the defect taxonomy of project A shown in Figure 2 the Beizer category “incorrect/incomplete feature/functionality” is, for instance, mapped to the product specific defect category ”Incorrect handling of syntactic and semantic constraints of processes and GUI” with assigned low‐level defect categories F1 to F9 each having a concrete severity value. These defect categories partition the original defect categories of Beizer to a specific domain and technology. The defect taxonomy of project A considers a specific domain (case management in public health insurance) and technology (web
5
application) and was adjusted iteratively in the course of interpreting the new defects in the defect management system. Our experience of using defect taxonomies in several projects showed, that up to 9 sub‐categories on each level are convenient and manageable. When creating defect taxonomies, defect data of completed projects and the feedback of affected roles such as developers or testers should be taken into account.
Link and Validate Requirements and Defect Taxonomy The test manager assigns the prioritized requirements to defect categories in cooperation with the analyst. Requirements are assigned to defect categories such that failures occurring when testing a specific requirement fall into one of the assigned defect categories. For instance, REQ_0027, i.e. the search functionality, is assigned to defect category F1, because the erroneous identification of a client is critical during search. Note that requirements can be assigned to more than one defect category. For instance, REQ_0027 is assigned to the defect categories F1 and F7. Based on the linkage between defect categories and requirements, RTDT enables specific checks of requirements quality criteria. The use of defect taxonomies allows additional anomalies to be detected compared to the standard review process on the basis of IEEE 1028 [7] in place in the studied public health insurance institution. With defect taxonomies the requirements quality criteria completeness, ranked for importance, verifiability, traceability, comprehensibility as well as right level of detail can additionally be reviewed utilizing the assignment of weighted defect categories and requirements. For instance, REQ_0027 with priority medium is assigned to defect category F1 with severity critical. This suggests that the priority is not adequately assigned and that the quality criterion ranked for importance is not fulfilled. After consultation of the analyst, the test manager changed the priority of REQ_0027 to high. In addition, we noticed that defect category U1 has no assigned requirement. This suggests that the requirements specification is incomplete and that a load requirement and an assigned load test definition have to be added. This was done by the test manager in coordination with the analyst.
Plan Testing Testing is planned by the test manager. The test strategy is based on the list of test design techniques shown in Figure 3.
6
Test Design Technique Process cycle TDS1 testing State transition TDS2 testing ID
TDS: Sequence oriented
Test Strength 1 Test Strength 2 Test Strength 3 (low ) (norm al) (high) Main paths
Branch coverage Loop coverage
State coverage
State transition coverage
Path coverage
Data cycle tests
Data cycle tests
EP valid
EP valid+invalid
EP valid+invalid
BVA valid
BVA valid+invalid
BVA values at boundaries
CE
CE
TDD1 CRUD testing EP: Equivalence partitioning BVA: Boundary TDD3 value analysis Cause-effect TDD4 graphing TDD2
TDD: Data oriented
Syntax valid + invalid Simple condition Modified condition TDD6 Condition testing coverage decision coverage Experience-based Experience-based TDE1 Heuristic testing criteria criteria Experience-based Experience-based TDE2 Exploratory testing criteria criteria TDD5 Syntax testing
TDE: Experience oriented
TDP1 Load testing TDP: Perform ance TDP2 Stress testing oriented
Syntax valid
Syntax valid + invalid Modified condition decision coverage Experience-based criteria Experience-based criteria
Low Load
Medium Load
High Load
Low Load
Medium Load
High Load
Figure 3. Test design techniques used in project A.
These techniques derive from a list provided by De Grood [8] and the experience of the test management in the institution where RTDT was successfully applied. Each test technique has three test strengths, i.e. 1 (low), 2 (medium), and 3 (high). The test strength refines the test depth of a testing technique, for instance by applying different coverage levels. The test design techniques are assigned to combinations of defect categories and requirements categories taking into account their focus of finding specific types of defects. The categories of these techniques complement each other, and combinations of defect categories and requirements can be assigned to more than one test design technique, especially from different categories. Test design techniques allow the test strength to be varied on the basis of the priority of requirements and the severity of defect categories. This allows to determine the test strength in a very specific and goal‐ oriented way. The test design technique, together with the test strength assigned to the combination of a requirement and defect category, is used to estimate the number of test cases. In the test strategy of project A, the test strength is determined on the basis of the priority of requirements and the severity of defect categories. For instance, we assigned the test strength 3 if the
7
priority is high or if the priority is normal and the severity is blocker or critical. Defect category F1 of project A is linked to test design technique TDS2 (state transition testing) because this test design technique is highly suitable for revealing failures of database retrievals. However, failures of category F1 are not necessarily found by other techniques. For instance, TDD1 (CRUD testing) focuses on finding failures of life‐cycle and verifying actions on data elements. TDD2 (equivalence partitioning) must also be applied to define the entry values and reduce the number of test cases. Thus TDD2 will detect erroneous behavior of the user interface: disable/enable of controls and input field, layout errors or failures in the order of lists. Because requirement REQ_0027 has high priority and the severity of F1 is critical, test strength 3 has to be assigned and 14 test cases are needed, 11 test cases to cover all paths of the activity diagram and three test cases for searching with the asterisk wildcard.
Design Tests In the test design phase, testers derive test cases by applying the previously decided test strategy based on the defect taxonomy. A good structure and clear references to the test base are preconditions for the design of effective and maintainable test cases. In our approach, therefore, test cases are derived on the basis of linked defect categories, requirements (and the assigned requirements artifacts of type USC, GUI, BR), test design techniques as well as the estimation of the number of test cases. In the test design phase, test cases defining the test purpose are created. They have a unique identifier linked to the requirement and the covered functionality. The test design phase starts with a definition of abstract test cases on the basis of the use case description, annotated with hypertext links to the requirements, user interfaces and business rules. Physical test cases with preconditions as well as concrete test steps and test data are created from each abstract test case. To cover REQ_0027, for instance, the 14 estimated test cases have to be specified. In addition, eight negative test cases have to be created to check all error messages of the business rule BR_client_name such as “a name has to have at least two characters”. Physical test cases with preconditions as well as concrete test steps derived from the use case description and test data derived with test design technique TDD2 (equivalence partitioning) are then created on the basis of the resulting 22 abstract test cases.
Execute and Evaluate Tests In the test execution phase, first all test cases are made executable. This comprises the maintenance of existing test cases or test infrastructure. The test cases are then executed manually or automatically. In case of project A, all system test cases are executed manually by testers. The reduced number of test cases therefore significantly lowers the overall test execution time. Additionally, the test cases can be prioritized on the basis of the requirements priority and the severity of the defect category if test execution resources are limited.
8
Finally, after the execution of a test cycle, the test manager evaluates the test exit criteria. To check the quality of the system and the test progress, not only the ratio of the passed test cases to the overall number of executed test cases but also the defects and their severity have to be taken into account. As the failures are traceable to defect categories, the severity values of failures which are assigned by testers and have varying accuracy, can be checked and adapted. Thus more realistic release quality statements and more precise planning of additional hotfixes or iterations are possible. For instance, in the first test cycle of project A, the test cases linked to REQ_0027 detected five failures. One of these is assigned to defect category F1 and has the severity major. However, the severity of the failure of defect category F1 should be critical instead of major, because of an erroneous identification of a handicapped client.
Results and Lessons Learned The application of RTDT in project A showed an improved effectiveness of test cases and other lessons learned discussed in the following paragraphs.
Improved Effectiveness of Test Cases Compared to similar projects with RT in the public health insurance institution, the overall number of test cases with RTDT in project A is lower but the generated test cases are more goal‐oriented in respect to detect specific categories of defects with a coverage taking the severity of an expected failure into account. The test cases generated with RTDT are therefore more effective, i.e. the tests cases find more failures on average and their total number is reduced. In project A the overall number of test cases (NOT) is 148. The number of registered failures (NOF) is 169. The number of test cases per requirement is 148/41 = 3.61. In a similar project of the same organization (project B) the number of test cases per requirement was considerably higher, i.e. 6.07. The effectiveness measured by the ratio NOF/NOT is 169/148 = 1.14 in project A. In project B, the effectiveness was only 0.67 [9]. These quantitative results are validated qualitatively by feedback of the testers which emphasized that RTDT helps them to improve their test cases to find more defects of severity major, critical or blocker. In addition, the feedback of the test management confirms that the severity value of a failure is more realistic and as a consequence the number of unplanned releases was reduced.
Tool Support In the studied public health insurance institution professional tools for requirements management, modeling, test management and defect management are used to implement the standard development and test process. The additional tool support required to implement RTDT is quite low as all activities specific to RTDT can be done with spreadsheets. We create a defect taxonomy as well as its links to requirements (exported from the requirements management tool into a separate worksheet), test
9
design techniques (stored in a separate worksheet) and failures (exported from the defect management tool into a separate worksheet) with spreadsheets. This works well in our projects as spreadsheets are easy to use and adapt.
Quality of Requirements With defect taxonomies especially the requirements quality attributes completeness, ranked for importance, verifiability, traceability, comprehensibility and right level of detail can be reviewed for individual requirements, requirements artifacts or complete requirements specifications. In project A, we found several additional anomalies compared to the standard review process based on IEEE 1028 and took appropriate countermeasures to increase the quality of requirements and consequently also in the released product.
Cost‐Benefit Analysis As soon as RTDT has been introduced in an organization by providing a description of the procedure, respective templates, and training for the testers, the effort for applying the method in a concrete project is due to our experience manageable. The main benefits of using RTDT are (1) the increased quality of the requirements, tests and especially the released product as well as (2) the increased process quality providing decision support for the release and test process. The main costs of the approach are (1) the effort to create and maintain the defect taxonomy as well as (2) their linkage to requirements and failures. In addition, we defined a feasible and realistic decision procedure whether to apply RTDT. The decision procedure compares the cumulated estimated time effort for RTDT and RT along the phases of RTDT. We conducted this procedure in project A and found that the break‐even of RTDT was reached already after the first iteration. With a growing number of iterations the cost difference to RT and thus also the benefit of RTDT increases further. The full potential of RTDT can be exploited in iterative and incremental development processes as the cost advantage of RTDT compared to RT grows with the number of iterations and decisions on whether additional hotfixes are needed or when and what to release are supported by more precise information.
Conclusion In this article, we presented a method of requirements‐based testing with defect taxonomies (RTDT). This approach extends the standard requirements‐based test process by seamlessly integrating defect taxonomies into all phases of the test process. On the basis of the requirements specification, the estimated cost of the standard test process is compared with the cost of RTDT. If RTDT has a lower cost and is applied, a defect taxonomy consisting of a hierarchy of defect categories is created under due consideration of the requirements. The requirements are then linked to defect categories and validated on the basis of the linkage. Test planning, design, execution and evaluation are then performed under
10
due consideration of the defect taxonomies. Finally, the observed failures are linked to defect categories and interpreted as part of the test evaluation. The application of RTDT to a project from a public health insurance institution revealed the improved effectiveness of test cases and several other lessons learned relating to the required tool support, the quality of the requirements, the preconditions of the development and test process as well as the costs and benefits of the approach.
Acknowledgements Parts of the work described in this article were supported by the project QE LaB – Living Models for Open Systems (FFG 822740).
References [1] N. Serrano and I. Ciordia, “Bugzilla, ITracker, and other bug trackers,” IEEE Software 22(2), pp. 11‐ 13, IEEE, 2005. [2] ISTQB, “Standard glossary of terms used in software testing. Version 2.2,” International Software Testing Qualifications Board, Glossary Working Party, 2012. [3] C. Denger and T. Olsson, “Quality assurance in requirements engineering,” Engineering and managing software requirements, pp. 163‐185, Springer, 2005. [4] M. Felderer and A. Beer, “Estimating the Return on Investment of Defect Taxonomy Supported System Testing in Industrial Projects,” Euromicro SEAA 2012, IEEE, 2012. [5] B. Beizer, “Software testing techniques,” International Thomson Computer Press, 1990. [6] IEEE, “IEEE Std 1044‐1993: IEEE Standard Classification for Software Anomalies,” IEEE, 1993. [7] IEEE, “IEEE Standard 1028‐1997: IEEE standard for Software Reviews,” 1997. [8] D. J. de Grood, “TestGoal – Result‐Driven Testing,” Springer, 2008. [9] M. Felderer and A. Beer, “Using Defect Taxonomies to Improve the Maturity of the System Test Process: Results from an Industrial Case Study,” Software Quality Days 2013, Springer LNBIP, 2013.
About the authors Michael Felderer is a senior research associate at the University of Innsbruck, Austria. His research interests include software testing, requirements engineering and empirical software engineering. Besides his research activities, he transfers his research results into practice as a consultant. Contact him at
[email protected].
11
Armin Beer has been working in the area of test management and test automation for about 20 years. He is now an independent consultant and lecturer. Armin Beer is member of the ISTQB working parties Glossary and Expert Level‐Test Automation. Contact him at
[email protected].
12