2012 38th Euromicro Conference on Software Engineering and Advanced Applications
Estimating the Return on Investment of Defect Taxonomy Supported System Testing in Industrial Projects Michael Felderer
Armin Beer
Institute for Computer Science University of Innsbruck Innsbruck, Austria
[email protected]
Independent Consultant Beer Test Consulting Baden, Austria
[email protected]
Abstract—Defect taxonomies collect and organize the domain knowledge and project experience of experts and are a valuable instrument of system testing for several reasons. They provide systematic backup for the design of tests, support decisions for the allocation of testing resources and provide a suitable basis for measuring the product and test quality. In this paper, we present a method of system testing based on defect taxonomies and an appropriate estimation procedure for its return on investment depending on several parameters like the average test design time or the number of test cycles and experience values of a test organization. The estimated return on investment provides decision support whether to apply defect taxonomy supported system testing for a specific product or not. We develop the estimation procedure in the context of an industrial project from a public health insurance institution where the return on investment was positive after the first test cycle. From the experience of this project we extract guidelines and heuristics for precise estimation and interpretation of the return on investment of defect taxonomy supported system testing in the context of other projects. Keywords: System testing; test management; taxonomy; estimation; return on investment;
We call our system testing approach that utilizes defect taxonomies for requirements testing defect taxonomy supported testing (DTST). In DTST, we adopt the established Beizer taxonomy [3] because of its suitability for requirements testing. Although we use this taxonomy as a starting point to define our product-specific defect taxonomy, this approach is not bound to a specific classification schema, and other defect taxonomies can also be considered. The return on investment (ROI) effectively compares the costs and benefits of an arbitrary activity [4]. For software quality practices like DTST, ROI analysis has several advantages [5]. If the ROI is estimated before deployment of a quality practice, it helps to decide whether to perform a specific technique or not and to choose among competing software quality techniques. The ROI is also suitable to inform management about potential savings from implementing a specific software quality technique, and to decide whether to improve, replace or expand a quality practice in an organization. We develop a ROI estimation procedure to decide whether to perform DTST in a standard test process or not. This paper is structured as follows. In the next section we give an overview of related work. We then present our defect taxonomy supported testing approach in Section III, and the ROI estimation and interpretation in Section IV. Finally, in Section V we conclude and present future work.
defect
I. INTRODUCTION Systematic defect management based on bug tracking systems like Bugzilla [1] is successfully used in many software organizations and forms the basis for the implementation of effective defect taxonomies. In practice, most defect taxonomies are only used for the a-posteriori allocation of testing resources to prioritize failures for debugging purposes. But the full potential of these taxonomies to control the overall system test design and release quality is not exploited in standard test processes like the ISTQB test process [2]. The reason for this may be that there are no experienced decision support procedures available to decide whether to apply defect taxonomies in such a test process. Thus, in this paper we investigate the following questions: Is there an easily applicable procedure to decide whether to apply defect taxonomies in a standard test process or not? To answer the question, we first define a system testing approach based on the standardized ISTQB test process that uses defect taxonomies. We then develop an estimation procedure for the return on investment of testing with defect taxonomies and show how this procedure helps to decide whether to apply defect taxonomies in a standard test process or not. The estimation procedure is applied in an industrial project from a public health insurance institution.
978-0-7695-4790-9/12 $26.00 © 2012 IEEE DOI 10.1109/SEAA.2012.33
II. RELATED WORK Our approach is related to previous work on the application of defect taxonomies for testing purposes, and the return on investment of testing methods. Several generic and application-specific defect taxonomies with various objectives have been listed by Vijayaraghavan and Kaner [6]. The most influential generic defect taxonomies in the software testing literature have been defined by Beizer [3], as applied in this paper, and Kaner [7]. The defect taxonomy proposed by Kaner distinguishes defects related to the user interface, error handling, boundaries, calculation, race conditions, load conditions, hardware, version control, and testing. The IEEE Standard 1044-1993 on test classification provides a classification scheme of anomalies [8]. It defines an anomaly as any condition that deviates from expectations based on requirements specifications, design documents, user documents, standards, etc. or from someone’s perceptions or experience. The orthogonal defect classification (ODC) scheme [9] defines attributes for the classification of failures aiming at analysis of the software development lifecycle. 426
Additionally, there are various application-specific defect taxonomies, e.g. for web applications [10]. Although many defect distribution statistics are based on defect taxonomies [3], there are only a few empirical investigations of their properties. For instance, Vallespir et al. [11] define a framework to evaluate and compare different defect taxonomies based on a meta taxonomy including attributes, structure type and properties. A promising approach for defect taxonomy-based testing of web applications is defined by Marchetto et al. [10]. First, specific defect categories are selected. Then, at least one usage scenario is defined for each selected category. Finally, for each scenario a test case is defined and executed. Compared to Marchetto et al., our defect taxonomy-based testing approach additionally considers traceability between requirements and defect categories, and weighted artefacts. This supports more accurate methods for test design and release quality assessment in our approach than in other testing approaches based on defect taxonomies. But due to the additional overhead, estimating the ROI becomes more important. ROI analysis aims to achieve clarity in the decisionmaking process [12] and is of high importance for software engineering especially when economic aspects are considered as in value-based software engineering [13]. In this respect, accurate estimation procedures for the ROI are essential. El-Emam [5] investigates the ROI in software quality which includes several ROI calculation examples for software quality techniques like inspection, defect detection, and test-driven development. Especially for software testing, several authors highlight the importance of the ROI and provide estimations for specific testing techniques. Ramler et al. propose a framework for value-based test management [14] and highlight the importance of the ROI for planning test targets of test cycles. Black [15] illustrates the ROI in testing considering detection costs, external failure costs and internal failure costs. In Graham and Fewster [4] several case studies with ROI calculations for test automation are provided, e.g. for model-based test case generation. But so far, an exemplary or generic ROI estimation procedure for defect taxonomy supported testing has not been given.
applicable to a specific requirement whose observed failures are in a specific defect category. Each test technique has three test strengths assigned to it, namely low, normal, and high. The test strength [16] refines the test technique, e.g. by coverage criteria or test methods, and is determined by the priority of a requirement and the severity of defect categories or failures.
III. DEFECT TAXONOMY SUPPORTED TESTING APPROACH In this section we give some basic definitions and explain the test process of DTST. A defect taxonomy is a system of (hierarchical) categories designed to be a useful aid for reproducibly classifying faults and failures [2]. Failures, i.e. malfunctions identified during the testing of requirements, are assigned to defect categories. Each failure also has a severity reflecting its potential impact. The possible values for the severity of defect categories and failures follow Bugzilla [1], and may be blocker, critical, major, normal, minor or trivial. A test strategy contains several test pattern elements. Each test pattern defines a test technique such as use case-based testing or state transition testing, is assigned to a defect category, and is linked to several requirements which are tested on the basis of the test pattern. Additionally, a test pattern is linked to a set of test case elements designed with the test technique for the pattern. Therefore, a test pattern defines a scheme used to determine the test design technique
In the following paragraphs, we outline the phases of our standard test process and the integrated steps of DTST.
Test Process In the organization where DTST was implemented, a standard test process based on the ISTQB test process was already in place. This standard test process consists of the following phases: (1) test planning, (2) test analysis and design, (3) test execution, and (4) test evaluation and reporting. DTST was set up to improve iterative testing, traceability between artefacts, test design, and the release quality assessment of the standard test process. Figure 1 shows its steps, their input and output, and their integration into the standard test process. Steps 1 to 4 are part of the standard test process phase (1), and Step 5 is integrated into phase (4). DTST does not directly affect the phases 2 and 3 of the standard test process. Steps of Defect Taxonomy Supported Testing
Phases of the Standard Test Process
Step 1: Analysis and Prioritization of Requirements Input: Requirements Output: Requirements with priority
Step 2: Creation of a Product-specific Defect Taxonomy Output: Defect Taxonomy and Test Patterns
(1) Test Planning Step 3: Linkage of Requirements and Defect Categories Input: Requirements with priority Output: Requirements with assigned Defect Categories
Step 4: Definition of a Test Strategy with Test Patterns Input: Requirements, Use Cases and Test Patterns Output: Test Strategy
(2) Test Analysis and Design (3) Test Execution Step 5: Analysis and Categorization of failures after a test cycle Input: Test Cases, Failures Output: Categorized Failures, Statement of Quality
(4) Test Evaluation and Reporting
Figure 1. Integration of DTST into the Standard Test Process
(1) Test Planning In this phase, a test strategy and a test plan are defined. In the integrated Step 1 of DTST, the requirements and use cases are reviewed by analysts in cooperation with domain experts, and a table linking the requirements with the assigned priority to the use cases and GUI is generated. In Step 2, a product-specific defect taxonomy is created by the test manager on the basis of the Beizer taxonomy [3]. This taxonomy has three levels of abstraction, starting with 8 top-level categories. Only categories (1) requirements, (2) features/functionality, (4) data, (6) interfacing, and (9) unclassified defects are relevant for system testing in our context. The high-level categories are mapped to product-specific categories which are then further refined to concrete low-level defect categories (DC) with an assigned identifier and severity (see Table I). In Step 3, the tester assigns the prioritized requirements to defect categories with the aid of the experience of domain experts.
427
TABLE I. PART OF A DEFECT TAXONOMY Category of Beizer (2) Incorrect/Incomplete Feature/Functionality
Category in Project DC Name of DC Incorrect handling of F1 Client not identified correctly constraint of … … processes and GUI F9 Display of incorrect data/minor layout errors
between defect categories, failures and requirements were defined and maintained by a test manager who analysed the defect entries in Bugzilla with the aim of creating and optimizing the product-specific defect taxonomy.
Severity Critical …
Minor
TABLE II. OVERVIEW OF THE STUDIED PROJECT Characteristics Area Staff Duration Number of iterations Size Ratio of system testing
In Step 4, test patterns are defined and assigned to defect categories. With this test pattern concept, we manage to relate the specific defect-detection capability of each testing technique, as pointed out by de Grood [16], to defect categories. As a consequence, the coverage level and therefore also the number of test cases can be optimized by varying the test strength. The characterization schema of Vegas and Basili [17], and the experience of previous projects were also considered in determining and recommending test techniques to testers. (2) Test Analysis and Design In this phase, testers define test cases on the basis of the test plan, taking the use case and GUI specifications into account. (3) Test Execution A set of executable test cases for the system under test is created and then executed. The detected failures are documented in Bugzilla. (4) Test Evaluation and Reporting In this phase, the test exit criteria are evaluated and the test results are summarized in a report. In the integrated Step 5 of DTST, test managers and analysts review the severities defined by testers for all failures detected during test execution. On the basis of the review, a statement on the release quality is made and failures to be corrected in the next release are selected. In additional test cycles, the defect taxonomy has to be maintained and selected tests have to be executed.
Studied Project Application for case managers About 7 9 months of development, now maintenance 4 ~ 40 requirements 27% of overall project effort
B. Estimation Procedure In this section, we explain our estimation procedure for the ROI of DTST. The main goal of the estimation is to decide whether to apply DTST in a concrete test project based on the standard test process (henceforth abbreviated to ISTQB) or not. A practical approach to estimating the ROI of test process improvement measures such as test automation compares their estimated benefits and costs [4]. We adapt this pragmatic approach to DTST and determine the ROI as follows: the ROI of DTST is the difference between the efforts of ISTQB (benefits) and DTST (costs). As we do not compare different projects or organizations, the difference of benefits and costs is not divided by the costs as done in the classical ROI definition [12]. Experience gained so far indicates that our ROI estimation is feasible and easy to interpret for practitioners leading to the same decision concerning the application of DTST as the classical ROI definition. The estimation is performed very early in the project life cycle to provide decision support as whether to apply DTST or not. Thus, we do not consider benefit factors which are difficult to estimate in that phase, such as a shorter time to market or a reduced number of defects. As our approach is independent of any specific test organization, we consider the abstract time ROI and not the monetary ROI, which can be derived from the time ROI by a cost model. The estimation results for the studied project are summarized in Table III. The estimation procedure is structured according to the phases of ISTQB and additionally considers test maintenance because the project was developed in several iterations. The efforts of ISTQB and DTST are compared for each testing phase. For one test organization, the estimated efforts of ISTQB and DTST are based on the experience of several projects and depend on project-specific factors such as the number of requirements, test cases and test cycles, and the average design, execution and maintenance time of a test. The estimation procedure for the various phases of ISTQB and the attached steps of DTST are explained below in the context of the studied project. All efforts are estimated in person hours (Ph). (1) Test Planning The estimation of the number of test cases with DTST is based on the requirements prioritization also performed in ISTQB. The prioritization of a requirement takes an average of 0.5 Ph in the studied test organization because each requirement has to be analyzed in depth and the prioritization has to be confirmed by several stakeholders. So the overall effort for 40 requirements is 20 Ph, while the estimated effort for Step 2 is 30 Ph. This value was derived from the experience gained in creating several defect
IV. ROI ESTIMATION AND INTERPRETATION In this section, we present our ROI estimation procedure for DTST. We first describe the project setting, then perform the ROI estimation, and finally interpret the results. A. Project Setting In the institution where this research was performed, a new generation of Java-based web applications has been developed: it aims to improve the efficiency of internal processes and automate services for the clients. Compliance with legislation applicable to social insurance institutions in Austria, such as correctness and completeness of business cases, security and refunding in time must be guaranteed. The architecture is service-oriented. Business objects, backend and external systems are networked via an Enterprise Service Bus. To manage the increasing effort of testing a large number of iterations and interfaces, measures to reduce the number of test cases and achieve a more precise assessment of the defects detected were needed. The studied project is an application developed to support the tasks of the employees of the public health insurance institution in caring for handicapped people and managing these cases. Table II gives an overview of the characteristics of the studied project. To test this project, we have estimated and applied DTST on the basis of the standard development and test process established by the institution. Note that the requirements and defect taxonomy are defined by different persons, and are consequently also prioritized independently of each other. The relationships
428
our experience, the number of test cases is 15-20% lower for DTST than for ISTQB. Additionally, skills in systematic test case design are needed and trained in both test process variants during an 18 Ph workshop. (3) Test Execution (per test cycle) The average time required to execute a test case manually is about 0.5 Ph, most of which is dedicated to checking the actual results against the expected ones and entering the failure descriptions in Bugzilla. (4) Test Evaluation and Reporting (per test cycle) After a test cycle, the results have to be analyzed, which requires a constant effort of 4 Ph. The categorization requires an additional constant effort of 2 Ph. We consider defect prediction to refine the analysis and categorization as a suitable topic for future work. (5) Test Maintenance (per test cycle) For DTST, a constant effort of 5 Ph per test cycle has to be considered for the maintenance of the defect taxonomy. In the studied project, an average of 60 test cases per test cycle is maintained in ISTQB, and of 30 test cases in DTST. Two test cycles are planned for each of the four iterations of the studied project. We therefore consider eight test cycles in our estimation procedure shown in Table III. The ROI of a specific test phase is the difference between the accumulated efforts of ISTQB and DTST in that test phase.
taxonomies in the studied test organization. It is constant because the number of defect categories is more or less constant to keep the defect taxonomy usable for testers and maintainable for test managers. Linking a requirement to a defect category takes an average of 0.25 Ph per requirement, and a total of 10 Ph for Step 3. The definition of a test strategy takes 30 Ph according to the experience of the test organization, and the additional definition of test patterns takes 10 Ph. All in all, about 100 Ph are required to prepare a test plan for DTST, and only 50 Ph are needed for ISTQB. This reflects the fact that DTST requires an additional initial effort which is then compensated in the later phases. (2) Test Analysis and Design First the number of test cases for ISTQB and DTST is systematically estimated. For ISTQB, the requirements and attached use cases are manually analyzed and appropriate testing techniques such as state transition testing or equivalence partitioning are assigned. Then, the number of test cases is estimated in order to reach 100% coverage of each requirement according to the selected test technique. The estimation of the number of test cases in DTST additionally considers different coverage levels depending on the priority and severity values. In the studied project, 182 test cases are estimated for ISTQB and 148 for DTST, with an overall effort of 182 Ph for ISTQB and 148 Ph for DTST, provided that the design time of one test averages 1 Ph. According to
TABLE III. ROI ESTIMATION OF DTST IN THE STUDIED PROJECT DTST
IS T Q B t e s t pro c es s P ha s es
A c t ivit ie s
P h A c t iv it ies
A nalysis and prio ritizatio n of requirements 4 0 requirements @ 0.5 P h
( 1) T e s t P la nning
Definitio n of test strategy T o t a l E f f o rt T e s t P la nning ( 2 ) T e st A nalysis a nd D e s ign
Design and implementatio n o f 18 2 test cases @ 1 P h Training of testers
T o t a l E f f o rt T e s t A na lys is a nd D e sign ( 3 ) T e st E xe c ut io n ( pe r cyc le)
Executio n o f 182 test cases @ 0.5 P h
( 4 ) T e st E va luat io n a nd R epo rt ing ( per cyc le )
A nalysis o f failures after o ne test cycle
( 5 ) T e st M a int e na nc e ( pe r c yc le )
6 0 test cases @ 0 .25 P h
T o t a l E f f o rt T e s t E xe c ut io n, E v a luat io n, M a int e na nc e IS T Q B t es t pro c e ss
Step 1: A nalysis and prioritizatio n o f 20,00 requirements 40 requirements @ 0 .5 P h Step2: Creatio n o f a product-specific defect taxo nomy Step 3: Linkage o f requirements and defect catego ries 40 requirements @ 0 .2 5 P h Step 4: Definitio n o f a test strategy with 30,00 test patterns 5 0 ,0 0 Design and Implementatio n o f 14 8 test cases @ 1 P h 18,00 Training o f testers 2 0 0 ,0 0 182,00
Executio n o f 14 8 test cases @ 0 .5 P h Step 5: A nalysis and catego rizatio n o f 4,00 failures after a test cycle M aintenance o f defect taxo no mies
91,00
15,00 30 test cases @ 0 .25 P h 110 ,0 0
R OI Ph
Ph
20,00 30,00 10,00 40,00 10 0 ,00
- 5 0,0 0
148,00 18,00 16 6 ,00
3 4,0 0
74,00 6,00 5,00 7,50 9 2 ,50
17,5 0
IS T Q B
D T ST
R OI
50,00
100,00
- 5 0,0 0
T e st A nalysis a nd D e s ign ( T A D )
250,00
266,00
- 16 ,0 0
T e st C yc le 1 ( T C 1)
360,00
358,50
1,5 0
T e st C yc le 2 ( T C 2)
470,00
451,00
19,0 0
T e st C yc le 3 ( T C 3)
580,00
543,50
3 6,5 0
T e st C yc le 4 ( T C 4)
690,00
636,00
5 4,0 0
T e st C yc le 5 ( T C 5)
800,00
728,50
71,5 0
T e st C yc le 6 ( T C 6)
910,00
821,00
8 9,0 0
T e st C yc le 7 ( T C 7)
1020,00
913,50
10 6,5 0
T e st C yc le 8 ( T C 8)
1130,00
1006,00
12 4,0 0
T e st P lanning ( T P )
number of test cycles n, can be calculated by the following formula: E_DTST(n) = TP + TAD + (TE + TER + TM) · n, where TP is the test planning effort, TAD the test analysis and design effort, TE the test execution effort per test cycle, TER the test evaluation and reporting effort per test cycle,
C. Interpretation of the Results The ROI estimation procedure presented in the last section can be summarized as follows. The overall estimated effort of DTST, denoted by E_DTST and depending on the
429
and TM the test maintenance effort per test cycle. For the ISTQB effort, denoted by E_ISTQB, a similar formula can be defined. The ROI as a function of the number of test cycles n is then defined as ROI(n) = E_ISTQB(n) – E_DTST(n). The break-even for DTST is reached as soon as the ROI is positive, i.e. its accumulated effort is lower than that of ISTQB in the same phase. The specific ROI values for our studied project are shown in the lower part of Table III and in Figure 2. The data shows that the break-even in the example project is already reached after the first test cycle. The estimation procedure can be easily adapted to different projects in the same test organization where constant efforts such as the definition of the test strategy and training of testers do not change, but also to projects in other test organizations where these constant values may be adapted. For the financial and insurance sectors at least, our experience shows that these proposed constant efforts are a good starting point for estimations in other organizations. Defect taxonomies of the same domain and technology can be re-used and customized, thus shifting the DTST breakeven to an earlier test phase, as constant efforts such as the creation of the defect taxonomy are reduced. Simulations may be applied on the basis of the @-values in Table III, such as the execution time of a test case. For instance, if the test execution time were to be only 0.1 Ph according to test automation, the break-even point would be reached only after TC 5.
industrial project from a public health insurance institution with the following results. (1) The ROI of DTST in a specific project can be estimated effectively by comparing the accumulated efforts of the ISTQB test process and DTST in a specific test phase. (2) The ROI estimation procedure is parameterized and can be adapted to the needs of specific projects and test organizations. Our results provide support for the management of ISTQB-based test projects. This approach allows more precise statements to be made about the quality of a release, while the ROI estimation procedure provides support for deciding whether to apply DTST or not. So far, DTST has only been applied to system testing based on the defect taxonomy of Beizer. In future, we will investigate how our approach scales for different defect taxonomies such as those of Kaner or IEEE. Finally, we will also integrate defect prediction into the estimation procedure and conduct further empirical studies in other industrial projects. Acknowledgment This work was sponsored by the project “QE LaB–Living Models for Open Systems (FFG 882740)”. REFERENCES [1] [2]
Ph 1200
[3] 1000
[4] 800
[5] [6]
DTST 600
ISTQB
[7]
400
[8] 200
[9] 0 TP
TAD
TC 1
TC 2
TC 3
TC 4
TC 5
TC 6
TC 7
TC 8
Test Cycles
Figure 2. Comparison of DTST and ISTQB efforts for ROI estimation [10]
With regard to the raised question, the presented estimation procedure and the interpretation of the results for the studied project show how an easily applicable estimation procedure to decide whether to apply defect taxonomies in a standard test process can be defined.
[11] [12]
V. CONCLUSIONS In this paper, we presented a defect taxonomy supported testing approach based on the standard ISTQB test process and an estimation procedure for its return on investment (ROI). The defect taxonomy supported testing (DTST) process considers links between prioritized requirements, defect categories, and failures. The ROI estimation procedure for DTST helps managers to decide whether to apply the approach in a standard test process or not. DTST and the ROI estimation procedure were applied to an
[13] [14] [15]
[16] [17]
430
N. Serrano and I. Ciordia, Bugzilla, “ITracker, and other bug trackers,” IEEE Software, vol. 22, pp. 11-13, 2005. ISTQB, “Standard glossary of terms used in software testing. Version 2.1,” International Software Testing Qualifications Board, Glossary Working Party, 2010. B. Beizer, “Software testing techniques,” International Thomson Computer Press, 1990. D. Graham and M. Fewster, “Experiences of Test Automation: Case Studies of Software Test Automation,” Addison-Wesley Professional, 2012. K. El Emam, “The ROI from software quality,” CRC Press, 2005. G. Vijayaraghavan and C. Kaner, “Bug taxonomies: Use them to generate better tests,” STAR EAST, 2003. C. Kaner, J. Falk, and H. Q. Nguyen, “Testing computer software,” Van Nostrand Reinhold, 1993. IEEE, “IEEE Std 1044-1993: IEEE Standard Classification for Software Anomalies,” 1993. R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and M. Y. Wong, “Orthogonal defect classification-a concept for in-process measurements,” IEEE Transactions on Software Engineering, vol. 18, pp. 943-956, 1992. A. Marchetto, F. Ricca, and P. Tonella, “An empirical validation of a web defect taxonomy and its usage for web testing,” Journal of Web Engineering, vol. 8, pp. 316-345, 2009. D. Vallespir, F. Grazioli, and J. Herbert, “A framework to evaluate defect taxonomies,” Argentine Congress on Computer Science, 2009. H. Erdogmus, J. Favaro, and W. Strigel, “Return on Investment,” IEEE Software, vol. 21, pp. 18-22, 2004. B. Boehm, “Value-Based Software Engineering,” ACM SIGSOFT Software Engineering Notes, vol. 28, p. 3, 2003. R. Ramler, S. Biffl, and P. Grünbacher, “Value-Based Management of Software Testing,” Value-Based Software Engineering, 2006. R. Black, “Managing the Testing Process: Practical Tools and Techniques for Managing Hardware and Software Tests,” Wiley, 2002. D-J. de Grood, “TestGoal – Result-Driven Testing, Springer,” 2008. S. Vegas and V. Basili, “A characterisation schema for software testing techniques,” Empirical Software Engineering, vol. 10, 2005.