When to Automate Software Testing? Decision Support Based on ...

When to Automate Software Testing? Decision Support Based on System Dynamics: An Industrial Case Study Zahra Sahaf1, Vahid Garousi1, 2, Dietmar Pfahl1, 3, Rob Irving4, Yasaman Amannejad1 1: Department of Electrical and Computer Engineering, University of Calgary Calgary, Canada

{zahras, vgarousi, dpfahl}@ucalgary.ca

2: Department of Software Engineering, Atilim University Ankara, Turkey

3: Institute of Computer Science, University of Tartu Tartu, Estonia

[email protected]

4: Pason Systems Corporation Calgary, Canada

[email protected]

ABSTRACT

Keywords

Software test processes are complex and costly. To reduce testing effort without compromising effectiveness and product quality, automation of test activities has been adopted as a popular approach in software industry. However, since test automation usually requires substantial upfront investments, automation is not always more cost-effective than manual testing. To support decision-makers in finding the optimal degree of test automation in a given project, we propose in this paper a simulation model using the System Dynamics (SD) modeling technique. With the help of the simulation model, we can evaluate the performance of test processes with varying degrees of automation of test activities and help testers choose the most optimal cases. As the case study, we describe how we used our simulation model in the context of an Action Research (AR) study conducted in collaboration with a software company in Calgary, Canada. The goal of the study was to investigate how the simulation model can help decision-makers decide whether and to what degree the company should automate their test processes. As a first step, we compared the performances of the current fully manual testing with several cases of partly automated testing as anticipated for implementation in the partner company. The development of the simulation model as well as the analysis of simulation results helped the partner company to get a deeper understanding of the strengths and weaknesses of their current test process and supported decision-makers in the cost effective planning of improvements of selected test activities.

Software testing, automated testing, manual testing, decision support, process simulation, system dynamics.

1. INTRODUCTION As of year 2002, software quality issues in the form of defects (bugs) cost the United States economy an estimated $59.5 billion annually and it is estimated that improved testing practices could reduce this cost by $22.5 billion [1]. The steps of a software test process can be conducted manually or automated. For example, in manual testing, the tester may assume the role of an end-user executing the software manually to identify unexpected behavior and defects. In automated testing, the tester may write test code scripts (e.g., using the JUnit framework) which are executed automatically to check the software under test (SUT) [2]. Deciding when to automate testing is a frequentlyasked and challenging question for testers in industry [3]. As test automation requires an initial investment of effort, testers are interested in finding the answer to this question. More specifically, testers are interested in understanding the return on investment (ROI) of test automation, i.e., when a positive ROI is reached and what total ROI can be achieved. Several studies have been carried out to investigate the ROI of test automation [4-9]. However, these studies are limited as they either focus exclusively on the process step of test execution, or calculate the ROI exclusively based on variables which must be converted to one unit (typically ‘time’, ‘effort’ or ‘cost’) and thus ignore influencing factors that cannot be transformed into a unique unit (e.g., variables representing ‘size’, ‘skill’, or ‘quality’), or merely present theoretical models without empirical evaluation. The objective of our study is to assess the ROI of test automation with a holistic approach, i.e., incorporating essential aspects of software testing processes that – according to the existing literature – influence the performance of software testing processes. Furthermore, our holistic view on the problem aims at evaluating both manual and (at least partly) automated testing, considering all steps typically involved in software testing, from test-case design, to test-case scripting, to test execution, test evaluation and reporting of test results.

Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging.D.2.9 [Software Engineering]: Management – Software quality assurance.K.6.3.3 [Management of Computing and Information Systems]: Software Management – Software development, Software process.

General Terms Management, Measurement, Experimentation, Verification.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

We develop in this work a System Dynamics (SD) process simulation model that serves as a tool to investigate various configurations of test processes for the purpose of decision support on when to automate software testing. SD is a modeling technique to model, study, and manage complex systems [10]. It

ICSSP’14, May 26–28, 2014, Nanjing, China Copyright 2014 ACM 978-1-4503-2754-1/14/05...$15.00 http://dx.doi.org/10.1145/2600821.2600832

149

Indirect (or ‘higher-order’) factors, side-effects, and feedbackloops are usually excluded from the analysis.

aims at identifying the key variables representing the system states, and defines their linkages to establish a representation of the system structure. For our study, we developed a generic SD model representing the key steps of software testing and then customized and calibrated this model to the test processes of Pason Systems Corporation, the industrial partner in this study. By instantiating the SD model for the cases of manual and automated testing, we could simulate each process option separately. Based on an in-depth analysis of the simulation results, we then compared the strengths and weaknesses of each process option and provided answers to the questions of when a positive ROI is reached and what total ROI is achieved. In addition, the simulation model helped us investigate whether proposed process changes have the potential to increase the ROI of test automation.

In addition to publications reporting the results of cost-benefit analyses of test automation, we found several literature reviews and surveys focusing on collecting evidence about benefits and limitations of manual and automated testing [11, 12]. For example, in a recent in-depth study combining a literature review and a survey, Rafi et al. [11] found that benefits of test automation mainly relate to better reusability and repeatability of tests as well as higher test coverage and lower test execution cost. The biggest impediment for test automation was the high initial investment into test automation setup, tool selection and training. The authors also indicate that benefits of test automation often had support from experiments and case studies, while statements about limitations and impediments of test automation often originate from experience reports. Furthermore, the authors noticed that in their survey and follow-up interviews, responses from students and faculty in academia did not always comply with responses received from engineers and managers in industry. With a stronger focus on the industry-perspective, Taipale et al. [12] explored the current state of automation in software test organizations by collecting and analyzing views and observations of managers, testers and developers in several organizations. The authors found that although test automation is viewed as beneficial, it is not utilized widely in the companies. According to Taipale et al., the main benefits of test automation are improved code quality, the possibility to execute more tests in less time and easy reuse of testware. The major disadvantages of test automation are related to the costs associated with developing test automation especially in dynamic environments. In addition, properties of tested products, attitudes of employees, resource limitations, and the types of customers influenced the level of test automation found in surveyed organizations.

In this paper, we present the following contributions to our industrial partner and the research community: 1.

2. 3.

A conceptual test process reference model comprising the typical steps of a comprehensive software test process, the relations between steps, and the key parameters influencing test process performance. (Section 3) Provision of a modular, customizable and scalable SD simulation model implementing the conceptual test process reference model. (Section 4) Demonstration that an SD-based simulation model can be successfully used as a cost-benefit analysis tool for comparing manual and (semi-)automated testing. (Section 5)

The rest of the paper is organized as follows. In Section 2, we discuss background and related work. In Section 3, we present the steps contained in a typical software test process and use it as a reference model for the SD modeling stage. In Section 4, we present the SD model representing the typical software test process. In Section 5, we present the industrial case study utilizing the approach, summarize and discuss the simulation result for different scenarios, i.e., manual versus automated testing. Finally, Section 6 presents conclusions and directions for future work.

While the results from the surveys and literature reviews are helpful to gain an understanding about the available empirical evidence and the opinions of researchers, engineers and managers on strengths and limitations of test automation, it does not provide guidance for companies on what to automate (and what not) in their specific context.

2. BACKGROUND AND RELATED WORK

2.2 Software Testing Simulation Methods

We present a brief overview of published literature related to our research topic, i.e., the investigation of costs and benefits of full or partial automation of software testing, and software testing simulation methods.

Beginning with the seminal work by Kellner and Hansen [13] and Abdel-Hamid and Madnick [14] in the late 1980s, software process simulation has become an active and growing area of research [15] and many companies have experimented with using process simulation models as a management and decision-support tool. However, only a small number of the published simulation models focus on the analysis/improvement of software test processes. For example, Collofello et al. [16] proposed using process simulation (using an SD model) to identify and analyze factors responsible for testing costs. Among other factors, the type of software to be developed and associated quality requirements as well as the degree of thoroughness of testing contributes to costs and benefits of testing. Rus et al. [17] developed a process simulator that aimed to investigate the impact of management and reliability engineering decisions on the reliability of software products. Raffo and Wakeland [18] developed a process simulator that was used to investigate costs and benefits of inspections, verification, and validation activities in NASA software development projects. Garousi et al. [19] presented an extensible and customizable SD process simulation model that can be used to assess trade-offs of different combinations of verification and validation activities, including inspections and various types of test levels, with regards to project performance parameters such as effort, duration, and product quality. None of the aforementioned

2.1 Cost-Benefit Analysis Methods The standard approach to compare manual with automated testing and analyzing the strengths and weaknesses of each option is costbenefit analysis. [5-7]. The limitations of this method include: (a) typically, in a cost-benefit analysis all cost and benefit factors are converted into a single unit of comparison, i.e., time, effort, or cost (using a monetary unit such as US dollar). This inevitably results in a predominant focus on immediate time, effort or cost saving benefits of the automated test activities. However, test automation may have other more indirect or long-term benefits such as improved maintenance, defect detection effectiveness, and reusability of product code and test code; (b) Moreover, the necessity of expressing all factors influencing cost or benefit in terms of a single unit might result in ignoring or overlooking important factors that cannot (easily) converted into the single unit of choice (e.g., skills of engineers, size and quality of artifacts, and familiarity with tools); (c) Finally, cost-benefit analysis methods are limited to considering only direct (or ‘first-order’) factors influencing the cost and/or benefit of a test activity.

150

publications investigated to what extent test automation influences the costs and benefits of testing.

changed by the modeled system itself.

During the past five years several process simulators were developed modeling Agile and lean software development processes (e.g., [20, 21]). This is related to our research, since most agile and lean processes suggest automation test activities (in particular unit testing). However, the published simulation models representing agile and lean development processes were neither designed (i.e., they lack structural detail about testing activities) nor used to investigate how the automation of one or several test activities influences the cost-benefit balance of software development projects.

Figure 1. Symbols of System Dynamics Model Typically, flows are calculated as mathematical functions of levels, (other) rates, auxiliary variables and constant variables. As the simulation advances in small time increments, it computes at each increment the changes in levels and flow rates. Thus, an SD process can be imagined as the approximation of a continuous, fluid-like process of a liquid accumulating into and flowing out of a container. In software development, for example, the defect generation rate may be treated as a flow rate and the current numbers of defects at any point in time could be treated as a level. When a flow originates from outside the scope of the system to be modeled, the flow’s origin is called a ‘source’ and represented by a cloud symbol. Similarly, when the destination of a flow is outside the scope of modeling, it is called a ‘sink’.

2.3 Background on System Dynamics System Dynamics (SD) was introduced by Forrester as a simulation modeling technique that applies the engineering principles of feedback and control to socio-technical systems [10]. In SD, a system is defined as a self-contained collection of elements that continually interact with each other. The two important aspects of SD models are structure and behavior. Structure is defined as the collection of components of a system, and their relationships. The structure of the system may also include parameters representing external influence on the system’s behavior. Behavior is defined as the way in which the elements (or variables) composing the modeled system vary over time.

There exist several professional SD modeling and simulation tools, e.g., Vensim, PowerSim, iThink and STELLA. In our case study, we used Vensim as it seems to be the SD tool with the most comprehensive set of analysis functionality.

SD models describe a system in terms of ‘levels’ representing the states of the system (refer to Fig. 1). Levels are accumulators (or stocks) of incoming and outgoing ‘flows’ of material. The flow ‘rates’ describe how much material can flow into or out of a level per time step. For internal calculations, SD models can have ‘auxiliary variables. Model parameters are represented by socalled ‘constant’ variables. Constant variables have fixed values that cannot change due to internal system behavior. Their values can only be changed by the model user and thus they are either used to represent external influence on the model system or to calibrate the model to empirically observed data which cannot be

Test-case Design

uses

Self knowledge

3. TEST PROCESS REFERENCE MODEL Based on standard textbooks on software testing and incorporating different views and classifications [22-24], we developed a test process reference model which is shown in the form of a UML activity diagram in Fig. 2. As per their nature, manual and automated testing includes different activities. By synthesizing the views and classifications found in the literature, we divide the testing process into the following activities:

Manual Test Execution

Exploratory testing

Activity/data providing benefit triggers

Test Scripting

Test suite

Manual Tester

Cost-incurring activity (effort)

Manual test script

Co-maintenance

triggers Need for Maintenance

triggers

Triggers (need for new test cases)

SUT Artifact

triggers

Maintenance

triggers

Fault

Test Evaluation Fault Localization

Developer changes

Fault detection effectiveness

triggers

Pass

uses Test-case Design

Test Developer (Automated Testing)

Test-code Refactoring, a.k.a. perfective maintenance

Co-maintenance, a.k.a. Test Repair tradeoff in cost changes

improves Test suite

Test Scripting

Test scripts

triggers

Fault detection effectiveness

Test Execution

Test Evaluation

Figure 2. Software testing reference process

151

Test Results

Failure

1. 2. 3. 4. 5. 6.

Note that the software testing reference model as presented in Fig. 2 does not distinguish between various test levels (e.g., unit testing versus system testing). This distinction is not needed since the model can simply be instantiated to the test level of interest.

Test-case design: Preparing test data and developing test procedures Test scripting: Documenting test cases in manual test scripts or test code for automated testing Test execution: Running test cases on the software under test (SUT) and recording the results Test evaluation: Evaluating results of testing (pass or fail), also known as test oracle or test verdict. Test co-maintenance: Repairing existing manual test cases and test scripts when the SUT is maintained or addition of new test cases. Test code refactoring (only for automated testing): Perfective maintenance of code that is needed for automated test script execution.

4. DEVELOPMENT OF SYSTEM DYNAMICS (SD) MODEL The activities defined for software testers in the software testing

reference process set the scope of our SD model which thus reflects the basic building blocks of a testing process. Similarly to what we mentioned at the end of the previous section about the possibility to instantiate the software testing reference model, the SD model described in this section could be instantiated accordingly. For the specific application at Pason, our case company (cf. Section 5), we focus mainly on system testing activities.

Fig. 2 also shows related activities performed by testers and developers in response to the test activities, i.e., fault localization and maintenance of the SUT. In each test activity, artifacts are produced. At first, in the test case design step, test cases are designed using either black- or white-box test approaches. The output of this step is a test suite, forming the input to the next step, test scripting. Test scripts, the output from the test scripting step, are the set of instructions that are performed on the SUT. When the test scripts are executed, test results are generated and – based on a test oracle – verdicts (pass or fail decisions) are reported to developers. Maintenance activities on test suites are needed for two reasons: (1) either new test cases and scripts have to be added due to the change in software requirements, or (2) it turns out that test cases and/scripts have to be corrected. In the case of automated testing, in addition to maintenance also test-code refactoring activities might be needed.

To make the SD model structurally uniform, independent from the decision whether a test activity is performed manually or automatically, we subsumed activity ‘test code refactoring’ under ‘test maintenance’. By parameterizing and initializing elements (i.e., testing activities) of the SD model differently for manual and automated testing, we can compare these two options, for example, by checking how many test cases can be executed in the same period of time, how much time can be devoted for each activity, how much time should be saved as a whole, or any other desired aspect in perspective of test manager. In order to represent the software testing reference process in our SD model, we had to identify levels (or: stocks) and flow rates (attached to the in- and out-flows of a level). This was done in a straight-forward manner. After subsuming the refactoring step under maintenance, and after sub-dividing the execution step into

Reusing

Scripts Cycle Checker

Initial Test Case rate

Planned Test Cases

Passing

Test Results

Test Suite Designing

Passing Tests

Executing

Scripting

Failing

Failing Tests

New Test Cases

Reporting Passed

Evaluating and Reporting Failed

Reports

Correcting TCs

Maintaining and Refactoring Scripts

Updated Test Cases

Figure 3. The SD model of software testing process

152

Maintaining TCs due to SW maintenance

sub-activities executing, evaluating, and reporting, we ended up with the following flows: Designing, Scripting, Executing, Reporting Passed and Evaluating/Reporting Failed (tests), as well as Maintaining/Refactoring (scripts) and Correcting (as shown in Fig. 3). The levels associated with each flow are labeled as follows: Planned Test Cases, Test Suites, Scripts, Test Results (split into Passing Tests and Failing Tests), Reports and Updated Test Cases.

provides drilling data for well-site and office personnel using a robust system of computers and sensors networked around a rig. ETS provides facilities to record comprehensive drilling information on demand. Table 1. Description on the important parameters in SD model SD Model Parameters

In SD models, the value of a level equals the sum of the inputflows minus the sum of the output-flows, e.g. the equation defining the level ‘Scripts’ corresponds to equation (1).

New Test Cases

(1) Scripts = Scripting + Reusing + Maintaining and Refactoring Executing

Cycle Checker Productivity of Designing* Productivity of Scripting* Productivity of Execution* Productivity of Evaluating and Reporting* Productivity of Maintaining and Updating*

Flow rates are expressed in terms of productivity (e.g., number of test cases designed per day). For instance, the equation of ‘Productivity of Designing’ used to calculate flow rate ‘Designing’ (cf. Fig. 3) corresponds to equation (2). (2) Productivity of Designing = Prod Coeff D * (((Net hour per week D - Training overhead per week per person D - Communication Overhead per week per person D) / Net hour per week D) * Total Person Available per Hour D) Net hour per week D

Total Person Available per Hour D Training overhead per week per person D

Average Experience Level D

Communication Overhead per week per person D

Prod Coeff D

Passing/Failing Rate Productivity of Designing

Correcting (Test Cases)

Net Person Available per Hour D

Maintaining and Refactoring (Scripts)

Figure 4. Factors having an impact on Productivity Rates

Net employee Module

Fig. 4 shows the graphical representation of equation (2) as implemented in the SD model. There are four model parameters determining the productivity rate of a test activity: Average Experience Level, Net hour per week, Total Person Available per Hour, and Prod Coeff. These parameters must be calibrated to the specific project environment for each activity (in equation (2), the letter ‘D’ stands for ‘Designing’). Average Experience Level and Prod Coeff (which stands for ‘Productivity Coefficient’) should be either estimated or measured. How we did this in the context of our case study is explained in the next section. Finally, the detailed information concerning the important parameters defined in the SD model is explained in the Table 1.

Description Number of Test Cases which may be added after several iterations due to change in requirements. This variable shows duration of each iteration (cycle) and controls the beginning and end of each iteration (cycle). Number of test cases which should be designed per hour. Number of test cases which should be scripted per hour. Number of test cases which should be executed per hour. Number of test cases which should be evaluated and reported per hour. Number of test cases which should be maintained and updated per hour. After execution of test cases, what percentage of test cases has passed or failed. Number of test cases which should be corrected per hour, in a way that new test cases should be created from scratch due to the issue with their designing. Number of test cases which should be Maintained and Refactored per hour, in a way that test scripts should be fixed to be improved in terms of their quality. This module determines the number of employees which are involved in each activity.

* These parameters might be estimated or are derived from more elementary parameters (cf. Equation 2 in Section 4).

5.2 Model Calibration In the following, we focus on the system testing activities as it was the need of our industrial partner. The baseline for the calibration of the SD model in the context of our project under study was the situation when all test activities are performed manually. The company has recorded data from past projects as well as substantial expert knowledge on which we could base parameter estimates in case the available data were not sufficient. This was particularly the case for parameter ‘average experience level’. Measurement data were carefully extracted from the company’s HR tools, logging systems, development tools, reporting tools, and testing tools. In Table 2, we report the data collection methods and resources for each SD model parameter. Due to confidentiality, we are unable to disclose actual parameter values, but instead use the notion of “Base” values in Table 3. The data sources available for calibrating the SD model for (semi)automated performance of test activities were sparse within Pason as little experience with automated testing was available within the company at the time of our study. Thus, we had to rely mainly on expert opinion.

5. INDUSTRIAL CASE STUDY To evaluate the applicability and effectiveness of our approach, we applied it to an industrial case study. The goal of our industrial case study was initially to explore and measure the corresponding factors of designed SD model at the industry case process (Sections 5.1 and 5.2) Of the crucial importance is the second goal is to find out When/what the Return on Investment (ROI) is for automated testing (Sections 5.3 and 5.4), therefore it could potentially help the test managers to discover a sweet spot in the trade-off relation between manual and automated testing based on the current situation of testing process (Section 5.5).

The variables ‘Code Excellency’, ‘Production Code Changes’, and ‘Proposition of New Requirements’ are used to determine the level value ‘Updated Test Cases’ in the SD model (but structural details are not shown in Fig. 3).

5.1 Project under Study The software project chosen for the case study is a system called Electronic Tour Sheet (ETS), which develops a critical part of Pason’s Electronic Drilling Recorder (EDR) system. Pason’s EDR

153

Table 2. Data Collection Approach/Resources for SD Model Parameters SD Model Parameters New Test Cases Cycle Checker Productivity of Designing Productivity of Scripting Productivity of Execution Productivity of Evaluating and Reporting Productivity of Maintaining and Updating Passing/Failing Rate

Correcting (Test Cases)

Maintaining and Refactoring (Scripts)

Technical Debt Code Excellency Production Code Changes Proposition of new requirements Net employee Module

have been decreased. Scenario 3 represents the case where not only ‘Executing’ is automated but in addition steps ‘Designing’ and ‘Evaluating and Reporting’ are partly automated, thus their productivity coefficients increased. Since the automation of ‘Executing’ slows down the step ‘Scripting’, in Scenario 4, the manpower for this activity is tripled as compared to Scenario 3 (everything else unchanged). Finally, Scenario 5 represents the case where the number of test cases is doubled after 10 development cycles (everything else unchanged as compared to Scenario 4).

Measurement Approach/Resources Manual Testing

Automated Testing

Pason’s test management tool (QAComplete) Pason’s release note document (QAComplete) Manually calculated by researchers Pason’s test management tool (QAComplete)

Pason’s test management tool (QAComplete)

Execution point [26]

Interview with engineers

Pason’s test management tool (QAComplete) Pason’s Operation Support system (OPS), issue tracking tool (Jira) Pason’s code management tool (SVN), issue tracking tool (Jira) Pason’s test management tool (QAComplete),and issue tracking tool (Jira) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Interview with the QA manager

Pason’s release documents Hexawise tool [25]

Table 3. Simulation Scenarios

Interview with Pason’s Employee

Rates

Interview with Pason’s Employee Interview with Pason’s Employee and Pason’s Operation Support system (OPS), issue tracking tool (Jira)

Runs Scenario 1: Baseline – fully manual testing Scenario 2: Automated Execution Scenario 3: Automated Execution and semi-automatic Designing and Eval./Reporting Scenario 4: As Scenario 3 but with 3 times the personnel for Scripting Scenario 5: As Scenario 4 but with additional test cases injected in cycle 11.

Interview with Pason’s Employee Pason’s Operation Support system (OPS), issue tracking tool (Jira) Pason’s code management tool (SVN), issue tracking tool (Jira) Pason’s code management tool (SVN), issue tracking tool (Jira) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Pason’s code management tool (SVN) Interview with the QA manager

Prod Coeff D [TC/h]

Prod Coeff S [TC/h]

Prod Coeff E [TC/h]

Prod Coeff ER [TC/h]

Prod Coeff MR [TC/h]

Base

Base

Base

Base

Base

Base

Base / 7

Base x 80

Base

Base / 1.5

Base x 3

Base / 7

Base x 80

Base x 8

Base / 1.5

Base x 3

Base / 7

Base x 80

Base x 8

Base / 1.5

Base x 80

Base x 8

Base / 1.5

Base x 3 (additi onal TCs)

(more manpo wer) Base / 7 (more manpo wer)

The following subsection shows the simulation results for each scenario. The subsequent subsection reports the feedback received from Pason about the usefulness of the SD model and its application for the company.

5.4 Simulation Results The following assumptions about the project setup apply uniformly to all scenarios:

5.3 Simulation Scenarios We calibrated the SD model to the ETS environment (see Section 5.2) and then ran several simulations representing various degrees of test automation and varying manpower allocations and additional test-cases. The scenarios are summarized in Table 3. The first scenario in Table 3 represents the baseline case, i.e., the case of fully manual execution of all test activities. Scenario 2 represents the case with automated test execution. Thus, the productivity coefficient related to ‘Executing’ (Prod Coeff E) is set 80 times higher than that of manual testing. Since automated test execution requires more scripting work, the productivity coefficients for ‘Scripting’ and ‘Maintenance and Refactoring’

154



Pason uses the Scrum development methodology with sprints of two months duration (six iterations per year)



In each sprint, the total time allocated to the Quality Assurance team for testing the system is 44 hours.



With the exception of Scenario 5, the total number of test cases is fixed for all iterations. The main reason is that we want to be more focused on important factors such as effort and time in some of the scenarios.



The goal is to run as many of the planned test cases as possible per iteration.



In each iteration, all planned test cases are created from scratch. By this assumption, we have covered all the possible scenarios in the real world from worst case (which is creating test cases from scratch) to best case (which is reusing prior test cases and add new ones). In addition to that, in some

cases test managers need to assess such kinds of analysis (balancing between manual and automated testing) for some new project. Therefore, it could be applicable for all cases.

Fig. 7 shows the number of test cases designed, scripted, executed and reported per QA cycle for Scenario 3. The difference to Scenario 6 is that now also steps ‘Designing’ and ‘Evaluating and Reporting’ are done with tool support (and thus semiautomatically). The effect is that now almost all planned test cases can be completed and thus are reported. However, the low productivity in the step ‘Scripting’ is still an obstacle for a faster ‘build-up’ and the achievement of 100% planned test case execution and reporting.

Fig. 5 shows the number of test cases designed, scripted, executed and reported per QA cycle (44 hours) for Scenario 1. Of the planned 40 test cases, only 14 test cases are reported. This is mainly due to the low productivity of the ‘Executing’ step. Fig. 6 shows the number of test cases designed, scripted, executed and reported per QA cycle for Scenario 2. In comparison with Scenario 1, more test cases can be completed per iteration (approximately 25 test cases). This is due to the higher productivity of the step ‘Executing’. However, one can also see from Fig. 6 that the ‘build-up’ takes longer than in Scenario 1. This is mainly due to the lower productivity in the step ‘Scripting’. In terms of ROI, one can argue that almost twice as many test cases can be completed once the build-up phase is over.

Fig. 8 shows the number of test cases designed, scripted, executed and reported per QA cycle for Scenario 4. In this scenario, the manpower for step ‘Scripting’ has been tripled (with all other parameters unchanged as compared to Scenario 3). As a result, it is now possible to complete all of the 40 planned test cases beginning with the fourth cycle. Thus, in terms of ROI, full automation of step ‘Execution’ and partial automation of steps ‘Designing’ and Evaluating/Reporting’ in combination with an Reports 30

15

22.5

TC

TC

Reports 20

10

5

15

7.5

0

0 0

100

200

300

400 500 600 Time (Hour)

700

800

900

1000

0

Reports : Run-Fig5

100

200

300

800

900

1000

700

800

900

1000

Figure 6. Scenario 2

Reports

Reports 40

30

30

TC

40

20

10

20

10

0

0 0

100

200

300

400 500 600 Time (Hour)

700

800

900

1000

0

Reports : Run-Fig7

100

200

300

400 500 600 Time (Hour)

Reports : Run-Fig8



Reports 80

Units: 60

TC

700

Reports : Run-Fig6


TC

400 500 600 Time (Hour)

y-axis: TC: number of test cases x-axis: Time (Hour): work-hours

40

20

0 0

100

200

300

400 500 600 Time (Hour)

700

800

900

1000

Reports : Run-Fig9


155

increase of manpower in step ‘Scripting’ results in a full achievement of the intended goal in relatively short time.

are specific for the case company – partly derived from measurement data and partly expert opinion. Thus, the specific results cannot be generalized and transferred to other companies easily. It should also be noted that there exist standard build-in functions in all modern SD simulation tools that help mitigate the issue of uncertainty in the data used for model calibration. For example, it is possible to run the scenarios described in Section 5.4 not with point estimates of the parameter values but with input distributions for each parameter [27].

Fig. 9 shows the number of test cases designed, scripted, executed and reported per QA cycle for Scenario 5. In this scenario, all parameters are set as in Scenario 4. The only difference is that after 10 cycles, the decision is made that the number of test cases per cycle shall be doubled. As can be seen from Fig. 9, after second build-up phase of five cycles, the total of 80 test cases can be developed, executed and reported.

We defined our SD model at a level of abstraction such that it can be considered as a kind of framework containing all essential elements representing the typical steps in a testing process. The design of the SD model is such that it has the following properties:

5.5 Benefits to the Company The feedback we received from Pason on using process simulation for exploring options regarding the automation of their test process is summarized as follows:

5.5.1 Reuse of study results in other projects

The data collection approach, the calibration of the proposed SD model and its application are independent of the properties of the software under test (SUT). Therefore, the SD model can be used in other projects. According to Pason’s QA manager, using the simulation result of the SD model helped improve the planning of the test processes, supports the distribution of tester effort across activities, and helps obtain a better understanding of the factors that influence the effectiveness of the testing process as a whole.



Multi-causality: The model offers the possibility to assess the impact of various parameters not only individually but concurrently. In other words, it is a holistic approach which does not limit itself to mono-causal dependencies but facilitates the consideration of complex inter-dependencies among impact factors.



Comprehensiveness: Similarly, the SD model offers the possibility to assess the impact of various model parameters not only on one performance parameter (representing the ROI of test automation) but on several performance parameters simultaneously, thus allowing for trade-off analyses and holistic assessment of the ROI of test automation.



Flexibility: The SD model offers the possibility to combine empirical data with expert knowledge as well as deterministic with stochastic simulation.



Adaptability: The SD model can be adapted to various test levels by simple instantiation. In addition, the model can be customized to specific test processes and data by users unrelated to our case company.

5.5.2 Improved measurement process

A typical phenomenon in the software industry is the lack of accurately measured data on the progress of projects. Similarly, at the beginning of our study, Pason did not have reliable data on spent effort and time for test activities such as execution and scripting. By developing the SD model for the simulation of Pason’s testing process, the usefulness of having data available could be better motivated and methods for systematic data collection (using the existing tools) became a new element of the company culture.

5.5.3 Benefits for the testing process

At Pason, due to lack of time or knowledge, crucial activities in the testing process are skipped or not performed efficiently. With the help of the developed SD model, several shortcomings of the existing test process at Pason were identified and removed. For example: (1) previously in Pason, software engineers conducted the test activities in a sequential way, which has now been partially changed to working in parallel; (2) test-case design phase had been omitted in the past and has now been incorporated in the test process; and (3) potential reusability and technical debt had not been noticed in the “scripting” phase for automated testing, which is now being followed-up upon.

5.7 Threats to Validity To assess the validity threats of our case study, we used the guidelines introduced in [28]. Robson lists three threats to validity in this type of research, i.e. action research or doing research in the real world: 

Reactivity (or Hawthorne effects): If cause-effect relationships are found they might not be generalizable to other settings or situations if the effects found only occurred as an effect of studying the situation. In the context of our study, reactivity relates to the (positive) effects and benefits observed by Pason when using our SD model (cf. Section 5.5). Since our case study consisted to a large part in developing and applying the SD model within Pason and some of the positive effects observed are only indirectly related to the pure application of the SD model in order to find a good degree of test automation, it might be so that not all of the positive effects will occur in other organizations if they only use the (and customize) the SD model and then run simulation to find the sweet spot of automation.



Researcher bias (or experimenter bias): This type of bias can be described as a process where the scientists performing a study influence the results in order to portray a certain outcome. To reduce the risk of researcher bias we involved members of the case company in all steps of the research so give them insight into all steps of model building, data collection, model calibration, and model application, thus creating transparency and giving room for control and the

5.6 Discussion We used SD modeling to simulate the behavior of manual versus automated testing. The goal was to investigate whether there is a “sweet spot” that maximizes the ROI of test automation. The benefit of using SD modeling is that this modeling approach provides the means to gather all relevant factors and parameters in a single model representing a holistic view on the problem at hand. While the variables included in the model can be diverse, i.e., representing very different aspects of the testing process under investigation, simulation results yield results with regards to performance parameters of interest (e.g., test cases executed per time unit) by integrating and combining the various influences represented by the model variables. The application scenarios and their results presented in Sections 5.3 and 5.4 are meant to illustrate how the SD model can be applied in a software company in order to assess the performance of various degrees of automation. The data used in the scenarios

156



possibility to criticize and intervene in case members of the target company felt the need for doing so.

run, this might yield a robust theory about the benefits of test automation.

Respondent bias: This type of cognitive bias can affect the result of a study if respondents answer questions in the way they think the questioner want them to be answered rather than according to their true beliefs. In the context of our study the risk for this kind of bias was limited to situations where we asked for data and descriptions of the test processes at Pason. In order to address the risk of respondent bias, we let two researchers conduct interviews independently and with different members of Pason. Also, we did not present to Pason the test process reference model (cf. Section 3) before we finished the design of the SD model as to not influence the informants when describing their test processes to us.

7. ACKNOWLEDGMENTS The authors would like to thank all the engineers and managers at Pason Systems Corporation for theır support and actıve involvement in this research project. This work was mainly supported by an NSERC ENGAGE grant from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Pason Systems Corporation. Vahid Garousi was additionally supported by the Visiting Scientist Fellowship Program (#2221) of the Scientific and Technological Research Council of Turkey (TÜBİTAK). Dietmar Pfahl was supported by the Estonian higher education information and communications technology and research and development activities state program 2011-2015 (ICT program) - EU Regional Development Fund, and by the institutional research grant IUT20-55 of the Estonian Research Council.

6. CONCLUSIONS AND FUTURE WORK The goal of our research is to find better ways for assessing the ROI of test automation. The solution we are proposing is to use simulation in order to compare various configurations of test processes. By comparing the performance of test processes which are fully manual, or fully automated, or consist of any technically feasible combination of manually and automatically conducted test activities, we try to identify that configuration that has the best ROI in a specific context. The simulation results in our case company indicate that the benefits of test automation depend on many factors, e.g., available resources per test activity, and number of target test cases. While our study demonstrates the applicability of our model, the concrete results of the simulations do have a degree of uncertainty. In order to assess the potential impact of uncertainty in input data on simulation results, more simulation experiments need to be conducted, in particular experiments that sample input data from probability distributions rather than conducting deterministic simulations using point estimates as input data.

8. REFERENCES [1] [2] [3] [4] [5] [6]

Although software testing is an intensively researched topic in the software engineering field, not much support is available to decide what test activities are worthwhile automating – and what activities are not. Full-fledged empirical investigations and experiments are required to find the best combination of manually conducted and automated test activities for a specific context. Since the costs for such studies are extremely high, the idea of building a simulation model representing the essential test activities is a good compromise between comprehensive and indepth empirical analysis on the one hand and simple expert opinion on the other hand. Simulation models such as the one presented in this paper have their limitations due to the abstractions and partly lacking data for calibration – but once in place they allow for easy evaluation of many different configurations of all test processes in an organization.

[7]

[8] [9]

[10]

While our SD model is the first model that has been designed for the purpose of analyzing the benefits (or the ROI) of test automation, and first positive results have been achieved in the context of our case study at Pason, much work has still to be done. In order to improve the trustworthiness of the results produced by our SD model, we would like to conduct more studies within Pason in order to assess the usability of the model and the validity of the simulation results within a larger time frame. We have also started to add new components to the current version of the SD model that will add new dimensions to the calculation of the ROI of test automation. For example, we have now added a component allowing us to predict the effort spent on the various test activities under given resource allocations. In addition, we are looking for new cases allowing us to exploit (and evaluate) the customizability of our SD model in various contexts. In the long

[11]

[12]

[13]

157

G. Tassey, The economic impacts of inadequate infrastructure for software testing, RTI Final Report, National Institute of Standards and Technology, May 2002. E. Dustin, T. Garrett, and B. Gauf, Implementing automated software testing: How to save time and lower costs while raising quality: Addison-Wesley Professional, 2009. K. Stobie, "Too much automation or not enough? When to automate testing," presented at the Pacific Northwest Software Quality Conference, 2009. D. J. Mosley and B. Posey, Just enough software test automation: Prentice Hall Professional, 2002. D. Hoffman, "Cost Benefit Analysis of Test Automation", presented at Software Testing Analysis and Review (STAR West), 14 p, 1999. R. Ramler, S. Biffl, and P. Grünbacher, "Value-based management of software testing," in Value-Based Software Engineering, ed: Springer, 2006, pp. 225-244. R. Ramler and K. Wolfmaier, "Economic perspectives in test automation: balancing automated and manual testing with opportunity cost," in International workshop on Automation of Software Test, 2006, pp. 85-91. L. Huang and B. Boehm, "How much software quality investment is enough: A value-based approach," IEEE Software, vol. 23, pp. 88-95, 2006. Q. Li, M. Li, Y. Yang, Q. Wang, T. Tan, B. Boehm, et al., "Bridge the gap between software test process and business value: a case study," in Trustworthy Software Development Processes, ed: Springer, 2009, pp. 212-223. J. W. Forrester, Industrial dynamics vol. 2: MIT press Cambridge, MA, 1961. D. M. Rafi, K. R. K. Moses, K. Petersen, and M. Mantyla, "Benefits and limitations of automated software testing: Systematic literature review and practitioner survey," in International Workshop on Automation of Software Test, 2012, pp. 36-42. O. Taipale, J. Kasurinen, K. Karhu, and K. Smolander, "Trade-off between automated and manual software testing," International Journal of System Assurance Engineering and Management, vol. 2, pp. 114-125, 2011. M. I. Kellner and G. A. Hansen, "Software process modeling: a case study," in Proceedings of Annual Hawaii International Conference on System Sciences, 1989, pp. 175-188.

[14] T. K. Abdel-Hamid and S. E. Madnick, Software project dynamics: an integrated approach vol. 1: Prentice Hall Englewood Cliffs, 1991. [15] H. Zhang, B. Kitchenham, and D. Pfahl, "Software process simulation modeling: an extended systematic review," in New Modeling Concepts for Today’s Software Processes, ed: Springer, 2010, pp. 309-320. [16] J. S. Collofello, Z. Yang, J. D. Tvedt, D. Merrill, and I. Rus, "Modeling software testing processes," in Proceedings of Annual International Phoenix Conference on Computers and Communications, 1996, pp. 289-293. [17] I. Rus, J. Collofello, and P. Lakey, "Software process simulation for reliability management," Journal of Systems and Software, vol. 46, pp. 173-182, 1999. [18] D. M. Raffo and W. Wakeland, "Assessing IV & V benefits using simulation," in Proceedings of NASA Goddard Software Engineering Workshop, 2003, pp. 97-101. [19] V. Garousi, K. Khosrovian, and D. Pfahl, "A customizable pattern‐based software process simulation model: design, calibration and application," Software Process: Improvement and Practice, vol. 14, pp. 165-180, 2009. [20] F. Glaiel, A. Moulton, and S. Madnick, "Agile Project Dynamics: A System Dynamics Investigation of Agile Software Development Methods," presented at the 31st International Conference of the System Dynamics Society 2013.

[21] L. Cao, B. Ramesh, and T. Abdel-Hamid, "Modeling dynamics in Agile software development," ACM Transactions on Management Information Systems, vol. 1, p. 5, 2010. [22] P. Ammann and J. Offutt, Introduction to software testing: Cambridge University Press, 2008. [23] R. Binder, Testing Object-oriented Software Testing: Models, Patterns, and Tools: Addison-Wesley Professional, 2000. [24] A. P. Mathur, Foundations of software testing: China Machine Press, 2008. [25] Hexawise Team. Hexawise – Pairwise Testing Made Easy. Available: http://www.hexawise.com, Last accessed: Dec. 2012 [26] E. Aranha and P. Borba, "An estimation model for test execution effort," in International Symposium on Empirical Software Engineering and Measurement, , 2007, pp. 107116. [27] D. Pfahl, "ProSim/RA - Software Process Simulation in Support of Risk Assessment," in Value-based Software Engineering, S. Biffl, A. Aurum, B. Boehm, H. Erdogmus, and P. Grünbacher, Eds., ed: Springer Press, 2005, pp. 263286. [28] C. Robson, Real world research: A resource for social scientists and practitioner-researchers vol. 2: Blackwell Oxford, 2011, 2d Edition.

158

When to Automate Software Testing? Decision Support Based on ...

When to Automate Software Testing? Decision Support Based on ...

Suggest Documents

designing a decision support software based on

A Decision Support System Software based on Multi-Criteria Analysis ...

A Decision Support System Software based on Multi-Criteria Analysis ...

WHEN DECISION SUPPORT SYSTEMS FAIL

WHEN DECISION SUPPORT SYSTEMS FAIL

Decision support system based on socio

Decision Support Biomedical Application Based on Consistent ...

REMINDER-BASED OR ON-DEMAND DECISION SUPPORT ...

Evaluating Software Complexity Based on Decision Coverage

Evaluating Software Complexity Based on Decision Coverage

Evaluating Software Complexity Based on Decision Coverage

Intelligent Decision Support Model Based on Neural Network to ...

Tool support to automate transformations between

Bidding Strategy to Support Decision-Making Based on ...

Effect of guideline based computerised decision support on decision ...

Using Probabilistic Reasoning to Automate Software ... - CiteSeerX

MS Excel based Software Support Tools for Decision ...

MS Excel based Software Support Tools for Decision Problems ... - Core

A Graphical, Computer-Based Decision-Support Tool to Help Decision ...

A Planning-based Decision-support Tool for Software ... - CiteSeerX

CLIPS based decision support system

KNOWLEDGE BASED DECISION SUPPORT ...

Computer-Based Medical Decision Support System based on ...

Decision Graphs and Their Application to Software Testing