Multi-objective test case prioritization for GUI ... - ACM Digital Library

39 downloads 0 Views 525KB Size Report
ABSTRACT. Test case prioritization techniques are proposed to sched- ule execution of test cases in order to improve testing ef- fectiveness. Various coverage ...
Multi-Objective Test Case Prioritization for GUI Applications Wei Sun, Zebao Gao, Weiran Yang, Chunrong Fang, Zhenyu Chen State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Software Institute, Nanjing University, Nanjing, China

[email protected] ABSTRACT

current industrial practices, testers manually create a large amount of scripts for GUI system testing with tools such as Quick Test Professional (QTP) 1 and Selenium 2 . In regression testing, the test tools can automatically execute the test cases to simulate the actual user interactions with GUI applications. For everyday smoking testing or frequent deployment, test cases need to be executed in some software systems. It is unreasonable to execute all test cases due to the high cost. A common way is to early execute the strong test cases which are more likely to reveal more faults in applications. Many test case prioritization strategies have been proposed in the past years [3, 4, 5]. Coverage criteria, including statement coverage, branch coverage, etc., are used as surrogates for test case prioritization. Among the existing strategies, statement coverage may be the most widely used ones [6]. On the other hand, users trigger a sequence of events as the input of GUI applications. To test the various GUI states adequately, testers use tools to record event sequences which are categorized according to system functions to create test scripts [2]. So event coverage is also used as a criterion for test case prioritization. The previous work treated test case prioritization as a single-objective optimization problem of event coverage [7]. GUI applications always have many interactions between front-end events and back-end codes. The risk of insufficient testing for back-end codes will increase if only event coverage strategy is applied. Hence, we are inspired to introduce a multi-objective test case prioritization strategy for GUI applications. In this paper, we focus on the problem of test case prioritization for GUI applications. We presented an empirical study and analysis to compare the effectiveness of two singleobjective test case prioritization strategies: statement-based and event-based. In order to achieve more sufficient testing for both front-end events and back-end codes, we propose a new multi-objective strategy to combine these two singleobjectives together. A novel feature of our strategy is taking cost into consideration. The execution time of testing was considered as an important cost driver in the past research [8, 9]. In our experiment, fixed time interval between the events is set when test cases are created. So we use the number of events to estimate the cost for a test case. As a result, a new fitness function for multi-objective test case prioritization is proposed. The main contributions of this paper are as follows.

Test case prioritization techniques are proposed to schedule execution of test cases in order to improve testing effectiveness. Various coverage criteria are used as surrogates for test case prioritization. They are expected to improve testing effectiveness by satisfying the surrogates as early as possible. In test case prioritization, statement coverage is widely used for conventional applications and event coverage is applied for GUI applications. GUI applications are different from conventional applications for interactions between front-end events and back-end codes. Such complex interactions lead to the single-objective strategy insufficient for testing of GUI applications in many cases. In this paper, we propose a multi-objective test case prioritization strategy to combine event coverage and statement coverage for GUI applications. The preliminary experimental results show that our multi-objective strategy can outperform the single-objective strategies.

Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging Testing tools

General Terms Measurement, Reliability, Experimentation

Keywords GUI testing, test case prioritization, statement coverage, event coverage, multi-objective strategy

1. INTRODUCTION Graphical User Interfaces(GUIs) have been widely applied in modern software. GUIs make interactions between users and software easy and flexible. In order to ensure the quality of GUI applications, we need some manual or automatic testing techniques [1]. Some techniques are proposed to automatically generate GUI test cases [2]. Still, in most of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’13 March 18-22, 2013, Coimbra, Portugal. Copyright 2013 ACM 978-1-4503-1656-9/13/03 ...$10.00.

• We propose a novel multi-objective prioritization strat1

http://www.hp.com/go/qtp 2 http://selenium.openqa.org

1074

Table 1: Statement Coverage and Event Coverage Test Case Triggered Event Covered Statement T C1 press 1 getInputNum() 12345+678 press 2 add(o1, o2) press 3 press 4 press 5 press + press 6 press 7 press 8 TC2 press 1 getInputNum() 1 M+ 1 press M+ writeMem() readMem() add(o1, o2)

T Cs TC1 TC2 TC3 TC4

2.2

Table 2: Information of Test Cases Statement Blocks X X X X X X X X X X X X X X X X X X

Cost 5 4 4 2

Additional Greedy Algorithm

To validate the GUI applications, a possible approach is to rerun all test cases in test pool. However, the retest-all approach is extremely time consuming. The time limitation forces the test case selection and prioritization approaches. In the experiment, we implement three strategies presented in previous sections with Java programs using additional greedy algorithm. Test cases in test pool are prioritized by our programs. In this section, we take statement coverage as an example to illustrate how our approach works. Table 2 shows the statement coverage and cost of the test pool, which contains four alternatives. Additional greedy algorithm is used to maximize coverage per cost. The first choice is TC2, which has the additional coverage per cost value of 0.6/4=0.15 (T1, T3, T4 each has 0.1, 0.125, 0.1). After we choose TC2, 4 statement blocks which are still not covered. The second choice will be TC4 with additional coverage per cost value of 0.5/2=0.25 (TC4 covers 2 new statement blocks among the 4 statement blocks not covered by TC2 in the first choice), and TC1 has 0.75/5=0.15, TC3 has 0.75/4=0.1875. Finally, we choose TC3 with additional coverage per cost value of 1.0/4=0.25. At this point the algorithm reaches 100 percent statement coverage. With additional greedy algorithm, we have selected TC2, TC4 and TC3 from the test pool.

egy to combine two objectives: statement coverage and event coverage for GUI applications. The time cost is used to balance the two objectives and a new fitness function of multi-objective is proposed. • An empirical study was conducted to show the effectiveness of our multi-objective strategy. We also analyze the effectiveness of statement-based strategy and event-based strategy. The inconsistence of fault detection capability of these two single-objective strategies encourages our multi-objective strategy. The rest of the paper is organized as follows. In the next section, we describe an example motivating our multiobjective strategy. In Section 3, we present the framework of our strategy and then describe the techniques in detail. The empirical study is described in Section 4. Section 5 describes related work. Section 6 is conclusion and future work.

3.

MULTI-OBJECTIVE STRATEGY

The test case prioritization procedure is shown in Figure 1. We first initialize the test pool. As we traverse the whole test pool, we get the test case t with maximum f (t, T ). f (t, T ) indicates the fitness of a candidate test case to the objective based on current selected test suite T . The formal definition will be given later. If t finds new faults, we record the newly found faults. The procedure ends when all faults are found. Otherwise, test case t is added to current test suite and removed from the test pool. Then new coverage of current test suite is calculated. If the coverage achieves to 100 percent, it will be reset to 0. Then the procedure restarts the traversal with the reduced test pool. To determine which test case has priority to be selected, we define fitness function to evaluate test case fitness with respect to the objectives of different strategies. Fitness functions for three strategies mentioned before are defined as follows: Let S to denote the set of all statements, and E to denote the set of all events in a application. A Covered Statement Set (CSS) is the set of statements covered by a test case t or test suite T . A Covered Event Set (CES) is the set of events triggered by a test case t or test suite T . That is, CSS(T ) = {Si |Si is covered by a test case in T } and CES(T ) = {Ei |Ei is triggered by a test case in T }. We introduce the Additional Statement Coverage and Additional Event Coverage, ASC(t, T ) and AEC(t, T ) of the candidate test case t on current test suite T . T is formed

2. MOTIVATING EXAMPLE 2.1 An Example of Calculator Our novel multi-objective strategy is designed to balance the inconsistence and improve the performance of singleobjective strategies. In this section, we take a well-known GUI application Calculator for example to illustrate the inconsistence between statement coverage and event coverage. Table 1 shows two test cases in the testing pool of Calculator. Test case 1 is a simple addition operation calculating 12345 + 678. In this test case, a total of 9 events are triggered, while only 2 functions, getInputN um() and add(o1, o2) are covered in the back-end codes. Test case 2 uses the M + operation which calculates 1M + 1. M + is a basic function in Calculator which enables users to add a number to the previous result stored in the memory. Test case 2 is described as follows. User presses 1, then presses M+, Calculator writes current result into memory. User then presses 1 again, Calculator reads the value stored in memory and adds it to the new input number. So in this test case, only two events are covered, while 4 functions, getInputN um(), writeM em(), readM em(), and add(o1, o2) are covered. So in this example we show that high statement coverage does not necessarily lead to a high event coverage, and also high event coverage does not result in a high statement coverage either.

1075

To measure the effectiveness of three strategies, we record the number of selected test cases when fault is found. In this way, we can compare fault detection ability of each strategy.

4.

EMPIRICAL STUDY

To explore the effectiveness of different strategies, an empirical experiment is carried out. Section 4.1 sets out the research questions. Section 4.2 and 4.3 describe the subjects and their faults. Section 4.4 explains how test cases are created. Section 4.5 describes the sampling technique of the experiment. In section 4.6, we evaluate the result from the experiment.

4.1

Figure 1: Procedure of Multi-objective Strategy

4.2

by all the selected t currently. ASC(t, T ) is the number of statements that are covered by t but not covered by test suite T . AEC(t, T ) is the number of events which are triggered by t but not triggered by T . That is, ASC(t, T ) = |CSS({t} − CSS(T )| and AEC(t, T ) = |CES({t}) − CES(T )|. We use Cost(t) to denote the execution cost of test case t, which is measured by the number of events of t. We define the fitness function of statement coverage strategy fSC (t, T ) for test case t on a selected test suite T , and fitness function of event coverage strategy fEC (t, T ) for test case t on a selected test suite T as follows. fSC (t, T ) =

ASC(t, T ) Cost(t)

(1)

fEC (t, T ) =

AEC(t, T ) Cost(t)

(2)

ASC(t, T ) AEC(t, T ) + 2Card(S) 2Card(E)

Information of two AUTs LOC Widgets Events Faults 1419 129 158 10 15474 337 376 20

We choose two popular GUI applications, Crossword Sage (CS) 3 and OmegaT (OT) 4 as our AUTs for this empirical study. Both applications are open source and obtained from open source SourceForge 5 . In previous study, both two applications have been applied in experiment and worked well [10]. Table 3 shows the information of two GUI applications. For each AUT, the line of code (LOC), the total number of different widgets and events on GUIs, and the number of faults are counted and listed.

4.3

(3)

Now we define the fitness function fM O (t) for multi-objective strategy for test case t on a selected test suite T :

Faults

Fault reports for both GUI applications are collected by users and reflected to developers on SourceForge. In this study, there are two categories of faults to be detected, failures and unhandled exceptions. Failures are originally located in the applications and may result in malfunction or unexpected breakdown. Besides, we choose some unhandled exceptions that are considered to threaten the behavior of the application. For example, some unhandled exceptions in event listeners may fail to response properly to the end users. Also, unhandled exceptions in the business logical codes may lead to an incorrect system state when some exceptional inputs are given. A recent study has proved that exceptions can be regarded as an important type of software faults [11]. In CS, we collected 3 original failures and 7 unhandled exceptions as faults. In OT, 20 unhandled exceptions were found and listed as faults in our experiment. 3

st AM C(t, T ) fM O (t, T ) = Cost(t)

Applications Under Test (AUTs) Table 3: AUT CS-v0.3.3 OT-v1.8.1 07

For the multi-objective formulation, the two objectives are combined into a single objective using the classical weightedsum approach. For most test cases, the value of ASC is much larger than the value of AEC, sum of the two value will be mainly decided by ASC. So we standardize ASC and AEC by dividing ASC and AEC by the number of all statements in S and the number of all events in E respectively. Both standardized ASC and AEC are combined using coefficients of 0.5 and 0.5 so that equal weighting is given to each objective. So we define the standardized Additional M ulti− objective Coverage of the candidate test case t on test suite T as st AM C(t, T ): st AM C(t, T ) =

Research Questions

RQ1: A test case prioritization strategy with good performance is supposed to be able to achieve an early fault detection with fewer test cases. So what are the performances for statement coverage and event coverage prioritization based strategies to find all faults? RQ2: What is the effectiveness of multi-objective test case prioritization strategy? Does the multi-objective strategy have significant improvement compared with two singleobjective strategies?

http://crosswordsage.sourceforge.net/ http://www.omegat.org/ 5 http://www.sourceforge.net/ 4

(4)

1076

4.4 Test Cases The undergraduate students majored in software engineering were organized to create the original test cases. Before they started to create test cases, training was given to show how to use QTP. After that, the participants were guided to create test cases referring to a given list of function sets of the AUTs. Later, we manually examined the quality of all test cases and discarded the broken ones. In this way, the qualified test cases were collected to generate the test pool for the two AUTs. All participants were asked to start the applications in a known start state. Table 4: Test Pool of Crossword Sage and OmegaT. AUT Test Pool Total Events Detected Faults CS-v0.3.3 455 12304 10 OT-v1.8.1 07 429 15474 20 Table 4 shows some basic information of test case pool for the two AUTs. Column T est P ool describes the number of test cases in test pool. Column T otal Events indicates the total number of events triggered by the test pool.

4.5 Sampling For each research question, the test case prioritization procedure is sampled for many times. As is shown in Section 3, in each selection cycle, there may be several test cases with the same fitness function value f (t, T ) based on the current test suite T . If several candidates have the same max f (t, T ), we randomly pick out one test case from the candidates. In previous study, evidence showed that the sampling frequency should be 30 at least to reach a reliable value [12]. Following this guide, for each AUT, we set 100 iterations of the procedure showed in Figure 1 for all three strategies.

Figure 2: Fault Detection Capability

In Figure 3, an Average Percentage of Faults Detected (APFD) metric is used to determine the effectiveness of the test case prioritization. According to the box plot in Figure 3, we give alternative hypotheses: H1 : In CS, event coverage performs better than statement coverage. H2 : In OT, statement coverage performs better than event coverage. Then we formulate the corresponding null hypotheses: H01 : In CS, event coverage performs worse than or equal to statement coverage. H02 : In OT, statement coverage performs worse than or equal to event coverage. Table 5 shows the statistical result of the comparison of two single-objective strategies in two AUTs. In CS, event coverage strategy performs significantly better than statement coverage strategy. However, the result is quite opposite in OT. Explanation for this phenomenon lies in the different features of the two AUTs. As is shown in Table 3, LOC of OT is ten times larger than LOC of CS. On the other hand, comparing with OT, we can see that the complexity of GUIs in CS is greater than the complexity of business logic. Event coverage strategy is more likely to trigger the faults in GUIs or code of event listeners. On the contrary, statement coverage strategy may perform better to find faults in business logic. So inconsistence of the two single-objective strategies is revealed in the two AUTs.

4.6 Result and Analysis An effective strategy can find more faults than others with equal or less cost. To measure the fault detection capability, we keep track of the test cases which are selected by the procedure. In this experiment, we use random sampling with sampling frequency of 100. In each iteration, we record the number of test cases selected when each fault is found until all faults are found, and then forms a sample. Then we calculate the average of 100 samples to get the final result. Figure 2 illustrates the comparison of the 3 strategies, based on statement coverage, event coverage and multiple objectives, for each AUT. In this study, we apply T-test(α = 0.05) analysis with a confidence level of 95%. The statistical result for two RQs is shown in Table 5 and Table 6. In both tables, Stmt denotes statement coverage strategy. Evt denotes event coverage strategy. M O denotes multi-objective strategy. Figure 2 shows that in both AUTs, the multi-objective strategy uses the fewest test cases to find all faults. The effectiveness of statement coverage strategy and event coverage strategy differs in CS and OT respectively. Statistical analysis is provided in the answer to the RQs below.

4.6.1 To Address RQ1 The experiment result provides a mixed message for the answer to RQ1. Comparing Statement Coverage and Event Coverage: does one coverage criterion perform better than the other in test case prioritization?

4.6.2

To Address RQ2

The result provides a positive answer to RQ2. According to the box plot, although either statement coverage or event

1077

Table 6 shows result of T-Test to validate the significance of the performance. Some previous work [13] pointed out that the testing criteria for GUI testing are different from those for traditional console application testing. GUIs provide great facilities to users and bring in extra difficulty to test the system as a whole, including GUI codes at the front-end, business logic codes at the back-end and the codes of “controllers” connecting the both ends. We figure out that in our AUTs, a simple GUI event sequence may need a great number of logic codes to respond, and the logic codes can be greatly different when the tester simply changes a parameter on the GUIs. Thus the faults in logic codes are more likely to be revealed by statement checking. While in some other cases, a small segment of business logic can be triggered only after a complex series of GUI events, such test cases are more likely to be selected via event coverage strategy. In addition, many parts of the business logic codes are designed for reuse, so simply executing the code for many times is not enough to reveal some hidden faults in the abnormal state of the GUI application, while triggering the reused codes via triggering various events on the GUIs in different contexts may help to detect such faults.

4.7

Figure 3: APFD

AUT CS OT

AUT CS OT

Table 5: Statistical Test Strategy Mean SD Stmt 0.87 0.011 Evt 0.92 0.0092 Stmt 0.80 0.0016 Evt 0.79 0.015

for RQ1 p-value

H0

9E-63

Reject

2E-21

Reject

Threats to Validity

One threat to external validity comes from the validation of exceptions being regarded as faults. Besides the original faults reported in both applications, we select some of the unhandled exceptions as faults to better evaluate the effectiveness of the three test case prioritization strategies. The exceptions we select are proved to have potential to render in malfunction in both AUTs. So these unhandled exceptions have similar features with faults to some extent. Another threat to external validity is mainly due to the two AUTs used as our subjects. These applications may not represent the wide range of possible GUI objects. Threats to construct validity lie in the measurement of cost for each test case. To evaluate whether event number can be used to simulate the cost of each test case, we did sampling survey. Result showed that event numbers could properly simulate the execution time of test cases.

Table 6: Statistical Test for RQ2 Strategy Mean SD p-value H0 MO 0.93 0.0036 9E-10 Reject Evt 0.92 0.0092 MO 0.87 5E-4 0 Reject Stmt 0.80 0.0016

5.

RELATED WORK

Test suite minimization, test case selection and prioritization have different focus and concerns in test area. Yoo and Harman summarized various test suite minimization, selection and prioritization techniques in [3]. With rich coverage criteria existed for GUI testing, Memon presented new coverage criteria using events to measure test adequacy of GUI test [14]. With the novel algorithms given, they constructed event-flow to evaluate adequacy of tests on different events. They also proposed interaction coverage based test suite prioritization [15], and in another work, they studied the call-stack coverage strategy for GUI test suite reduction [16]. But to the best of our knowledge, the existing techniques treat the coverage on code and the coverage on GUI separately. Previous work has treated test case selection as a single objective optimisation problem. The Pareto efficiency approach which takes multiple objectives such as code coverage, past fault-detection history and execution cost, was introduced to test case selection [17] by Yoo and Harman. In their later work [18] a hybrid, multi-objective genetic algo-

coverage may perform better in different AUTs, the multiobjective strategy steadily outperforms both single-coverage based strategies. Since the performances of both singleobjective strategies have been compared, for RQ2 we compare the multi-objective strategy with the winner in RQ1. We propose the following hypotheses: H1 : In CS, multi-objective strategy performs better than event coverage. H2 : In OT, multi-objective strategy performs better than statement coverage. Then we formulate the corresponding null hypotheses: H01 : In CS, multi-objective strategy performs worse than or equal to event coverage. H02 : In OT, multi-objective strategy performs worse than or equal to statement coverage.

1078

rithm was proposed to produce higher-quality Pareto fronts in test suite minimisation. Harman presented several examples to argue that regression test optimization problems such as selection and prioritization require multi objective optimization in order to adequately cater for real world regression testing scenarios [19]. In our work, a multi-objective algorithm is proposed for test cases prioritization for GUI applications. Considering the special characteristics of GUI applications, we take two objectives, statement coverage and event coverage to help to reveal the faults in front-end events and back-end codes respectively. In addition, different from Yoo and Harman’s previous work [17], since we care more about how different strategies can find faults effectively, we use the fault detection capability instead of Pareto fronts to evaluate the effectiveness of different test case prioritization strategies.

[7]

[8]

[9]

[10]

6. CONCLUSION AND FUTURE WORK In this paper we compare the effectiveness of two singleobjective test case prioritization strategies, one based on statement coverage and the other one based on event coverage. Experiment is made to evaluate the fault detection capability of two strategies and reveals the inconsistence between them. In addition, a novel multi-objective strategy based on these two strategies is proposed. Our experiment reveals that the new strategy has better performance than both single-objective strategies. In our current experiment, we just choose statement coverage and event coverage as the representatives of code coverage and GUI coverage, respectively. The main reason is that they are simple and efficient, and both criteria are well recognized [13].Some more strategies will be applied, and the fitness function will also be optimized in the future work.

[11]

[12]

[13]

[14]

7. ACKNOWLEDGMENTS The work described in this article was partially supported by the National Natural Science Foundation of China (61003024, 61170067).

[15]

8. REFERENCES [16]

[1] A. Bertolino, “Software testing research: Achievements, challenges, dreams,” in Proceedings of Future of Software Engineering (FOSE’07), 2007, pp. 85–103. [2] X. Yuan, M. Cohen, and A. Memon, “Gui interaction testing: Incorporating event context,” IEEE Transactions on Software Engineering, vol. 37, no. 4, pp. 559–574, 2011. [3] S. Yoo and M. Harman, “Regression testing minimization, selection and prioritization: a survey,” Software Testing, Verification and Reliability, vol. 22, no. 2, pp. 67–120, 2012. [4] S. Li, N. Bian, Z. Chen, D. You, and Y. He, “A simulation study on some search algorithms for regression test case prioritization,” in Proceedings of International Conference on Quality Software (QSIC’10), 2010, pp. 72–81. [5] C. Fang, Z. Chen, and B. Xu, “Comparing logic coverage criteria on test case prioritization,” Science China Information Sciences, 2012. [6] S. Elbaum, A. Malishevsky, and G. Rothermel, “Test case prioritization: A family of empirical studies,”

[17]

[18]

[19]

1079

IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 159–182, 2002. R. C. Bryce, S. Sampath, and A. M. Memon, “Developing a single model and test prioritization strategies for event-driven software,” IEEE Transactions on Software Engineering, vol. 37, no. 1, pp. 48–64, 2011. K. Walcott, M. Soffa, G. Kapfhammer, and R. Roos, “Timeaware test suite prioritization,” in Proceedings of the 2006 international symposium on Software testing and analysis. ACM, 2006, pp. 1–12. D. You, Z. Chen, B. Xu, B. Luo, and C. Zhang, “An empirical study on the effectiveness of time-aware test case prioritization techniques,” in Proceedings of ACM Symposium on Applied Computing (SAC’11), 2011, pp. 1451–1456. A. Memon, “Automatically repairing event sequence-based gui test suites for regression testing,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 18, no. 2, p. 4, 2008. P. Zhang and S. Elbaum, “Amplifying tests to validate exception handling code,” in Proceedings of the 2012 International Conference on Software Engineering. IEEE Press, 2012, pp. 595–605. A. Arcuri and L. Briand, “A practical guide for using statistical tests to assess randomized algorithms in software engineering,” in Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011, pp. 1–10. A. Memon, M. Soffa, and M. Pollack, “Coverage criteria for gui testing,” in ACM SIGSOFT Software Engineering Notes, vol. 26, no. 5. ACM, 2001, pp. 256–267. A. Memon, “GUI testing: Pitfalls and process,” IEEE Computer, vol. 35, no. 8, pp. 87–88, 2002. R. Bryce and A. Memon, “Test suite prioritization by interaction coverage,” in Workshop on Domain specific approaches to software test automation: in conjunction with the 6th ESEC/FSE joint meeting. ACM, 2007, pp. 1–7. S. McMaster and A. Memon, “Call-stack coverage for gui test suite reduction,” Software Engineering, IEEE Transactions on, vol. 34, no. 1, pp. 99–115, 2008. S. Yoo and M. Harman, “Pareto efficient multi-objective test case selection,” in Proceedings of International Symposium on Software Testing and Analysis (ISSTA’07), 2007, pp. 140–150. ——, “Using hybrid algorithm for pareto efficient multi-objective test suite minimisation,” Journal of Systems and Software, vol. 83, no. 4, pp. 689–701, 2010. M. Harman, “Making the case for morto: Multi objective regression test optimization,” in Software Testing, Verification and Validation Workshops (ICSTW), 2011 IEEE Fourth International Conference on. IEEE, 2011, pp. 111–114.