ComboRT: A New Approach for Generating Regression Test Cases for ...

26 downloads 1924 Views 1MB Size Report
used regression testing techniques, test case selection and prioritization, are ..... JMeter is a Java desktop application designed to load test functional behavior.
International Journal of Software Engineering and Knowledge Engineering Vol. 26, No. 6 (2016) 1001–1026 # .c World Scienti¯c Publishing Company DOI: 10.1142/S0218194016500340

ComboRT: A New Approach for Generating Regression Test Cases for Evolving Programs

Xiaobing Sun*,†,‡,¶, Xin Peng*,†, Hareton Leung§ and Bin Li‡ *School of Computer Science Fudan University, Shanghai, China †

Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China ‡

School of Information Engineering Yangzhou University, Yangzhou, China §Department of Computing Hong Kong Polytechnic University, Hong Kong, China ¶ [email protected]

Received 20 April 2015 Revised 21 May 2015 Accepted 25 September 2015 Regression testing is essential to ensure software quality during software evolution. Two widelyused regression testing techniques, test case selection and prioritization, are used to maximize the value of the continuously enlarging test suite. However, few works consider both these two techniques together, which decreases the usefulness of the independently studied techniques in practice. In the presence of changes during program evolution, regression testing is usually conducted by selecting the test cases that cover the impact results of the changes. It seldom considers the false-positives in the information covered. Hence, the e®ectiveness of such regression testing techniques is decreased. In this paper, we propose an approach, ComboRT, which combines test case selection and prioritization together to directly generate a ranked list of test cases. It is based on the impact results predicted by the change impact analysis (CIA) technique, FCA–CIA, which generates a ranked list of impacted methods. Test cases which cover these impacted methods are included in the new test suite. As each method predicted by FCA–CIA is assigned with an impact factor value corresponding to the probability of this method to be impacted, test cases are then ordered according to the impact factor values of the impacted methods. Empirical studies on four Java based software systems demonstrate that ComboRT can be e®ectively used for regression testing in object-oriented Java-based software systems during their evolution. Keywords: Change impact analysis; empirical studies; regression testing; test case selection; test case prioritization.

1001

1002

X. Sun et al.

1. Introduction Software change is a continuous fact in software maintenance and evolution [1]. During the change process, changes will inevitably induce unpredictable and undesirable e®ects on the other parts of the software, thus may introduce faults to the modi¯ed software. These faults should be ideally detected before releasing the modi¯ed software. The new faults induced during software evolution can be identi¯ed and isolated by software change impact analysis [2] and regression testing [3]. Change impact analysis (CIA) is an approach to identify potential ripple e®ects caused by changes made to software [2]. Whether these potential e®ects inject faults into the program is often validated by regression testing. Regression testing is a type of software testing that seeks to uncover new software bugs, or regressions, in existing functional and nonfunctional areas of a system after changes have been made to them [3]. It is used to provide con¯dence that: (1) the modi¯ed parts of the program can run consistently with the new system, and (2) the unmodi¯ed parts are not a®ected by the modi¯cations and can behave correctly as before. A number of di®erent approaches have been developed in regression testing [3], such as test case selection and test case prioritization. Test case selection is used to reduce testing cost by selecting a subset of test cases from the original test suite that are necessary to test the modi¯ed software [4], while test case prioritization is used to identify an `ideal' ordering of test cases to enable earlier feedback to testers and earlier fault detection [5]. While the research community has made considerable progress in regression testing, one important issue has been overlooked, i.e. most current regression testing techniques are studied independently; for example, some focused on test case selection [6, 7] while others on test case prioritization [5, 8]. But test case selection and prioritization are often performed together to generate a ranked list of test cases to be used for regression testing of the modi¯ed program in practice [3]. In addition, traditional test case selection and prioritization techniques are usually performed based on the coverage information to identify the relevant test cases [9]. They seldom consider the case that the covered information may not represent the really impacted elements. Hence, the e®ectiveness of such regression testing techniques may be decreased. This paper attempts to address the above issues. We propose an approach, ComboRT, which combines test case selection and prioritization together, to generate a ranked list of test cases from the original test suite. Moreover, it considers the probability that the elements covered by the test cases are really impacted. To perform our regression testing, we ¯rst use a CIA technique, FCA–CIA [10] to compute a ranked list of impacted methods from a set of changed classes based on the formal concept analysis (FCA) technique. Then, ComboRT is based on the results of FCA–CIA. On the one hand, test cases are selected based on the coverage information of these impacted methods; on the other hand, test cases are ordered based on

ComboRT: A New Approach for Generating Regression Test Cases for Evolving Programs

1003

the probability of the methods to be impacted. These two procedures are integrated together. The main contributions of this paper are as follows: .

We proposed a novel regression testing approach, ComboRT, which is based on an e®ective CIA technique, FCA–CIA. ComboRT combines test case selection and prioritization together, and directly generates a ranked list of test cases for regression testing. . We conducted empirical studies on four Java programs to show the e®ectiveness of test case selection in terms of e±ciency, inclusiveness and precision. . We also conducted additional empirical studies on four subject programs to show the e®ectiveness of test case prioritization, and compared ComboRT with the commonly used coverage-based test case prioritization strategies [11–13]. This paper is organized as follows: in the next section, we introduce the background of regression testing and the FCA–CIA technique used to support our approach. Section 3 presents details of ComboRT. In Sec. 4, an empirical study is performed to show the e®ectiveness of ComboRT. In Sec. 5, related work of regression testing techniques are discussed. Finally, we conclude and suggest future work in Sec. 6. 2. Background ComboRT is based on FCA–CIA to predict the change e®ects induced by the classlevel changes. In this section, we give the background of regression testing and FCA–CIA. 2.1. Regression testing Regression testing is performed between two di®erent versions of the software to show that changes in the new version, do not interfere with the unchanged part in the original version. During software evolution, the test suite for the software also evolves. A commonly used regression testing approach is to reuse existing test suite for testing of the modi¯ed version, which is test case selection. During the testing process, we need to run the test cases in an order to e®ectively identify the induced faults or change e®ects in the new version, which is test case prioritization. Test case selection techniques usually attempt to minimize the cardinality of a test suite, and they are modi¯cation-aware [3]. Following the de¯nition of Rothermel and Harrold [14], the problem of test case selection is de¯ned as follows: De¯nition 1 [Test Case Selection Problem]. Given a program P , with test suite T , and the modi¯ed version of P , P 0 , test case selection is to ¯nd a subset of T , T 0 , which is used to test P 0 . To evaluate the e®ectiveness of test case selection, Rothermel and Harrold developed an evaluation framework that includes four metrics: inclusiveness, precision,

1004

X. Sun et al.

e±ciency and generality [14]. Inclusiveness measures the safety of a test selection technique, that is, the extent to which a technique selects tests from T that reveal faults in P 0 . Precision measures the extent to which a technique omits tests in T that cannot reveal faults in P 0 . There is a problem with this measure. For example, if an omitted test case (t) can identify a fault (f) in P 0 , and the fault f can be also identi¯ed by the test case (t 0 ) in T 0 , we still believe that the test case t is reasonable for removal, because it is redundant to identify the fault f. So, we make a small change to this de¯nition. We identify the omitted test cases (T 00 ) that can reveal faults in P 0 , which have not been identi¯ed by T 0 . Then, we de¯ne precision as an 00 inverse measure of T TT 0 . The key di®erence between the traditional precision and ours lies in the consideration of similar fault identi¯cation ability between the omitted test cases and the test cases in the new test suite. E±ciency represents the space and time requirements of a technique, which is usually de¯ned as to what extent a technique selects test cases from T . Generality measures the ability of a technique to be applied in a practical and su±ciently wide range of situations. It is often di±cult to ¯nd adequate subject programs for study because the empirical studies should include a number of subject programs, each with several versions, test suites and fault data. Obtaining these materials is a nontrivial task in practice. Thus the empirical studies for regression testing is performed in a controlled environment with a limited number of subject programs and faults, inducing di±culty in measuring the generality of regression testing. So in this paper, we evaluate test case selection based on the inclusiveness, precision and e±ciency measures. They are de¯ned as follows: jT F0 j : jTF j

ð1Þ

jT 00 j : jT  T 0 j

ð2Þ

jT 0 j : jT j

ð3Þ

Inclusiveness ¼ Precision ¼ 1 

Efficiency ¼ 1 

In (1), TF and T F0 represent the faults identi¯ed by the test cases in original test suite T and new test suite T 0 , respectively. Higher inclusiveness values indicate higher safety. When Inclusiveness ¼ 1, it shows that the test case selection approach is totally safe to detect the faults. For (2), a precision value approaching 1 is desirable. For (3), it is also the case, i.e. the higher the e±ciency, the better the test case selection. However, there is a trade-o® between these measures. For example, when the e±ciency is high, the precision and inclusiveness may be low. In this case, the test case selection is not e®ective in identifying the faults in the modi¯ed program. Therefore, the test case selection attempts to get high e±ciency while maintaining high inclusiveness and precision. Test case prioritization focuses on ordering test cases for early maximization of desirable properties based on de¯ned criteria, for example, the rate of fault detection.

ComboRT: A New Approach for Generating Regression Test Cases for Evolving Programs

1005

It is used to ¯nd the optimal permutation of the sequence of test cases. More formally, the prioritization problem is de¯ned as follows [14]: De¯nition 2 [Test Case Prioritization Problem]. Given a test suite T , the set of permutations of T , PT, and a function from PT to real numbers, f: PT 7!