Regression Test Selection on System ... - ACM Digital Library

8 downloads 0 Views 510KB Size Report
Feb 19, 2008 - Tata Research Development and Design Center. Tata Consultancy Services Ltd. [email protected]. Mary Jean Harrold. College of ...
Regression Test Selection on System Requirements ∗

Pavan Kumar Chittimalli

Mary Jean Harrold

Tata Research Development and Design Center Tata Consultancy Services Ltd.

College of Computing Georgia Institute of Technology

[email protected]

[email protected]

ABSTRACT

1.

Regression testing, which is performed after changes are made to a software system, can be used before release of new versions of the system. However, practitioners often have little time to perform this regression testing because of the quick-release cycles of such modified systems. Thus, they may use a random-testing approach or perform little regression testing. This lack of adequate regression testing can cause bugs in untested parts of the program to be exposed only during production or field usage. To improve the efficiency of the regression testing, and thus enable its use before release, techniques that select and run only those test cases that are related to the changes or prioritize the test cases based on criticality or perceived effectiveness have been presented. These technique typically use some representation of the software such as a system model or the source code to perform the test selection and prioritization. However, in practice, access to a system model or the source code may not be possible. To provide regression test selection in practice, we have developed, and present in this paper, a novel approach to regression test selection that, instead of using a system model or the source code, uses the system requirements and their associated test cases, which are typically available to developers/testers. The approach uses the set of system requirements, usually in natural language or some informal notation, that represent what is to be tested about the system. The technique uses these requirements, along with the set of test cases and their criticality that are associated with the requirements, to select test cases for use in regression testing. In this paper, we also present a case study that shows the potential effectiveness of our technique. Categories and Subject Descriptors: D.2.5 Software Engineering: Testing and Debugging — Testing tools (e.g., data generators, coverage testing) General Terms: Experimentation Keywords: Requirements, traceability, regression testing, regression test selection, test case prioritization

Software systems are continually changed during development and maintenance for a variety of reasons, including correcting errors, adding new features, porting to new environments, and improving performance. After changes are made to the software, it should be tested to ensure that it behaves as intended and that the modifications have not had an adverse impact on the quality of the software. This testing after changes, or regression testing, is expensive, and, if performed, can account for as much as one-half of the cost of software maintenance [3, 8]. Because of the quickrelease cycles for modified software, however, practitioners often have little time to perform regression testing.

INTRODUCTION

To improve the efficiency of regression testing so that it can be used in practice to test changed software, researchers have developed techniques that attempt to reduce its cost. One approach to regression testing saves the test suite Ti used to test one version of the software Pi , and uses it to test the the next (modified) version of the software Pi+1 . Instead of rerunning all test cases in Ti on Pi+1 , selective regression testing approaches attempt to improve the efficiency of the retesting using regression-test-selection (RTS) techniques. These techniques select a subset of Ti , Ti  , and use it to test Pi+1 (e.g., [2, 4, 6, 7, 9, 10, 11, 14]. Under certain conditions, these RTS techniques are safe in that the test cases that they omit (i.e., Ti − Ti  ) will give the same results on Pi and Pi+1 , and thus, do not need to be rerun on Pi+1 [10]. Studies have shown that RTS techniques can be effective in reducing the cost of regression testing (e.g., [4, 9, 10, 14]). Most RTS techniques use some representation of the software, such as a system model or the source code, on which to gather coverage data. The coverage data is collected when testing Pi using Ti and is used to assist in identifying test cases to rerun on Pi+1 . For example, several RTS techniques collect coverage data, such as statement, branch, or method coverage, to use in selecting test cases to include in Ti  for testing Pi+1 (e.g., [2, 4, 6, 7, 9, 10, 11, 14]). However, in practice, access to a system model or the source code is not always possible. For example, for commercial off-theshelf software (COTS), source code may not be available, and for languages such as COBOL and RPG in which many legacy systems are written, there may not be effective analysis techniques or tools. Thus, these existing RTS techniques may not be applicable in practical scenarios. Because of this lack of available techniques and tools, practitioners often use random regression testing or little regression testing before releasing the modified versions of the software. This insufficient testing can lead to bugs in untested parts of the system being exposed only during production or field usage.

∗Visiting researcher at Georgia Institute of Technology.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISEC’08, February 19-22, 2008, Hyderabad, India. Copyright 2008 ACM 978-1-59593-917-3/08/0002 ...$5.00.

87

2.

To improve regression testing techniques so that they can be used in practice, we have begun a project to develop RTS techniques that work on the representations used in industry. A common representation that is often used for testing, and thus, regression testing, is a set of system requirements for the software. This set of requirements, typically available in natural language or some informal notation, represents what is to be tested about the system. Designers/architects create high-level and low-level designs of the system using these requirements. Developers implement the system using these design documents. Developers/testers create test cases to meet the initial requirements and associate test cases with the requirements. Requirements that relate to the critical (important) parts of the system are given higher priority while developing the test cases.

SELECTIVE REGRESSION TESTING

This section describes an example that we use for illustration throughout the rest of the paper. The section also presents background on safe regression test selection with which we compare our requirements-based regression test selection.

2.1

Example

Figure 1 lists a program segment, written in Java, that implements a bank withdrawal transaction. The transaction implements implements requirements r1 (Account Validation), r2 (ATM transaction limit validation), r3 (Funds availability validation), and r4 (Successful withdrawal). The program segment consists of two classes: Account and Transaction. The Account class contains details of the account holder, such as accountNumber and accountBalance. The Transaction class contains methods for processing the transactions. Method validateAccount() implements requirement r1 by checking the validity of the user’s account number (accountNumber), and outputting an error message if the account number is invalid (lines s1-s2). Method validateLimit() implements requirement r2 by determining whether the amount that the user wants to withdraw is valid for the kind of transaction (lines s6-s11). There are two kinds of withdrawals: (1) an ATM withdrawal, where kind is 0 and the withdrawal is limited to $1000 and the user’s account balance, and (2) non ATM withdrawals, where kind is 1, and the withdrawal is limited to the user’s account balance. If the transaction is an ATM withdrawal and the limit is violated, the method issues a message and returns an error code. Otherwise, if the transaction is valid then the method returns “true”. Method validateFundsAvailability() implements requirement r3 by determining whether the requested withdrawal amount is less than the user’s account balance, and raises an error message if the request is over the limit (lines s13-s18). Three other methods—withdrawAny(), validateAmount(), and withdraw()—implement requirement r4. Method withdrawAny() decreases the user’s account balance by the withdrawal amount. Method validateAmount() calls validateLimit() to check for ATM withdrawal limit, and if this limit is valid, calls validateFundsAvailable() to ensure that the user has a sufficient balance for the withdrawal. The main transaction method, withdraw(), checks the validity of the user’s account number and account balance, and if these are valid, process the withdrawal and returns success for the transaction.

The goal of our work is to provide RTS techniques that can be used in practice. To do this, we are developing techniques that can be used with the system-requirements representation of the program. In this paper, we present the results of the first part of this project. Our work consists of an RTS technique, adapted from existing RTS techniques, for use with system requirements. Our new technique provides a way to use system requirements and their associations with test cases to select the important test cases for rerunning on the changed versions of the system. In addition, our technique uses the criticality of the test cases to order the selected test cases. The main benefit of our approach is that it provides a way to select a subset of test cases for use on the changed software that can provide confidence in the changes made to the software. Another benefit is that this technique uses only the association (coverage) of requirements by the test cases, instead of requiring model or code coverage, which is often not practical to gather. In this paper, we also present the description of a system and the results of a study that uses that system to evaluate the potential effectiveness and benefit of our approach in practice. We performed our study using two real-world systems, developed at Tata Consultancy Services, Ltd (India). Our study shows the extent of the savings in regression testing that can be achieved on these real systems using our approach. Our study also shows that, for the subjects we studied, our technique compares well to a code-based approach that provides safe regression test selection. The main contributions of this paper are: • A description of a novel technique that is adapted from existing RTS techniques, and that selects and orders test cases for regression testing using changed system requirements, their associated test cases, and the criticality of the test cases; • A description of a tool that implements the technique and that can be used in practice for selecting and ordering test cases to rerun after changes; • The results of a case study that shows, for the real-world subjects we studied, that our technique provides an effective way to select test cases using the systems requirements that differs little with the results obtained using a safe code-based RTS technique.

Based on its functionality, each requirement is also assigned a criticality level (the criticality level also depends on the problem and its complexity). Table 1 shows the details of requirements r1r4. The first column in the table lists the requirement number, the second column shows the criticality of the requirement, and the third column provides the validation needed for that requirement. In our example, the criticality increases from 1 to 4: the higher the value of criticality, higher the importance of the requirement. For example, requirement r2, has criticality 2 and it must verify that the withdrawal, if it is an ATM withdrawal, is less than or equal to $1000.

The next section presents an example that we use throughout the rest of the paper, along with a description of a safe RTS technique and its application to the example. In Section 3, we present our requirements-based RTS technique. Then, in Section 4, we present an overview of our tool and our empirical evaluation using experimental models.Finally, in Section 5, we discuss our results and provide some future work.

The program segment also contains two changes, c1 and c2. Change c1 is in method validateAccount, and it represents a change in valid account numbers—a change to requirement r1. Change c2 is in method validateLimit, and it represents an increase in the allowable withdrawal using an ATM—a change to requirement r2.

88

p u b l i c c l a s s Account { i n t accountNumber ; int accountBalance ; ... } public class Transaction { Account a c c o u n t ; i n t errorCode ; i n t kind ; i n t amount ; −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Requirement r1 : Account V a l i d a t i o n −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− p u b l i c bool v a l i d a t e A c c o u n t ( i n t accountNumber ) { s1 i f ( a c c o u n t N u m b e r < 100 && a c c o u n t N u m b e r > 899 ) { / / c h a n g e c1 : i f ( a c c o u n t N u m b e r < 100 && a c c o u n t N u m b e r > 9 9 9 ) s2 System . e r r . p r i n t l n ( " I n v a l i d a c c o u n t Number " ) ; s3 errorCode = 3; s4 System . e x i t ( e r r o r C o d e ) ; } s5 r e t u r n Account . getAccount ( accountNumber ) ; } −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− R e q u i r e m e n t r 2 : ATM t r a n s a c t i o n l i m i t v a l i d a t i o n −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− p u b l i c b o o l v a l i d a t e L i m i t ( i n t amount ) { s6 bool retCode = t r u e ; s7 i f ( k i n d == 0 ) { / / ATM w i t h d r a w a l s8 i f ( amount > 1 0 0 0 ) { / / c h a n g e c2 : i f ( amount > 1 2 0 0 ) s9 System . e r r . p r i n t l n ( " I n v a l i d amount f o r ATM t r a n s a c t i o n " ) ; s10 errorCode = 2; s11 retCode = f a l s e ; } } s12 r e t u r n retCode ; } −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− R e q u i r e m e n t r 3 : Funds a v a i l a b i l i t y v a l i d a t i o n −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− p u b l i c b o o l v a l i d a t e F u n d s A v a i l a b i l i t y ( i n t amount ) { s13 bool retCode = t r u e ; s14 i f ( A c c o u n t . a c c o u n t B a l a n c e < amount ) { s15 System . e r r . p r i n t l n ( " I n s u f f i c i e n t f u n d s i n t h e a c c o u n t " ) ; s16 errorCode = 5; s17 retCode = f a l s e ; } s18 r e t u r n retCode ; } −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Requirement r4 : S u c c e s s f u l withdrawal −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− p u b l i c v o i d withdrawAny ( ) { s19 a c c o u n t . b a l a n c e −= amount ; }

s20 s21 s22 s23 s24 s25

s26 s27 s28 s29 s30 s31 s32 s33

p u b l i c b o o l v a l i d a t e A m o u n t ( i n t amount ) { bool retCode = f a l s e ; r e t C o d e = v a l i d a t e L i m i t ( amount ) ; i f ( ! retCode ) { r e t u r n retCode ; } r e t C o d e = v a l i d a t e F u n d s A v a i l a b i l i t y ( amount ) ; r e t u r n retCode ; } p u b l i c b o o l w i t h d r a w ( l o n g accountNumber , i n t _amount , i n t _ k i n d ) { a c c n t = v a l i d a t e A c c o u n t ( accountNumber ) ; amount = _amount ; kind = _kind ; b o o l i s V a l i d A m n t = v a l i d a t e A m o u n t ( amount ) ; i f ( isValidAmount ) { withdrawAny ( ) ; } else { System . e x i t ( e r r o r C o d e ) ; } r e t u r n t r u e ; / / Valid t r a n s a c t i o n } ...

}

Figure 1: Withdrawal transaction example used throughout the paper.

89

Table 1: Requirements and their criticality for example program in Figure 1. Requirement r1 r2 r3

Criticality 4 2 3

r4

1

Description Account Validation: Account number ranges from 100 to 999 ATM transaction limit validation: $1000 withdrawal limit per transaction Funds availability validation: The withdrawal amount should be less than available amount in the account Successful withdrawal: Completion of withdrawal successfully

Table 2: Test suite T for program segment in Figure 1. Test case t1 t2 t3 t4 t5

Account (number, balance) 101, $10000 102, $5000 103, $5000 104, $6000 105, $6000

Using requirements r1-r4, we created a test suite, T , for use in testing the program segment. Table 2 shows the details of T . The first column of the table shows the test-case number. The second column shows information about a user’s account: account number and account balance. The third column shows information about a transaction: account number, requested withdrawal amount, and kind of the withdrawal. For example, t3 checks the transaction, which is an ATM withdrawal on account number 103 for $1000, against a user with account number 103 and balance $5000.

2.2

To illustrate the safe RTS technique, we use a safe RTS technique implemented in a tool, D EJAVOO [9]. D EJAVOO creates controlflow graphs1 for the original (Porig ) and modified (Pmod ) versions of a program. The technique traverses these graphs synchronously over like-labeled edges—the edge in Porig and the edge in Pmod have no label, a “true” label, or a “false” label. The technique performs the traversal in a depth-first order to identify dangerous edges—edges whose sinks differ and for which test cases in T that executed the edge in Porig should be rerun on Pmod because they may behave differently in Pmod .

Table 3: Requirements’ traceability matrix for test suite T in Table 2; the criticality values of the test cases are also shown. t1 0 2 0 1

Test Case t2 t3 t4 4 0 0 0 2 0 0 0 3 0 1 0

Safe Regression Test Selection

Under certain conditions, RTS techniques are safe in that the test cases that they omit (Ti − Ti  ) will give the same results on Pi and Pi+1 , and thus, do not need to be rerun on Pi+1 [9, 10].

To represent the associations between test cases and requirements, we created a traceability matrix that shows which test cases cover which requirements. Table 3 shows the traceability matrix for our example.

Requirement r1 r2 r3 r4

Transaction (number, withdrawal, kind) 101, $100, 0 101, $1000, 1 103, $1001, 0 104, $6001, 1 105, $6001, 0

To illustrate the safe RTS algorithm, consider running D EJAVOO for code fragments of versions v0 and v1 of the validateLimit() method in Figure 1. The control-flow graphs for v0 and v1 are shown in Figure 2, with the graph for v0 on the left and the graph for v1 on the right. Using the control-flow graphs, D EJAVOO traverses like-labeled edges starting at the Entry node of each graph, and finds all sinks on the path s6, s7, s8, s9, s10, s11, s12, Exit in v0 identical to the sinks on the path s6, s7, s8, s9, s10, s11, s12, Exit in v1 . When the traversal continues at s6 in the two graphs, it finds that s7 in v0 and s7 in v1 match. However, when the traversal reaches edges e3 in the two graphs, it finds that the sinks, s8 in both graphs, differ (an inspection of the code shows that the statements corresponding to s8 in v0 and v1 differ). The RTS algorithm then marks e3 in v0 as a dangerous edge. The edge coverage matrix for v0 given in Table 4 shows that test cases t1, t3, and t5 traverse e3 in v0 , and thus, they are selected to run on v1 .

t5 0 2 3 0

In the table, the rows represent the requirements and the columns represent the test suite T . A zero entry means that the test case did not cover the requirement. A non-zero entry means that the test case did cover the requirement, and the value of the entry represents the criticality of the test case. The value of the criticality assigned to each requirement represents the importance of the requirement. For our example, criticality values range from 1 to 4, with 1 being the least critical and 4 being the most critical. To illustrate, consider the first row in the table, which shows that requirement r1 is covered by test case t2 and has criticality level 4. This criticality value is also used to order the selected test cases to help in finding the bugs at the early stages of testing [12, 13]. The table shows that the test suite T covers all four requirements, and that some test cases cover multiple requirements.

3.

OUR TECHNIQUE

Our technique for regression test selection is based on system requirements. Our algorithm, R EQUIREMENTS RTS shown in Fig1

A control-flow graph is a representation of a program in which each node represents a statement and each edge represents the flow of control between statements.

90

ALGORITHM R EQUIREMENTS RTS() Input: Rorig : Original requirements for the program Porig T : set of test-cases used to test Porig C: set of changes from Porig to Pmod Output: T  : ordered set of test cases need to be re-run Use: covers(): inputs test case t and requirement r, returns true if test meets t the requirement r affects(): inputs a change ch and requirement r, and return true if change ch affects requirement r sort(): sorts the array in descending order of criticalitay values and returns ordered set of test cases Declare: crr : criticality value for the requirement t, t0 : test case in test-suite T T0 : set of test cases r: requirement for the program Porig array: an array of criticality values indexed by test case cr: criticality value of the test case Rmod : set of modified requirements ch: change in the program Porig m: is a matrix of size [i × j], where i is the number of requirements, and j is the number of test cases in test-suite T

// Step 1 (1) foreach r ∈ Rorig (2) foreach t ∈ T (3) if covers(t, r) (4) m[r, t] = crr (5) endif (6) endfor (7) endfor // Step 2 (8) Rmod = ∅ (9) foreach ch ∈ C (10) foreach r ∈ Rorig (11) if affects(ch, r) (12) Rmod = Rmod ∪ r (13) endif (14) endfor (15) endfor // Step 3 (16) T0 = ∅ (17) foreach r ∈ Rmod (18) foreach t ∈ T (19) if m[r, t] = 0 (20) T0 = T0 ∪ t (21) endif (22) endfor (23) endfor // Step 4 (24) foreach t0 ∈ T0 (25) cr = 0 (26) foreach r ∈ Rmod (27) cr = cr + m[r, t0 ] (28) endfor (29) array[t0 ] = cr (30) endfor (31) T  = sort(array) (32) return T 

Figure 2: Control-flow graphs for v0 (left) and v1 (right).

Table 4: Edge coverage matrix for v0 . Test case Edge t1 t2 t3 t4 t5 e1 1 1 1 1 1 e2 1 1 1 1 1 e3 1 0 1 0 1 e4 0 1 0 1 1 e5 1 0 0 0 0 e6 0 0 1 0 1 e7 1 0 0 0 0 e8 1 0 0 0 0 e9 1 0 0 0 0 e10 1 1 1 1 0

ure 3 gives our technique to perform regression test selection using system requirements. R EQUIREMENTS RTS takes three inputs: Rorig , the original requirements for program Porig ; T , a set of test cases that was created to test Porig ; and C: set of changes from Porig to Pmod . R EQUIREMENTS RTS outputs T  , an ordered set of test cases that need to be re-run on program Pmod . The algorithm uses three external functions: covers(), which returns “true” if test

Figure 3: R EQUIREMENTS RTS algorithm.

91

{t1 , t3 , t5 }). Thus, the test cases need to be rerun for Pmod are {t1 , t2 , t3 , t5 }.

Table 5: Sample requirements traceability matrix M . Requirement r1 r2 .. . rj−1 rj .. . rn−1 rn

t1 c1 c1 .. . 0 ck .. . 0 cl

t2 c2 0 .. . c2 ck .. . c3 0

Test Case . . . ti . . . ck . . . c1 .. .. . . . . . c2 ... 0 .. .. . . ... 0 ... 0

... ... ... .. . ... ... .. . ... ...

tm cl 0 .. . 0 ck .. . 0 cl

In Step 4, R EQUIREMENTS RTS orders the selected test cases based on criticality values. For example, consider the program in Figure 1. The cost array is calculated as {t1 → 3, t2 → 4, t3 → 3, t5 → 5}. The array is sorted on cost in descending order to compute T  as {t5 , t2 , t1 , t3 }.

4.

EMPIRICAL EVALUATION

In this section, we provide our experimental framework, study, and evaluation. In this section, we also address our research questions, which lead to our goals of the study.

4.1

case t meets the requirement r; affects(), which returns “true” if the change ch affects the requirement r; and sort(), which inputs an array of criticality levels and returns an ordered set of test cases. The algorithm also declares and uses some local variables, which are described in the Figure 3.

Experimental Framework

To evaluate our requirements-based regression test selection techniques, we created a tool. Our tool is partially automated and also includes input from the user. We used the tool to perform a case study using two real-world applications. In this section, we first describe our evaluation tool, then, we describe our empirical setup, and finally, we present a study.

R EQUIREMENTS RTS consists of four main steps: (1) creating and initializing the requirements coverage matrix, m, for Porig (lines 1-7); (2) identifying Rmod , the set of modified requirements (lines 8-15); (3) identifying T  , the set of test cases in T to rerun on Pmod (lines 16-23); and (4) ordering T  , the set of selected test cases, based on criticality values associated with each test case (lines 24-31). R EQUIREMENTS RTS then returns T  .

The main goal of our experimental study is to evaluate our technique based on requirements-based regression-test selection technique using a prototype tool. The motivation of this study is to evaluate effectiveness, precision, and safety of our requirementsbased regression test selection technique.

In Step 1, the algorithm initializes m, the matrix for program Porig represented as |Rorig | × |T | matrix, where Rorig is the set of initial requirements, T is the set of test cases. Table 5 shows a sample traceability matrix m. This matrix shows requirements ri (1 ≤ i ≤ n) of Rorig in rows and test cases tj (1 ≤ j ≤ m) of T in columns. The criticality crk is shown as the value for m[ri , tj ]. If test case tj covers requirement ri then, the algorithm assigns the value of crk at m[ri , tj ]. For example, when R EQUIREMENTS RTS is run on program Porig in Figure 1, m is a 4 × 5 matrix where m has four requirements and five test cases. In the example, the matrix element at m[1, 2] shows the coverage of t1 on r2 with criticality value of 4.

Our framework for experimentation consist a prototype tool to evaluate our technique, a set of real-world subjects to evaluate technique using prototype tool, presentation of results of prototype tool to show the effectiveness of the technique, and lessons learned during the experimentation for the purpose of future work and extensions.

4.1.1

Tool

Figure 4 shows the architecture of our tool, which consists of four components: Traceability Matrix Builder, Change Analyzer, Requirements Based RTS, and Prioritizer.

In Step 2, R EQUIREMENTS RTS identifies Rmod . Recall that D EJAVOO (as described in the section 2.2) first traverses controlflow graphs for the original and modified versions to identify the dangerous entities, and then uses the coverage matrix to identify test cases that need to be rerun. R EQUIREMENTS RTS also considers C, the set of changes. For each change, it checks whether the change affects any requirement. The affected requirement is added to Rmod —the set of modified requirements (line 12).

Traceability Matrix Builder, a manual step that requires input from the user, inputs system requirements Rorig for each version v of program P . With each requirement ri of Rorig , the developer/tester creates a test case that will exercise the requirement. While creating these test cases, the developer/tester also creates mappings between requirement ri of Rorig and test case tj of test suite T . This process of associating requirements and test cases is commonly done in development organizations, so building the matrix does not add significant expense to the process. The test cases that are created to test one requirement may be used to test other requirements. This component outputs the requirements coverage matrix M [r × t] and test-suite T .

To illustrate, consider the program in Figure 1. In the example program, the changes {c1 , c2 } affect requirements r2 and r1 . Thus, the algorithm computes Rmod as {r2 , r1 }. In Step 3, R EQUIREMENTS RTS inspects the coverage matrix m to identify the test cases that execute the requirements in Rmod . The test cases are identified for each r, the affected requirement in Rmod by traversing the marked values of m, the coverage matrix. The corresponding test cases for the marked values are added to the set of test cases to be rerun (line 20). For example, consider the program in Figure 1. For Rmod = {r1 , r2 }, T  is computed as the union of test cases that cover r1 (i.e., {t2 }) and r2 (i.e.,

Change Analyzer, a manual inspection step, inputs requirements Rorig for vi of program P , and a set of changes C from vi to vi+1 . The changes C can be bug fixes, additional features, or enhancements to the existing feature in the system. The changes need to be mapped to requirements level. During this step, the practitioner analyzes changes thoroughly

92

Name Cobol2C ProAX

Size in KLOC 20-25 27-30

Table 6: Details of the subjects used in the study. Number of Test-suite versions size Description 3 216 Translates Cobol to C 4 312 Performs program analysis and transformation using user specifications

4.2.1

and maps/traces them back to requirements. We call these changed requirements modified requirements Rmod . This process of identifying which requirements have changed because of the modifications, is also commonly done in development organizations, so should not add significant expense.

The goal of this experiment is to address question RQ1.

Requirement Based RTS, an automatic step, inputs a set of changed requirements Rmod , a set of test cases T , and the requirements coverage matrix M [r × t], where r is a requirement from Rorig and t is a test case from T . In this step, the tool inspects the requirements coverage matrix M for the modified requirements and selects the test cases corresponding to them. This component of tool outputs T  , the subset of test-suite T . Prioritizer, an automatic step, inputs a set of selected test cases T  , and the requirements coverage matrix M [r × t], where r is a requirement from Rorig and t is a test case from original set of test cases T . In this step, the tool computes the criticality value for each selected test case and sorts the selected test cases in decreasing order of criticality values. This component of tool outputs T  , an ordered set of test cases.

4.1.2

Experiment 1

RQ1: What is the effectiveness of our requirementsbased regression test selection in selecting the test cases that need to be rerun on the changed versions of the program? We conducted the first experiment using the requirements that were developed with the subject. We created a traceability matrix using the test suite created to test these requirements. We then ran our tool on three versions of Cobol2C (the original and two versions of it) and four versions of ProAX (the original and three versions of it). For each run, we ran our tool on two consecutive versions (vi , vi+1 ) of our subjects. We ran our tool on versions (v0 , v1 ), (v1 , v2 ) of Cobol2C, and versions (v0 , v1 ), (v1 , v2 ), and (v2 , v3 ) of ProAX. Figure 5 shows the results of the study. The horizontal axis represents the two versions for Cobol2C and the three versions of ProAX in version pair (vi , vi+1 ). The vertical axis represents the percentage (%) of the test suite that our tool selected to run on the changed versions of the program. For regression test selection on version (v0 , v1 ) of Cobol2C, our tool selected 14.81% of the original test suite whereas for version (v1 , v2 ), it selected only 8.79% of the initial test suite. When we ran our tool on ProAX it selected 16.35%, 9.29%, and 20.19% for versions (v0 , v1 ), (v1 , v2 ), and (v2 , v3 ), respectively.

Subjects

For our study, we used two subjects as shown in Table 6—Cobol2C and ProAX. These subjects are real-world applications developed at Tata Consultancy Services Ltd, India (TCS) [1]. Cobol2C is a migration tool written in the C language. Cobol2C migrates COBOL programs to equivalent ANSI C programs. Cobol2C has 20-25 KLOC, depending on the version, and a test suite consisting of 216 test cases. The application’s development started with 38 initial requirements, which we obtained from the developers.

Our tool selects on average 11.8% of the original test suite for Cobol2C, and 15.28% of the original test suite for ProAX. The results of the first experiment show the effectiveness of our technique in selecting, for the subjects we studied, only small percentage of the original test suite to test the changed requirements.

ProAX, is a programmable analysis and transformation framework [5]. ProAX inputs a file containing specifications to perform various kinds of program analyses and transformations, and outputs a Java class file. ProAX has 30-35 KLOC, depending on the version, and a test suite consisting of 312 test cases. ProAX has 46 initial requirements, which we also obtained from the developers. Most of these requirements are overlapping in nature. The test-suite developed for corresponding requirements reuses the test cases. For example, consider the requirement specification in ProAX to parse a set union and generate equivalent Java classes and Java source code. The ProAX test suite was designed to have eight test cases to test this requirement. However, six of these eight test cases are also used for other requirements.

4.2.2

Experiment 2

The goal of second experiment is to address research question RQ2. RQ2: What is the safety and precision of our requirementsbased regression test selection technique? This experiment uses the traceability matrix created in the first experiment. In this experiment, we also used the results that were obtained in the first experiment. For regression test selection, as described in Section 2.2, a safe regression test selection technique includes all cases that might reveal differences between the original and modified versions of the program. A safe technique does not omit important test cases from the test suite to be rerun, and thus, the reduced test suite selected will provide the same information about the program as running the entire test suite. Test cases that should have been selected but that a technique omits are false negatives. A technique is safe if the reduced test suite it selects has no false negatives. For regression

4.2 Study We performed a study using our experimental setup. For the study, we used both Cobol2C and ProAX to evaluate our technique. We performed three experiments to evaluate the effectiveness, safety, and precision of our technique.

93

Figure 4: Architecture for the requirements coverage based RTS.

Figure 5: Results to show the effectiveness of our technique. test selection, a precise regression test selection technique includes no test cases that do not need to be rerun. A precise technique is efficient because unnecessary test cases will not be run on the modified program. Test cases that are selected by a technique but are not necessary are false positives. A technique is precise if the reduced test suite it selects has no false positives.

versions of it). As in Experiment 1, we used pairs of consecutive versions of the programs. The results are shown in Table 7. The table has five columns. The first column shows the versions on which the our tool was run; the second column shows the number of modified requirements for version; the third column shows the number of test cases that were selected using our technique; the fourth column shows the false positives—false positives represent test cases that do not need to be rerun but were selected using our technique; the fifth column shows the false negatives—false negatives represent test cases that need to be rerun but were not selected using our

To measure safety and precision of our technique in practice, we ran our tool on three versions of Cobol2C (the original and two versions of it) and four versions of ProAX (the original and three

94

Table 7: Results to show the efficiency and safety of the technique. Number of Test cases False False Versions changed requirements selected positives negatives Cobol2C (v0 , v1 ) 2 32 11 0 (v1 , v2 ) 1 19 8 0 ProAX (v0 , v1 ) 3 51 14 0 (v1 , v2 ) 2 29 10 0 (v2 , v3 ) 4 63 18 0 Table 8: Results to show the precision of our technique in comparison with D EJAVOO.

(v0 , v1 ) (v1 , v2 ) (v2 , v3 )

Number of changed requirements

Test cases selected

3 2 4

51 29 63

Our tool results False False positives negatives ProAX 14 0 10 0 18 0

45 32 61

8 13 16

False negatives 0 0 0

using either our technique or D EJAVOO. The results of study on v1 and v2 , shown in the second row, reveal better effectiveness of our technique over D EJAVOO. Our technique shows only 10 false positives whereas D EJAVOO shows 13 false positives. These scenarios happen when the mapping in the traceability matrix building step becomes close to ideal. This experiment clearly shows that, for the subjects studied, using our technique we can achieve the precision levels of RTS using D EJAVOO.

technique. For example, consider results in Table 7, our technique selects 32 test cases with 11 false positives and 0 false negatives when run on versions v0 and v1 of Cobol2C. Our second experiment shows that the use of our tool results in few false positives and no false negatives. Our experiment also shows that, for the subjects we studied, only small percentage of original test suite is required to test the changes in the requirements.

4.2.3

DejaVOO results Test cases False selected positives

Experiment 3

4.3

The goal of this experiment is to address research question RQ3.

Threats to Validity

There are several threats to the validity of our study and what can be drawn from it. Threats to internal validity arise when factors affect the dependent variables without the researchers’ knowledge. Our technique does not use code coverage or any other type of static dependence results, such as that used by code-coverage tools, like D EJAVOO. Thus, our results depend significantly on the quality of the traceability matrix and test-case development. If the requirements and test cases are accurate in terms of the test cases that exercise the different requirements, even though they are not originally selected for those requirements, then our technique will have few false negatives. If this association is is not well done, and test cases that do exercise some requirements are not associated with them in the matrix or test cases are associated with requirements that they do not exercise, then our technique may identify false positives.

RQ3: How does the precision of our requirementsbased regression test selection technique compare to safe regression test selection techniques based on system models or source code? In the third experiment, to measure the precision of our tool based on requirements-based regression-test selection technique, we ran D EJAVOO [9] on consecutive versions of ProAX and compared the results with what we got with our technique. We conducted this study only on ProAX because it is written in Java and D EJAVOO works on Java applications. Cobol2C was written in C and so we did not use this subject for this experiment. Table 8 shows the results. In the table, the first column shows the versions (vi , vi+1 ) on which the experiments were performed. The second column shows the results in form (number of test cases selected, false positives, false negatives). The third column shows results in the form of (number of test cases selected, false positives, false negatives) when run on using D EJAVOO. For example, consider the first row in Table 8. The first column shows versions (v0 v1 ). The second column shows that v1 has three modified requirements over v0 . Our tool results show that 51 test cases are selected from 312 test cases for two requirement changes in v0 , whereas D EJAVOO selects 45 test cases need to be re-run. There are 14 false positives for our tool and 8 false positive for D EJAVOO. However these false positive are not the same because of the way we built the requirements coverage matrix. There are no false positives

Threats to external validity arise when the results of the experiment are unable to be generalized to other situations. We studied only two subjects from industry, including the requirements and test cases created for them. Without many additional experiments, we cannot know how these subjects represent industrial software, how their requirements are represented, and how test cases are created and associated with these requirements. However, we believe that in good development environments, requirements are created and recorded, and test cases are created to test them. Threats to construct validity arise when the metrics used for evaluation do not accurately capture the concepts that they are meant to

95

6.

evaluate. Selecting all modified requirements associated with test cases for inclusion in the test cases to rerun T  does not guarantee that test cases in T − T  can be safely ignored for testing the modified requirements. Our technique is based on traceability matrix that is built using requirement coverage. At the code level, the implementation of these requirements may use dependent variables or global data. The requirements mapping may not represent the actual code to test case mapping. The requirement ri for test case tj can also be reused for requirement rk depending the kind of requirement. The reuse of a test case created for one requirement for another requirement is a common a policy in the test-case design and development. Our technique is more effective if there are no overlapping requirements and test cases. We believe that test-case designer/developers use such policy while creating test cases, but the test cases may still overlap. Currently, our technique does not account for this overlap.

ACKNOWLEDGEMENTS

This work was supported in part by National Science Foundation awards under CCR-0096321, CCR-0205422, and SBE-0123532 to Georgia Tech, and by Tata Consultancy Services, Ltd.

7.

REFERENCES

[1] Tata Consultancy Services Limited. http://www.tcs.com/AboutUs/AboutUs.html. [2] T. Ball. On the limit of control flow analysis for regression test selection. In In ACM Int’l Symp. on Software Testing and Analysis, pages 134–142, March 1998. [3] B. Beizer. Software Testing Techniques. Van Nostrand Reinhold, 1990. [4] Y. F. Chen, D. S. Rosenblum, and K. P. Vo. Testtube: A system for selective regression testing. In Proceedings of the 16th International Conference on Software Engineering, pages 211–222, May 1994. [5] P. K. Chittimalli, M. Bapat, and R. D. Naik. ProAX: A program analysis and transformation framework. In TCS Technical Architect’s Conference (TACTiCS 2004), Hyderabad, December 2004. [6] P. K. Chittimalli and M. J. Harrold. Re-computing coverage information to assist regression testing. In International Conference on Software Maintenance (ICSM 2007), pages 164–173, October 2007. [7] D. Kung, J. Gao, P. Hsia, Y. Toyashima, and C. Chen. Firewall regression testing and software maintenance of object-oriented systems. 1994. [8] H. K. N. Leung and L. J. White. Insights into regression testing. In Proceedings of the Conference on Software Maintenance - 1989, pages 60–69, 1989. [9] A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In Proceedings of the 12th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2004), pages 241–252, November 2004. [10] G. Rothermel and M. J. Harrold. A safe, efficient regression test selection technique. In ACM Transactions on Software Engineering and Methodology, pages 6(2):173–210, April 1997. [11] G. Rothermel, M. J. Harrold, and J. Dedhia. Analyzing regression test selection techniques. In IEEE Transactions on Software Engineering, pages 22(8):529–551, August 1996. [12] G. Rothermel, R. Untch, C. Chu, and M. J. Harrold. Test case prioritization. In Technical Report GIT-99-28, College of Computing, Georgia Institute of Technology (GIT-99-28 1999), Georgia Institute of Technology, December 1999. [13] G. Rothermel, R. Untch, C. Chu, and M. J. Harrold. Test case prioritization: An empirical study. In Proceedings of theInternational Conference on Software Maintenance (ICSM 1999), pages 179–188, Oxford, England, September 1999. [14] F. Vokolos and P. Frankl. Pythia: A regression test selection tool based on text differencing. In International Conference on Reliability, Quality and Safety of Software Intensive Systems, May 1997.

5. DISCUSSION AND FUTURE WORK In this paper, we have presented our technique for regression test selection using system requirements. The technique uses the associations between the system requirements and test cases, along with the changes, to select a subset of the original test suite for use in retesting the modified software. We also presented a tool, that is partially manual, that directs the process of selecting the test cases. The paper also presented a empirical evaluation on two real subjects from industry—these subjects came with modified versions and sets of system requirements and test cases that were used to test the requirements. Our results show that, for the subjects we studied, our approach was able to reduce the test cases that need to be rerun to test the changed software. The results show that, there is significant percentage (86%) reduction of the test cases to be rerun to test the modified requirements. Our results also show that our approach selects few unnecessary test cases and compares well with the code-based approach that requires code coverage. In the second experiment, the number of false positives that our technique computed show that our technique performs similar to RTS code-coverage techniques, such as D EJAVOO. Also, this experiment produced no false negatives, which means that, for this subject, requirements, test cases, and changes, our technique omitted no test cases that could show a difference from the original to the modified version of the software, and thus, omitting them is safe. Encouraged by these results, we are currently investigating several areas of future work. First, we are planning more experiments with other systems. We especially want to investigate the quality of the matrix that is created to associate requirements with test cases. For those associations, we want to determine the quality of those created from practical scenarios to and provide guidelines for methods for creating matrices that will help the technique with better test selection. Second, in these studies, we discovered situations in which test cases were used for multiple requirements. However, in practice, architectures are multi-tiered where the application has modules using different technologies, such as J2EE, Mainframe, and .NET. In this case, each of these modules will have different regression test suites, and they need to be integrated together to provide fewer false positives in the retesting. We plan to investigate how we can use our technique for these architectures.

96