Locating Program Features by using Execution ... - Semantic Scholar

Locating Program Features by using Execution Slices W. Eric Wong Bellcore Morristown, NJ [email protected]

Swapna S. Gokhale University of California Riverside, CA [email protected]

Joseph R. Horgan Bellcore Morristown, NJ [email protected]

Kishor S. Trivedi Duke University Durham, NC [email protected]

Abstract

An important step toward ecient software maintenance is to locate the code relevant to a particular feature. In this paper we report a study applying an execution slice-based technique to a reliability and performance evaluator to identify the code which is unique to a feature, or is common to a group of features. Supported by tools called ATAC and Vue, the program features in the source code can be tracked down to les, functions, lines of code, decisions, and then c- or p-uses. Our study suggests that the technique can provide software programmers and maintainers with a good starting point for quick program understanding.

Keywords: program comprehension, program feature, execution slice, invoking test, excluding test, unique code, common code

1 Introduction The modules in a well-designed software system should exhibit a high degree of cohesion and a low degree of coupling, such that each module addresses a speci c subfunction of the requirements and has a simple interface when viewed from other parts of the program structure [15, 16]. Cohesion is a measure of the relative functional strength of a module and is a natural extension of the concept of information hiding. A cohesive module should ideally do just one thing. Coupling is a measure of interconnection among the modules in a program structure that depends on the interface complexity between the modules. Although high cohesion and low coupling are very desirable characteristics, achieving them in practice is extremely dicult. Low cohesion and high coupling are inevitable because most language parameters are ubiquitous, which results in program features being mixed together in the code. Programmers, in the early development stage of the system, may try to follow certain standards [8, 14] to ensure a clear mapping between each feature and its corresponding code segments. However, as development continues, the pressure to keep a system operational will probably lead to the exclusion of such 1

traceability. Thus, as the software ages it is more likely that the program features are implemented over modules which are seemingly unrelated. A programmer's/maintainer's understanding can deteriorate based on such delocalized structures, and this can often lead to serious maintenance errors [18, 21]. An important step toward ecient software maintenance is to locate the code relevant to a particular feature. There are two methods for achieving this [12]. First, a systematic approach can be followed that requires a complete understanding of the program behavior before any code modi cation. Second, an as-needed approach can be adopted that requires only a partial understanding of the program so as to locate, as quickly as possible, certain code segments that need to be changed for the desired enhancement and bug- xing. The systematic approach provides a good understanding of the existing interactions among program features, but is often impractical for large and complex systems which can contain millions of lines of code. On the other hand, the as-needed approach, although less expensive and less time-consuming, tends to miss some of the non-local interactions among the features. These interactions can be very critical to avoiding unexpected side-eects during code modi cation. Thus, the need arises to identify those parts of the system that are crucial for the programmer and maintainer to understand. A possible solution is to read the documentation, and studies have been conducted on the eective design of documentation to compensate for delocalized plans [18]. However, it is not uncommon to nd inadequate and incomplete documentation of a system. Even when a document is available, programmers and maintainers may be reluctant to read it. Perhaps a faster and more ecient way of identifying the important segments of the code is to let the system speak for itself. This paper reports our study of this issue. Both static and dynamic slices can be used as an abstraction to help programmers and maintainers locate the implementation of dierent features in a software system [3, 9, 10, 13, 19]. However, a static slice is less eective in identifying code that is uniquely related to a given feature because it, in general, includes a larger portion of program code with a great deal of common utility code. On the other hand, collecting dynamic slices may consume excessive time and le space. In this paper we use an execution slice-based technique, where an execution slice is the set of program components (either basic blocks, decisions, c-uses, or p-uses)1 executed by a test input. Compared with static- or dynamic-based techniques, a clear advantage of applying our approach is that if the complete traceability of each test has been collected properly during the testing (see Section 3 for details), such information can be reused directly without any additional sophisticated slicing analysis. As a result, not only can testers use code coverage to improve their con dence in quality, but programmers and maintainers can also bene t from the same information to eciently locate code that implements features. 1

An explanation of basic block, decision, c-use and p-use can be found in Section 2.4.

2

The eectiveness of using the execution slice-based technique depends on several important factors, as shown in Figure 1.

Heuristics: Several studies have been reported [20, 21, 22] using some heuristics to identify

code from the feature point of view. However, these studies touch only a very speci c part of the problem. Since, in general, the notion of an execution slice-based technique is very abstract, it is incorrect to assume that a single heuristic can be applicable to all scenarios. In fact, depending on the need, dierent heuristics have to be applied in mapping program features to code. As a result, developing novel heuristics and experimenting with them to explore their potential will provide maximum bene ts. In this paper we present several heuristics to identify code that is either unique to a given feature (Section 2.1) or common to a group of features (Section 2.2). We also discuss, in Section 5.4, the impact of the feature implementation and the invoking and excluding tests on selecting the heuristic.

Test cases: As indicated above, code identi ed by the execution slice-based technique depends

not only on which heuristic is applied and how the features are implemented, but also on how tests are selected. In Section 2.3, we discuss the relation between the invoking and excluding tests in terms of their execution slices. Such a relationship depends heavily on the goal, that is, whether we are looking for the code unique to a given feature or common to a group of features. In Section 5.4, we explore a word-around solution for when no focused invoking tests exist with respect to the feature being located, i.e., no test which will exhibit only this feature and no others. These discussions and explorations, either completely ignored or only slightly acknowledged in the previous works, provide important guidance in helping us select appropriate tests.

Granularity: Program features in our study are located by using execution slices with respect

to both control ow and data ow analyses to provide dierent details, whereas in others only control ow is used. This implies we can present code that is unique to a feature or common to a group of features at dierent levels of granularity (basic blocks, decisions, c-uses, and p-uses) rather than only with respect to a branch or a line of code. The importance of locating featurerelated code at ner granularity levels such as c-use and p-use is explained in Section 5.1.

Tool support: A very important factor for a technique to be applicable in real-life contexts

is that it be supported by some automated tools. The tools used in our study can generate the execution slice with respect to any given test. They also provide a graphical interface for 3

11 00 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 0000000000 1111111111 00 11 0000000000 1111111111

Execution slice-based technique

1 0 0 1 0 1 0 1 0 1 0 1 00000000 11111111 0 1 00000000 11111111 0 1

Heuristics

1 0 0 1 0 1 0 1 0 1 0 1 00000000 11111111 0 1 00000000 11111111 0 1

0 1

0 1 1 0 1 Granularity 0 0 1

Test cases

0 1 11111111 00000000 0 1 00000000 11111111 0 1

0 1

0 1 1 0 1 Tool support 0 0 1

0 1 11111111 00000000 0 1 00000000 11111111 0 1

1 0 0 1 0 1 0 1 0 1 How features are implemented 0 1 0 1 in the program 0 1 0 1 0 1 000000000000000000 111111111111111111 0 1 000000000000000000 111111111111111111

Figure 1: Important factors of the execution slice-based technique visualizing code related to a feature (or features) (see Figures 4 and 5 in Section 3). Compared with the tool in reference [20, 21, 22] which only shows such code in an annotated ascii interface, we believe these tools will make our technique more attractive to practitioners. It is clear that there is a close relationship between the heuristic applied, the invoking and excluding tests selected, and the way in which a feature is implemented. Each of these three has a profound impact on the code identi ed by an execution slice-based technique. In addition to the heuristics, we provide a thorough discussion of the impact of the last two factors on the quality of the code identi ed, whereas most prior eorts have only focused on the de nition, demonstration and selection of heuristics. The remainder of this paper is organized as follows. Section 2 explains the general concepts including heuristics for nding code, selection of invoking and excluding tests, and division of programs into components. Section 3 describes tool support for the execution slice-based technique. Section 4 presents a case study to show the merits of our technique. Lessons learned from our study are discussed in Section 5. Our conclusions appear in Section 6.

2 General Concepts A program feature can be viewed as an abstract description of a functionality given in the speci cation. One good way to describe the features of a given program is through its speci cation. For example, the speci cation of the UNIX wordcount program (wc) is to count the number of lines, words, and/or characters given to it as input. Based on this, we can specify three features (with respect to three functionalities): one which returns the number of lines, another which returns the number of words, and one which returns the number of characters. 4

Suppose programmers and maintainers understand the features of the program being considered and can determine whether a feature is exercised by a test. For a given program P and a feature F , an invoking test is a test which when executed on P shows the functionality of F ; an excluding test is one that does not. For example, let us use the UNIX wordcount program again. Suppose F is the functionality to count the number of lines. A test (say t1 ) such as \wc -l data" that returns the number of lines in the le data is an invoking test; whereas another test (say t2 ) \wc -w data"that gives the number of words (instead of the number of lines) in the le data is an excluding test. An invoking test is said to be focused on a given feature if it exhibits only this feature and no other features. Following the same example as above, t1 is also a focused test with respect to F which counts the number of lines. But this is not true for a test \wc data" (referred to as t3 ) even though it also returns the number of lines in the le data. This is because in addition to the number of lines, t3 also returns the number of words and characters. That is, t3 also exhibits features which count the number of words and characters, respectively. There are many ways in which execution slices of invoking and excluding tests may be compared to identify pieces of code that are related to F . For example, we can form a union of the execution slices of all the invoking tests to nd a set of code that is used to implement F . One clear problem of this approach is that some code which has nothing do with F may also be included unless all the invoking tests exhibit only F and no other features (i.e., all the invoking tests are focused on F ). We can also create an intersection of these execution slices to identify code that is executed by every test which exhibits F . Since it is impossible, in general, to identify all the invoking and/or excluding tests for a given P and F , a practical alternative is to run P using a small, carefully selected set of tests, say T , with some exhibiting F and others not.2 Hereafter, we consider tests in T instead of all possible tests for the program. Let Sinvoking and Sexcluding represent program components that are executed by at least one test in T that exhibits F and those in T that do not exhibit F , respectively. Similarly, Tinvoking is the set of components that are executed by every invoking test in T .

2.1 Heuristics for nding code unique to a feature One simple approach is to compare the execution slice of just one invoking test with that of one excluding test. To minimize the amount of relevant code identi ed, the invoking test selected may be the one with the smallest execution slice (in terms of number of blocks, decisions, c-uses, or p-uses in the slice) and the excluding test selected may be the one with the largest execution slice. Another 2

Ideally, we would like to have all such tests to be the focused invoking tests on F .

5

approach is to identify code that is executed by any invoking test but not by any excluding test (i.e., S invoking - excluding ). In other words, code that is in the union of invoking tests, but not in the union of excluding tests, is identi ed. In a third approach, similar to the second, we can identify program components that are commonly executed by all invoking tests but not any excluding test (i.e., Tinvoking - Sexcluding ). This implies that the identi ed program components are in the intersection of all invoking tests but not in the union of excluding tests. As a result, we only select those program components that are always executed when the feature is exhibited, but not otherwise.

S

Depending on how features are implemented, programmers and maintainers may have to try all these approaches (see Section 5.4) or construct their own approaches in order to nd the best set of code related to a given feature. In the study reported in Section 4, we found the second approach works best for identifying code uniquely related to each of the ve features in Figure 3. We summarize this approach as follows: 1. Find Sinvoking : This includes components that implement F ; some of them may also be used to implement other features. 2. Find Sexcluding : This includes components that implement features exhibited by the set of excluding tests. It may also include code segments that are common to F and other features.

3. Subtract Sexcluding from Sinvoking : Components that are executed by tests that exhibit F but not those that do not exhibit F are uniquely related to F .

2.2 Heuristics for nding code common to more than one feature Code common to a pair of features, say F1 and F2 , is that which is executed by at least a test that exhibits only F1 and not F2 , and at least a test that exhibits only F2 and not F1 . One way to nd such code is to run the program using a few carefully selected tests which exhibit only F1 and no other features (or at least not F2 ); and a few other tests which exhibit only F2 and no other features (or at least not F1 ). We then take the intersection of the set of code related to feature F1 with the set of code related to feature F2 (i.e., by the intersection of Sinvoking for F1 and Sinvoking for F2 ). Similarly, code common to all features can be identi ed by taking the intersection of the sets of code executed by each individual feature, that is, intersection of Sinvoking for Fi , i = 1; : : :; n, where n is the number of features in the application, and Sinvoking for Fi is the union of the code executed by tests which exhibits only Fi .

2.3 Selection of invoking and excluding tests

In theory, to nd Sinvoking or Sexcluding , one may have to run all possible tests for the program. In practice, only a few carefully selected tests are needed. Dierent sets of code may be identi ed by dierent 6

sets of invoking and excluding tests. Poorly selected tests will lead to inaccurate identi cation by either including code that is not unique to a given feature, or excluding code that should not be excluded. When the invoking tests for a given feature are selected, they should be focused, as much as possible, with respect to this feature. To nd code unique to a feature, the excluding tests should be as similar as possible, in terms of the execution slice, to the invoking tests so that as much common code can be ltered out as possible. Similarly, while identifying code common to a group of features, one would like to have the invoking tests, in terms of the execution slice, for a feature be as dissimilar as possible to the invoking tests for other features in the group. This enables the exclusion of maximum uncommon code. To illustrate this concept, we use the sample code in Figure 2. Note that the code is written in a free format to explain its functionality. It is not intended to follow the syntax of any computer programming language. Suppose we want to nd the code that is uniquely used to compute the area of an equilateral triangle. We rst construct an invoking test t1 that exhibits this feature and two excluding tests t2 and t3 that compute the area of of an isosceles triangle and a rectangle, respectively. Clearly, t1 is closer to t2 than t3 . The dierence between the execution slices of t1 and t2 shows that only statements s10 and s13 are unique to this feature, whereas additional code, such as the statements s3 to s7, would also be identi ed (but should not be) if t3 is used in place of t2 as the excluding test. Furthermore, this example also indicates we do not even need to use the feature that computes the area of a rectangle to nd code that is unique to computing the area of an equilateral triangle. The ability to identify program components unique to a feature without the necessity of knowing all the program's features greatly enhances the feasibility of using the execution slice-based technique to quickly highlight a small number of program components that are important to programmers and maintainers following the as-needed program understanding strategy.

2.4 Division of programs into components Natural components of a C program might be les or functions3 within a le, but in many cases such a classi cation may not be satisfactory. For example, code in a function may be used to implement more than one feature which precludes this function from being unique to any of these features. To solve this problem, we need to further decompose the program. Four additional ner categories are used in this paper: blocks, decisions, c-uses, and p-uses. If necessary, programs can also be decomposed into a broader category such as a subsystem, which may contain many les and functions. Since the program used in our study is written in C (see Section 4.1), we use the term function instead of procedure or subroutine. 3

7

Figure 2: Sample code written in a free format. A basic block, or a block, is a sequence of consecutive statements or expressions, containing no branches except at the end, such that if one element of the sequence is executed, all are. This de nition assumes that the underlying hardware does not fail during the execution of a block. A decision is a conditional branch from one block to another. A de nition of a variable is a statement (or an expression) that assigns a value to it, and a use of that variable is an occurrence of it in another statement (or expression). Uses are classi ed as c-uses if the variable appears in a computational statement (or expression) and p-uses if the variable appears inside a predicate.4 Note that while a c-use is a pair of a de nition and its corresponding use with respect to a variable, a p-use includes not only the de nition and the use of a variable but also a decision on the predicate in which the variable is used. For example, the de nition and the use of w at s18 and s19, respectively, in Figure 2 and the true branch (s20) constitute a p-use. Similarly, the same de nition-use pair and the false branch (s22) make another p-use. We now use the same code in Figure 2 to illustrate components in c-uses that are unique to equilateral triangle. We compute the dierence between the set of c-uses executed by a test for equilateral triangle (t1 in Section 2.3) and that by a test for isosceles triangle (t2 in Section 2.3). We nd that the de nition of the variable a via the read statement at s3 is uniquely used at s13 for this feature (i.e., the one to compute the area of an equilateral triangle). However, the de nition of the variable s at s14 and its use at s15 have nothing to do with the feature. This type of information provides a ner view of whether a def-use pair of a variable has anything to do with a given feature. It is not our attempt to enumerate all possible ways to de ne or use a variable. Readers who are interested in the details of how blocks, decisions, c-uses, and p-uses are de ned in our study should refer to references [6, 7]. 4

8

3 Tool Support for Execution Slice-Based Technique Collecting the execution slice of each test input by hand can be very time consuming and prone to errors. Therefore, a tool which automatically analyzes and collects such information is necessary before any studies can be conducted on large complicated systems. In our study, we used Vue [1, 4] (a tool built on top of ATAC [6, 7]) for visualizing features in code. Given a program, ATAC computes its set of testable attributes (blocks, decisions, c-uses, and p-uses) and instruments it at compile time. Once the instrumentation is complete, an executable is built on the instrumented code. Each time the program is executed on a test, new trace information with respect to that test in terms of how many times each block, decision, c-use, and p-use is executed by that test (i.e., the complete traceability matrix of that test) is appended to a corresponding trace le. With this information, the execution slice of a test can be represented in terms of the blocks (decisions, c-uses, or p-uses) executed by that test. Through the Vue interface, tests that exhibit the feature being examined are moved into the invoking category and those that do not, into the excluding category. However, not every test has to be categorized in this way. Some tests can stay in the default dont know category, which means we either do not care about these tests or we simply do not know whether these tests exhibit the feature or not. Code identi ed using heuristics discussed earlier is displayed in a graphical interface to allow programmers and maintainers to visualize where a feature is implemented in the program. Two examples of this appear in Figures 4 and 5. Both ATAC and Vue are part of SudsT M ?a Software Understanding System developed at Bellcore. More information about Suds can be found at http://xsuds.bellcore.com.

4 A Case Study We now present a case study to show our experience of using the execution slice-based technique to identify code that is uniquely related to a feature, or is common to a pair or a group of features.

4.1 The target system SHARPE, a Symbolic Hierarchical Automated Reliability and Performance Evaluator [17] which analyzes stochastic models, was used in this study. SHARPE contains 35,412 lines of C code in 30 les, and has a total of 373 functions. It was rst developed in 1986 for three groups of users: practicing engineers, researchers in performance and reliability modeling, and students in engineering and science courses. Since then, several major revisions have been made to x bugs and adopt new requirements. SHARPE provides a speci cation language and analysis algorithms for nine dierent model types such 9

FEATURES SUB-FEATURES cdf, prob, tvalue

MC

SHARPE

MRM

sreward, exrss, exrt, cexrt, rvalue prob, cdf

GSPN

etok, prempty, util, etokt, premptyt tputt, utilt, tavetokt, tavtputt, cdf

PFQN

tput, rtime, qlength, util, mtput mrtime, mqlength, mutil

cdf, pqcdf

FT

Figure 3: Features of SHARPE as Markov Chains (MC), Markov Reward Models (MRM), Generalized Stochastic Petri-Nets (GSPN), Product-Form Queuing Networks (PFQN), and Fault Trees (FT). Various transient and steady-state measures of interest are possible using the built-in functions in SHARPE. In this study, we view the speci cation of a model and the built-in functions that can be used for computing the measures of interest pertaining to that model as a feature. For example, the speci cation of Markov chains and the grouping of the built-in functions cdf , prob and tvalue, which facilitate the analysis process, constitute a single feature, while the speci c functions (cdf , prob, tvalue) form the subfeatures. The ve features used in this study are shown in Figure 3. We will refer to MC , MRM , GSPN , PFQN , and FT along with their respective built-in functions as features F1, F2, F3 , F4 , and F5 , respectively. A subfeature j of a feature i is denoted as Fi;j . For example, etok is denoted as F3;1 and prempty as F3;2. The subfeatures need not be unique to a particular feature and can be shared among features. An example of this is the subfeature cdf which is shared by features F1 , F2 , F3 , and F5 as F1;1, F2;7 , F3;10, and F5;1.

4.2 Data collection For each feature, a small set of invoking and excluding tests was carefully selected. Heuristics, as discussed in Sections 2.1 and 2.2, were used to nd blocks, decisions, c-uses, and p-uses that are unique to each feature, or are common to a pair or all ve features.

10

4.3 Observations Data collected in our experiments, such as the identi ed les, functions, blocks, and decisions, were given to experts who were familiar with SHARPE for veri cation. The results are very promising as these identi ed program components are unique to a feature as they should be, shared by a pair of features, or common to all ve features. No complete veri cation was done with respect to the identi ed c-uses or p-uses because it is very dicult for humans to have a complete understanding of a complicated system like SHARPE at such ne granularity. Nevertheless, our experts did examine some of the identi ed c-uses and p-uses and agreed to the descriptions assigned to each of them. Table 1: Number of blocks unique to F3;j ; 1 j 10 analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F3 1

F3 2

F3 3

F3 4

33 11

63 32

33 11

63 32

;

9

3 3 21

F3 8

F3 9

F3 10

33 11

63 32

63 32

3

3

136 7 31 35

136 7 31 27

307

307

37 25 15 20

37 25 15 20

728 224 798 52 214 18 219 737 6.19 1.91 6.79 0.44 1.82 0.15 1.86 6.27

729 6.20

;

26

226 1.92

;

26

3

3

1 136 7 31 26

136 7 31 39

137

9

;

9

3 3 19

F3 5 ;

;

9

F3 7 ;

33 11

19

3 3 14

12

137

307

307

37 25 14 20

72 29 33 20

;

9

1

137

9

F3 6

25 4 4

4

3 3 14

;

26

;

26

137

5

9

Note: A blank entry means no block in the corresponding le is unique to F3 . ;j

Based on these data, we analyzed code unique to every subfeature for all ve features. Due to space constraints, we report the results only for the subfeatures of F3 . The choice of F3 was com11

Blocks highlighted are unique to feature F3,1

Figure 4: Display of blocks unique to feature F3;1 pletely arbitrary. Table 1 shows the number of blocks identi ed on a per le basis for the subfeatures F3;j ; 1 j 10. As can be seen, a total of 226 blocks were identi ed as unique to feature F3;1 from 11; 752 blocks that constitute the entire application. This implies only about 1:9% of the total blocks are identi ed as unique to F3;1 . Blocks so identi ed are important to programmers and maintainers because they provide a good starting point for understanding how F3;1 is implemented. Analysis at dierent granularity also nds only a small percentage, that is, 148 of the 7,093 decisions (2.09%), 418 of the 21,936 c-uses (1.91%), and 264 of the 15,122 p-uses (1.75%) are identi ed as unique to F3;1. A similar observation (i.e., only a small percentage is identi ed) applies to other subfeatures of F3 as well. A le-wise summary of decisions, c-uses, and p-uses unique to the subfeatures of F3 is shown in Appendix A. The code unique to each subfeature can be viewed via the graphical interface of Vue. For the purpose of illustration, unique blocks and decisions identi ed based on the three steps discussed in Section 2.1 for feature F3;1 are shown in Figures 4 and 5. As for the subfeatures of F1 , F2 , F4 , and F5 , although detailed data are not shown here, analyses of these data provide a similar observation as that for subfeatures of F3 . At no matter which granularity level, it is always only a small percentage of program components de ned at that level that are identi ed as unique to the corresponding subfeature. 12

The true branch, not the false branch, is unique to feature F3,1

The false branch, not the true branch, is unique to feature F3,1

Figure 5: Display of decisions unique to feature F3;1 Code common to pairs of features was also analyzed. All ve features were used for this purpose, giving us a total of ten pairs. Table 2 shows the number of blocks common to the various possible pairs of features for every le. Similar summaries of decisions, c-uses, and p-uses are presented in Appendix B. Such information can be very useful in preventing certain types of unexpected maintenance errors. For example, a programmer may make a change to a function with one feature in mind without realizing that such a change may also aect another feature. In our case, 65 blocks in le analyze.c are used to implement both features F1 and F2 . As a result, a change in these blocks may aect not only F1 but also F2 . After being shown these data, experts who know SHARPE well indicated that results collected using our technique could provide some surprising insights about the target program. This is especially true when we move the granularity to c-uses and p-uses because it is very dicult to have a complete understanding of a complicated system like SHARPE at such ne granularity. Next, we present another view by showing in Table 3 the number of blocks, decisions, c-uses, and puses common to all ve features. This information can help programmers and maintainers understand whether a change in certain code will have global impact on all the features. In addition, such code in some sense represents the \utility code" and may have a potential for reuse if the program were to be expanded at a later stage. 13

Table 2: Number of blocks common to each pair of features analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F1 =F2

F1 =F3

F1 =F4

F1 =F5

F2 =F3

F2 =F4

F2 =F5

F3 =F4

F3 =F5

F4 =F5

363

308

200

124

299

201

123

200

124

143

63

63

71

51

66

66

54

66

54

55

137

138

130

68

136

43 135

49 73

134

73

43 75

211 83

181 83

95

183

186 147

95

180

95

180

94

191 237

181 196

164 178

145

181 208

171 243

217

170 176

136

219

144

154 75 336 399 260 237 30 237 155

144

144

144

145

25

343

25

25

65

64

335 246 25 238 172

64

64

64

64

16

181 25 229 173

142 25 169 151

246 25 242 248

181 25 234 261

161 24 170 236

181 25 233 243

139 27 169 221

103 24 162 233

1679 14.29

1058 9.00

2535 21.57

1888 16.07

1287 10.95

1757 14.95

1123 9.56

1167 9.93

140 2510 21.36

3237 27.54

Note: The notation F1 =F2 indicates code common to features F1 and F2. A blank entry means no common block in the corresponding le.

5 Lessons Learned In this section, based on our case study, we consider some pros and cons of using the execution slicebased technique to map program features to code.

5.1 Identi cation of starting points for program understanding The code identi ed using the execution slice-based technique (either unique to a feature or common to a group of features) can be used as a starting point for studying program features. This is especially useful for those features that are implemented in many dierent modules in a large, complicated but poorly documented system. However, it is also important to realize that our technique may not nd 14

Table 3: Code common to all ve features analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

Number of Number of Number of Number of blocks decisions c-uses p-uses 122

32

68

32

51

34

44

45

64

29

11

17

94

40

72

36

123

72

160

56

13 24 161 136

64 15 114 68

182 37 205 109

111 18 159 95

788 6.71

468 6.60

888 4.05

569 3.76

Note: A blank entry means no common code in the corresponding le.

all relevant code that makes up a feature because the tests used in computing the execution slices may be incomplete. In fact, to nd the complete set of code for a given feature, one may have to use many more invoking and excluding tests??in contrast to our technique which only requires a few carefully selected tests. Nevertheless, experts who know SHARPE well have indicated that the small number of program components identi ed at various granularity levels using our technique provided some surprising insights that could be very useful in preventing certain types of unexpected maintenance errors (see Section 4.3). A particularly interesting reaction of these experts was that they were amazed at the results at the c-use and p-use levels, as they never had such detailed information before. We have not collected any real data on how information on such ne granularities can help programmers and 15

maintainers avoid making unexpected errors. However, our intuition leads us to believe that knowing the possible impact on other features while changing the de nitions of certain program variables and how they are used with one feature in mind is always a plus and a good practice in software maintenance.

5.2 Ease of use One advantage of using the execution slice-based technique is that programmers and maintainers only have to compute the execution slices of a few carefully selected tests (some invoking and some excluding) and ignore the others. This signi cantly reduces the amount of eort required to use this technique. Moreover, this process can be automated by using ATAC and Vue (see Section 3). Code so identi ed can be displayed in a user-friendly graphical interface (see Figures 4 and 5). All these simplify the transfer of this technique into the real world. Furthermore, ATAC diers from many other test coverage measurement tools in that it collects the complete trace information (in terms of how many times each block, decision, c-use, and p-use is executed) with respect to each test, whereas others remember only whether a line of code (or a decision) has been executed or not. With such detailed traceability, not only can programmers and maintainers eectively locate code that implements features, but testers can use the same set of information to compute code coverage to improve their con dence in quality. As a result, we can combine program understanding and code coverage measurement into an integrated process.

5.3 Need for more objective veri cation An important part of our case study is to verify whether the identi ed program components have the descriptions we assigned. The ideal approach is to ask those who are familiar with the application to highlight code segments they think are important to each feature. Such information then serves as the basis for the veri cation. An obvious diculty of this approach is that dierent segments might be highlighted by dierent people, which raises another series of problems about how to summarize such divergent information. As explained earlier, our goal is not to nd the complete set of code used to implement a feature. Instead, the objective is to provide a small number of carefully selected program components, which can be at various levels of granularity, as the starting point for programers and maintainers to get a quick understanding of how a feature is implemented in a system. To accomplish this, we only have to ask experts who are well-versed with the implementation details of the application to con rm whether the identi ed code is either unique to a feature or common to a group of features, as predicted. We 16

realize that a more objective way to verify the data collected from our experiment is desirable and more eort should be devoted to this task.

5.4 Variations of code identi ed As with any dynamic technique, the code identi ed in our study depends on which heuristic is applied, which itself is aected not only by how the features are implemented in the program but also how invoking and excluding tests are selected. We rst discuss how the feature implementation can decide which heuristic should be used. Let us assume that we want to nd code that is unique to a given feature. Suppose a feature (say F ) can only be exhibited if either feature F or feature F is also exhibited. This implies that all the invoking tests for F must also exhibit at least F or F and perhaps many other features. Under this condition, the best way to nd the code that is uniquely related to F is to use Tinvoking - Sexcluding . On the other hand, if F is not bundled with F or F in the way we just described (i.e., F can be exhibited by itself without other features being exhibited simultaneously), we probably should use a dierent heuristic, such as Sinvoking - Sexcluding . Similar arguments apply to other cases such as nding code that is common to a group of features. Next, we explain how invoking and excluding tests can aect the decision of which heuristic should be used. Suppose we are looking for code unique to a given feature (say F! ). If we can easily identify invoking tests that exhibit F! only and no other features (i.e., we can easily identify some focused invoking tests with respect to F! ), then it is better to use Sinvoking - Sexcluding . On the other hand, if we have trouble identifying tests that only exhibit F! (i.e., dicult to obtain focused invoking tests with respect to F! ), we probably should use Tinvoking - Sexcluding . Once the invoking tests are selected, we can follow the guidelines in Section 2.3 to select excluding tests. That is, to nd the code unique to a feature, we would like to have the invoking and the excluding tests as similar, in terms of the execution slice, as possible in order to lter out as much common code as possible. And, to nd the code common to a group of features, we would like to have the invoking tests for a feature be as dissimilar as possible from the invoking tests for other features in the group so that the maximum uncommon code can be excluded. In short, there is no universal heuristic good for all cases. Programmers and maintainers have to use their own judgment to determine which approach best meets their needs. Although dierent heuristics, invoking and excluding tests give dierent mappings between features and code, our experience suggests that, in general, they all provide good starting points to help programmers and maintainers 17

understand the system being analyzed. The only dierence is that the program components identi ed give dierent (1) recall de ned as the percentage ratio of the number of identi ed program components to the total number of program components that should be identi ed, and (2) false positive rates determined as the percentage ratio of the number of identi ed program components which do not have the descriptions we assigned to the total number of identi ed program components.

5.5 Extension to program debugging Two very costly activities in software maintenance are enhancements and the removing of development bugs [11, 23]. Although the emphasis of our study is to address the concern of identifying code unique to a particular feature, and common to a pair or group of features, from the point of view of program understanding, a similar approach could be used for program debugging. Excluding tests would be those that result in successful execution of the program, while the invoking tests would be those that cause the program to fail. The excluding tests are thus noted as \successful tests" and the invoking tests are noted as \failing tests." The code components that are executed by the failing tests but not by the successful tests are the most likely places for the location of the fault [2].

5.6 Enhancing results by including information from static slices Execution slices extract the code contained within the body of functions and methods. Some features may introduce new structs and classes, or may add a few attributes to existing structs and classes. To include such information, it would be necessary to examine some static slices in addition to the results obtained using execution slices. The bottom line is that this is a trade-o between whether we should follow the as-needed program understanding strategy (as discussed in Section 1) to locate, as quickly and as easily as possible, certain code segments as the starting points, or we should look for all the code related to a given feature. A study is underway to develop a hybrid method which can eectively combine the above static information with code identi ed by the execution slice-based technique.

6 Conclusions In our study, the program components identi ed as unique to a feature were, in general, unique as veri ed by experts who understood the application thoroughly. Also, our technique can aid in revealing code common to a group of features, which can assist developers in recovering design description from an implementation description [5]. Although a more rigorous and objective veri cation process still needs to be developed, our experience indicates that the execution slice-based technique is simple, 18

yet very informative. It could be a useful part of a programmer's/maintainer's tool-kit to provide a starting point for understanding a large complicated software system. The technique is valuable for nding code components unique to a functionality; however, it may be less eective in identifying all the components that are relevant or important from the point of view of a functionality. In fact, it is dangerous to assume that all the code for a feature can be identi ed by only a few invoking and excluding tests. One of the greatest advantages of the our methodology is that it is supported by tools, ATAC and Vue, which makes it a viable option for immediate application to large scale systems. Also for large and complicated systems, mere identi cation of unique (or shared) les, functions, blocks, and decisions may not be sucient. Identi cation of unique (or shared) def-uses can provide an in-depth understanding of the application and help save maintenance errors that could occur due to subtle interactions among the various features that can be easily overlooked. Our tools support the identi cation of unique (or shared) def-uses in addition to les, functions, blocks, and decisions. The fact that maintenance requires understanding of a very small percentage of a very large system is exempli ed by the Year 2000 problem. The \date sensitive" code has to be identi ed, understood and modi ed to ensure smooth transition from the twentieth to the twenty- rst century. The code unique to this feature may be only a few lines in a system consisting of millions of lines of code. The Year 2000 problem drives home the signi cance of highlighting unique components of code. Once again, the success of the execution slice-based technique to identify the \date sensitive" code, depends on how eectively the invoking and the excluding tests can be designed. Invoking tests execute segments pertaining to the date portion of the code, while excluding test cases execute the segments pertaining to the non-date portion of the code. To conclude, our experience suggests the execution sliced-based technique can be immediately applicable in industry. It can answer, although not perfectly, a very important and dicult problem in large legacy systems, that is, provide software programmers and maintainers with a good starting point for quick program understanding.

Acknowledgements The authors are extremely grateful to Dr. Robin Sahner for all her help with SHARPE. The authors would also like to thank all the Research Scientists on the Suds team in the Software Environment Research group at Bellcore for their eort in making ATAC and Vue available for this study. 19

References [1] H. Agrawal, J. L. Alberi, J. R. Horgan, J. J. Li, S. London, W. E. Wong, S. Ghosh, and N. Wilde, \Mining system tests to aid software maintenance," IEEE Computer, pp 64-73, July 1998. [2] H. Agrawal, J. R. Horgan, S. London, and W. E. Wong, \Fault localization using execution slices and data ow tests," in Proceedings of the Sixth IEEE International Symposium on Software Reliability Engineering, pp 143-151, Toulouse, France, October 1995. [3] T. Ball, \Software visualization in the large," IEEE Computer, pp 33-43, April 1996. [4] \Suds User's Manual," Bellcore, 1998. [5] S. C. Choi and W. Scacchi, \Extracting and restructuring the design of large systems," IEEE Software, 7(1):66-71, January 1990. [6] J. R. Horgan and S. A. London, \Data ow coverage and the C language," in Proceedings of the Fourth Symposium on Software Testing, Analysis, and Veri cation, pp 87-97, Victoria, British Columbia, Canada, October 1991. [7] J. R. Horgan and S. A. London, \ATAC: A data ow coverage testing tool for C," in Proceedings of Symposium on Assessment of Quality Software Development Tools, pp 2-10, New Orleans, LA, May 1992. [8] \IEEE Guide to Software Requirements Speci cations," ANSI/IEEE Std 830-1984, 1984. [9] B. Korel and J. W. Laski, \Dynamic program slicing," Information Processing Letters, 29(3):155163, 1988. [10] B. Korel and J. Rilling, \Dynamic program slicing in understanding of program execution," in Proceedings of the Fifth International Workshop on Program Comprehension, pp 80-89, Dearborn, MI, May, 1997. [11] B. P. Lientz and E. B. Swanson, \Software Maintenance Management," Addison-Wesley, New York, 1980. [12] D. Littman, J. Pinto, S. Letovsky, and E. Soloway, \Mental Models and Software Maintenance," in Empirical Studies of Programmers (E. Soloway and S. Iyengar, Eds.). Ablex Publishing Corp., Norwood, NJ, 1986. [13] A. D. Malony, D. H. Hammerslag, and D. J. Jablonowski, \Traceview: A trace visualization tool," IEEE Software, pp 19-28, September 1991. 20

[14] \Military Standard: Defense System Software Development, (DOD-STD-2167A)" Department of Defense, February 1988. [15] Meilir Page-Jones, \The Practical Guide to Structured Systems Design (Yourdon Press Computing Series," Prentice Hall, Englewood Clis, New Jersey, 1988. [16] R. S. Pressman, \Software Engineering: A Practitioner's Approach," McGraw-Hill, New York, 1997. [17] R. A. Sahner, K. S. Trivedi, and A. Pulia to, \Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package," Kluwer Academic Publishers, Boston, 1996. [18] E. Soloway, J. Pinto, S. Letovsky, D. Littman, and R. Lampert, \Designing documentation to compensate for delocalized plans," Communications of the ACM, 31(11):1259-1267, November 1988. [19] M. Weiser, \Program slicing," IEEE Trans. on Software Engineering, SE-10(4):352-357, July 1984. [20] N. Wilde and C. Casey, \Early eld experience with the software reconnaissance technique for program comprehension," in Proceedings of the International Conference on Software Maintenance, pp 312-318, Monterey, CA, November 1996. [21] N. Wilde, J. A. Gomez, T. Gust, and D. Strasburg, \Locating user functionality in old code," in Proceedings of the International Conference on Software Maintenance, pp 200-205, Orlando, FL, November 1992. [22] N. Wilde and M. S. Scully, \Software reconnaissance: Mapping program features to code," Software Maintenance: Research and Practice, 7(1):49-62, 1995. [23] E. Yourdon, \Introduction to the March 1994 issue," American Programmer, 7(3), March 1994.

21

Appendix A: Number of decisions, c-uses, and p-uses unique to F3;j ; 1 j 10 Note: A blank entry means no unique decision, c-use, or p-use in the corresponding le. Part I: Number of unique decisions analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F3 1

F3 2

F3 3

F3 4

16 12

32 35

16 12

32 35

;

9

3 4 18

;

19

;

9

2

2

1 80 5 22 26

80 5 22 32

75

3 4 16

F3 5

;

;

19

156

9

37 20 30 10

;

9

F3 7 ;

16 12

F3 8

F3 9

F3 10

16 12

32 35

32 35

2

2

80 5 22 35

80 5 22 28

;

9

1 9

75 1

F3 6

3 4 12

9

3 4 12

;

19

75

75

1

1

156

156

7

22 19 15 10

22 19 15 10

445 6.27

1

156

10

22 19 14 10

148 2.09

443 145 480 25 137 13 139 452 6.25 2.04 6.77 0.35 1.93 0.18 1.96 6.37

10 1 5

;

19

5

3

Part II: Number of unique c-uses analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c

F3 1 ;

F3 2

F3 3 ;

F3 4

55 45

108 81

55 45

108 81

8

;

39

8

F3 5

;

;

39

F3 6 ;

8

55 45

22

F3 7 ;

F3 8 ;

F3 9

F3 10

55 45

108 81

108 81

8

;

39

;

39

results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F3 1 ;

14 10 54

F3 2 ;

1 277 19 59 58

217

F3 3 ;

F3 4

14 10 53

277 19 59 71

;

F3 5

F3 6

24

14 10 50

;

217

1

640

14

58 39 16 22

1

640

14

116 43 21 22

41 4

;

F3 7

F3 8 ;

F3 9

F3 10

19

14 10 50

277 19 59 68

277 19 59 60

;

1

;

;

217

217

1

1

640

640

13

58 39 16 22

58 39 16 22

8

6

418 1417 417 1496 69 408 26 413 1426 1.91 6.46 1.90 6.82 0.31 1.86 0.12 1.88 6.50

1418 6.46

Part III: Number of unique p-uses analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F3 1

F3 2

F3 3

F3 4

35 21

63 60

35 21

63 60

;

10

1 14 19

;

30

2

2 148 3 37 22

148 3 37 22

1 14 19

F3 5

;

;

30

2

154

F3 6 ;

10

F3 7 ;

35 21

F3 8

F3 9

F3 10

35 21

63 60

63 60

2

2

148 3 37 28

148 3 37 24

;

10

2 9

154

1 14 19

9

1 14 19

;

30

;

30

154

154

1

1

412

412

6

28 29 19 17

28 29 19 17

871 263 910 31 258 15 261 876 5.76 1.74 6.02 0.20 1.71 0.10 1.73 5.79

872 5.77

1

412

9

28 29 18 17

264 1.75

;

10

1

412

8

51 30 35 17

16 1 5

23

3

4

Appendix B: Number of decisions, c-uses, and p-uses shared by each pair of features Note: The notation F =F indicates decisions, c-uses, or p-uses shared by features F and F . A blank entry means no shared decision, c-use, or p-use in the corresponding le. i

j

i

j

Part I: Number of shared decisions analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F1 =F2 z

F1 =F3

F1 =F4

F1 =F5

F2 =F3

F2 =F4

F2 =F5

F3 /F4

F3 =F5

F4 =F5

190

146

81

34

136

82

33

80

34

46

40

41

43

34

43

43

37

44

37

38

69

68

61

31

66

18 65

21 34

63

34

18 35

94 46

85 46

41

86

85 92

41

82

41

83

40

111 141

103 118

89 99

86

103 118

95 134

127

93 102

82

124

81

93 31 170 175 115 144 22 155 80

81

81

81

82

12

181

12

12

29

171 149 17 156 94

27

26

27

26

26

7

104 16 150 95

93 17 118 80

149 17 158 124

104 16 155 134

104 15 120 120

104 16 153 119

91 20 118 108

64 15 116 117

898 12.66

579 8.16

1380 19.46

1006 14.18

693 9.77

935 13.18

607 8.56

620 8.74

48 1388 19.57

1667 23.50

Part II: Number of shared c-uses analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c

F1 =F2 z

F1 =F3

F1 =F4

F1 =F5

F2 =F3

F2 =F4

F2 =F5

F3 /F4

F3 =F5

F4 =F5

415

326

179

68

318

180

68

179

68

95

62

62

65

44

68

68

50

68

50

51

63

63

61

12

60

10 61

16 12

59

13

10 13

117

114

99

114

24

99

99

17

results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F1 =F2 z

F1 =F3

F1 =F4

F1 =F5

F2 =F3

F2 =F4

F2 =F5

F3 /F4

F3 =F5

F4 =F5

231 324

212 256

189 235

187

211 268

199 313

274

185 237

178

275

244

258 87 630 656 595 538 42 299 137

241

244

239

248

46

744

46

46

153 143

637 525 38 295 156

130 143

74

133

135 354

74

129

74

128

72

297 37 282 168

341 39 213 130

535 38 300 217

297 37 288 234

344 37 216 188

297 37 288 219

328 42 217 178

182 37 209 195

1973 8.99

1167 5.32

3606 16.44

2145 9.78

1334 6.08

2036 9.28

1202 5.48

1156 5.27

191 3403 15.51

4739 21.60

Part III: Number of shared p-uses analyze.c multpath.c share.c reachgraph.c pfqn.c mpfqn.c util.c inspade.c indist.c inshare.c maketree.c results.c cg.c in qn pn.c inchain.c bind.c bitlib.c sor.c newcg.c phase.c newphase.c newlinear.c cexpo.c expo.c read1.c symbol.c ftree.c debug.c uniform.c mtta.c Total Percentage

F1 =F2 z

F1 =F3

F1 =F4

F1 =F5

F2 =F3

F2 =F4

F2 =F5

F3 /F4

F3 =F5

F4 =F5

272

206

105

34

196

106

33

104

34

47

54

55

58

45

60

60

51

61

51

52

53

54

48

19

48

3 51

6 21

47

21

3 22

78 79

71 79

37

84

83 202

37

74

37

72

36

137 117

122 94

108 92

64

121 96

114 120

97

109 89

61

106

172

199 59 401 390 282 261 28 202 108

174

172

171

179

23

481

23

23

59

406 262 19 202 129

57

46

57

46

46

8

160 19 196 133

192 22 163 112

261 19 205 162

160 19 203 178

199 18 166 156

160 19 200 156

185 28 163 142

111 18 162 154

1199 7.93

735 4.86

2163 14.30

1291 8.54

821 5.43

1230 8.13

757 5.01

719 4.75

112 2039 13.48

2780 18.38

25

Locating Program Features by using Execution ... - Semantic Scholar

Locating Program Features by using Execution ... - Semantic Scholar

Suggest Documents

Locating Program Features by using Execution ... - Semantic Scholar

Towards Locating Execution Omission Errors - Semantic Scholar

Optimal program execution reversal - Semantic Scholar

Visual Representations of Program Execution - Semantic Scholar

Locating Facial Features Using Genetic Algorithms - CiteSeerX

Cyclic Debugging Using Execution Replay - Semantic Scholar

Locating Salient Object Features - CiteSeerX

Language Identification by Using SIFT Features - Semantic Scholar

Commencing Execution - Semantic Scholar

Towards Locating Execution Omission Errors - CiteSeerX

Locating objects and - Semantic Scholar

Measuring morphological features using light ... - Semantic Scholar

Secret Program Execution in the Cloud Applying ... - Semantic Scholar

Secret Program Execution in the Cloud Applying ... - Semantic Scholar

Image features extraction using mathematical ... - Semantic Scholar

Zone Classification Using Texture Features - Semantic Scholar

Face Recognition using Texture Features ... - Semantic Scholar

COMBINATION OF SPEECH FEATURES USING ... - Semantic Scholar

Locating Facial Features in Image Sequences using ... - CiteSeerX

Locating Facial Features Using Threshold Images - Signal Processing ...

Locating Facial Features in Image Sequences using ... - CiteSeerX

Using Queuing Approach for Locating the Order ... - Semantic Scholar

Visualizing Program Execution - CiteSeerX

Locating and sizing bank-branches by opening ... - Semantic Scholar