Identifying Comprehension Bottlenecks Using Program Slicing and ...

3 downloads 6863 Views 157KB Size Report
Program Slicing and Cognitive Complexity Metrics. Juergen ... dynamic program slicing to provide programmers with ... Poor design, unstructured programming.
Identifying Comprehension Bottlenecks Using Program Slicing and Cognitive Complexity Metrics Juergen Rilling and Tuomas Klemola Department of Computer Science, Concordia University, Canada [email protected], [email protected]

Abstract Achieving and maintaining high software quality is most dependent on how easily the software engineer least familiar with the system can understand the system’s code. Understanding attributes of cognitive processes can lead to new software metrics that allow the prediction of human performance in software development and for assessing and improving the understandability of text and code. In this research we present novel metrics based on current understanding of short-term memory performance, to predict the location of high frequencies of errors and to evaluate the quality of a software system. We further enhance these metrics by applying static and dynamic program slicing to provide programmers with additional guidance during software inspection and maintenance efforts. Keywords: complexity measures, program comprehension, program slicing

1. Introduction As software ages, the task of maintaining and comprehending it becomes increasingly complex and expensive. Poor design, unstructured programming methods, and crisis-driven maintenance can contribute to poor code quality, which in turn affects program comprehension. The essence in comprehending existing systems is in identifying program artifacts and understanding their relationships through reverse engineering; this process is essentially pattern matching at various abstraction levels, via mental pattern recognition by the software engineer and the aggregation of these artifacts to form more abstract system representations [7,32,38,39]. In this context, the comprehension of source code plays a prominent role in ensuring quality during software maintenance and evolution. The effectiveness in identifying and recognizing these source code artifacts

depends on a programmer’s past experience with the system or similar systems and the size of the system to be comprehended. Filtering and interpreting enormous quantities of information is a problem for humans. From a mass of data they need to extract knowledge, which will allow them to make informed decisions. Software defects are often the result of the incomplete or incorrect comprehension of a program segment [26]. Therefore, the location of a program segment that presents a comprehension challenge to the software developer can be the basis for isolating code at a greater risk of defects. There are varieties of support mechanisms for aiding program comprehension, which can be grouped into three categories: unaided browsing, leveraging corporate knowledge and experience, and computer-aided techniques like reverse engineering. In this paper we focus on the latter and how reverse engineering in connection with source code analysis techniques can be applied to drive software inspection based on cognitive complexity metrics. The main motivation of this research1 is to present novel cognitive complexity metrics to guide programmers in localizing system parts with high cognitive complexity. Reducing the cognitive complexity of these system parts with unnecessarily high complexity will not only improve future comprehension of the system but also benefit its overall system value for an organization. The added system value cannot only be found in its improved maintainability, but also in its faster time to market and a reduction in the system defects [4]. We present how the cognitive complexity metrics can be applied on a system level and can be further refined through program slicing to guide programmers during software inspection. The remainder of this paper will be organized as follows: Section 2 introduces additional background information with respect to cognitive aspects in program comprehension and introduces program slicing. Section 3 1

This research is supported by NSERC grant 227680.

Proceedings of the 11 th IEEE International Workshop on Program Comprehension (IWPC’03) 1092-8138/03 $17.00 © 2003 IEEE

defines our cognitive complexity driven approach to software inspection, using program slicing, followed by Section 4 that presents applications of cognitive complexity driven software inspection. Section 5 concludes the paper and describes some future challenges.

2. Background Here we will review background from cognitive psychology, complexity metrics, program slicing, program slice metrics and software inspection.

2.1. Cognitive psychology Numerous theories have been formulated and empirical studies conducted to explain and document the problem solving behavior of software engineers engaged in program comprehension [32,38,39]. It has become an axiom in the psychology of programming that “individuals differ” and results will differ among individual users and program domains. Traditionally the following cognitive models are used to group approaches of how programmers comprehend software; the bottomup, top-down and opportunistic cognitive models. The bottom-up approach reconstructs a high level of abstraction that can be derived through reverse engineering of source code. The top-down approach is applying a goal-oriented approach by utilizing domain/application specific knowledge that is used to identify parts of the program that are relevant to achieve the goal in leading to the identification of the relevant source code artifacts. Both top-down and bottom-up comprehension models have been used in an attempt to define how a software engineer understands software systems. However, studies have shown that in reality software engineers switch between these different models depending on the problem-solving task [32,38]. This opportunistic approach can be described as exploiting both top-down and bottom-up cues as they become available. This approach utilizes previous knowledge and experience of the programmer, as well as the association of visualized program parts with program goals. Both top-down and bottom-up comprehension approaches involve the necessary comprehension of lines of code with their identifiers and other language elements. The notion of cognitive complexity can be subjective. Our discussions require an objective view, therefore we discuss some relevant aspects of cognitive psychology so that subjectivity can be objectified. All information processed for comprehension must at some time occupy short-term memory (STM). STM limitations vary depending on the individual and on what kind of information is being retained [23]. For the purposes of understanding text, STM has been measured

at 4 concepts [8]. This would suggest that any statement that is using more than 4 concepts to make a point unfamiliar to the reader might not be immediately understood. STM has a duration of 20-30 seconds [35]. When reading new material one must retain the known concepts in STM, using them to define new concepts. When the density of such terms is high, the risk of comprehension error increases [24]. Tracing is the process of searching through text to find other references to the same term or symbol. The significance of a concept is derived by tracing the text for instances of its usage. Tracing has been observed as a fundamental activity in program comprehension [5]. A person’s experience and general knowledge can have a significant impact on their performance, especially when their background is relevant to the task [13]. The measurement of effort and error rate, products of human activity, is subject to the differences in human performance between individuals and from the same individual at different times.

2.2. Complexity metrics Fenton [16] describes software engineering as a “collection of techniques that apply an engineering approach to the construction and support of software products.” Software metrics were introduced to support the most critical issues in software development and provide support for planning, predicting, monitoring, controlling, and evaluating the quality of both software products and processes [7,14,15]. Many concepts have been presented to evaluate the process of software development and the quality of software design. Software measures enable the early identification of maintenance and reuse issues in existing systems. It has been shown that software metrics can provide software engineers and maintainers with guidance in analyzing the quality of their code/design and its possible maintainability and comprehension [1,2,6,7,30]. Some traditional complexity metrics can be supported by the fact that they are clearly related to cognitive limitations [26]. These include LOC, fan-out or external coupling [16], and decision points such as McCabe's Cyclomatic Complexity [33]. A well-known effort to define metrics that correspond to cognitive effort and complexity is the work of Halstead [18]. Halstead proposed a series of metrics based on his studies of human performance in programming tasks. In one experiment his subjects were given enough time to completely familiarize themselves with a short program and were then asked to perform tasks that demonstrate comprehension. He then validated metrics based on their performance.

Proceedings of the 11 th IEEE International Workshop on Program Comprehension (IWPC’03) 1092-8138/03 $17.00 © 2003 IEEE

Halstead’s experimental design represents a situation different from the usual maintenance scenario, where a programmer has a limited amount of time to perform a task that requires understanding of the program. Halstead also did not use LOC as a measure of length and chose instead the total number of operators and operands. In practice, programmers are limited in the time available to understand a program segment. An empirical study comparing measures of the solutions to examination questions with average scores demonstrated that neither Halstead’s metrics nor LOC consistently predict error rate and that identifier density is in fact a better predictor of comprehension error when time is constrained for the comprehension of small program segments [25]. This corresponds with text comprehension performance where a higher density of concepts increases the rate of comprehension error [24]. Identifiers consist of all programmer-defined names such as variable, procedure and class names, while measuring the density in a repeatable way depends on consistent style. When a program segment has a high number for LOC, it will have enough features such as identifiers that it becomes difficult to understand correctly when time is limited. When a program segment has a high fan-out or external coupling, the time spent tracing references increases and comprehension is compromised. Finally, when there are many possible paths to be taken within a module as when McCabe’s Cyclomatic Complexity would get a high value, again tracing becomes demanding and it can be difficult to correctly interpret the code. As programmers gain experience, they will avoid repeating mistakes that the complexity of their particular domain will precipitate in a novice. The task of predicting the error an experienced problem solver might make, becomes dependent on their past experience and their current project [3]. This consequence of individual performance characteristics poses a challenge to the design and use of complexity metrics [26]. When a programmer encounters an unfamiliar code segment, the language component of the code is not new. The use of identifiers such as variables and method labels must be carefully observed to arrive at a correct understanding. Identifier density can be tuned to account for familiarity, for instance by deciding on whether to count class library related names based on the familiarity of the programmers working on the code. An indicator of how much tracing between modules is involved in understanding a program is the external coupling, or references to external blocks of code. This measure was found to be an indicator of defect density in a study of a Java implementation [16].

2.3. Program slicing Typically, a program performs a large set of functions/outputs. Rather than trying to comprehend all of a system's functionalities, programmers tend to focus on selected functions (outputs) and those parts of a program that are directly related to that function. An important step toward efficient software comprehension is to locate the code relevant to a particular feature or function. Two major approaches can be distinguished. A systematic approach, requiring a complete understanding of the program behavior and its source code, or an “as needed” approach that requires only a partial understanding of the program to quickly locate related code segments. Program slicing, a program reduction technique, identifies code that is related to a given function or variable of interest by identifying only those parts of the original program that are relevant to the computation of a particular function/output. Static slicing [40] derives its information through the analysis of the source code. Applications of program slicing can be found in software testing, debugging and maintenance [20,28,36] by reducing the amount of data that has to be analyzed in order to comprehend a program or parts of its functionality. Program slicing provides support during program comprehension by capturing the computation of a chosen set of variables/functions at some point (static slicing) in the original program or at a particular execution position (dynamic slicing), which leads to a simplified version of the original program by maintaining a projection of its semantics. A dynamic slice, as originally presented in [27], is an executable part of the program whose behavior is identical for the same program input to that of the original program with respect to a variable of interest at some execution position. A slicing criterion n of program P executed on input x is a tuple C=(x,yq) where yq is a variable at execution position q. An executable dynamic slice of program P on slicing criterion C is any syntactically correct and executable program P’ that is obtained from P by deleting zero or more statements, and when executed on program input x produces an execution trace T’x for which there exists the corresponding execution position q’ such that the value of yq in Tx equals the value of yq’ in T’x. A dynamic slice P’ preserves the value of y for a given program input x. Chunking is another abstraction mechanism that can be used in bottom-up approaches to allow code chunks to be associated with descriptions that are more abstract. Code chunks are grouped together to form larger chunks, until the entire program is understood. In this way a hierarchical internal semantic representation of the program is built from the bottom-up. As a part of this

Proceedings of the 11 th IEEE International Workshop on Program Comprehension (IWPC’03) 1092-8138/03 $17.00 © 2003 IEEE

research we focus on how assessing the complexity of program chunks can be a criterion for slicing. One of the key activities in inspecting software for its quality is the task of rapidly building a conceptual model about the pieces of software being maintained. Usually it is necessary that such a conceptual model exist before the maintenance task can be performed [32,38]. Therefore, it suffices to obtain only some partial (in most cases local) understanding that is sufficient to build a mental model one can trust when performing a particular change, or a model for locating places where such a change should be applied. Program slicing allows software engineers to focus on aspects of restructuring and maintenance. Tool support can be provided based on a model that preserves data flow dependence and control flow dependence, to automate the repetitive, error-prone, and cognitively demanding task of identifying good candidates for re-engineering. Thus, the

conceptual model to be built needs to be rather detailed at critical spots, although it might not address all portions of the system that might be affected by any far reaching side effects of the intended modification.

2.4. Slicing and software metrics Most of the current research effort in the area of software measurements focuses on measuring software design and calculating class or functional cohesion [6,9,10,11,12,17,21,22,29,30,31,37]. Weiser presented in [40] several slicing based metrics, such as coverage, clustering, parallelism, and tightness. These slicing based metrics are mainly concerned with the cohesiveness of the program with respect to the slice. Coverage measurement can be used to differentiate between different types of cohesion. Ott and Thuss introduced a metric slice called data slice [34]. Their study showed that an association between cohesion and the metric data slice can be made and that the metric data slice could be used as a measure of cohesion. Harman et al. introduced in [19,20] some metrics for evaluating the complexity of an expression and applied it on some standard metrics introduced by Ott and her colleagues. Harman et al. also introduced an expression metric to calculate the significance of the intersection of program slices. They also raised issues concerning the computability of a cohesion metric based on program slicing. To our knowledge, the only research conducted in the area of program slicing based coupling measures is by Harman et al [20]. In this research, Harman and colleagues are applying static program slicing as codelevel based metrics for assessing coupling, and they outline the use of the metrics as components of prediction systems. Their information-flow based coupling metrics are based on utilizing both static backward and forward

slicing to derive their metrics to measure function coupling.

2.5. Software inspection Software inspection has established itself as an effective technique for finding defects [2,7]. However, the vast majority of reports in the literature relate to inspections carried out with procedural languages, the predominant paradigm used when inspections were originally proposed in the 1970's. In the last decade or more, the object-oriented (OO) paradigm has been widely adopted, made popular by C++, and now even more so by Java. It is generally thought that OO programming best practice involves creating methods that are relatively small. Together with inheritance, polymorphism and dynamic binding, this can lead to the code required to complete a single task being dispersed widely around a program. As a result, program understanding requires tracing chains of method invocations through many classes, traversing both up and down the inheritance hierarchy. Systematically inspecting (and understanding) all code and its dependencies is a possible solution. This would be expensive and time consuming. Due to limitations on the amount of information that can be usefully retained at one time in short-term memory, performing a systematic inspection is typically unrealistic. When inspecting OO code, an as-needed reading approach has to be adopted to deal with the possibly large amounts of de-localized information. However, the danger is that an as-needed approach will force inspectors to make unverified assumptions. A related problem is how to select the code to be inspected. Due to the large number of dependencies within OO code, it becomes very difficult to isolate a chunk of code of an appropriate size. Selecting by size alone is inappropriate due the many links and dependencies one class may have. The aim must be to limit these dependencies, which can be effectively achieved with program slicing. Applications of program slicing can traditionally be found in software testing, debugging and maintenance. In the context of this research, we focus on program slicing and its application in cognitive complexity driven code inspection. In the next section, we introduce novel measurements that can be applied in connection with program slicing to compute the cognitive complexity of a slice, and to focus on the program subset that is more likely to represent a comprehension challenge than an arbitrary program subset.

Proceedings of the 11 th IEEE International Workshop on Program Comprehension (IWPC’03) 1092-8138/03 $17.00 © 2003 IEEE

3. Cognitive complexity driven software inspection The roots of cognitive complexity can be found in cognitive psychology. For the purpose of cognitive complexity metrics, we are interested in unfamiliar term usage, as this will tax STM and lead to error. Measuring the density of unfamiliar term usage predicts comprehension difficulty in both text [24] and code [25].

controlled or influenced to reduce risks to the success of a project [15]. A causal model helps managers understand how to evaluate and react to project risks. From the Bayesian net in Figure 1, a manager can deduce that coding errors are related to difficulties in comprehension that arise out of domain complexity, a complex coding style, or time constraints and that identifier density is an indicator of code complexity.

3.1. Measuring cognitive complexity In computer programs, identifiers represent defined concepts. In program code, some terms belong to the programming language and others are terms defined within the program. During typical program comprehension activities the programmer is familiar with the language of implementation but not with the specific application of the identifiers defined within the program. Variables, classes, methods and other programmerdefined labels must be traced to identify their definition and application. Identifier density can be defined as all occurrences of programmer defined labels divided by lines of code [26]. More formally, identifier density can be defined as:

Complex Coding Style

Lacking Specific Knowledge or Experience

Code Complexity

Time Constraints

Identifier Density

Error in Code Comprehension

k

Identifier Density ( ID )

Domain Complexity

¦ Si ( I )

i 1

k

Where, Si(I) represents the number of identifier at statement i in the source code and k corresponds to the total number of lines of code. Note that a consistent style of code writing improves the accuracy of this measure. The granularity of the code segment for the computation of the metric is relevant. A small block of highly dense code may be buried in a large block of simple code resulting in a low value for the large block. The small dense block will be more difficult to correctly interpret than the overall measure would suggest. Consequently, the best use of the metric is to locate high concentrations of identifiers, inspect that code and the code that depends on it. Size and static complexity metrics are poor predictors of defect density [15]. One would expect that the total number of defects would in general be related to comprehension difficulty, and that defect density would correspond with comprehension difficulty. Measures of cognitive complexity, when sufficiently reliable, can aide in making best use of inspection and testing time by focussing on high risk parts of a system [16, 26]. A Bayesian Belief Network represents a causal model that can be used to identify factors that can be

Coding Error in Module

Figure 1. Bayesian net for Identifier Density. Once we have assessed a body of code for its cognitive complexity, we can use the measures in different ways. Now we can decide if the cognitive complexity indicates a high risk of comprehension error. Code that has been divided into many small parts will have a higher concentration of method invocations with parameters, leading to a higher concentration of identifiers and more tracing activity during the comprehension process. On the other hand, code that is complex because the computation is complex would be easier to understand when divided into smaller, more recognizable blocks. For example, when a computation involves many variables, a line of code can easily involve more than 3 or 4 identifiers. Breaking up such lines so that they are composed of no more than 3 or 4 identifiers makes them more familiar and easier to understand.

Proceedings of the 11 th IEEE International Workshop on Program Comprehension (IWPC’03) 1092-8138/03 $17.00 © 2003 IEEE

It is not always feasible to improve on the cognitive complexity of a difficult code segment, as some domains are inherently complex. In such a situation we should allocate enough time for individuals to understand the material without error. In such situations, programmer experience plays a larger role and the programmer’s familiarity with the application domain and type of implementation should be considered when setting time constraints. It may be appropriate in some application domains to use more formal methods in developing segments of software systems with inherently high Identifier Density.

3.2. Cognitive complexity measurements The term cognitive complexity metric is associated with a software metric that has a basis in cognitive psychology. Identifier density [25] and external coupling [16] have supporting cognitive models and empirical studies. In a program segment, a rise in external coupling will result in a rise in identifier density. A rise in identifier density may be caused by: increased external coupling, complex condition statements, complex expressions, multi-dimensional arrays and so on.

Class Bank Bank::Deposit() { … 1. cin>>amt>>acct; 2. lastdep=amt; 3. while(amt>0) { 4. lastdepact.Dep(amt,acct); 5. act.PrintBal(acct,lastdep); 6. transactions++; 7. cin>>amt>>acct; } 8. cout>acct; } 9.coutamt>>acct; 11. last_deposit=amt; 3.while(amt>0) 12. return(last_deposit); { } 4. lastdep=act.Dep(amt,acct); 7. cin>>amt>>acct; } 8.cout