David F. Bacon and Peter F. Sweeney. Fast static ... M. Lawford and D.L. Parnas, editors, WISE'01: Proceedings of the 1st Workshop on Inspection in Software.
PARTITIONING O BJECT-O RIENTED S OURCE C ODE FOR
I NSPECTIONS
S UBMITTED TO THE D EPARTMENT
OF
C OMPUTER AND I NFORMATION S CIENCES ,
T HE U NIVERSITY
OF
S TRATHCLYDE ,
G LASGOW. F OR
THE DEGREE OF DOCTOR OF PHILOSOPHY.
By Neil Walkinshaw June 2006
’The copyright of this thesis belongs to the author under the terms of the United Kingdom Copyright Acts as qualified by University of Strathclyde Regulation 3.49. Due acknowledgement must always be made of the use of any material contained in, or derived from, this thesis.’
c
copyright 2006
ii
Abstract Software inspections are used for the detection of defects in various types of documents produced during the software development life-cycle. An inspection consists of the manual scrutiny of a software document, with the aim of detecting defects. Inspections complement existing automated fault detection techniques such as testing because they find different types of faults and the inspector has the capacity to provide qualitative feedback about the product, which can be used to improve the software’s maintainability. It is particularly important that source code is subject to inspection, because it is the most literal representation of the system that will ultimately be deployed. Object-oriented source code is however particularly difficult to inspect because the object-oriented decomposition approach encourages the distribution of functionally related code elements across the system. Paradigm features such as polymorphism and dynamic binding make it virtually impossible to predict how these dispersed source code elements will interact at runtime. This thesis presents an analysis technique for object oriented source code that is designed to help the inspector focus on source code elements that are relevant to a particular element of functionality, which is identified by ‘landmark methods’ (methods that must be invoked when the feature under inspection is executed). The technique is based on a combination of call graph analysis and code slicing. A tool has been developed to demonstrate that the technique is feasible. It also forms the basis for a thorough evaluation on open-source software projects. The evaluation establishes that the proposed technique is effective at removing irrelevant source code, thus helping the inspector to focus on the feature under inspection. The precision is however (at least to an extent) dependant on the topology of the call graph. One potential approach that could be used to remedy this is to introduce a new type of input that can be used to mark out methods that are never invoked when a given feature is executed.
iii
Acknowledgements I owe a debt of gratitude to my supervisors Dr. Marc Roper and Dr. Murray Wood for their patient support and advice. They have gone above and beyond their call of duty, providing me with constructive advice on my work as well as personal encouragement. Their support has been a real privilege and I am truly grateful. I would like to thank the Department of Computer and Information Sciences at Strathclyde for generously sponsoring my research in the second and third years of my degree. I would also like to thank the secretarial and support staff for making my stay as trouble-free as possible. Professor Roel Wuyts of the Universite Libre de Bruxelles suggested that I use abstract and interface methods as a basis for selecting landmark methods. This advice was duly taken and forms an important part of the final evaluation. Professor Mark Harman of King’s College introduced me to the slicing community through the VASTT and ASTRENET networks and generously provided me with accommodation on two trips to London. Navindra Umanee from the SABLE group at McGill University provided some important advice on using the SOOT framework for the computation of control dependencies. This saved many painful hours of debugging and implementation. I have had a lot of valuable support and advice from the other members of EFoCS. Doug, Mike and Kostas have all been excellent company in the lab as well as on conference trips. Doug’s insights into JHotDraw were vital during the first evaluation study. There are many people who have not been directly involved with my thesis work, but have provided valuable support all the same. Thanks Ric for being an excellent flat mate and especially for helping me through the ‘dark zone’ of write up. Thanks also to the rest of the PhD students and staff in the department who have provided many an entertaining coffee break. My parents have provided me with a lot of encouragement, without which I doubt I would even have started, let alone finished, this thesis. Finally I am most indebted to my girlfriend Emma, whose astonishing levels of tolerance and patience saw this thesis through to completion.
iv
Publications Technical Reports • Neil Walkinshaw, The Java System Dependence Graph, Technical Report (EFoCS46-2003), Department of Computer and Information Sciences, The University of Strathclyde, Glasgow, UK, 2003. • Neil Walkinshaw, Statically Partitioning Object-Oriented Systems for Use-case Driven Code Inspections, Technical Report (EFoCS-55-2004), Department of Computer and Information Sciences, The University of Strathclyde, Glasgow, UK, 2004. Publications • Neil Walkinshaw, Marc Roper, Murray Wood, The Java System Dependence Graph, Proceedings of the 3rd International IEEE Workshop on Source Code Analysis and Manipulation (SCAM’03), Amsterdam, Netherlands, 2003. • Neil Walkinshaw, The Application of Slicing to Software Inspections, Poster Proceedings of the 5th Postgraduate Research Conference in Electronics, Photonics, Communications & Networks, and Computing Science (PREP’04), Hatfield, UK, 2004. • Neil Walkinshaw, Marc Roper, Murray Wood, Understanding Object-Oriented Source Code from the Behavioural Perspective, Proceedings of the 13th International IEEE Workshop on Program Comprehension (IWPC’05), St. Louis, USA, 2005. • Neil Walkinshaw, Inspecting Object-Oriented Code from the Behavioural Perspective, ECOOP Doctoral Symposium, Glasgow, UK, 2005 • Neil Walkinshaw, Marc Roper, Murray Wood, Extracting User-Level Functions from Object-Oriented Source Code, Proceedings of the 6th International Workshop on Object-Oriented Re-engineering (WOOR’05), Glasgow, UK, 2005.
v
Contents Abstract
iii
Acknowledgements
iv
Publications
v
Technical Reports
v
Publications
v
Chapter 1. Introduction
1
1.1. Overview
1
1.2. Contributions
2
1.3. Thesis Structure
3
Chapter 2. Inspecting Object-Oriented Source Code
5
2.1. Introduction
5
2.2. The Software Inspection Process
6
2.3. Code Reading
7
2.4.
Object-Oriented Source Code Inspection and its Problems
11
2.5. Tool Support for Inspections
18
2.6. The Case for Slicing
21
2.7. Conclusions
23
Chapter 3. Code Extraction Techniques
25
3.1. Introduction
25
3.2. Definitions
26
3.3. Static Slicing
27
3.4. Dynamic Slicing
48
3.5. Bridging the Gap between Static and Dynamic Slicing
50
3.6. Feature Identification and Analysis
57
3.7. Conclusions
59
Chapter 4. Using Landmark Methods to Partition Object-Oriented Source Code vi
61
4.1. Introduction
61
4.2. Truncating the Call Graph with Landmark Methods
62
4.3. Implementation
71
4.4. Using the Call Graph to Guide the Inspector: A Case Study
77
4.5. Conclusions
79
Chapter 5. Evaluation
81
5.1. Introduction
81
5.2. Study of Landmark Methods
81
5.3. Selection of Landmark Methods
92
5.4. Using the Reduced Call Graph to Read and Understand the Source Code
114
5.5. Tool Evaluation
116
5.6. Conclusions
118
Chapter 6. Conclusions and Future Work
121
6.1. Conclusions
121
6.2. Summary of Contributions
123
6.3. Future Work
125
6.4. Concluding Remarks
127
Appendix A. The Java System Dependence Graph
128
A.1. Introduction
128
A.2. The Java System Dependence Graph
128
A.3. Representing the Graph as a Layered Architecture
130
A.4. Constructing the Graph
136
A.5. Example
140
Appendix B. The Hotel System
143
B.1. Hotel System Class Diagram
143
B.2. go() method
144
Appendix C. NanoXML
145
C.1. Code to Print XML Data as it is Parsed
146
C.2. Code to Construct Parse Tree Before Printing it
147
Appendix D.
Sample Document of Reduced Call Graph
Appendix. Bibliography
148 160
vii
CHAPTER 1
Introduction 1.1. Overview It is well publicised that, for most software projects, maintenance is responsible for a significant proportion of the cost. Koskinen [KLT03] illustrates the scale of the problem in an overview of maintenance studies, most of which conclude that 50-75% of the costs are spent on maintenance. A large part of this cost is attributed to the detection and correction of faults, many of which are inevitably introduced by the addition of new features. Software inspection [Fag76] is a technique that was developed by Fagan in 1976. It involves the manual scrutiny of documents that are a product of the software development process with the aim of detecting faults. As a manual task it also benefits from the inspector’s capacity to reason about the document from a qualitative perspective. Hence inspections are also advocated as a means for ensuring that documents are readable, reducing the potential for the introduction of new faults and hence also reducing the maintenance costs [SV01]. Arguably the most important document type for any software inspection is source code; the most literal representation of the system that will ultimately be deployed. Understanding and reading source code is challenging because it demands a substantial cognitive overhead [CJHS95]. This is particularly the case with object-oriented systems, as they introduce two problems that are not as prevalent in procedural programs: delocalisation [SPL+ 88, DRW03] (the distribution of functionally related program elements across the system) and unpredictability [JE94] (due to the fact that interactions are determined between program elements at run time and therefore make it difficult to predict system behaviour from a static view of the source code). This thesis contends that tool support is needed to manage the high level of complexity that is introduced by object-oriented software systems. One of the most important challenges for any source code inspection tool lies in isolating or extracting the source code that is relevant to the artifact under inspection. With object-oriented source code the challenge is amplified by delocalisation and the unpredictability of how elements will interact at runtime. 1
Various automated approaches have been developed that attempt to automatically locate the source code corresponding to a particular feature or use-case. Many require dynamic information (information that is captured as the system is executed). This is impractical for software inspections because the system may not necessarily be executable (e.g. the system may be a framework) and it is difficult to determine a complete set of test cases that are representative of the feature to be inspected. Purely static approaches on the other hand are too conservative; the subset of possible method invocations returned is too large to be of use to the inspector. The approach adopted in this thesis is to specify a set of key methods that must be executed for a given feature (referred to as “landmark methods”). This small amount of lightweight dynamic information allows for an analysis approach that, depending on the number of methods specified by the inspector, provides a useful compromise between a sound1 but conservative static analysis and a precise but unsound dynamic analysis. A two-stage approach has been developed that takes as input the call graph of the system to be inspected and a set of landmark methods specified by the inspector. In the first stage direct paths on the call graph are identified between landmark methods. A set of direct paths between two nodes takes the form of a hammock graph [Kas75, KE00] (which is a single-entry single-exit directed graph). In the second stage the approach uses code slicing [Wei81, Wei84] (a code analysis technique that highlights the semantic relationships between statements with respect to a single point in the code) to identify further call sites that may influence the execution of methods that have been identified in the first stage. To establish whether or not the approach is useful a tool has been developed as a proof of concept. It uses the Soot framework [VCG+ 99] to produce call graphs and slices from Java byte code. The JUNG framework [OFWB03] is used to provide visualisations of the call graph as landmark methods are selected. The tool is used to evaluate the approach on four software systems, three of which are under active development by the open source community. The evaluation is based on four factors: The feasibility of using landmark methods to extract a useful set of edges from the call graph, the practicality of selecting suitable landmark method combinations by hand, the understandability of the resulting code base and the performance of the prototype tool. 1.2. Contributions The work presented in this thesis makes the following contributions: 1In this context, soundness guarantees that analysis results are accurate, regardless of the input and environment in
which a program is executed. 2
(1) An investigation into the issues that hamper the inspection of object-oriented code. (2) The Java System Dependence Graph (JSysDG), a graph that represents various dependencies and relationships between the components of a Java program. (3) A flexible source code analysis technique that is designed to extract code that is relevant to a software inspection. (4) An implementation that is a proof-of-concept for the technique and serves as a basis for the evaluation. (5) A novel evaluation approach that uses precision and recall to evaluate the technique on several open-source software systems. 1.3. Thesis Structure Chapter two: Inspecting Object-Oriented Source Code. This chapter introduces source code inspections and the rôle they play in software inspections as a whole. It provides an overview of the traditional reading techniques that have been used for procedural software and discusses the reasons that render conventional reading techniques less effective when applied to the now dominant object-oriented paradigm. It provides an overview of techniques that have been developed to address the problem of object-oriented source code comprehension and concludes that, although useful when applied specifically to the tasks for which they were developed, they are not suitable for the inspection of object-oriented source code. It asserts that a tool supported code inspection technique is required to address the two key problems of delocalisation and run-time unpredictability. Finally it concludes that code slicing is by its nature a suitable analysis technique that has the potential to address these problems. Chapter three: Code Extraction Techniques. This chapter provides an overview of code slicing and extraction techniques. It starts by providing an overview of the theory that underlies slicing and presents the two most popular slicing approaches (data flow slicing and dependence graph slicing). Having done so, it presents an overview of the various slicing paradigms, based mainly on the relationship between the input (slicing criteria) and the output and whether they are static, dynamic or are based on some hybrid of the two. Chapter four: Using Landmark Methods to Partition Object-Oriented Source Code. This chapter presents the “landmark method” technique, which is the main contribution of this thesis. To demonstrate that it is feasible, a tool has been developed that implements the suggested analysis. It provides details of how the tool was implemented, illustrating its use on a small case study. 3
Chapter five: Evaluation. This chapter evaluates the approach that is presented in chapter four. It consists of two studies that analyse four software systems. The results are measured in terms of their precision and recall and the reduction in the number of edges that belong to the call graph. The results from these studies provide a number of insights into the technique and expose areas where the technique could be improved. A sample of the tool’s output is provided, which is used as a basis for discussing the readability of the extracted code. Finally the tool is evaluated against a set of established comprehension tool criteria. Chapter six: Conclusions and Future Work. This chapter presents a brief summary of the work. Numerous avenues for future research are discussed with respect to the results from the evaluation.
4
CHAPTER 2
Inspecting Object-Oriented Source Code 2.1. Introduction
Software inspection has existed for almost thirty years [Fag76]. Inspection is an approach to manually examining a software product in detail, by following a prescribed process. The ultimate aim is to ensure that it contains no faults and is fit for deployment. It is perceived as a complement to testing, because it has been shown to detect different kinds of faults. Because it is not a mechanical process (it cannot be entirely automated and requires human involvement), inspection also provides qualitative feedback about the software, thus ensuring that the software documents are readable and easier to maintain for the future. A reading technique is a systematic means of reading software documents (design, specification and source code) and is integral to an inspection. A successful technique will guide the inspector towards areas in the document that are most likely to contain a fault. Traditional reading techniques were developed for procedural programs, which are commonly constructed via functional decomposition. When a system is decomposed into objects, as is the case in the now dominant object-oriented paradigm, source code that is related to a single user-level function can be distributed across multiple objects (this is referred to as delocalisation). Object-oriented systems are also dynamic (i.e. polymorphic object types are not known until runtime), making it very difficult to predict how the system will execute from a static perspective. New reading techniques must take this distribution of functionality and unpredictability introduced by dynamic binding into account. This chapter introduces software inspections and some of the most popular reading techniques. It provides an overview of some of the problems that were introduced with the advent of object-oriented programming and reviews some reading techniques that have been specifically designed to address them. It concludes that, even with the development of specialised reading techniques, object-oriented source code is essentially difficult to read and understand manually. Tool support is necessary if inspections are to remain a viable option for the software industry. 5
Operation
Objectives
• • •
Materials to be inspected must meet inspection entry criteria Arrange suitable meeting place and time
O VERVIEW
• •
Group education of participants in what is to be inspected Assign inspection roles to participants
P REPARATION
•
Participants learn the material and prepare to fulfill their assigned roles
•
Find defects (Solution hunting and discussion of design alternatives is discour-
P LANNING
I NSPECTION R EWORK F OLLOW- UP
Arrange the availability of the right participants
aged)
•
The author reworks all defects
•
Verification by the inspection moderator or the entire team to assure that all fixes are effective and no secondary defects have been introduced
F IGURE 2.2.1. The software inspection process [Fag86]
2.2. The Software Inspection Process
Software inspections were devised by Fagan in 1976 [Fag76] and have since become a firmly established means of detecting and preventing software errors. According to Fagan, software inspection is "a formal, efficient, and economical means of finding errors in design and code". He asserts that enforcing discipline and adhering to a set of rules and processes during the software development cycle can drastically reduce the number of errors in the finished product. By following a framework of rules and processes the software development cycle can be broken down into a sequence of clearly defined phases. Each phase is assigned a set of exit-criteria, where the satisfactory completion of each phase is judged by an inspection. Fagan’s inspection process [Fag86] consists of six stages, each of which has its own objectives. These are shown in figure 2.2.1. Subsequent research has proposed various refinements to this process [PW85, MT90, GG93, KM93, Vot93]. There are several comprehensive reviews of software inspection that provide a more elaborate description of these modifications [Lai00, RDW03]. Inspections are applicable to any development cycle that can be broken into discrete phases, where the documents can be inspected before they are used in the following phase. By ensuring that faults are detected early in the development life-cycle, the cost of locating and rectifying them is substantially reduced. Figure 2.2.2 shows where inspections would take place in a development cycle that follows the waterfall model (they can be integrated into any type of development cycle, the waterfall model is simply illustrative). As indicated in the figure, this 6
Requirements definition
Inspect requirements documents
System and software design
Inspect design and specification
Inspect source code
Implementation
Integration and system testing
Inspect test plans
Operation and maintenance
F IGURE 2.2.2. Applying Inspections to the waterfall development cycle thesis focusses on the inspection of documents produced by the implementation phase (i.e. source code documents).
2.3. Code Reading Reading techniques are crucial for effective inspections. They guide the inspector towards areas of the document that are particularly relevant and fault prone by following specific guidelines. This section explains why code reading is necessary (as opposed to simply testing the source code) and provides an overview of the popular code reading techniques. 2.3.1. The Importance of Code Inspections. The importance of inspecting design documents is well established. It is more economical to find faults early on in the software development cycle, because their origins can be traced more easily and require less effort to fix. Recent advances in static analysis techniques and statically typed programming languages have enabled the detection of many errors at compile-time, raising the questions of whether code inspections still offer a justifiable return on investment [SV01]. Where lies the value in researching a practice that may be perceived as redundant? When software inspection was conceived thirty years ago, the software engineering landscape was completely different. Processes were less formal and software was developed with simple text editors. Compiler technology was relatively simple and testing environments were only in their genesis. Today there are several well structured development processes (e.g. the 7
Unified process [JBR99]). Software is developed using advanced integrated development environments such as Eclipse [OTI03]. Languages such as Java are statically typed, detecting many errors at compile-time. Testing approaches have been tightly integrated with development processes (e.g. unit testing and extreme programming). With these substantial advances in both technology and practice it is unsurprising to find that code inspections are seen by many as obsolete. Faced with this popular opinion Siy and Votta [SV01] conducted a study into the efficacy of code inspections, investigating whether they are still valuable and worth pursuing. By analysing data from a code inspection experiment they determined that 60% of the issues raised could not have been uncovered by testing or field usage. Inspections can be used to find qualitative code faults as well as those that cause software to fail. Their work supports the fact that inspections are important for finding faults and can have a major impact on the maintainability and evolvability of the product. A static counterpart to testing is still vital for a comprehensive verification and validation approach. 2.3.2. Popular Reading Techniques. Reading techniques provide the inspector with a structured means of reading the document under inspection and aim to provide a sufficient level of understanding to guide the inspector towards faults in the system. An overview of the most prominent reading techniques is provided below. Techniques are described in the context of their application to source code documents, but many can be applied to any document produced by the development process. Ad-hoc. The inspector is given no direction about how the code should be read. Success is based on the inspector’s skill and ability to root out any defects. This freedom can be advantageous to the experienced inspector, because it eliminates any overhead required by a more formal reading technique. It is however particularly unsuitable for large inspections that involve many inspectors, because it is difficult to maintain the consistency between individuals that would usually be imposed by a more formal reading technique. Having studied the techniques employed by developers when they investigate source code, Robillard et al. [RCM04] conclude that ad-hoc reading is less effective than other structured reading techniques. Checklists. Checklists were first introduced by Fagan [Fag76], and later refined by Gilb and Graham [GG93]. They consist of a list of questions can either be answered ‘yes’ or ‘no’, the latter indicating a fault in the code. Questions are aimed a focusing the inspector’s attention towards common sources of defects. To correctly answer checklist questions, the inspector has to closely scrutinise various aspects of the document. This encourages a greater understanding 8
For each class: 1
Feature
Question
Inheritance
Is all the inheritance required by the design implemented in the class?
Class Constructor
Are all instance variables initialised with meaningful values?
2
Is the inheritance appropriate?
3 4
If a call to super is required in the constructor, is it present?
For each method: 5
Data Referencing
Are all parameters used within a method?
Method Behaviour
Are all assignments and state changes made correctly?
··· 14 15
For each return statement, is the value returned and its type correct?
16
Does the method match the specification?
For each class: 17
Method Overriding
18
If inherited methods need to behave differently, are they overridden? Are all uses of method overriding correct?
F IGURE 2.3.1. Sample checklist [DRW01]
of the system, increasing the probability of detecting defects. As pointed out by Laitenberger [Lai00] the problem with this approach is that, because the inspector’s attention is focused on the specific checklist items, any fault types omitted by the checklist are unlikely to be detected by the inspector. An example of a checklist for object-oriented systems by Dunsmore et al. [DRW01] is shown in figure 2.3.1. Stepwise Abstraction. The idea of stepwise abstraction by Linger et al. [LMW79] is that, as the reader inspects the code, its functionality is abstracted. This is very similar to the bottomup comprehension approach that was devised by Shneiderman and Mayer [SM79] at the same time. By initially extracting the functions of the fundamental components, the abstraction process is continued to higher levels until the inspector is left with a fully abstracted notion of how the code functions. This abstracted understanding can then be compared to the system specification. Any deviations from the specification indicate the existence of a defect. Linger et al. also suggest that this approach is particularly valuable if the code is poorly specified or documented. One drawback is that it takes the entire system into account. Focussing on particular components or use cases can however be difficult, particularly when functionality is interdependent and distributed (as is the case with object-oriented systems). Dunsmore et al. [DRW03] recognise that applying Linger et al.’s stepwise abstraction technique to object-oriented code is problematic. The reason for this is that object-oriented source code is not developed by functional decomposition, as a result source code that pertains to a particular unit of functionality tends to be distributed across the system (this problem is elaborated in section 2.4). Dunsmore et al. propose that the inspector produces a summary for the 9
functionality of every method encountered, following dependencies as they arise. These summaries can then be reused for subsequent inspections. As more methods are summarised, the reading process becomes easier. Dunsmore et al.’s approach is referred to as systematic reading. Scenario-based Reading. Scenario-based reading [PVB95] is founded on work by Parnas and Weiss [PW85], who suggest that in order to maximise the coordination of different inspectors, each inspector should be given a different angle from which to approach the inspection. A scenario is either a set of questions or a set of instructions that focus the inspectors attention on a particular aspect or feature of the system. Ideally inspectors are allocated scenarios that suit their areas of expertise to increase the chance of detecting defects. There have been several variations of this approach that have been developed to improve the generation of scenarios: • Defect-based Reading (DBR): Porter et al. [PVB95] propose that each inspector is given a set of questions that focus on the detection of a different defect type. The types of faults that are the target of DBR are the same as checklist-based reading. A survey of inspection experiments on the effectiveness of DBR by Regnell et al. [RRT00] is inconclusive about whether or not it outperforms other mainstream reading techniques. • Function Point Analysis (FPA): Cheng and Jeffrey [CJ96] propose the use of function point scenarios. Function point scenarios are written in terms of the program’s inputs, files, enquiries and outputs. In an experiment (on requirements documents) that compares its performance to ad-hoc reading, they determined that ad-hoc reading outperforms FPA, but suggest that the results are biased by prior experience of the subjects. • Perspective-based Reading (PBR): Basili et al. [BGL+ 96] suggest that scenarios ought to be based on the perspectives of various stakeholders. Software quality is composed of factors such as correctness, maintainability and testability. Scenarios are based on these individual properties and are inspected by stakeholders in each factor (e.g. tester, designer and user). PBR has been evaluated extensively on a variety of software documents and has generally performed favourably [BGL+ 96, LEEH01]. It has however yet to be evaluated on object-oriented source code (as is the case with most other reading techniques). Usage-based Reading (UBR). Thelin et al. [TRW03] propose UBR as an approach that is based on the idea that inspections should be governed by the expected usage of the system. The faults detected by this technique are deemed to have a higher impact on software quality 10
because they are most likely to be encountered as the system is executed. Thelin states: “UBR utilises the set of use cases as a vehicle for focussing the inspection effort, much in the same way as a set of test cases focusses the testing effort”. A problem that naturally arises in this case is that it is often difficult to accurately trace use cases to the document under inspection, particularly to source code. This problem is elaborated in the following section.
2.4. Object-Oriented Source Code Inspection and its Problems An understanding of the functionality of the system under inspection is fundamental to its verification. Most of the reading techniques in section 2.3 are based on helping the inspector to comprehend some aspect of system functionality. It is also the case that most of these reading techniques were developed for systems that are constructed by functional decomposition, where a system is decomposed into units of atomic functionality. This is particularly appropriate for the procedural programming paradigm, which was dominant until the emergence of popular object-oriented languages in the nineties. Object-oriented systems are not decomposed in this fashion; object-oriented decomposition often results in the distribution of functionality across the system. Traditional reading techniques have to be adjusted if they are to be effective when applied to object-oriented systems. This section examines the problems presented by this new decomposition approach and gives an overview of some proposed object-oriented reading techniques.
2.4.1. The Challenge of Reading and Understanding Object-Oriented Source Code. Dunsmore et al. carried out a study to identify the performance of various reading techniques when applied to object-oriented source code. They observed a number of problems that hampered the inspector when conventional reading techniques (for procedural code) were applied to object-oriented systems [DRW00a, DRW01, Dun02, DRW03]: The problem of delocalisation • Functionally related source code is non-contiguously scattered around the system. Figure 2.4.1 illustrates the extent to which the functionality that results from a mouse click in JHotDraw is delocalised. Every node in the call graph represents a single method, and every edge represents a call to that method. The methods in the call graph belong to a multitude of different classes. In total the graph contains 251 methods and 719 edges (representing potential method calls). Comprehensively exploring 11
F IGURE 2.4.1. Call graph from PaletteButton.mouseReleased method in JHotDraw
graphs of this complexity and picking out the relevant paths is extremely time consuming to perform manually. This problem is presented in more detail in section 2.4.2.1. Chunking • The number of interdependencies in object-oriented systems (particularly due to delocalisation) coupled with the lack of a hierarchy, presents the inspector with the problem of how to partition a system. In their work on the cognitive complexity of object-oriented systems, Cant et al. [CJHS95] allude to this problem. Besides the practical difficulties of partitioning an object-oriented system, they note that the code inspector also has to deal with a substantial cognitive burden, caused by continually chunking and tracing through related code artifacts. Reading strategy • Systematically reading through object-oriented source code in every possible context of execution is unrealistic. A reading strategy is required that will result in a low cognitive overhead for the inspector, dealing with the problem of delocalisation and presenting the system in manageable chunks. 12
These three problems identified by Dunsmore et al. are diverse in nature. Delocalisation is part of the inherent nature of object-oriented systems. The other two problems are tasks that must be carried out during an inspection (which are both made harder by delocalisation). This section is particularly concerned with the reasons that make conventional reading techniques inefficient when applied to object-oriented systems. It explores the problem of delocalisation as well as unpredictability, a closely related problem that was implicit in Dunsmore et al.’s work. 2.4.2. Delocalisation and Unpredictability. The problem of delocalisation is that functionally related program elements tend to be spread across the system. The problem of unpredictability is that object-oriented systems are executed in such a way that many of the decisions that determine the interactions between objects are taken at run-time, making it virtually impossible to predict them on a static basis. These two problems are well established in object-oriented software engineering research. Gamma et al. [GHJV99] state the following: “In fact, the two structures [run-time and compiletime] are largely independent. Trying to understand one from the other is like trying to understand the dynamism of living ecosystems from the static taxonomy of plants and animals”. Jorgensen and Erickson [JE94] also refer to the problem of unpredictability in the context of object-oriented integration testing: “The declarative aspect of object-oriented software lies primarily in its event-driven nature. Dynamic binding also creates an indefiniteness that resembles that of declarative programs”. 2.4.2.1. Delocalisation. In object-oriented systems the object is the primary and sole unit of decomposition. Objects model important entities within a system and consist of data tightly bound with the methods that manipulate them. The system executes by passing messages (invoking methods) between objects. A consequence of this decomposition strategy is that function and structure are no longer coincidental: functionally related code is distributed over many different objects. This distribution of functionality is termed by Solway et al. as a delocalised plan [SPL+ 88] and described as follows: “In a delocalised plan, pieces of code that are conceptually related are physically located in non-contiguous parts of the program”. It is important that this problem of delocalisation is not confused with the problem of crosscutting concerns, which forms the basis of Aspect-Oriented Programming [KLM+ 97]. Crosscutting concerns are also elements of functionality that are distributed across objects. A common example of a cross-cutting concern is logging, where the same logging action is carried out in multiple objects. This means that multiple objects have similar source code that is concerned with logging. The delocalised functionality that is addressed in this thesis is slightly 13
different. Here we are particularly concerned with functionality that is the result of objects interacting with each other, where each object contributes a different element of functionality (as is predominantly the case in object-oriented systems). To illustrate the problem of delocalisation, a small example is presented in figure 2.4.2. The screen shots and sequence diagram are from the JHotDraw framework, which can be used for the construction of drawing tools. The ‘PERT Editor’ is an example framework application, that allows the user to construct work-flow diagrams for business tasks. The sequence diagram shows the key method invocations that are involved in the simple task of selecting a different tool (the two screen-shots illustrate the simple action of selecting a different tool on the tool bar). This involves at least nine (key) method calls between four objects (PaletteButton, DrawApplication, ToolButton
and
Tool).
The
Tool
object is simply an interface class, in JHot-
Draw itself there are seventeen classes that implement the interface, many of which have their own implementations of the
activate
and
deactivate
methods and are therefore all (poten-
tially) relevant to the selection of a different tool. As functions become more complicated, the number of objects and object-interactions becomes intractable. Much of the work carried out by Murphy and her colleagues [MKRC05, KM05, RCM04, RM02] is concerned with the identification of code (and other material) that is relevant to a given software engineering task (e.g. implementing a change). They observe that “developers may be spending more time looking for relevant information amongst the morass presented than working with it” and find that relevant information tends to be scattered across the system. They illustrate this by analysing CVS commits for both the Mozilla and Eclipse projects. For both systems, over 90% of the transactions (concerning a single task) involve changes to more than one file [MKRC05]. Murphy’s work is presented in more detail in section 2.5.3. 2.4.2.2. Unpredictability. It is almost impossible to statically determine how delocalised elements of functionality will interact at run-time. This is due to the interplay of inheritance, polymorphism and dynamic binding. Methods belonging to an object that is instantiated at run-time may belong to different classes at different locations in the class hierarchy. Polymorphism is one of the main mechanisms of object-oriented programming1. In essence a polymorphic object varies its behaviour depending on its run-time type. If class x is a subclass of y, an object that is defined to be of type x may bind to methods that are defined in y if there are no overriding implementations in x. To illustrate this a small extract from the JHotDraw class hierarchy is shown in figure 2.4.3 (in reality the hierarchy is much deeper and the 1Polymorphism is however not restricted to object-oriented languages. 14
F IGURE 2.4.2. Illustration of how functionality is distributed in JHotDraw
classes contain more methods and data members; this is simplified for the sake of illustration). StandardDrawingView.mousePressed
invokes the
mouseDown
method on an object (representing
the currently active tool) that implements the Tool interface. In the class diagram three classes implement the
mouseDown
method (in reality there are twelve
mouseDown
implementations in
this hierarchy). Depending on the type of tool that is selected in JHotDraw at run-time, a different
mouseDown
tion, it inherits
method will be executed. If a tool does not have its own implementa-
mouseDown
from a class that does implement it further up the hierarchy (here
PertFigureCreationTool inherits mouseDown from ConnectionTool). 15
F IGURE 2.4.3. Extract from the JHotDraw class diagram [JHo] (dashed arrow represents possible calls) public void changeCursorBehaviour(CreationToolct){ mouseCursor.setSnapToGrid(true); ct.setCursor(mouseCursor); }
public void changeCursorBehaviour(ConnectionToolct){ mouseCursor.setSnapToGrid(false); ct.setCursor(mouseCursor); }
F IGURE 2.4.4. Overloaded methods that result in different run time behaviour (refer to the class hierarchy in figure 2.4.3) As well as affecting the type of the destination object, polymorphism also determines the runtime type of method parameters. Variable parameter types can introduce further uncertainty if the destination method is overloaded. An example that illustrates this problem is shown in figure 2.4.42. If for example there exists some method in JHotDraw that takes an object of the type
CreationTool
as an argument, and that method is overloaded by another method
that takes an object of type ConnectionTool, it is difficult for an inspector to determine which method will be executed. The decision is only taken at run-time, once the type of the argument has been established. The mechanism that permits the precise type of an object to be determined at run-time is called dynamic binding. It binds objects and methods together as the system is executed, using 2These methods do not belong to JHotDraw but simply serve to illustrate the uncertainty introduced by polymorphic
method parameters. 16
the run-time object types to decide which methods and attributes they inherit. This makes it almost impossible to ascertain which elements in the system will interact and contribute to a particular element of functionality without actually executing the system or exhausting every possibility during inspection.
2.4.3. Object-Oriented Reading Techniques. The challenges of delocalisation and unpredictability introduced by object-oriented programming demand specialised code reading techniques. Recent software inspection research has attempted to address this. This section presents some reading techniques that have been specifically developed for object-oriented systems (some of these have already been mentioned in section 2.3.2). Dunsmore et al. [Dun02, DRW03] produced a modified reading technique for object-oriented systems that is based on Linger et al.’s [LMW79] stepwise-abstraction reading technique. The idea is that the inspector inspects the system in a bottom-up fashion, as was originally suggested by Linger et al.. Dunsmore et al. acknowledge that, because of the sheer complexity inherent in object-oriented systems (large numbers of small methods that extensively communicate with each other), it is virtually impossible to adopt the traditional approach in a practical manner. They suggest that, for each method encountered, the inspector produces a concise summary of the method’s functionality. Instead of reading in the traditional bottomup manner (which presumes that functions are organised in some hierarchy that corresponds to different levels of abstraction), they propose that the order in which methods are read is determined by their dependencies on other methods. By adopting this approach, methods that interact with few other methods (i.e. they are relatively easy to understand in isolation) are read first, and methods that rely heavily on interaction with other methods are read once the methods they interact with have been summarised. This should mean that for a given method, even if it is executed in multiple contexts, its summary will suffice for the inspector to comprehend how it contributes to the behaviour of the system as a whole. Laitenberger and Atkinson [LA99] adapted Perspective-Based Reading to make it applicable to object-oriented systems. When applied to code documents [LEEH01], PBR takes a more focussed approach. The code that is relevant to the function under inspection is isolated and the inspected by the inspector. Interestingly Laitenberger et al. also draw comparisons between the process of understanding the source code and Linger et al.’s stepwise abstraction technique. They do not propose an approach that can be used to identify and isolate the relevant source code. 17
Thelin et al.’s [TRW03] Usage-Based Reading proceeds by identifying the elements of a system that belong to a particular user-level feature. This approach is particularly applicable to objectoriented software, because they are often designed in terms of their use (i.e. in terms of usecases [JCJO92]). The rationale for adopting this approach is that, if the elements of a system are inspected in the context of their use, emphasis is placed on detection of the most important defects. Although empirical results from design inspections look promising, this technique has yet to be applied to source code. The apparent problem, as with PBR and Dunsmore et al.’s technique lies with detecting the relevant code and code dependencies (e.g. calls between relevant methods). There is a lack of evidenced success of any of these reading techniques when applied to objectoriented source code. Perspective-Based Reading and Usage-Based reading have both outperformed the conventional checklist-based approach in detecting defects in object-oriented designs [LASE00, TRW03], but there has been very little empirical work to support the efficacy of these techniques when applied to object-oriented source code. In his thesis Dunsmore evaluated the abstraction-based technique on the source code of three object-oriented systems and detected fewer defects than checklist-based reading (the base-line technique against which new reading approaches are evaluated)3. His findings were supported by Skoglund et al. [SK04], who also carried out two experiments comparing Dunsmore’s technique to checklistbased reading and found no significant increase in effectiveness, but did find that it provided more support in code comprehension. 2.5. Tool Support for Inspections This thesis contends that the inherent complexity of object-oriented source code cannot be addressed by manual reading techniques alone. A (semi-) automated approach is required to assist the inspector in focussing on the relevant parts of the system. There exists a large collection of source code analysis tools that can be used to reduce this substantial burden on the inspector. This section provides an overview of the key existing tools (some of which are not explicitly intended for inspections). Many tools that are advertised for inspections are geared towards supporting the logistics of Internet-based inspection meetings [BSM90, MM99, vGvDSV01, TG02]. Although their importance is acknowledged, this thesis focusses on tool support that can specifically facilitate reading through source code and identifying the code that is relevant to the requirement that is under inspection. 3It was however noted anecdotally that the abstraction-based technique encouraged a deeper understanding of the
code ([Dun02], p.119). 18
2.5.1. Coding standards, bug patterns and smells. Tools that ensure the conformance of source code to set standards are known as lint-like tools (based on the popular lint program for C source code [Dar88]). Areas in the code that do not conform to these standards are highlighted as areas that may be defective. Popular integrated development environments such as Eclipse [OTI03] automatically check that (user-specified) coding standards are adhered to as the code is typed. Findbugs [HP04] is a more advanced checker that can be used as a plugin for Eclipse. It checks the source code against a library of “bug patterns”, flagging areas of the code that match these patterns. Detecting code smells is another automatable process. ‘Code Smell’ is a term from Fowler’s book on refactoring [Fow99], which is used to describe a set of source code properties that suggest bad design. Refactorings are a collection of source code transformations that can be combined to address code smells. Several smells described by Fowler would also be of interest to the code inspector. A few examples are: duplicate code, methods that are too long, classes that contain too much functionality, classes that violate data hiding or encapsulation rules and classes that delegate the majority of their functionality to other classes. Van Emden and Moonen [vEM02] have produced a tool called jCosmo that attempts to automatically locate code smells by statically analysing the source code. Ensuring that the source code adheres to coding standards and contains no bad smells or bug patterns is useful for inspections but faces the same issues as checklist-based reading. Any issues that are to be flagged by the tool must be defined before it is used. As with checklists, any bugs, smells or defects that are not predefined will not be detected. There is also a limit to the number of fault types that can be defined as a static pattern in the source code. 2.5.2. Slicing. Slicing is a source code analysis technique that identifies statements that can affect the behaviour of a set of variables at a given point in the source code. CodeSurfer is a slicing tool for C and C++ [ART03]. It provides a static graph representation of the program (the system dependence graph), which the user can slice (slicing is introduced in detail in the following chapter). The resulting slice can be navigated via hyperlinks that are embedded in the source code. Several other slicing tools are openly available, but CodeSurfer is the only tool that openly advertises itself as a code inspection tool. The ability to semantically relate source code statements to each other is clearly beneficial to inspections. According to Dunsmore [Dun02], a lot of the overhead that is inherent in manual code reading strategies is due to delocalised artifacts (see section 2.4.3). Slicing can remove the need to trace related source code, substantially reducing this overhead. CodeSurfer is 19
primarily a slicing tool and only supports ad-hoc inspection, unsupported by any structured reading approach. A key problem with the use of conventional slices is that they can be too large to be of practical use. If a slice is too large, it will highlight too much source code as being ‘relevant’ to the artefact that is being inspected.
2.5.3. Code comprehension and visualisation tools. Research into software inspections suggests that an increased understanding of the product under inspection increases the inspector’s engagement with the process, resulting in a higher yield of defects [RD94, DRW00b]. There exists a multitude of tools that aim to improve program comprehension. Storey [Sto05] notes that comprehension is not an end in itself; it is a necessary step to achieving some other objective. She points out that it is the type and the scope of programming task that ultimately determine which comprehension technique is adopted. Tool construction is generally driven by a given set of program comprehension theories and programmer tasks. As an example, Murphy et al.’s Reflection tool [MNS95] is designed to support top-down comprehension (as proposed by Brooks [Bro83]). Using this approach the developer suggests provides a set of hypotheses that they perceive to be true about some aspect of the program’s functionality and the reflexion tool attempts to determine whether or not these are true. The Rigi tool on the other hand has support for multiple views, cross-referencing and queries to support bottom-up comprehension (as proposed by both Schneiderman and Mayer [SM79] and Linger et al. [LMW79]). Using this approach the developer can build up a full understanding of how the system functions by reading the low-level entities of the system (individual functions), and abstracting away their functionality by examining how they interact with the rest of the system. Storey [Sto05] uses the term “Recommender system” for yet another form of comprehension tool, which is aimed at guiding the developer towards parts of the system that are particularly relevant to a specific task. Some of these tools are presented below (all are implemented as plug-ins for Eclipse [OTI03]): • Mylar (by Kersten and Murphy [KM05]): Mylar monitors programmer activity to generate a degree-of-interest model by monitoring the programmer’s activity. This is used to highlight files that are particularly relevant to the current focus of the IDE, reducing the navigational overhead on the programmer. • NavTracks (by Singer et al. [SES05]): NavTracks monitors the navigation between files in Eclipse. Following Wexelblat’s assumption that “path information gathered from 20
an information space can reveal the user’s model of how information should be connected”, the tool records navigation patterns and feeds them back to the programmer. • FEAT (by Robillard and Murphy [RM02]): FEAT enables the user to manually construct “concern graphs”, which relate source code that contributes to a particular concern. This graph can then be used to navigate source code in terms of its concerns. For each node in the graph the user can apply queries, e.g. if the node is a class the superclass can be obtained, or if the node is a method the user can obtain any other methods that are called by that method. • Hipikat (by Cubranic and Murphy [CM03]): Hipikat uses resources such as CVS, bug tracking systems, emails / newsgroups and documentation to represent knowledge about a software system. As the developer navigates the system, Hipikat is able to recommend elements of these resources that may be of use in the context of the current focus of attention. It provides light-weight information (e.g. bug report documents), as a means for enhancing developer knowledge about a specific fragment of source code.
2.6. The Case for Slicing The tool-supported approaches presented above are all useful for code inspections, but only to a certain extent. None of them explicitly address the problems of delocalisation and unpredictability (as introduced in section 2.4.3). Scanning the source code for predefined patterns and smells can eliminate known defects, but cannot offer help in terms of comprehension and code reading. There is also not sufficient evidence to suggest that an inspector can rely on the existence of programmer behaviour patterns or expertise (as is required for Mylar, NavTracks and FEAT) to act as a reliable guide towards source code that is relevant to a particular task. Also, these tools in their current state (with the exception of FEAT) simply present results in terms of files. This does not provide enough information to the inspector about where to look within the file, and about the context in which a given file is relevant. This thesis aims to construct a tool that enables the inspector to focus on a specific user-level function, taking into account the problems of delocalisation and unpredictability. These can only be addressed by considering the nature of the relationships between functionally related, but structurally disjointed methods. An effective code reading tool needs to make use of more sophisticated code analysis techniques. That can reliably guide the inspector towards source code that is relevant to the system function that is being inspected. 21
There exists a host of techniques for analysing relationships within the source code. Many of them were developed as analysis tools for compilers, to generate intermediate representations that could be used to facilitate program synthesis. These have produced relationships such as data flow between statements [ASU86] and the control dependence of a statement on its predicate [CFRW91].
In his thesis [Wei79] Weiser observed that, whilst debugging, developers find it difficult to navigate these hidden dependencies. He used data flow analysis as a means to compute code slices [Wei79, Wei81, Wei84], which include all of the statements in the program that are related to a statement specified by the user (slices are introduced in-depth in the following chapter). He concluded that developers indeed use mental slices as they debug [Wei82].
Although the problem of debugging is different to that of inspecting source code, the problems encountered are similar. Just as with Weiser’s developers, inspectors also have problems navigating hidden dependencies in source code [DRW03]. Indeed the problem is amplified because of the fact that the number of dependencies has proliferated with the advent of objectoriented programming. It would therefore seem that slicing would be a very useful technology to apply to code comprehension and inspection. Storey [Sto05] supports this argument, alluding to its potential application to the location of programming concepts: “the user may need to indicate a starting point and then use slicing techniques to find related code”.
The idea of using slicing for inspections is not new. Anderson et al. [ART03] have advertised their CodeSurfer as a useful tool to accompany source code inspection. The work in this thesis gives slicing a more central rôle in inspections, instead of it being simply a passive tool that can be used during ad-hoc inspections.
Conventional static slicing approaches become impractical when applied to object-oriented code because they are inherently conservative. As a result slices are often too large to be of practical use (this problem will be discussed in more detail in the following chapter). To circumvent this an alternative approach is required that allows the inspector to restrict the results in some way. The rest of this thesis presents a source code analysis technique that allows the inspector to insert information about the execution of the program. This is then used to eliminate as much irrelevant source code as possible, helping the inspector to focus in on the parts of the system that are particularly relevant. 22
2.7. Conclusions Over the past thirty years inspections [Fag76, Fag86] have made an important contribution to ensuring that software systems are correct and maintainable. Although they can be applied to any document that is produced during the development of a software system, source code has proven to be a particularly challenging document type to inspect, because it is inherently difficult to navigate and comprehend. Object-oriented source code is particularly difficult to read from a static perspective because it (a) encourages the distribution of functionally related code across multiple classes in the system [Dun02, GHJV99] and (b) the use of polymorphism coupled with dynamic binding renders it practically non-deterministic [JE94], making it virtually impossible to accurately predict how software elements will interact at run time. Several manual reading techniques have been devised to address these issues [DRW03, TRW03]. There has however been very little empirical evidence to suggest that they merit the inevitable increased effort from the inspector when compared to traditional reading techniques (such as checklist-based reading [SK04]). A major reason for this (as emphasised by Dunsmore [Dun02]) is the fact that inspectors often find it difficult to trace through the source code, identifying those elements that are relevant. This forms the rationale for the argument that object-oriented source code inspections could greatly benefit from tool support that addresses the inherent weaknesses of manual inspection. There exists a number of tools that can be used to support inspections by various means. Many tools that have been developed specifically to aid inspections have so far only considered the logistical aspect of supporting meetings between inspectors [BSM90, MM99, vGvDSV01, TG02]. To the best of the author’s knowledge no tools have been developed to specifically reduce the manual overhead that is inherent in object-oriented source code reading techniques. Various source code analysis tools that are not specifically intended as inspection aids can however still be useful in this respect (such as bug pattern or code smell detectors [HP04, vEM02], slicing tools [ART03] and general source code visualisation / comprehension tools [Sto05]). Recently research has focussed on the development of “Recommender systems” [KM05, SES05, RM02, CM03], which are aimed at providing guidance towards areas of a system that are particularly relevant in some respect. This is particularly interesting for source code inspections, because these tools aim to reduce the lack of guidance and ensuing confusion that arises with object-oriented systems. Although these tools fulfill a useful purpose in terms of program 23
comprehension, most of them rely on a history of programmer interaction or expert input to provide their feedback. A tool is required that relies on only a minimal amount of developer expertise, but can still provide useful feedback to the inspector in the form of guidance through the source code. Code slicing [Wei81, Wei84] highlights (hidden) dependencies between software elements that may interact at run time. Empirical studies by Weiser [Wei82] have shown that programmers use mental slices as they debug source code. Based on the premise that the mental processes of debugging and inspecting source code are similar, this chapter concludes that tool support must incorporate slicing if it is to address the problems of navigating and understanding objectoriented source code.
24
CHAPTER 3
Code Extraction Techniques
3.1. Introduction
Slicing is a source code analysis technique designed to extract semantically related components from a program. A slice returns a fragment of the program, consisting of statements that may contribute to the computation of a variable (or set of variables) at a given point. Weiser first introduced the notion of a slice in his thesis [Wei79] that investigated how programmers debug their source code. Since its conception slicing has grown into an active field of academic research. A large variety of slicing paradigms exist. Approaches may vary in their precision, input, output and computational efficiency. Jackson and Rollins observe that “not all questions even when restricted to the vocabulary of program dependences can be cast as [conventional] slice criteria” [JR94]. For this reason new slicing techniques are developed to fit the needs of a specific application (e.g. a slicing technique for a debugger may want to allow for added precision around the suspected location of a fault). This chapter is primarily an overview of slicing techniques, but takes care to include other code extraction techniques that are relevant to the following chapters. Techniques are discussed in terms of the questions they answer about a program. The actual mechanics of computing the slices themselves is covered to an extent, but emphasis is placed upon their ‘shallow features’: their input requirements and their output. When a slicing approach is presented, there should be sufficient information about its computation to make this chapter self-contained. If the reader requires more information about slice computation and alternative intermediate representations, Tip [Tip95] and Binkley and Gallagher [BG96] provide overviews that are more complete in this respect. A survey of slicing approaches with emphasis on their empirical results is provided by Binkley and Harman [BH03]. Krinke also provides a more up-to-date survey of slicing and its applications [Kri05]. 25
3.2. Definitions These are definitions that are generally relevant to source code analysis and slicing. More complex and specific definitions will be provided in their relevant context. D EFINITION 3.2.1. A digraph (or directed graph) D is a structure < N, E >, where N is a set of nodes and E is a set of edges. A path from n to m of length k is a list of nodes p0 , p1 , ..., pk such that p0 = n, pk = m and for all i where 1 ≤ i ≤ k − 1, (pi , pi+1 ) is in E. A node nx precedes a node ny (and ny succeeds nx ) if there exists a path from nx to ny . If they are adjacent then nx directly precedes ny . The set of nodes preceding ny on a graph D is denoted as P re(ny , D) and the set of nodes succeeding nx is denoted as Succ(nx , D). D EFINITION 3.2.2. A slicing criterion is denoted < p, V >, where p denotes the point of interest in the program and V denotes the set of variables. When slicing based on a program’s data flow alone (as will be presented in section 3.3.1) the variable(s) and point do not have to be collocated. However when slicing dependence graphs (this will be presented in section 3.3.3) the restriction that variables are either defined or referenced at the criterion point is enforced. D EFINITION 3.2.3. A flow graph F is a structure < N, E, n0 >, where < N, E > is a digraph and that there is a path from n0 (the initial node) to all other nodes in N . If m and n are two nodes in N , m dominates n if m is on every path from n to ne . D EFINITION 3.2.4. A control-flow graph CF G is a flow graph representing the control structure of a procedure1. N contains units in the procedure, where a unit can be a basic block, a single instruction or a statement. For the sake of simplicity it is assumed that every node in N is a statement. E contains edges that represent the flow of control between individual units. n0 is the entry point to the procedure. For a node n in the CF G, the set DEF (n) contains all variables that are assigned a value at n (are on the left hand side of an assignment). REF (n) contains all variables that are read at n (are on the right hand side of an assignment). D EFINITION 3.2.5. A control dependence exists between two statements x and y if the predicate represented by x controls the execution of statement y. D EFINITION 3.2.6. A data dependence exists between a statement x that defines variable v and statement y that uses v if control can reach y from x and there is no intervening definition of v. The process of computing data dependences makes use − def inition chains [ASU86] explicit. 26
1: 2: 3: 4: 5: 6: 7: 8:
public void divideByEight(int a){ int result=0; if(a, where p is the program point and V is the set of criterion variables. In the case of figure 3.3.1 the slicing criterion is therefore < 7, {result} >. There are two popular approaches to the computation of a slice. Weiser’s original approach is an iterative algorithm that computes the slice as the solution to a set of data-flow equations [Wei84]. The other approach is based on representing the program as a dependence graph [FOW87] where vertices represent expressions and edges represent control and data dependencies between them (see definitions 3.2.5 and 3.2.6). This section provides an introduction to their underlying theory and the intermediate program representations upon which they are based. Other slicing approaches introduced in the following sections will be based upon either one or the other2. 3.3.1. Slicing using Data-Flow. Weiser’s original approach [Wei81, Wei84] computes a slice by using the control flow graph as an intermediate representation of the program. A slice is computed via a two step approach. The first step computes the data-flow through 1The terms ‘procedure’ and ‘method’ are used interchangeably.
2There are other underlying representations, such as information-flow analysis [BC85], but this chapter emphasises
types of information that can be obtained from slices, not how they are computed. 27
(1) Initialise all relevant sets to ∅ (2) Insert variable v into node n’s set relevant(n) (3) For m, n’s immediate predecessor, relevant(m) the value relevant(m) = ((relevant(n) − DEF (m)) ∪ (REF (m) if relevant(n) ∩ DEF (m) 6= ∅) (see definition 3.2.4 for definitions of DEF and REF ) (4) Working backwards, repeat step (3) for m’s immediate predecessors until n0 is reached
F IGURE 3.3.2. Process for intra-procedural slicing without considering structured control flow (where n represents slicing criterion node on the CF G and v represents the criterion variable)
1: x = 1
2: y = x
3:
1: 2: 3: 4: 5: 6:
x=1; y=x; z=x+5; x=y-4; y=x+4; z=x+2;
7:
return y;
REF={}
DEF={x}
REF={x}
DEF={y}
relevant={}
relevant={x}
z=x+5
REF={x}
DEF={z}
relevant={y}
4: x = y − 4
REF={y}
DEF={x}
relevant={y}
5:
y=x+4
REF={x}
DEF={y}
relevant={x}
6:
z=x+2
REF={x}
DEF={z}
relevant={y}
return y
REF={y}
DEF={}
7:
relevant={y}
F IGURE 3.3.3. Computing data flow on a CF G where the criterion is < 7, y > (statements belong to a slice if the intersection of their DEF set with the relevant set of the following statement is not empty) the program and the second step uses the results of the data-flow computation to identify statements belonging to the slice. Data-flow is computed by iterating through the CFG, determining the set of relevant variables for each node (relevant(n)) that may (transitively) affect the computation of the variable at the slicing criterion. To give an idea of how the approach works, figure 3.3.2 contains a four step algorithm for intra-procedural slicing without compensating for structured control flow (this is covered later), taken from Binkley and Gallagher’s survey [BG96]. Figure 3.3.3 applies the algorithm in figure 3.3.2 to a simple program without control structures. Arrows between the nodes represent the flow of control. Nodes are structured as follows: Line number, program statement, REF (n), DEF (n) and relevant(n). To illustrate the algorithm: relevant(6) = (relevant(7) − DEF (6)) ∪ (REF (6) if relevant(7) ∩ DEF (6) 6= ∅). So, relevant(6) = ({y} − {z} ∪ ({x} if {y} ∩ {z} 6= ∅) = {y}. 28
1:
2:
1: 2: 3: 4: 5: 6: 7: 8:
x=1; y=x; z=x+5; x=y-4; y=x+4; if(x>4){ z=x+2; y=y+1;}
9:
return y;
x=1
y=x
REF={}
DEF={x}
control={}
relevant={}
REF={x}
DEF={y}
control={}
relevant={x}
3:
z=x+5
REF={x}
DEF={z}
control={}
relevant={y}
4:
x=y−4
REF={y}
DEF={x}
control={}
relevant={y}
5:
y=x+4
REF={x}
DEF={y}
control={}
relevant={x}
if x > 4
REF={x}
DEF={}
6:
control={}
7:
z=x+2
REF={x}
DEF={z}
control={6}
relevant={y}
8:
y=y+1
REF={y}
DEF={y}
control={6}
relevant={y}
9:
return y
REF={y}
DEF={}
control={6}
relevant={y}
relevant={y}
F IGURE 3.3.4. Computing data and control flow on a structured CF G where the criterion is < 9, y >
1: a=1; 2: b=2; 3: c=3; 4: while(a4)
8: y=y+1
Control Dependence Data Dependence
F IGURE 3.3.7. Example of a PDG corresponding to the code from figure 3.3.4, thick edges are control and dashed edges are data dependences The steps involved in constructing a P DG are listed in figure 3.3.6. Ferrante et al. [FOW87] provide a more detailed account, including algorithms for computing control and data dependences from the CF G. Cytron et al. [CFRW91] have since produced an algorithm that is particularly efficient at computing control dependences. Figure 3.3.7 shows an example of a PDG representing the program from figure 3.3.4. Thick and dotted edges represent control and data dependences respectively. Slicing the PDG is simple. Given that the slicing criterion is a vertex on the PDG, a slice is obtained by traversing all of the incoming control and data dependencies back to the entry node. Any nodes that belong to the traversal belong to the slice. To illustrate the slicing process, let the slicing criterion be the same as in figure 3.3.7 (line 9, variable y). The corresponding node in the PDG is marked with a thick boundary. By simply tracing backwards along all incoming dependencies, we end up with the same slice as we do when we use the data flow based approach (lines 1,2,4,5,6,8 and 9). 3.3.3.2. The System Dependence Graph. The PDG can only represent single procedures. To enable inter-procedural slicing Horwitz et al. [HRB90] present an augmented PDG called the System Dependence Graph (SDG). This is used to model a system consisting of a single main procedure that calls multiple auxiliary procedures. Each procedure is modelled by an extended PDG. An example is shown in figure 3.3.8. The PDG is extended as follows: • An additional vertex type is added to mark call-sites (invocations of procedures or functions). • Call-site vertices are connected to corresponding entry-vertices via call edges • Parameter passing is modelled as follows (this model assumes pass-by-value [ASU86]): – Values being passed from the caller are represented as actual-in vertices connected to the call-site 32
procedure A(x,y) call Add(x,y) call Inc(y)
procedure Main sum=0 i=1 while(i, 8 > (figure (b)) are highlighted. Using this approach to construct a DDG, the size of the graph is relative to the length of the execution history, which is potentially infinite (as is the case in figure 3.4.1 (a)). Although the execution trace itself may be infinite, there can only be a finite number of dynamic slices. 49
1: procedure Main 2: read(x) 3: read(y) 4: i=0 5: while(i, where Vin is ′ the set of input variables of a program and Vin ⊆ Vin , F is a first-order logic formula on the ′ variables in Vin , p is a statement in P and V is a subset of the variables in P .
To illustrate this approach, we use the source code in figure 3.4.1. Below are two conditioned slice criteria: (1) C = (< x, y >, F, 8, {x}) where F = (∃i • 20 ≤ i ≤ 30, x < i, y ≥ 2i) (2) C = (< y >, F, 8, {x}) where F = (y = 0) Criterion (1) will produce the slice that is highlighted in the left in figure 3.4.1 and criterion (2) will produce the slice that is highlighted on the right. Fox et al. propose an extension to conventional conditioned slicing called backward conditioning [FHHD01]. Canfora et al.’s (forward) conditioning approach answers the question: ”What happens if the program starts in a state satisfying condition c?”. The backward conditioning approach answers the question: “What parts of the program could potentially lead to the program arriving in a state satisfying condition c?”. The difference is analogous to the difference between forward and backward slicing. Fox et al. [FDHH04] have developed a conditioned slicer by the name of CONSIT that can compute forward-conditioned slices. The slicer works on a subset of the C language. It accepts as input a program to be sliced, which is annotated with an assert statement at the start of the program representing the path conditions and a further statement representing the slice criterion. Fox et al. also draw an important distinction between two forms of conditioned program that can form the basis for a conditioned slice: The ‘Control Dependence Sensitive’ and the ‘Control Dependence Insensitive’ conditioned program. The following definitions and example in figure 3.5.4 are based on those provided in [FDHH04]. D EFINITION 3.5.5. A Control Dependence Sensitive (CDS) conditioned program is constructed with respect to a condition, π. A statement may be removed from the original program P to form a conditioned program S(P ) iff it cannot be executed when the initial state satisfies π. D EFINITION 3.5.6. A Control Dependence Insensitive (CDI) conditioned program is constructed with respect to a condition, π. A statement may be removed from the original program P to form a conditioned program S(P ) iff when S and P are executed in an initial state satisfies π, the values of all variables at all points in S(P ) agree with their values at corresponding points in P . 56
I F(A>B) X =1; ELSE
I F(A>B) X =1;
X =1;
X =2;
Original program
CDS w.r.t. a>b
CDI w.r.t. a>b
F IGURE 3.5.4. Difference between CDS and CDI conditioning Field et al. [FRT95] independently propose parametric slicing as a means of computing constrained slices, which are very similar to conditioned slices. As is the case with quasi-static slicing and conditioned slicing, some of the inputs to the program may be fixed and some may vary. A fully constrained slice (where all inputs are given a value) is the equivalent to a dynamic slice, whereas a completely unconstrained slice is equivalent to a static slice. It should be noted that Field et al. also provide an argument against using partial evaluation to compute quasi-static slices, as was proposed by De Lucia et al. [LFM96]. Their reason is that some of the optimisations involved in partial evaluation can remove certain statements (such as predicates that evaluate to false) that should, according to Field et al., remain in a slice. The difference between Field et al.’s notion of a correct slice and the one computed by partial evaluation is reminiscent of the difference between CDS and CDI programs illustrated in figure 3.5.4.
3.6. Feature Identification and Analysis There exist a number of code extraction approaches that do not involve code slicing. The approaches that are covered in this section do not use slicing, but still use program analysis as a means for identifying source code that is relevant to a particular feature. As with slicing, these can be categorised according to their use of either static or dynamic program information. 3.6.1. Dynamic Approaches. Wilde and Scully [WS95] introduced an approach called software reconnaissance, which is designed to identify the source code that is essential for the execution of a given feature (a feature is a realised functional requirement). They identify a set of test cases that exercise the feature of interest and another set of test cases that will definitely not invoke the feature. The program is executed twice, using both sets of test cases. By comparing the traces from both sets of test cases it is possible to locate the source code that is responsible for the feature of interest. Bojic and Velasevic [BV00] propose an approach for using run-time information to reverseengineer use cases. Their approach relates a set of test cases (that collectively correspond to a use case) to a set of system code entities. This information can be used to construct a formal context, which can in turn be used as a base for formal concept analysis. This produces 57
a “concept lattice”, which indicates subsumption relationships between code entities. If a code entity is ranked higher in the lattice, it is shown to represent a more broader element of functionality, that makes use of smaller more focussed concepts that are ranked lower in the hierarchy. Eisenbarth et al. [EKS03] propose a similar approach to that proposed by Bojic et al. for procedural programs. Their approach uses dynamic analysis and formal concept analysis to represent the relationships between features and components (a computational unit of a system that consists of an interface and a collection of subprograms). Again, their approach uses a selection of test cases to exercise the features that are of interest and then uses formal concept analysis to relate these features to the components that implement them. Greevy and Ducasse [GD05, GDG05] propose an approach that maps features to classes in the system. This provides information about the capacity in which individual classes contribute to the execution of a feature. They exercise features by running collecting multiple traces, as is performed by Bojic et al. and Eisenbarth et al.. A set of these feature traces is referred to as a feature model. For a given feature model, classes belonging to the traces can be categorised as follows: • Not Covered: A class does not participate in any of the feature traces of the feature model. • Single-Feature: A class participates in only one feature trace. • Group-Feature: A class participates in less than half of the feature traces. • Infrastructural: A class participates in over half of the feature traces. 3.6.2. Static Approaches. Chen and Rajlich [CR00] use the system dependence graph (see section 3.3.3.2) to identify components that contribute to concepts (which are referred to as features by Eisenbarth et al.). The rationale for this approach is that the dependence graph points out dependencies that would perhaps not be obvious during an examination of the source code. The downside, as with most other static approaches, is that the search space can be too large because of its inherent conservatism. Di Lucca et al. [LFC00] propose an approach that is based on the premise that a scenario starts with a system-level input and ends with a system-level output. They represent the message sequences in the form of a Method-Message Graph (MMG), which was devised by Jorgensen and Erickson [JE94]. ‘Threads’ of message invocations are extracted from the graph and collated to form use cases. 58
Qin et al. [QZZ+ 03] propose an approach based on constructing a call graph-based abstract representation of the subject program called the Branch-Reserving Call Graph (BRCG). This represents calls between methods and retains control dependence information, so that predicate statements that control the execution of a given procedure call are integrated. Because no prior use case information is used and the approach is static, it returns all possible execution scenarios of the system. This can be alleviated by pruning nodes using a graph-based ‘importance metric’ proposed by the authors. Tonella and Potrich [TP03] provide a reverse-engineering approach for interaction diagrams from C++ code. Acknowledging that a purely static approach is over conservative, they use two mechanisms called partial analysis and focussing to ensure that the average size of a graph is small enough to be of use. They validate their approach by applying it to a substantial realworld project. 3.7. Conclusions This chapter provides an overview of slicing and other source code extraction techniques that are relevant to the following chapters. It shows that there exists a large number of techniques for code extraction. The challenge lies in choosing the correct technique (or combination of techniques) to suite the application. The two most popular techniques for slice computation are by data-flow analysis and by traversal of a graph that represents control and data dependencies between program elements. The choice of between the two approaches is a question of the level of efficiency and accuracy that is required. Dependence graphs are more expensive to construct, but are more efficient to slice. If the slice is inter-procedural, they also allow for context-sensitive slices (slices that take into account the calling context of a function that is being sliced), which are more accurate than conventional data-flow slices. As well as the underlying program analysis used for its computation, the precision of a slice is also dependent on the amount of information available about the runtime state of the program. Although there is an inevitable compromise in terms of soundness, the precision of a slice can be substantially increased if the slicer is supplied with dynamic program information. Whether or not this information is used depends on its availability (e.g. the program must be executable) and whether the lack of soundness is tolerable for the application. Once the representation has been chosen, an appropriate slicing approach must be implemented. This depends on the slicing application. If the program can be executed and the 59
user only wants to consider a slice with respect to one test case, dynamic slicing is the obvious choice. If it can’t be executed, but the user still wants to consider a single execution, there exists a selection of techniques that allows for the inclusion of (speculative) dynamic information. A major challenge lies in choosing the correct slicing approach to suit the problem domain. The question that needs to be answered is “what information will this slicing approach return, and is it appropriate for my problem?”. Once a slicing approach is chosen, the problem is re-cast so that the solution merely requires the correct slicing criteria.
60
CHAPTER 4
Using Landmark Methods to Partition Object-Oriented Source Code 4.1. Introduction
Chapter two introduced the problems that arise when inspecting object-oriented code. Delocalisation and the unpredictability of run-time interactions make it virtually impossible to understand and inspect an object-oriented system because it is very difficult for the inspector to predict how elements in the system will interact at run-time. It was concluded that because of the inherent complexity of object-oriented source code, code inspections will need to rely on tool-support if they are to become an efficient fault detection option in the software engineering industry. Slicing has been identified as a tool-supported means of addressing the problem of delocalisation. Slices can be used to establish control and data relationships between statements. Various slicing approaches exist, each of which presents a different output depending on the information provided in the slicing criterion. An overview of these approaches is provided in chapter three. As mentioned in the conclusions of chapter two, inspections are restricted to a static view of the source code. Although static slices can be used to illustrate the delocalised statements that are related to each other, they tend to be too large to be practical for inspections. Slices on objectoriented source code are likely to be particularly large because of the inherent unpredictability of object-oriented systems. Slicing alone does not serve as the basis for a practical software inspection tool. A technique is required that provides information that is more focussed than conventional slicing, allowing the inspector to be more selective about the range of program executions to be taken into consideration. This chapter presents the main contribution of this thesis: an approach for partitioning objectoriented source code into segments that can be inspected individually, using slicing to connect delocalised code elements and introducing the notion of landmark methods to address the 61
problem of unpredictability. The result is a call graph that corresponds to an element of userdefined feature, as defined by the landmark methods. This can be used to guide the inspector along calls that may be invoked when the feature executes.
4.2. Truncating the Call Graph with Landmark Methods To determine whether the system will behave correctly when it executes, the inspector has to mentally execute the source code from a static perspective. This involves establishing the methods that call each other when a particular use-case is executed. The inspector has to simultaneously chunk (understand and mentally record sections of code) and trace (scan through the code in order to identify further relevant chunks) [CJHS95]. When there are multiple execution paths that can be taken at runtime (this is especially the case in object-oriented systems), the inspector is burdened with the task of determining which paths warrant inspection and which ones (if any) do not. The reasons that make this task so challenging in object-oriented systems are introduced in section 2.4. Call graphs illustrate the calls that may occur between methods when the system is executed. Nodes in the graph represent methods and edges between the nodes represent calls. Because object-oriented systems are to a large extent non-deterministic, their call graphs tend to have a large edge-to-node ratio because there are usually multiple possible destinations for a given call. The definition of a call graph that will be used for the rest of this chapter is presented below (it is based on Grove et al.’s context-insensitive call graph definition [GDDC97]). D EFINITION 4.2.1. A context-insensitive1 object-oriented call graph C =< N, E, n0 > is a flow graph (definition 3.2.3) that represents the call relationships between methods. The node set N represents the set of methods (every method is represented by a single node). Each node n ∈ N has an indexed set of call sites, where a call site is the source of zero or more (call) edges e ∈ E to other nodes. A call site simply represents the statement belonging to the method that makes a call. The node n0 is the entry node to the call graph. This method is initially executed and is the source of all paths of execution through the system. Faced with the task of determining how delocalised artifacts interact in a complex objectoriented system, the call graph is a useful tool. It makes explicit the set of methods that may 1A context-insensitive call graph is used because it is cheaper to construct and store than a context-sensitive one. A
context-insensitive call graph contains only a single node per method. Context-sensitive call graphs contain multiple nodes to represent each method, where each node represents a different calling context [GDDC97]. For a non-trivial program this results in a very significant overhead in terms of the amount of time and space required for its computation [WL04]. 62
be the target of a given call. By isolating the edges that are particularly relevant to the use-case under inspection, the mental overhead involved in tracing through the source code is minimised, allowing the inspector to devote more effort to the task of understanding the source code. Analysing the call graph from a static perspective is difficult because of the large edge-to-node ratio; every call that may be executed at run-time is included. There are usually many such edges because of the inherent unpredictability of object-oriented programs (see section 2.4). The challenge is to select only those edges that are relevant to the inspector, i.e. those edges that may be executed as part of the use-case under inspection.
4.2.1. The Rationale for Landmark Methods. As presented in chapter three, various slicing techniques have been developed with the aim of reducing the size of conventional (static) slices by taking into account information that is collected while the program executes (dynamic information). This can substantially reduce the size of a slice because the analysis is no longer forced to be conservative (Binkley and Harman’s slicing survey [BH03] provides an overview of empirical results to support this). There are however three properties of dynamic analysis that make it unsuitable to use for code inspections: (1) Analysis is restricted to what is executed: The program must be executed in such a way that it accurately represents the subject use-case. This is particularly challenging if the use-case has multiple variations, because each one must be captured with the correct input to the program. A set of executions that do not accurately represent the use-case can result in producing an incomplete analysis. (2) Requires a large time and space overhead: The process of collecting run-time data as a program is executed requires a lot of storage space. There is also a substantial temporal overhead that is incurred by logging every run-time transaction (e.g. logging the statements that are executed or method calls). (3) Requires an (at least partially) executable program: Dynamic information can only be collected from a program if it is executable. Frameworks are a good example of code that is not executable without an application (that may not be available at the time of an inspection). Although dynamic information can increase the precision of an analysis, the executions that are used to collect it must be representative. The process of collecting and processing dynamic information is expensive and the program must be executable in the first place. A technique 63
is required that can address the above points whilst still providing information about a set of program executions (corresponding to a use case) that will reduce the conservatism of a purely static approach. From the overview of slicing techniques in chapter 3, it becomes apparent that there are two key approaches that are used to focus the contents of a slice (thus reducing their size): (a) computing the slice with respect to dynamic information (dynamic slicing, call-mark slicing, conditioned slicing etc.) and (b) allowing for more expressive slicing criteria (chopping, barrier slicing etc.). All of these approaches however still require that this information is specified at a low level; ultimately slicing criteria need to contain some combination of statements, relevant variables and (depending on the type of slice) dynamic traces or breakpoints. The analysis technique presented in this chapter aims to provide a slice-based code extraction approach that leverages the two benefits of incorporating information about program execution (without requiring a full trace) and allowing for more expressive criteria than simply a program point and set of variables. It aims to achieve this at a sufficiently high level of abstraction so as to make it practical for software inspections. Ideally the inspector should be able to specify run-time information about the program with respect to a use-case, without having to actually execute the program and without having to reason about the use-case that is being inspected at statement-level. Dynamic information is supplied in terms of (multiple) landmark methods, which are methods that are necessary for the execution of a particular use-case. Because the program does not need to be executed there is a greater degree of flexibility. If the inspector wants to consider multiple scenarios (variants) of a use-case, landmark methods that are specific only to a single use-case variation can be omitted. Conversely, if the inspector is interested in a single variant of the use-case, methods that are specific to its execution can be included to restrict the results (this is illustrated as part of the evaluation in the NanoXML case study in section 5.3.2). 4.2.2. The Technique. An overview of the technique is illustrated in figure 4.2.1. The first step consists of identifying methods that contribute to the use-case that is under inspection. These methods can be identified from the specification, from other system documentation or from developer knowledge of the system. The nodes in the call graph that correspond to these methods are highlighted in the call graph. Edges that belong to direct paths between landmark methods are highlighted (this process is elaborated in the following section). Finally, the code that is identified in the direct paths is sliced to identify further call graph edges that may be relevant. 64
(1) Trace landmark methods from specification to call graph (2) Identify direct paths between traced methods: (a) Mark methods traced from the specification on the call graph (b) Induce hammock graphs on the call graph between every pair of traced methods (3) Identify paths that can influence and be influenced by the paths in the hammock graphs: (a) Identify call statements for every edge in the hammock graphs (b) Generate intra-procedural slices, using call statements as slicing criteria (c) Mark all calls belonging to the slices (d) Follow all paths in the call graph originating from the marked call sites
F IGURE 4.2.1. Process of extracting code relevant to a particular aspect of system functionality 4.2.3. Obtaining Direct Paths in the Call Graph Between Landmark Methods. Direct paths are obtained between landmark methods by extracting a subgraph that takes the form of a special kind of flow graph (definition 3.2.3), known as a hammock graph. Hammock graphs were introduced (in the context of program analysis) by Kasyanov [Kas75, KE00] as a means of analysing and manipulating control flow graphs (definition 3.2.4). In this thesis Kasyanov’s notion of a hammock graph remains the same, but they are referred to in the context of call graphs as opposed to control flow graphs. D EFINITION 4.2.2. A hammock graph is a flow graph H =< N, E, n0 , ne >, where n0 ∈ N is the entry node, ne ∈ N is the exit node and Succ(n0 , H) ≡ P re(ne , H) (see definitions 3.2.1 and 3.2.3 for definitions of a flow graph, Succ and P re). D EFINITION 4.2.3. A vertex-induced subgraph G′ =< N ′ , E ′ > of a graph G =< N, E > is a graph where N ′ ⊆ N and E ′ ⊆ E, and E ′ contains all of the edges in E that connect the vertices in N ′ . As stated above, direct paths between two landmark methods in the call graph are obtained by inducing a subgraph from the call graph. This subgraph takes the form of a hammock graph where the landmark methods are entry and exit nodes. The formal definition is as follows: D EFINITION 4.2.4. A graph containing the direct paths in a call graph C between two landmark methods a and b is a hammock graph D(C, a, b) =< N ′ , E ′ , a, b >, which is a vertexinduced subgraph of C where N = Succ(a, C) ∩ P red(b, C). Note that D(C, a, b) 6= D(C, b, a) because edges are directed. An example of how to induce a hammock graph on a call graph is provided in figure 4.2.2. Here the nodes {a, ..., s} = N represent the vertices of the entire call graph, a = n0 is the 65
a
b
c
b
f
j
l
d
g
k
i
p
j
l
m
k
j
e
l
h
k
r
(a)
d
i
h
s
(b)
c
b
d
g
k
i
j
l
m
o
(c)
i
k
m
p
o
q
s
r
s
(d)
r
(e)
entry method. Before the hammock graphs are computed, the entry method to the call graph is added to the set of landmark methods, because it is always executed (all methods in the call graph are preceded by the entry point). In figure 4.2.2 the methods {a, q, i} = L represent the landmark methods. Figure (a) shows the entire call graph, with nodes a and q highlighted. Figure (b) shows the call graph obtained by inducing the hammock graph D(C, a, q). Figure (c) shows the hammock graphs that are obtained when we divide the graph from (b) by using node i as an additional landmark method. Because i succeeds a and precedes q on the call graph, we can split the graph D(C, a, q) into D(C, a, i) and D(C, i, q). Assuming we do not know what happens after the execution of method q, we have to add all of q’s successors to the list of calls to be inspected. This is shown in (d). This is what is used as the basis for computing the path dependencies (figure (e)), as described in the next section. It is possible to automate the computation of all of the possible hammock graphs between a set of landmark methods by simply inducing a subgraph between every combination of two landmark methods (i.e. in a pairwise manner). As an example, to automate the computation of the hammock graphs for the three landmark methods {a, q, i} in figure 4.2.2, the set of graphs to be computed is: D(C, a, q), D(C, a, i), D(C, q, a), D(C, q, i), D(C, i, a), D(C, i, q). Graphs D(C, q, a), D(C, q, i) and D(C, i, a) have empty edge sets and can be discarded (e.g. D(C, q, a) is empty because there are no directed edges that succeed q and precede a). D(C, a, q) can be discarded because it contains a landmark method and can be broken down further. This leaves the two graphs that have not been discarded and are presented in figure 4.2.2 (c), namely
66
e
h
F IGURE 4.2.2. Trimming the call graph by landmarking, hammocking and slicing
D(C, a, i) and D(C, i, q).
d
g
n
q
r
c
f
e
h
n
p
s
r
j
l
o
a
f
e
q
q
s
b
m
p
o
c
g
n
m
p
q
i
a
f
d
g
n
o
b
c
f
e
h
n
a
a
F IGURE 4.2.3. Call graph for the entire Hotel System (landmark methods are marked)
4.2.4. Computing Dependencies of Direct Paths. The edges in the hammock graphs identify the calls in the call graph that directly link landmark methods. Inspecting paths that only directly link landmark methods is not sufficient for a thorough inspection. Some calls that branch from direct paths may still be executed and affect the behaviour of methods belonging to the direct paths. The challenge lies in isolating only those branches that are relevant.
The problem is illustrated with a small example. The Hotel System is a small Java application that has been used in an undergraduate programming course at the University of Strathclyde. The system consists of 13 classes with a total of 102 methods. A class diagram of the system is provided in appendix B. It operates by providing the user with a selection of common hotel management tasks, such as checking a customer in, booking a room etc. The go method in the HotelUI
class calls the appropriate interface method (also in the HotelUI class), which sets off
the chain of calls through the system that correspond to the users choice. To provide an idea of the scale of the system in terms of the number of methods and connecting calls, the complete call graph is shown in figure 4.2.3. 67
Identified in hammock graph Necessary for use case execution
HotelSystem.main(String[])
Landmark method HotelUI.go()
Hotel.(int,int)
HotelUI.checkinReserved()
Hotel.checkinReserved(String,int,int,HotelDate,HotelDate,boolean)
Room.isSingle()
Room.cancelReserve(String,HotelDate,HotelDate)
Reservation.(String,HotelDate,HotelDate)
Room.(int,int)
HotelDate.(int)
Room.isReserved(String,HotelDate,HotelDate)
HotelUI.getHotelDate()
Room.occupy(String,HotelDate,HotelDate)
Bill(double)
HotelUI.(Hotel)
HotelDate.(int,int,int)
HotelDate.daysBetween(HotelDate)
HotelDate.equals(Object)
FunctionDate.equals(Object)
F IGURE 4.2.4. Call graph for the use case that checks guest in to a reserved room Figure 4.2.4 shows the calls that are involved in the use-case that checks a guest into a reserved room. Assuming that
Hotel.checkinReserved
and Room.occupy are chosen as landmark meth-
ods (remembering that the entry method to the call graph, in this case the
main
method, is a
landmark method by default), the hammock graphs that are induced on the call graph using the technique described in the previous section are D(C, HotelSystem.main, Hotel.checkinReserved) and D(C, Hotel.checkinReserved, Room.occupy), where C is the call graph of the entire system in figure 4.2.3. Edges that would be identified as a result of computing hammock graphs (steps (a) to (d) in figure 4.2.2) are marked bold in figure 4.2.4. It becomes apparent that considering hammock graphs alone misses out some potentially important calls (marked with dotted lines), such as the constructors for the HotelUI and Hotel objects, which are referenced by several other methods in the use case. This section shows how these calls are identified. The method calls that belong to the hammock graphs only belong to direct paths between landmark methods. Calls that branch away from the direct paths may return values, which may in turn be used by subsequent methods belonging to the direct paths, affecting their behaviour. These branching calls can be identified by tactically slicing the methods that have been identified in the hammock graphs. Slicing was introduced in detail in chapter 3. A backward static slice (the slicing technique used in this approach) on a slicing criterion (definition 3.2.2) returns the set of statements that can influence the value of the variable(s) in the criterion. Branch calls that may influence the behaviour of the methods that have been identified in a hammock graph can be identified by slicing these methods, using the call sites that spawn calls in the hammock graph as slicing criteria, where call site arguments are the criterion variables. Figure 4.2.5 shows the two hammock graphs from the example in figure 4.2.4. The source code for these methods is provided in figures 4.2.6 and 4.2.7. The calls belonging to the hammock 68
HotelSystem.main(String[])
HotelUI.go()
D(C,HotelSystem.main,Hotel.checkinReserved)
HotelUI.checkinReserved()
Hotel.checkinReserved(String,int,int,HotelDate,HotelDate,boolean)
D(C,Hotel.checkinReserved,Room.occupy) Room.occupy(String,HotelDate,HotelDate)
F IGURE 4.2.5. Hammock Graphs from figure 4.2.4 graphs in figure 4.2.5 are used as a basis for detecting the branch points for the other other relevant calls (identified by the dotted lines in figure 4.2.4). For each edge in the hammock graphs, the corresponding call statements (often referred to as call sites) are identified. These are used as slicing criteria, where call arguments are used as criterion variables. The variable representing the object containing the called method is also added to the slicing criterion2. In figures 4.2.6 and 4.2.7 the slicing criteria that are generated by the hammock graphs are underlined. Statements belonging to the slices are highlighted in colour. Call sites that belong to the slices and do not already belong to the hammock graphs are marked (shown in red). These calls are significant because (a) they may be executed at runtime and (b) if they are executed, they can influence the execution of methods belonging to the hammock graphs. All call graph edges that can be transitively reached by these marked call sites are added to the final graph that is to be inspected. Calls to library methods are not traversed because it is assumed that their source code is not of interest to the inspector. Once these marked call sites have been traversed to return all transitively reachable calls, the final graph is shown in figure 4.2.4. The highlighted source code and call graph edges are useful for source code inspections, because they indicate relevant source code elements to the inspector. Statements belonging to a slice within a method indicate those statements that can affect the behaviour of the landmark methods selected by the inspector. In the go() method in figure 4.2.6 for example the call site to checkinReserved() (which belongs to the hammock graph) is executed as part of a switch state-
ment, which belongs to a while-loop. The slice is particularly useful in this instance because 2With respect to inter-procedural dependence graphs [HRB90] discussed in chapter 3, a slicing criterion for a call site
contains all the variables that are represented by actual-in nodes. 69
public static void main(String[] args) throws Exception { Hotel baytes = null; HotelUI baytesUI = null; int noSingles; int noDoubles; BufferedReader din = new BufferedReader(new InputStreamR eader(System.in)); System.out.println(); System.out.println("Starting up ..."); System.out.print("How many single rooms: "); noSingles = Integer.parseInt(din.readLine()); System.out.print("How many double rooms: "); noDoubles = Integer.parseInt(din.readLine()); System.out.println(); baytes = new Hotel(noSingles, noDoubles); baytesUI = new HotelUI(baytes); baytesUI.go(); } public void go() throws Exception{ boolean go = true; int choice; BufferedReader din = new BufferedReader(new InputStreamR eader(System.in)); System.out.println("Welcome to Marco Baytes’ Hotel"); while (go) { System.out.println(); System.out.println("Input number corresponding to choice: "); //...display choices 1 and 2 System.out.println("3. Check in a guest (reservation)"); //... display choices 3 - 10 System.out.println("0. Exit"); System.out.print("Choice: "); choice = Integer.parseInt(din.readLine()); switch (choice){ //... case 3: checkinReserved(); break; //... case 0: go = false; break; } this.displayHotel(); } } public void occupy(String name, HotelDate start, HotelDate stop) { int noDays = start.daysBetween(stop); rOccupiers.add(name); if (rType == SINGLE) { rBill = new Bill(SINGLECHG * noDays); } else { rBill = new Bill(DOUBLECHG * noDays); } rStart = start; rStop = stop; occupied = true; }
F IGURE 4.2.6. Methods HotelSystem.main, HotelUI.go and Room.occupy
it picks out these predicate statements (i.e.
while(go)
and switch(choice)) and also identifies
the statements that influence their outcome (i.e. the values of go and choice). For call sites belonging to the hammock graph, all possible destination methods are made explicit, saving the inspector from having to identify them by hand. In the case of the call to
checkinReserved()
there is only one possible destination. However, in the presence of polymorphic calls (where there are multiple possible destinations) the ability to automatically identify these destinations is particularly useful. 70
public void checkinReserved() throws Exception { String name = null; int noSingle = 0; int noDouble = 0; HotelDate start = null; HotelDate stop = null; boolean card = false; String s; BufferedReader din = new BufferedReader(new InputStreamR eader(System.in)); System.out.println("Check in with a reservation"); System.out.print("Input name: "); name = din.readLine(); System.out.print("Number of single rooms required: "); noSingle = Integer.parseInt(din.readLine()); System.out.print("Number of double rooms required: "); noDouble = Integer.parseInt(din.readLine()); start = new HotelDate(); System.out.println("Input stop date for booking"); stop = this.getHotelDate(); System.out.print("Discount card [y/n]? "); s = din.readLine(); if (s.equalsIgnoreCase("y")) { card = true; } else {card = false; } if (theHotel.checkinReserved(name, noSingle, noDouble, sta rt, stop, card)) { System.out.println("Checked in against previous reservation"); } else { System.out.print("Sorry - doesn’t match any existing reservation"); } System.out.println(); System.out.println("Hit return to continue"); din.readLine(); } public boolean checkinReserved(String name, int noSingle , int noDouble, HotelDate start, HotelDate stop, boolean card) throws Exception{ Vector guestRooms = new Vector(); Enumeration e = hRooms.elements(); Room r; Guest hGuest; int singleCount = 0; int doubleCount = 0; while (e.hasMoreElements()){ r = (Room) e.nextElement(); ){ if (r.isReserved(name, start, stop) if (r.isSingle()) {singleCount++;} else {doubleCount++;} guestRooms.add(r); } } if (guestRooms.isEmpty() | (singleCount != noSingle) | (do ubleCount != noDouble)) {return false; } else { hGuest = new Guest(name, guestRooms, card, start, stop); hGuests.add(hGuest); e = guestRooms.elements(); while (e.hasMoreElements()) { r = (Room) e.nextElement(); r.occupy(name, start, stop); if (!r.cancelReserve(name, start, stop) ) {throw new Exception("Error in occupy against reservation"); } } return true; } }
F IGURE 4.2.7. Methods HotelUI.checkinReserved and Hotel.checkinReserved
4.3. Implementation
A tool has been developed as a proof of concept for the technique described in the previous sections. It analyses programs that are written in Java. The user can navigate the call graph from a graphical or textual perspective and trim it by specifying landmark methods. This 71
Controller File loader
Landmark method selection
Call graph browser
Model Soot intermediate representation
Method dependence graphs
Hammock graph / slice computation
View Source code representation of call graph
Graphical representation of call graph
F IGURE 4.3.1. Model View Controller implementation (arcs represent flow of data) section provides an overview of architecture of the tool and provides insight into some of the static analysis techniques that were chosen. The system adheres to the Model View Controller (MVC) architectural pattern [SG96]. In this pattern the system modules are classed either as modelling the data that is manipulated by the application, visualising the model, or controlling the manipulation of the model. The observer design pattern [GHJV99] is used to notify the view modules of any changes to the model. An overview of the model is shown in figure 4.3.1. 4.3.1. Model. The Soot byte code analysis framework [VCG+ 99] is used as a basis for analysing the system. It provides a collection of intermediate representations that can readily be used as a basis for constructing Java analysis tools. It also contains a selection of analyses that are already implemented. The computation of the call graph and the generation of method dependence graphs is extremely time consuming and can take several hours for a large system. The computation of hammock graphs and slices can be carried out interactively, requiring a relatively small amount of time. 4.3.1.1. Call Graph Computation. A particularly useful feature of Soot is that it implements three popular call graph construction algorithms (Call Hierarchy Analysis (CHA), Rapid Type Analysis (RTA) and Variable Type Analysis(VTA)). These algorithms can be changed without 72
affecting the functionality of the rest of the tool3. A comprehensive overview of the algorithms is provided by Sundaresan et al. [SHR+ 00], a summary is presented below: • Class Hierarchy Analysis (CHA): A standard method used to construct call graphs by using the class hierarchy to estimate the run-time types of objects that receive message calls. This produces the most conservative call graph but also takes the least time to compute and is used as a basis for the following two approaches. • Rapid Type Analysis (RTA): Proposed by Bacon and Sweeney [BS96], RTA is a postprocessing approach for a call graph that has been constructed using CHA. It is based on the observation that the receiver object of a message has to have been instantiated at some point (via a call to the object’s constructor). By collecting the set of types that have been instantiated in the program and analysing them with respect to the class hierarchy, many of the false positives produced by CHA can be removed. • Variable Type Analysis (VTA): Proposed by Sundaresan et al. [SHR+ 00], VTA is a refinement of RTA. In addition to assuming that the receiver object of a message has to be instantiated, Sundaresan et al. also impose the restriction that there must be some chain of assignments (computed using inter-procedural control flow analysis) between the call to the object’s constructor and any calls to its methods. Sundaresan et al. [SHR+ 00] produce an empirical study that compares their VTA call graph construction approach to RTA and CHA. CHA produces the basic call graph, where RTA and VTA are post-processing techniques. In their results both RTA and VTA substantially reduce the number of nodes and edges in the call graph. In Soot both RTA and VTA analyses tend to take a substantial amount of time to compute (depending on the size of the program). Although the potential is there to plug different call graph construction approaches into the tool, CHA was chosen as a base approach because it is much less time consuming than the other two, facilitating experimentation with the tool. The call graph algorithms are however interchangeable and, besides the substantial time overhead, changing them will have no impact on the functionality of the rest of the program. Although post-processing techniques such as RTA and VTA substantially reduce the number of nodes in the call graph, there are still certain nodes and edges that cannot be eliminated. Although they eliminate many infeasible calls, certain (polymorphic) calls may be considered
3Any time and space overheads incurred by different call graph algorithms may affect its performance. 73
feasible, but also as irrelevant by the inspector. The landmarking technique aims to eliminate as many of these irrelevant paths as possible, regardless of the call graph construction algorithm. 4.3.1.2. Dependence Graph Construction. Once the call graph is constructed, it is used as a basis for constructing method dependence graphs. Every method belonging to the call graph is represented as a dependence graph (introduced in section 3.3.3). Soot provides a control flow graph and intra-procedural def-use relations [ASU86]. These are used as a basis for constructing control and data dependence edges in the dependence graph. It is important to note that the data flow is only intra-procedural. If, for example, there is a call r.occupy() (modifying r) and that is followed by a call to r.isOccupied() (referencing r), the def-use relationship between the two statements would not arise. A slice on the statement r.isOccupied() where the criterion variable is r would
not highlight the statement r.occupy(),
even though it modifies the state of r. This problem is addressed by using summary edges (see section 3.3.3). A summary edge between an actual-in parameter x and an actual-out parameter y would indicate that, in the called method, the value of y is affected by the value of x. It is the computation of these summary edges that is the most expensive part of Horwitz et al.’s slicing algorithm [HRB90, RHSR94]. A more efficient (but slightly less precise) approach that is used in constructing dependence graphs for this tool is to use two assumptions that tend to be true for objectoriented systems to provide a conservative estimate of summary edges. The two assumptions are as follows: (1) If the called method is a mutator4, all of the arguments in the call site affect the change in state of the object receiving the call. In this case, summary edges should be added from each actual-in vertex representing arguments to the actual-out vertex representing the receiver object. (2) If the called method returns a variable, all of the arguments in the call site affect the value of that variable. In this case, summary edges should be added from each actualin vertex representing arguments to the actual-out vertex representing the returned variable. This can result in a slice that is larger than it needs to be, because statements that affect the values of all arguments are included, even if not all of the arguments contribute to the mutation 4A method can be deemed to be a mutator if there exists a data dependence between a statement in the method body
and the formal out vertex that represents a data member. 74
F IGURE 4.3.2. Graphical view of call graph
of the receiver object or influence the value of the variable returned by the called method. With respect to the landmark method approach, if a summary edge turns out to be spurious and results in a slice with a call site that is superfluous, it will present the inspector with a trail of calls that does not affect the behaviour of the rest of the use-case. This has however only been observed very infrequently. 4.3.1.3. Hammock Graph / Slice Computation. The computation of hammock graphs and the slice computation closely follows the technique described in section 4.2. Given a selection of landmark methods hammock graphs are induced on the call graph between the landmark methods by adopting the pairwise approach described in section 4.2.3. Slices are obtained by traversing the method dependence graphs, using call sites and actual-in vertices for edges belonging to the hammock graphs as slice criteria. Since these slices are not inter-procedural, statements belonging to a slice can be determined by a single backward traversal instead of the two-step inter-procedural approach as proposed by Horwitz et al. [HRB90]. The call graph that is initially computed by Soot is not altered, but a temporary graph is used to present the results to the user (with respect to figure 4.3.1 the component that graphically represents the call graph is an observer of the model produced by the hammock graphing / slice computation component).
75
F IGURE 4.3.3. Textual view of call graph
4.3.2. View. There are two views of the call graph. Figure 4.3.2 shows the graphical view and figure 4.3.3 shows the textual view. The graphical view is displayed using the JUNG framework [OFWB03]. Both views are observers of the temporary call graph that is computed by the hammock graph / slicing module and are dynamically updated as the model changes. Both views are browsable; the pane on the left contains the set of methods that can be reached from the method that is currently under inspection (this is also updated to reflect the model). If a call belongs to the graph that is induced by the selection of landmark methods, it is highlighted in yellow (this will be illustrated in section 4.4). Calls that do not belong to the reduced call graph are not removed from the list in the textual call graph, so that the inspector can add landmark methods if they do not already belong to the existing graph.
4.3.3. Controller. There are three points at which the user interacts with the system: Selecting the files that are to be analysed, selecting the landmark methods and browsing the call 76
F IGURE 4.3.4. Selecting files to be analysed
F IGURE 4.4.1. Simple Sequence Diagram graph. Figure 4.3.4 shows the selection of files for analysis, which is carried out via the standard Java Swing file selection interface. Landmark methods can be selected via a dialogue-box that contains a list of all methods belonging to the application that is being analysed, or via the source code browser by selecting one of the methods in the list of successor methods and selecting the “add as landmark” button, shown in figure 4.3.3. The call graph can be browsed via either of the views, by selecting a method from the list of successor methods and pressing “forward”, or by pressing “back” to backtrack.
4.4. Using the Call Graph to Guide the Inspector: A Case Study This section provides an illustrated example of how to apply the technique, using the tool that was described in the previous section. The “check in to reserved room” use case from the Hotel System is used for the example. As stated in section 4.3.3 landmark methods can be specified in two modes: all at once or iteratively through the textual call graph browser. This section illustrates their selection through the browser. 77
F IGURE 4.4.2. Actual Sequence Diagram
This process does not rely on a complete and precise specification. Figure 4.4.1 shows a simple sequence diagram of the use case that may be presented to the inspector. Figure 4.4.2 shows the complete sequence diagram that contains all of the calls that are invoked for this use case. This section illustrates how calls corresponding to those in the sequence diagram in figure 4.4.2 can be obtained by using the simple diagram in figure 4.4.1. From the simple sequence diagram the inspector can surmise that the use case that is being inspected must involve the execution of the methods main, checkinReserved and occupy. Figure 4.4.3 shows the selection of these methods as landmarks in a stepwise manner. The call graph in (a) simply specifies that the
main
method must be executed (as this is the entry method
for the call graph it contains every possible path through the system). (b) shows the graph 78
after the
Room.occupy()
method is selected (the last method called in the simplified sequence
diagram). Any methods that belong to the resulting call graph are highlighted in yellow. (c) shows the call graph after
Hotel.checkinReserved
is added. This final graph corresponds to
the use case illustration in figure 4.2.4 and the sequence diagram in figure 4.4.2. 4.5. Conclusions This chapter has introduced the technique used to focus the call graph on a selection of calls and methods that are of particular relevance to a use case. A number of issues arise when using dynamic analysis, particularly with software inspections, where the system may not even be executable. For this reason landmark methods are used as a substitute for a dynamic trace. Because they are more light-weight than, the inspector has more flexibility to focus on particular areas of execution. Essentially the approach allows the inspector to select (landmark) methods, which are nodes on the call graph, that they know must be executed as part of the use case. The relevant nodes and edges in the call graph are computed in two ‘sweeps’. In the first sweep direct paths connecting landmark methods in the call graph are identified (these subgraphs take the form of hammock graphs). In isolation these hammock graphs do not suffice for a thorough inspection of the use case, which may involve the execution of other paths that do not necessarily belong to direct paths. For this reason slices are computed on the call sites for calls belonging to the direct paths. These slices identify the statements that can influence the behaviour of methods belonging to the direct paths. Slices may include call sites for calls that do not already belong to hammock graphs. If this is the case, paths that start from these call sites belonging to the slices are added to the code to be inspected. This approach has been implemented as a proof of concept. A description is provided, detailing the various components and how they interact with each other. Details are provided on how the underlying model is constructed using the Soot framework and how the resulting graph is displayed graphically and textually. Finally a small case study is presented, illustrating how the tool might be used in practice. A more in-depth analysis of the tools usability and some of the practical issues that can arise will be provided in the following chapter.
79
(a)
(b)
(c) (a) HotelSystem.main 102 methods 178 calls
(b) HotelSystem.main Room.occupy 30 methods 48 calls
F IGURE 4.4.3. Iterative selection of landmark methods
80
(c) HotelSystem.main Room.occupy Hotel.checkinReserved 19 methods 19 calls
CHAPTER 5
Evaluation 5.1. Introduction The technique introduced in the previous chapter attempts to estimate the set of method calls in the system that can be considered as relevant to a given use-case or scenario. Its success depends on a variety of factors: The type of information required by the inspector, the amount of irrelevant material it returns, the amount of relevant material it omits and its scalability. The information that is produced also has to be presented to the inspector in a practical manner. This chapter presents an evaluation of the technique and its implementation and is composed of four parts. The first part involves the automatic computation of landmark combinations, with the aim of establishing whether it is feasible to produce landmark method combinations that produce a high level of precision and recall [vR79]. It also seeks to identify the properties of those combinations that produce particularly favourable and unfavourable results. The second part is focussed on the manual identification of landmark methods. Its aim is to determine whether it is possible to manually identify landmark method combinations that produce good results and to glean some insight into how the technique could be improved. The third part discusses the usefulness and readability of the code that is returned. The final part evaluates the tool implementation against a set of established comprehension tool criteria.
5.2. Study of Landmark Methods This study aims to examine the properties of landmark method combinations that produce a set of call graph edges that is complete and relevant to the inspector. Results are based on four use-cases for two different systems. For each use-case a set of all possible landmark method combinations (consisting of one, two and three methods) was automatically generated. The set of call graph edges that were produced by each combination was (automatically) compared against a set of edges that was compiled manually using expert knowledge of the systems. This resulted in a set of precision-recall plots (detailed in section 5.2.3), which could be used to identify the combinations of methods that were particularly effective or ineffective. 81
Hotel System
Number of direct children per class Depth of inheritance tree
JHotDraw
Number of direct children per class Depth of inheritance tree
Total Occurrences 2
Mean 0.14 1.45
Std. Deviation 0.36 0.93
Max. 1 4
95
0.7 2.57
1.95 1.25
12 6
F IGURE 5.2.1. Inheritance metrics for Hotel System and JHotDraw 5.2.1. Generating Landmark Method Combinations. For each system sequence diagrams were constructed that are representative of the key system use-cases. These are based on expert and developer knowledge. Each sequence diagram was traced to the statically reverseengineered call graph. Whenever the sequence diagram was not detailed enough, the source code was manually scrutinised to identify the set of relevant calls. This set of relevant edges, along with domain knowledge of the system, was used to identify combinations of candidate landmark methods. To aid with the evaluation, the tool (see section 4.3) has been equipped with a component that automatically produces landmark combinations and evaluates the results with respect to a set of relevant call graph edges. Given a call graph for a system, the tool allows the user to select a subset of the call graph edges that are relevant. For this set of edges it automatically produces every possible combination of n methods, where n is also specified by the user. Depending on the size of n and the number of methods in the set of relevant edges this can result in a large number of combinations. The number of combinations of n objects selected from a total of m objects is calculated as Cnm =
m! n!(m−n)! .
If for example a use-case contains just 32 relevant
methods, there are 4960 possible combinations of three methods. 5.2.2. Software Systems. The study is based on two Java applications. JHotDraw1 is a framework for the construction of drawing editors (JHotDraw use cases are based on the Pert drawing tool, a sample JHotDraw application). Hotel System is a small application developed as part of an undergraduate programming course work. There are two reasons for choosing these systems. The main reason is that expert knowledge on both systems is readily available via colleagues who have studied or programmed the systems. The second reason is that they are both very different. The Hotel System is a small self-contained application, consisting of 13 classes, 280 methods and 1213 lines of code. JHotDraw is much larger and more complex to read from a static perspective. It consists of 136 classes, 1079 methods and 4558 lines of code2. The most significant difference between the two 1http://www.jhotdraw.org
2These metrics were obtained by using the Eclipse Metrics plug-in (http://metrics.sourceforge.net). 82
systems is the use of inheritance, which plays a central rôle in JHotDraw. Inheritance metrics for the two systems are shown in figure 5.2.1 (the depth of inheritance includes Java library types). 5.2.3. Measuring Precision and Recall. Precision and recall [vR79] are commonly used to measure the performance of information retrieval techniques. Precision measures the proportion of retrieved material that is relevant and recall measures the proportion of relevant material that is retrieved. Given a set of documents that are considered relevant (in this case the call graph edges that were manually identified for each use case) and a set of retrieved documents (the edges returned), they are calculated as follows:
P recision =
Recall =
N umber of relevant edges retrieved T otal number of edges retrieved
N umber of relevant edges retrieved T otal number of relevant edges
To enable the calculation of precision and recall for the landmark method technique the tool includes a component that enables the user to specify which call graph edges are considered relevant (this set is denoted as REL). The set of edges returned by a single query (in the form of a landmark combination) is denoted as RET . To compute precision and recall in terms of the above definitions, the number of relevant calls retrieved can be defined as |REL ∩ RET |. This means that, in terms of sets, precision and recall can be computed as follows:
P recision =
Recall =
|REL ∩ RET | |RET |
|REL ∩ RET | |REL|
Precision and recall results can be visualised as a chart where the x-axis denotes the recall value and the y-axis denotes precision. Because multiple landmark method combinations can produce the same coordinates, conventional scatter plots do not suffice. For this reason bubble charts were used, where the size of the bubble varies depending on the number of combinations that correspond to a given coordinate. As an example, Figure 5.2.2 shows the bubble charts for two Hotel System use cases and figure 5.2.3 shows the JHotDraw use case bubble charts. The charts graphically display the distribution of the values of precision and recall, where a cluster of small bubbles or a single large 83
1 Method
1 Method 1.2
1
1
0.8
0.8
Precision
Precision
1.2
0.6 0.4
0.4 0.2
0.2
0 -0.2 0
0 -0.2
0.6
0
0.2
0.4
0.6
0.8
1
1.2
0.2
0.4
Recall
1 0.8 Precision
Precision
1
0.6 0.4 0.2
1.2
0.8
1
1.2
0.8
1
1.2
0.6 0.4 0.2
0
0
0
0.2
0.4
0.6
0.8
1
1.2
-0.2
0
0.2
0.4
Recall
0.6 Recall
3 Methods
3 Methods 1.2
1.2
1
1
Precision
0.8
Precision
1
1.2
0.8
0.6 0.4 0.2
0.8 0.6 0.4 0.2
0 -0.2
0.8
2 Methods
2 Methods 1.2
-0.2
0.6 Recall
0
0
0.2
0.4
0.6
0.8
1
1.2
-0.2
Recall
0
0.2
0.4
0.6 Recall
F IGURE 5.2.2. Landmark combinations for “display room” use-case (left) and “add charge” use-case (right) in Hotel System bubble indicates a combination of precision and recall that was produced by multiple landmark method selections. As an example, in the ”add charge” bubble plots in figure 5.2.2 a large number of combinations produce full recall and precision, resulting in a large bubble at the top-right of the graph. 5.2.4. Results . This section provides a high-level summary of the results. The precision and recall results for each landmark method combination are displayed as bubble charts (see previous section). These are shown in figures 5.2.2 and 5.2.3. The independent distributions of precision and recall are displayed as box plots in figure 5.2.4. Each box plot summarises a bubble chart. As an example, the distribution of precision values for the bubble chart showing combinations of three landmark methods for the ”display room” use case in figure 5.2.2 are shown in plot ‘P3’ in the ”display room” box plots in the left column. Figure 5.2.2 shows precision and recall results for the landmark combinations for the two Hotel System use cases. For the ”display room” use case, results for a single landmark method are clustered around the 60% precision, 40% recall area, with outlying results producing either high precision and high recall, low precision and high recall, or low recall and high precision. 84
The concentration around 60% precision and 40% recall is however what is reflected in the ‘P1’ precision box plot and the ‘R1’ recall box plot in figure 5.2.4. As more landmark methods are added the number of outlying precision and recall results at the extremities of the chart decreases, and a larger number of the combinations result in a lower precision and higher recall. This is again reflected in the box plots, where the reduction in precision can be observed in plots P2 and P3, and the rise in recall can be observed in plots R2 and R3. The high recall suggests that increasing the number of landmark methods leads to the inclusion of a higher number of call graph edges, but the lower recall suggests that many of these edges are irrelevant. The removal of precision and recall results at the extremities of the chart suggests that, as more landmark methods are selected, fewer combinations result in: (a) a very small number of relevant edges (this leads to high precision and low recall) and (b) the entire call graph (resulting in high recall and very low precision). The bubble charts for the ”add charge” use case (on the right in figure 5.2.2) contrast with the ”display room” charts. There are fewer combinations (456 combinations of three landmark methods as opposed to 3655 combinations of three landmark methods for the ”display room” use case). Many combinations produce 100% precision and recall. Recall is high in the vast majority of cases, but there is a small number of combinations that produce a low value of precision. From the box plots for ”add charge”, it becomes obvious that the recall again improves as landmark methods are added. Although the distance between the upper and lower quartiles is consistently large for precision, the median drops sharply as soon as further landmark methods are added, indicating a drop in precision. Figure 5.2.3 shows precision and recall bubble charts for the two JHotDraw use cases (”select tool” on the left and ”select figure” on the right). The distributions of precision and recall are again summarised in box plots in figure 5.2.4. The ”select tool” results indicate a consistently high recall (as reflected in the box plots), but also indicate a relatively low precision. As further landmark methods are added, there is however a slight increase in precision. The ”select figure” results contrast those from the ”select tool” use case. There are 3655 combinations of three landmark methods as opposed to just 561 combinations for ”select tool”. It is apparent both from the bubble chart and the box plot that the precision results are consistently low. What is less obvious on the bubble chart but is clarified in the box plot is that most of the results actually produce a high recall. This may not be apparent due to the fact that there are several large bubbles (denoting multiple results that produced the same combination 85
1 Method 1 Method
1
1.2 1
0.6
Precision
Precision
0.8
0.4 0.2 0
0.8 0.6 0.4 0.2 0
0
0.2
0.4
0.6
0.8
1
1.2
-0.2
-0.2
0
0.2
0.4
Recall 2 Methods
1.2
Precision
Precision
1.2
0.8
1
1.2
0.8
1
1.2
1
0.6 0.4 0.2 0
0.8 0.6 0.4 0.2 0
0
0.2
0.4
0.6
0.8
1
1.2
-0.2
0
0.2
0.4
Recall
0.6
Recall
3 Methods
1.2
3 Methods
1.2
1
1
0.8
0.8
Precision
Precision
1
2 Methods
0.8
0.6 0.4 0.2 0 -0.2
0.8
1.2
1
-0.2
0.6
Recall
0.6 0.4 0.2 0
0
0.2
0.4
0.6
0.8
1
1.2
-0.2
0
0.2
0.4
0.6
Recall
Recall
F IGURE 5.2.3. Landmark combinations for “select tool” use case (left) and “select figure” (right) use-case in JHotDraw of precision and recall) which are covered up by multiple smaller bubbles and are hence not visible.
5.2.5. Interpretation. The bubble charts suggest that, although the selection of landmark methods has a significant effect on the precision and recall of the results, the results also vary according to the nature of the call graph that is being analysed. In figure 5.2.2 for example two different use cases of the same system produce completely different results, primarily because their call graphs are completely different (both in terms of size and taxonomy). This is elaborated further in the ‘discussions’ section of the next experiment 5.3.3. The purpose of this study is to establish the properties of particularly successful landmark method combinations (where successful combinations produce both a high recall and precision). Having identified which landmark method combinations are successful and unsuccessful (these are at the upper-right and lower-left extremes of the bubble charts respectively) this section presents the following three observations, which should aid the inspector in the identification of successful landmark method combinations: 86
Precision
Recall
1.2
1.2
1.0
1.0
.8
.8
.6
.6
.4
.4
.2
.2
0.0
0.0 N=
9
24
32
P1
P2
P3
N=
2
11
7
19
29
9
24
32
R1
R2
R3
“add charge” 1.2
1.2
1.0
1.0
2 16 15
21 29 90 108 122 17
22
64 41 134
114 411 136 122 96 210 158
.8 .8 .6 .6 .4 .4 .2
.2
0.0
0.0
-.2 N=
23
148
413
P1
P2
P3
N=
23
148
413
R1
R2
R3
“display room” 1.2
1.2
1.0
1.0
.8 .8 .6 .6 .4 .4 .2 2
.2
0.0
-.2
0.0 N=
7
18
33
P1
P2
P3
N=
7
18
33
R1
R2
R3
“select tool” 1.2
1.2
3 19
1.0
.8 14
175 189 16 174
215 430 320 595
1.0
161 176
746 304
.8
180
.6
.6
.4
.4
.2
20 75 114 1 26 4 109 186 89 147 17 154 22 187 19 23 2 66 73 150 85 47 61 153 119 18 10 158 125
37 155 200
434 60 295 179 213 674 548 363 657 612
161 174 176 189 175 16 180
746 430 304 215 595 320
.2
231 633 244 71 514 579 606 219 15 94 249 386 697 424 246 711 647 262 267 614 147 102 749 455 67 195 68 234 1 23 411 248 543 453 706 648 148 217 61 32 693 532 216 202 243 686 568 208 31 85 233 38 336 72 630 146 167 723 731 270 87 715 211 354 692 616 558 467 611 75 112 192 627 166 4 247 744 103 463 33 652 2 155 238 626 14 645 183 171 223 344 57 521 480 253 30 220 88 241 554 662 25 144 76 258 56 366 239 598 725 547 404 119 388 34 48
19 3 14
0.0
0.0
-.2
-.2 N=
25
202
750
P1
P2
P3
N=
25
202
750
R1
R2
R3
“select figure” F IGURE 5.2.4. Box plots of precision and recall all four use cases
87
(1) Increasing the number of landmark methods tends to produce a higher recall but can compromise precision. (2) Landmark methods can be used to reduce the number of candidate destinations for polymorphic calls. (3) The context in which landmark methods are executed needs to be taken into consideration when they are selected. These observations are elaborated in sections 5.2.5.2, 5.2.5.3 and 5.2.5.4. First of all the following section contains two cautionary remarks that must be taken into account when interpreting the precision and recall results. 5.2.5.1. Interpreting the precision and recall results. It is important that precision and recall are not solely considered independently of each other (as is the case in the box plots). The box plots indicate that precision tends to decrease as further landmark methods are added. This is however potentially misleading as the bubble plots in figures 5.2.2 and 5.2.3 indicate that, despite an increase in the number of combinations that produce lower precision, there is also a distinct increase in the number of combinations that produce both high precision and recall. This is not evident in the box plots because the increase in good combinations is outweighed by the overall decrease in precision. Precision and recall values do not accurately convey the extent to which the size of the program is reduced. Looking at the combinations of three landmark methods for the “select figure” use case in figure 5.2.3, the precision is relatively low. The total number of call graph edges is 1340 and 101 of these edges were deemed relevant. One of the combinations produced 100% recall and only 16% precision. If it is taken into consideration that the number of edges returned by the combination is 621, the number of edges is still reduced by 54%. Even though the graph suggests that the user is left with a large number of superfluous method calls, there is still a considerable reduction occurring. 5.2.5.2. Increasing the number of landmark methods tends to produce a higher recall but can compromise precision. Recall results for all use cases improved as the number of landmark methods per combination increased. This is particularly evident in the box plots in figure 5.2.4, where median, lower quartile and upper quartile values all increased as further landmark methods are added. This is more explicit in the (top two) Hotel System use cases than in JHotDraw. Unfortunately this increase in recall tends to result in a corresponding drop in precision. As an example, if we look at the clear increase in recall that is evident in bar charts ‘R1’ and ‘R2’ for the ”display room” use case, there is a corresponding drop in bar charts ‘P1’ and ‘P2’. 88
CreationTool mouseDown();
superfluous edges: relevant edge:
TextTool public void mousePressed(){ ... x.mouseDown(); ... }
mouseDown(){ ... super.mouseDown(); ... }
ConnectedTextTool mouseDown(){ ... super.mouseDown(); ... }
F IGURE 5.2.5. Polymorphic methods can form a hammock by calling the implementation they are overriding This suggests that, in general, adding extra landmark methods results in the inclusion of more call graph edges. Although many of these additional edges may be relevant (resulting in an increase in recall), many are also irrelevant, causing the drop in precision. The “select tool” use case is the only anomaly to this trend, where precision improves as the number of landmark methods increases. An analysis of the call graph characteristics (carried out as part of the next study - see section 5.2) suggests that this increase in precision is due to the fact that the connectivity of the call graph is relatively low. This means that those parts of the landmark method algorithm that trace along all of the paths in the call graph from a particular point do not end up including as many irrelevant edges (as is the case with highly connected call graphs). 5.2.5.3. Landmark methods can be used to reduce the number of candidate destinations for polymorphic calls. One of the main reasons for low precision is that the technique relies on static analysis, which is inherently conservative. The static object-oriented call graph can contain multiple call edges for a single call site (this is discussed in section 4.3.1.1). The destination of polymorphic calls can only be determined at run-time when the type of the class containing the destination method is known. The hypothesis was that landmark methods could be used to address this problem by manually determining which method would be executed, eliminating redundant polymorphic calls. We investigated this by analysing the results for the two JHotDraw use-cases, which rely heavily on polymorphism. Whilst the number of polymorphic calls can be substantially reduced, there are certain instances that cannot be removed despite their irrelevance to the use-case. The destinations of polymorphic calls that cannot be eliminated are those that contain calls to the methods they are overriding (by calling super.x(), where x is the overridden method). An example is shown in figure 5.2.5. Dotted lines show edges in the call graph. If CreationTool.mouseDown() 89
is marked as a landmark method, the hammock graph is produced by including the intersection of all call graph edges that succeed mousePressed() and calls that reach CreationTool.mouseDown(). This will include the two methods that override CreationTool.mouseDown() (TextTool.mouseDown() and ConnectedTextTool.mouseDown()) because they too call CreationTool.mouseDown() via a call to super().mouseDown(), where super() refers to the super class (the base class for the method that is currently being executed). Tracing the additional call sites that belong to slices for these two additional calls may result in the inclusion of multiple superfluous call edges. To investigate the effect of polymorphism on the accuracy of the landmark method results, the calls belonging to one JHotDraw use case were analysed, to determine how many of them are a result of polymorphism. For one JHotDraw use-case there were eight polymorphic calls, with a combined total of 65 candidate destinations. Ideally the total number of candidate destinations would be reduced to eight (one per call). For each polymorphic call the ‘correct’ destination was selected as a landmark method. The resulting call graph reduced the total number of polymorphic candidates to 29. For every polymorphic call the number of candidates was reduced by an average of 56%. Superfluous polymorphic calls were always included as a result of the process explained above. Call sites that spawn polymorphic calls are easy to identify in call graphs. In Soot [VCG+ 99] for example, a call site can be deemed polymorphic if the vector containing the outgoing edges is greater than one. This information could easily be used as guidance for the inspector, where polymorphic calls are flagged up so that the inspector knows where landmark methods are paricularly necessary. 5.2.5.4. The context in which landmark methods are executed needs to be taken into consideration when they are selected.
A large proportion of combinations produced a low recall. This is
particularly unsuitable if the approach is to be used for code inspections, because it can lead to the omission of potentially important methods and method calls. Combinations that resulted in a low recall were scrutinised to determine under what circumstances this occurred. In object-oriented systems it is often the case that a method is executed multiple times in different execution contexts. Depending on the context in which it is executed it may behave differently. An example is shown in figure 5.2.6, which contains the source code for the AbstractCommand.viewSelectionChanged
tor has already identified
in JHotDraw. For the sake of illustration, the inspec-
StandardDrawingView.addFigureSelectionListener
as a landmark
method. This method is called at line 6 in the viewSelectionChanged method. If viewSelectionChanged is only called once with a null
oldView
object and non-null 90
newView
object, then no necessary
1
public void viewSelectionChanged(DrawingView oldView, rawingView D newView) {
2 3
if (oldView != null) { oldView.removeFigureSelectionListener(this);
4
}
5 6
if (newView != null) { newView.addFigureSelectionListener(this);
7
}
8 9
... }
F IGURE 5.2.6.
AbstractCommand.viewSelectionChanged method
method calls will be omitted. If however oldView
viewSelectionChanged
from JHotDraw
is called multiple times, and
is not always null, the call to removeFigureSelectionListener that should be invoked
when oldView is not null will only be included if removeFigureSelectionListener is itself also a landmark method (as it is not included in the backward slice from line 6).
By selecting a landmark method the user is stating that this method is definitely executed. The slicing step is designed to identify only those method calls that are relevant only to the call to the landmark method. Other calls that are not relevant to that specific call (but may still be relevant to the use case as a whole) may be omitted. It is situations like this that make it very difficult to apply this technique in a single step (i.e. selecting all of the relevant landmark methods at once). To ensure that these circumstances are identified, it could be better to adopt a stepwise approach, where each step refines the selection of landmark methods. With respect to figure 5.2.6 for example, the inspector may initially only select addFigureSelectionListener
as a landmark method and, upon inspecting the resulting call
graph realise that removeFigureSelectionListener is in fact also a landmark method.
A high level of recall is also guaranteed if every relevant independent thread of execution that stems from a given logic branch is marked by a single landmark method. As an example a method could contain a large switch clause, where each branch contains a method call that executes an independent system function (see the
go()
method in appendix B). By selecting
the destinations of the relevant method calls in the switch clause as landmark methods, a high level of recall is guaranteed (however with the potential for low levels of precision because it ends up tracing all calls that are spawned by every branch). 91
5.3. Selection of Landmark Methods The previous section established that, although it is possible to produce a useful3 subgraph of the call graph, the success of the technique rests on a the strategy that is used for selecting combinations of landmark methods. Suitable and unsuitable combinations were identified by computing every possible combination of a given number of landmark methods for two systems. It identified some of the properties that contribute to the selection of successful landmark method combinations. In practice an inspector will only be able to rely on knowledge of the system that is provided by its documentation or that can be inferred from its structure (i.e. abstract classes, interfaces and polymorphic call sites). This section evaluates the results produced by the focussed selection of landmark methods that are identified in this manner, with the aim of using them to devise a more concrete set of guidelines for the identification of potential landmarks. This part of the evaluation consists of two case studies of open-source systems (FreeMind and NanoXML). As was the case in the previous section, for each system two use cases were analysed and the relevant calls for each use case were identified. However this time landmark methods were chosen manually, using intuition and expertise, instead of using an automated approach. A method was considered to be a potential landmark if (a) its name suggested that it plays a key rôle in the use-case under inspection or (b) it is declared by an interface or an abstract class. If a method is declared by an abstract class4 or an interface it must be implemented by any implementation that either derives from the abstract class or implements the interface. This suggests that these methods must be in some way instrumental to the behaviour that is specified by abstract classes and interfaces. These methods are therefore suitable candidates for landmark methods. It is interesting to note that, although the system documentation and class interfaces played an important part in identifying the key landmark methods, it is clear that they are not sufficient to use as the sole basis for the selection of landmark methods. In both case studies the initially identified landmark methods were used to initially trim down the call graph. Several others 3A graph is considered useful if it produces a high precision and recall. The actual usefulness of the code is explored
later. 4A class is abstract if it contains any abstract methods (methods that have no method bodies). Abstract classes can not be instantiated. 92
F IGURE 5.3.1. “Java Browsing” perspective in Eclipse were only identified after an analysis of the trimmed call graph, where the process of reading through the source code and understanding its behaviour resulted in the identification of further landmark methods. Once a suitable method was identified, the class hierarchy was examined to find out any implementations by sub-classes that are likely to be instantiated during the execution of the specific use case. If it was likely that a subclass could be instantiated at runtime, the implementation belonging to the subclass was also used as a landmark method (this is elaborated in section 5.2.5.3). Navigating a system in this manner is straightforward in the “Java Browsing” perspective in Eclipse (shown in figure 5.3.1). To determine which call graph edges are relevant to a particular use-case or scenario the systems were profiled extensively (this step had been carried out manually in the previous experiment). The profiler used in this exercise was the Eclipse Test & Performance Tools Platform (TPTP)5, which has an integrated profiler. To obtain something approaching complete coverage, an extensive range of inputs were used for the profiling (this was reasonably straightforward because both case studies are relatively uncomplicated). 5http://www.eclipse.org/tptp 93
F IGURE 5.3.2. FreeMind 5.3.1. FreeMind Case Study. FreeMind6 is a mind mapping tool that can be used to graphically represent the links between particular notions and ideas (referred to as a “mind map”), as shown in figure 5.3.2. It was chosen for analysis because it is popular, under active development (according to Sourceforge development statistics7) and is relatively mature (established in 2000). It is a particularly interesting system with respect to this evaluation because it has very little in the way of system documentation, forcing the inspector to rely on domain and structural knowledge to identify landmark methods. The tool can operate in three modes (the system has been designed to be extensible, facilitating the addition of more modes at a later stage). The three current modes that are available are File mode (the file system is represented as a mind map), Mindmap mode (an editable mind map) and Browse mode (the mind map can be browsed but not edited). The FreeMind documentation8 states that it implements the Model View Controller (MVC) architecture [SG96] (see section 4.3). The responsibility of a class (whether it contributes to the model, view or controller) can be derived from its name. Controller classes end with “controller”, view classes belong to 6http://freemind.sourceforge.net
7This is at the time of writing (08/08/05), the statistics used to compute the ranking can be found at:
http://sourceforge.net/docman/display_doc.php?docid=14040&group_id=1 8http://freemind.sourceforge.net/javadoc/index.html 94
the freemind.view package and other classes (belonging to the freemind.modes.* packages) are presumed to contribute to the models that correspond to the various modes. The main FreeMind use cases become evident from a glance at the instructions9: “Create a new map by opening the File menu and clicking New. FreeMind will create a new screen, with an oval in the center labeled ’New mindmap’. This is the root node. You will build your map by adding nodes to the root.”. This process can be divided into two use cases: creating a new mind map and inserting a new node. 5.3.1.1. “New mind map” use case. A new mind map is generated by selecting ‘New’ from the file menu. This primarily involves setting up a new model of the mind map and inserting the default root node. The subgraph of the call graph that would be considered relevant to an inspection (the set of calls that were executed during profiling) contains 166 edges and 130 nodes. Landmark methods were again determined by browsing the interfaces and abstract classes of the system. The landmark method choices are presented below: •
freemind.modes.ControllerAdapter$NewMapAction.actio nPerformed(ActionEvent): Iden-
tifying the entry point for the call graph is straightforward, because there is an internal class in the ControllerAdapter class called NewMapAction. Its name intuitively suggests that it is responsible for handling the user input that constitutes the generation of a new map. The call graph that is generated as a result of using this method as the entry point contains 692 edges and 343 nodes. (1)
freemind.modes.ControllerAdapter.newMap(MindMap) : The
name of the method sug-
gests that it is used for the generation of a new mind map. None of the subclasses of ControllerAdapter provide an overriding implementation, so it makes sense to assume that this method is executed by any controller when the use case is executed. (2)
freemind.modes.mindmapmode.MindMapController.newMod el():
ments state: “You
The source code com-
_must_ implement this if you use one of the following actions:
OpenAction, NewMapAction”.
As this use case uses the
NewMapAction
class, it follows
that this method must be a landmark method. (3)
freemind.modes.MapAdapter.MapAdapter(FreeMindMain) : The MapAdapter
ments the
MindMap
class imple-
interface, which specifies the functionality of the internal model
representing the mind map. When a new mind map is generated, it presumably involves the instantiation of a new mind map representation. This is achieved by calling the MapAdapter constructor. 9http://freemind.gardennanny.com/ 95
Combination
Methods
Precision
Recall
Edges
Nodes
% reduction of edges
(a)
1
0.20
0.81
624
316
9.9
% reduction of nodes 7.9
(b)
2
0.47
0.14
48
40
93.1
88.4
(c)
3
0.48
0.14
34
27
95.1
92.1
(d)
4
0.20
0.39
307
167
55.7
51.4
(e)
5
0.47
0.14
48
40
93.1
88.4
(f)
6
0.26
0.16
98
61
85.9
82.2
(g)
7
0.17
0.6
562
272
18.8
79.3
(h)
1,2,3,4,5,6,7
0.18
0.33
284
152
41
44.3
(i)
1,2
0.23
0.93
656
333
5.3
3
(j)
3,4,5
0.20
0.39
307
167
55.7
51.4
(k)
6,7
0.17
0.61
565
274
18.4
20.2
(l)
1,2,3,4,5
0.20
0.39
310
168
55.3
51.1
(m)
3,4,5,6,7
0.20
0.39
281
151
59.4
56
(n)
1,2,6,7
0.17
0.61
568
275
18
19.9
F IGURE 5.3.3. Summary of results for “new mind map” use case (numbers correspond to the enumeration of candidate landmark methods) (4)
freemind.modes.mindmapmode.MindMapNodeModel.MindMap NodeModel(FreeMindMain): By
default a new mind map model is generated with the root node already in place. The addition of the root node must involve the instantiation of an object representing a node. This is achieved by calling the MindMapNodeModel constructor. (5)
freemind.modes.mindmapmode.MindMapMapModel.MindMapM apModel(MindMapNodeModel , FreeMindMain): MapAdapter is
an abstract class for providing the default functional-
ity for all classes that represent the mind map model in the specific FreeMind modes. The class that is responsible for representing the model in the “mind map” mode is the
MindMapMapModel
class. When a new mind map representation is generated, its
constructor will be called before it calls the constructor of its super class, which is MapAdapter.MapAdapter (landmark method
(6)
(3)).
freemind.view.MapModule.MapModule(MindMap, MapView, ode) M :
the MapModule class is “The map”
According to the JavaDocs,
key to one Model/View bundle which represents one
it is assumed that the generation of a new mind map involves creating a new
instance of this class to represent the corresponding new model and view. (7)
freemind.view.MapView.MapView(MindMap, Controller) : The MapView
class represents
the view of a mind map and is instantiated along with a new mind map model. Figure 5.3.3 shows the results that were produced by individual landmark methods and combinations for the “new mind map” use case. The charts in figure 5.3.4 represent the precision - recall results and the reductions in terms of edges and nodes for the various combinations. Combinations (a) - (g) show the results obtained by using the candidate landmark methods 96
Precision and Recall
Precision Recall
% edge reduction % node reduction
Size reductions
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
100 90
% reduction
80 70 60 50 40 30 20 10 0
(a) (b) (c) (d) (e) (f) (g) (h) (i)
(j) (k) (l) (m) (n)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
Combination
Combination
F IGURE 5.3.4. Precision-recall and size reduction for landmark method combinations in FreeMind “New mind map” use case
by themselves. This is to establish whether or not the rest of the combinations (consisting of multiple landmark methods) always improve on the results obtained by a single landmark method. Combination (h) is the result obtained by using all of the landmark methods. The rest of the combinations are based on the method’s respective rôles in the MVC architecture. Combination (i) consists of the methods that are responsible for the controller component, (j) combines the model methods and (k) combines the view methods methods. Combinations (l), (m) and (n) consist of the controller and model methods, controller and view methods and the model and view methods respectively. The results are discussed in section 5.3.1.3. 5.3.1.2. “Add mind map node” use case. This use-case consists of adding a child node to the root node of a mind map. It consists of the user right-clicking on the canvas, selecting “add child node”, and entering a name for the node. The subgraph of the call graph that would be considered relevant to an inspection (the set of calls that were executed during profiling) contains 149 edges and 85 nodes. By browsing the interfaces, abstract classes and the classes that implement them, seven candidate landmark methods were selected. The reasons for the landmark method selections are provided below:
•
freemind.modes.ControllerAdapter$NewChildAction.act ionPerformed(ActionEvent) : The
entry point to the call graph is a landmark method by default. The ControllerAdapter abstract class specifies the core functionality to be supplied by any controller class. This involves responding to events and invoking methods in the model or view as appropriate.
ControllerAdapter
contains a number of internal classes that represent
the user actions for which it is responsible. One of these is the NewChildAction class, which implements the
ActionEvent
interface, indicating that it is responsible for re-
sponding to the user action that constitutes the creation of a new child node. The call 97
graph that is generated as a result of using this method as the entry point contains 827 edges and 360 nodes.
(1)
freemind.modes.mindmapmode.MindMapController.newNod e(): The ControllerAdapter class,
according to its source code comments, should be used as a basis for controllers that are specific to the various FreeMind modes. of
ControllerAdapter
MindMapController
is a subclass
that is specifically responsible for controlling the mind map
model. The MindMapController.newNode() method, which is also declared in the ControllerAdapter
class, is a sensible choice as a landmark method because the use
case involves the insertion of a new node, and tying the implementation to the MindMapController newNode
class reduces the possibility of superfluously including the other
implementations (FileController.newNode and
BrowserController.newNode),
as discussed in section 5.2.5.3. (2)
freemind.modes.ControllerAdapter.addNew(NodeView, in t, KeyEvent): ControllerAdapter
class implements the
ModeController
ods specified by ModeController is the addNew(NodeView, controls the insertion of a new
MindMapNode
The
interface. One of the meth-
int, KeyEvent) method.
This
(obtained by calling the above method)
into the mind map. It is a suitable candidate for a landmark method because it is declared by an interface and intuitively provides an important element of functionality with respect to adding a new node. (3)
freemind.modes.MapAdapter.insertNodeInto(MutableTre eNode, MutableTreeNode, int) : MapAdapter
is the only class that implements the
MindMap
interface, so it presumably
plays an important rôle in governing the behaviour of the mind map model. It contains the
insertNodeInto
method that is responsible for inserting a node into the
model of the mind map. There is however another insertNodeInto method with different parameters (MapAdapter.insertNodeInto(MindMapNode,
parent)),
which would
seem an equally suitable candidate. Upon analysis of the source code, MapAdapter.insertNodeInto(MutableTreeNode, MutableTreeNode, int) was chosen be-
cause it is prefixed with the following comment: “//use
this method to add children
because it will cause the appropriate event”.
(4)
freemind.modes.MapAdapter.nodesWereInserted(TreeNod e, int[]):
declared by
MapAdapter,
This method, also
was also chosen because its name suggests that it will be
called once a node has been inserted into the mind map. 98
Combination
Methods
Precision
Recall
Edges
Nodes
% reduction of edges
% reduction of nodes
(a)
1
0.15
0.47
477
255
42.4
29.2
(b)
2
0.17
0.77
826
359
0.2
0.3
(c)
3
0.18
1
679
318
17.9
11.7
(d)
4
0.18
1
677
318
18.1
11.7
(e)
5
0.18
1
525
269
36.6
25.3
(f)
6
0.17
0.77
418
218
49.5
39.5
(g)
7
0.17
0.77
683
324
17.5
10
(h)
1,2,3,4,5,6,7
0.18
0.79
659
315
20.4
12.5
(i)
1,2
0.15
0.47
477
255
42.4
29.2
(j)
3,4,5
0.19
0.86
677
318
18.4
11.7
(k)
6,7
0.17
0.80
689
324
16.7
10
(l)
1,2,3,4,5
0.19
0.85
677
318
18.1
11.7
(m)
3,4,5,6,7
0.18
0.79
659
315
20.4
12.5
(n)
1,2,6,7
0.17
0.79
689
324
16.7
10
F IGURE 5.3.5. Summary of results for “add child node” use case (numbers correspond to the enumeration of candidate landmark methods)
(5)
freemind.modes.mindmapmode.MindMapNodeModel.( Object, FreeMindMain): Because
the use case involves the addition of a new node to the mind map, it must involve the instantiation of an object that represents the node being inserted. The MindMapNode interface defines a mind map node. This is implemented by the (abstract) NodeAdapter class and extended by the MindMapNodeModel class, which is specifically used to represent nodes when the system is in the ‘mindmap’ mode. Because this is the mode that is active when the use case is executed, the MindMapNodeModel constructor is a sensible choice for a landmark method. (6)
freemind.view.mindmapview.MindMapLayout.layout() : According
to the source code
comments and the name, this method is responsible for the layout of the mind map once a new node is inserted into the mind map. It also belongs to the view component, ensuring a distribution of landmark methods across the model, view and controller components. (7)
freemind.view.mindmapview.RootNodeView.insert(MindM apNode): NodeView is an abstract
class that determines how nodes are rendered. RootNodeView is a subclass that, besides being responsible for the root node in the mind map, also initiates the rendering of child nodes. This is carried out via the RootNodeView.insert(MindMapNode) method.
Figure 5.3.5 shows the results that were produced by individual landmark methods and combinations that were deemed suitable, based on their rôle in the MVC architecture. Figure 5.3.6 contains two charts that graphically represent the precision - recall results and the reductions 99
Precision Recall
Precision and Recall
% edge reduction % node reduction
Size reductions
1
100
0.9
90
0.8
80 70
% reduction
0.7 0.6 0.5 0.4
60 50 40
0.2
30 20
0.1
10
0.3
0
0 (a) (b) (c) (d) (e) (f) (g) (h) (i)
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n)
(j) (k) (l) (m) (n)
1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.2
0.4
0.6
0.8
1.0
F IGURE 5.3.6. Precision-recall and size reduction for landmark method combinations in Freemind “Add child node” use case
1
2
1
80 60 40 20 0
0
20
40
60
80
100
Recall
100
Precision
2
1
2
1
Reduction in the number of edges
2
Reduction in the number of nodes
F IGURE 5.3.7. Box plots comparing precision and recall from both use cases in terms of edges and nodes for the various combinations. The rationale for choosing particular landmark combinations is largely similar to the previous use case. Combinations were chosen for the same reasons as in the previous use-case (individually, cumulatively and based on their rôles in the MVC architecture). The results are discussed in the following section. 100
5.3.1.3. Results for the FreeMind use cases. The box plots in figure 5.3.7 summarise the results obtained for both use cases. Plots corresponding to the first use case are on the left and those for the second are on the right. In most cases there is a substantial difference between the precision and recall values. The recall was predominantly higher than precision, particularly in the second use case, where precision was consistently poor. In terms of both precision and recall, the first use case has a wider range of results than the first one. The landmark combinations are much more successful at reducing the size of the call graph for the first use case, both in terms of the number of edges and nodes. This correlates with the precision and recall results. Lower recall for many combinations in the first use case suggests that many of the combinations were too restrictive, leading to the omission of too many relevant method calls. Looking at the precision and recall results for individual combinations in figures 5.3.4 and 5.3.6 it becomes apparent that there exists a compromise between precision and recall. Combinations tend to either produce a lower precision and high recall or vice versa (as illustrated by combinations (e) and (i) in figure 5.3.4). Many combinations that produced a high recall also produced a substantial reduction in the size of the call graph. This is particularly evident in figure 5.3.6, where all of the results apart from combinations (a) and (i) provide a relatively high recall and some, particularly (a), (f) and (i), also produce a substantial reduction in the size of the call graph. In the ”Add child node” use case (figures 5.3.5 and 5.3.6) the precision values remain almost constant (at around 17%). None of the landmark combinations for this use case returned a particularly precise result. Most of the landmark combinations consistently produce a high recall and only a small reduction in the size of the call graph. Upon further analysis it was established that the hammock graphs between these methods tend to be relatively small. This suggests that the inclusion of paths that do not belong to direct paths between these methods (i.e. paths that do not belong to hammock graphs) can end up including substantial parts of the call graph that would be considered as irrelevant by the inspector. This point is discussed further in section 5.3.3. In table 5.3.3 method 1 produces low precision (0.20) and high recall (0.81) and method 2 produces a reasonable precision (0.47) and low recall (0.14). Ideally a combination of the two would result in an improvement in terms of their individual precision and recall results. However when they are combined in (i) this improvement does not occur. With a precision of 0.23 and a recall of 0.93, the recall is substantially improved from the two individual methods but 101
F IGURE 5.3.8. NanoXML structure (taken from the NanoXML manual) the precision remains low (lower than the precision produced when using method 2 by itself). Upon closer scrutiny, the two methods are adjacent on the call graph (the hammock graph that connects them consists of a single edge). Method 2 has no successors in the call graph, which means that all of the edges (including those that are superfluous and cause a low precision) are added by following paths from call sites belonging to slices. In table 5.3.5 methods 3, 4, and 5 individually score 1 for recall and 0.18 for precision. Ideally, by combining them, the recall should remain unaffected, and the precision should increase. In reality there is a significant decrease in recall (from 1 to 0.86) and only a negligable increase in precision (from 0.18 to 0.19). Upon closer analysis, the reason for this reduction in recall lies in the relative locations of the landmark methods in the call graph. Most of the relevant edges that are eliminated (causing the drop in recall) are left out during the computation of the hammock graphs. They are edges that belong to some path of execution that does not affect a landmark method and do not belong to the direct path between these methods. Hence they are not picked up by the hammocking or slicing process. 5.3.2. NanoXML Case Study. NanoXML10 is a small XML parser for Java. It is particularly popular because it is relatively small (only 6kb), and is therefore widely used in embedded systems. It was primarily chosen because it represents a completely different kind of system to FreeMind, which is much larger and GUI-driven. NanoXML is a component that provides a simple collection of functions for inputting, traversing and writing XML files. It is 10http://nanoxml.cyberelf.be 102
provided with a relatively comprehensive manual and JavaDocs11. An overview of its structure (provided in the manual) is shown in figure 5.3.8. The manual provides a substantial amount of sample code, which provides hints for identifying potential landmark methods. In addition to the manual, the “Java Browsing” perspective in Eclipse was again used to browse the interfaces and abstract classes to identify further potential landmarks. By its nature a parser has a very limited set of distinct use cases. The basic functionality lies in reading an XML data source (into an internal data tree model), and printing it to the screen. There are however two different approaches that can be used to achieve each of these. For the reading functionality the first approach is to process the XML source in a single step and to parse it afterwards. The second is to stream the source, parsing it as the file is read in. For the printing part, XML elements can either be printed as they are encountered, using the standard Java System.out.println method, or they can be pretty-printed using the XMLWriter class supplied with NanoXML. These two variations have been used to produce two scenarios of a single use case (parse and print XML). The first scenario prints XML using the conventional System.out.println method as it is parsed and the second scenario pretty-prints XML after it has been parsed into an object tree. Because there is an overlap between the two scenarios (they both read and parse an XML source in the same manner), there will be a corresponding overlap in the landmark methods that are responsible for identifying this functionality. These landmark methods are presented below: (1)
net.n3.nanoxml.XMLParserFactory.createDefaultXMLPar ser(): This method is respon-
sible for generating the parser object, which provides the core functionality of NanoXML. Because both scenarios use the parser, it can be assumed that this factory class is responsible for its instantiation in both cases. There are two createDefaultParser methods, one of which allows for an object tree builder to be specified as an argument. If the builder is not passed as an argument, the default builder (StdXMLBuilder) is instantiated. A custom builder can be then set via a mutator method at a later stage (this is done in the first scenario). (2)
net.n3.nanoxml.StdXMLReader.fileReader() : Because both scenarios obtain data from
an XML file, they both need an object representing the input stream. This is returned by the fileReader method. 11http://nanoxml.cyberelf.be/documentation.html 103
(3)
net.n3.nanoxml.StdXMLParser.setReader(IXMLReader) : This method is responsible for
setting the reader responsible for obtaining the XML data. (4)
net.n3.nanoxml.StdXMLParser.parse() : This
method is responsible for parsing the
data. 5.3.2.1. “Print XML data as it is parsed” scenario. This scenario prints using the conventional System.out.println method as it is read from the input. This involves the generation of a custom object tree builder, which must implement the
IXMLBuilder
interface (in this sce-
nario the builder is called MyBuilder). This class provides a set of methods that are responsible for handling various types of data obtained from the parser. Data is printed out as it is read in by simply inserting a
System.out.println
statement in each
MyBuilder
method that is re-
sponsible for handling data from the parser. The default object-tree builder carries out a more complex set of tasks (see the next scenario). Taking into account that this scenario uses the customised builder class, the MyBuilder methods that can be considered as landmarks (based on the JavaDoc documentation for the IXMLBuilder interface class) are shown below. •
StreamXML.main(String[]) :
The entry point for the call graph does not belong to
NanoXML itself. Because NanoXML is not a stand-alone application, it is invoked by an external, customised method (see appendix C for the source code). The call graph that is constructed by using this method as an entry point contains 161 nodes and 331 edges. (5)
MyBuilder.MyBuilder(): This
instantiates a customised XML data tree builder, which
implements the IXMLBuilder interface. (6)
MyBuilder.startBuilding(String,int) : This
method is called before the parser starts
processing its input. (7)
MyBuilder.startElement(String,String,String,String, int):
This method is called
every time a new XML element is encountered. (8)
MyBuilder.addAttribute(String,String,String,String, String): This method is called
every time an attribute of an XML element is encountered. Figure 5.3.9 summarises the results for this scenario and figure 5.3.10 illustrates the precision recall and the corresponding reduction in the size of the graph. Combinations (a) to (h) show the results produced by the individual landmark methods. Combination (i) represents the selection of all landmark methods and (j) is the result obtained by combining methods that are applicable to both scenarios. Combination (k) is the result obtained by those methods that are 104
Combination
Methods
Precision
Recall
Edges
Nodes
% reduction of edges
% reduction of nodes
(a)
1
0.27
0.16
41
32
87.7
80.2
(b)
2
0.90
0.13
10
10
97
93.8
(c)
3
0.40
0.30
52
42
84.3
74
(d)
4
0.19
0.88
322
157
2.8
2.5
(e)
5
0.67
0.12
12
10
96.4
93.8
(f)
6
0.35
0.23
46
36
86.1
77.7
(g)
7
0.43
0.42
67
50
79.8
69
(h)
8
0.44
0.39
62
48
81.3
70.2
(i)
1,2,3,4,5,6,7,8
0.51
0.58
79
61
76.1
62.1
(j)
1,2,3,4
0.20
0.99
330
161
0.4
0
(k)
5,6,7,8
0.45
0.45
69
51
79.2
31.7
(l)
7,8
0.44
0.43
68
51
79.5
31.7
(m)
4,5,6,7,8
0.45
0.45
69
51
79.2
31.7
F IGURE 5.3.9. Summary of results for NanoXML “Print XML data as it is parsed” scenario Precision Recall
Precision and Recall
Size reductions 100
0.9
90
0.8
80
0.7
70
% reduction
1
0.6 0.5 0.4 0.3
% edge reduction % node reduction
60 50 40 30
0.2
20
0.1
10 0
0 (a) (b) (c) (d) (e)
(f)
(g) (h)
(i)
(j)
(k)
(a) (b) (c) (d) (e) (f) (g) (h) (i)
(l) (m)
(j)
(k)
(l) (m)
F IGURE 5.3.10. Precision-recall and size reduction for landmark method combinations in NanoXML “Print XML data as it is parsed” scenario specific to this scenario, (l) contains the two methods that are invoked when an XML element and attribute are encountered. Method 4 is key to invoking the methods that are responsible for parsing the data and methods 5-8 are responsible for interpreting and acting on that data. Combination (m) illustrates the result of aggregating them. The results for this scenario are discussed in section 5.3.2.3. 5.3.2.2. “Construct a data object tree as data is parsed before pretty-printing it” scenario. In this scenario a complete object tree is constructed before the data is pretty-printed to the screen, using the customised NanoXML writer. This differs from the previous scenario in two ways: It does not stream the data directly to the printer and uses a customised XML writer (XMLWriter) as opposed to the standard System.out.println method. The previous scenario did not require the construction of an object tree, so the tree builder was customised to stream the data to output instead of using it to construct the data object tree. In this scenario the standard tree builder is used (StdXMLBuilder). The two classes 105
XMLWriter
and
StdXMLBuilder
provide a set
of methods, as defined in their interfaces and documented in JavaDocs, that are particularly important to the execution of this scenario and therefore provide a useful set of landmark methods: •
DumpXML.main(String[]):
As was the case with the previous scenario, this method
also does not belong to NanoXML itself. Its source code is also available in appendix C. (9)
StdXMLBuilder.StdXMLBuilder() :
This instantiates an object tree builder that is the
standard implementation of the IXMLBuilder interface. Instead of the MyBuilder class used in the previous scenario, this class actually constructs a tree from XML data as opposed to simply printing it out. (10)
StdXMLBuilder.startBuilding(String,int) : This
method is called before the parser
starts processing its input. (11)
StdXMLBuilder.startElement(String,String,String,Str ing,int): This method is called
every time a new XML element is encountered. (12)
StdXMLBuilder.addAttribute(String,String,String,Str ing,String):
This method is
called every time an attribute of an XML element is encountered. (13)
XMLWriter.XMLWriter(OutputStream) : This method
instantiates an XMLWriter, which is
responsible for pretty-printing XML data. There are two possible constructors, one of which takes an argument of type
Writer,
and the other of which takes an input
of type OutputStream. This scenario assumes that data is still written to the standard System.out output
(14)
stream.
XMLWriter.writeElement(IXMLElement) : This method is responsible for pretty-printing
an XML element. Multiple write methods overload this one, providing options for inserting additional spaces and indents etc. This scenario uses the standard output. The reasons for choosing particular combinations are largely similar to the previous scenario. Combinations (a) - (j) represent individual landmark methods. Combination (k) contains all of them, (l) contains those that are general enough to apply to both scenarios, (m) contains those that are specific to this scenario, (n) combines only those responsible for building the object tree and (o) combines those that are responsible for pretty-printing the output. 5.3.2.3. Results for NanoXML scenarios. The box plots in figure 5.3.13 summarise the results obtained for both scenarios. Plots corresponding to the first scenario are on the left and those for the second are on the right. 106
Combination
Methods
Precision
Recall
Edges
Nodes
% reduction of edges
(a)
1
0.24
0.10
41
32
88.3
% reduction of nodes 81.7
(b)
2
0.90
0.09
10
10
97.2
94.3 75.9
(c)
3
0.38
0.21
52
42
85.1
(d)
4
0.25
0.86
329
161
5.8
7.5
(e)
9
0.45
0.05
11
9
96.9
94.9
(f)
10
0.43
0.25
56
46
84
73.6
(g)
11
0.47
0.40
83
65
76.2
81.4
(h)
12
0.47
0.44
91
74
74
57.5
(i)
13
1
0.02
2
3
99.4
98.3
(j)
14
0.27
0.96
320
174
8.3
0
(k)
1,2,3,4,9,10,11,12,13,14
0.27
0.96
320
174
8.3
0
(l)
1,2,3,4
0.25
0.98
329
161
5.7
7.5
(m)
9,10,11,12,13,14
0.22
1
349
174
0
0
(n)
9,10,11,12
0.47
0.49
103
82
70.5
52.9
(o)
13,14
0.22
1
349
174
0
0
F IGURE 5.3.11. Summary of results for NanoXML “Construct a data object tree as data is parsed before pretty-printing it” scenario
Precision Recall
Precision and Recall
% edge reduction % node reduction
Size reductions
1
100
0.9
90
0.8
80
% reduction
0.7 0.6 0.5 0.4 0.3
70 60 50 40 30
0.2
20
0.1
10 0
0 (a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(a)
(o)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Combination
Combination
F IGURE 5.3.12. Precision-recall and size reduction for landmark method combinations in NanoXML “Construct a data object tree as data is parsed before pretty-printing it” scenario
Twenty of the landmark combinations result in a substantial reduction in the size of the call graph. The other eight result in an almost negligible reduction (less than 9% reduction of edges and nodes). As expected this minimal reduction usually results in a high recall and low precision. In several cases, the same methods caused high recall and low precision in both scenarios. The parse method by itself is a particularly ineffective landmark method, leading to a less than 8% edge reduction in both scenarios. This was initially suprising, because it is the method that contains the key functionality of the system as a whole (parsing the XML). In hindsight this could be precisely the reason why it is also unsuitable as a landmark method for either of the two scenarios; it is not specific to the functionality of either of them. In terms of its position of the call graph, it has relatively few predecessors (meaning that the hammock 107
1.0 0.0
0.2
0.4
0.6
0.8
1.0 0.8 0.6 0.4 0.2 0.0
1
2
1
80 60 40 20 0
0
20
40
60
80
100
Recall
100
Precision
2
1
2
1
Reduction in the number of edges
2
Reduction in the number of nodes
F IGURE 5.3.13. Box plots comparing precision and recall from both scenarios
graph will eliminate very few nodes), but is succeeded by a large percentage of the call graph edges, which are all included if it is not succeeded by any further landmark methods. In the first scenario however (see figure 5.3.10) combinations (g), (h) and (i) are noteworthy because, besides substantially reducing the size of the call graph, they produce reasonable precision and recall values. All reduce the number of edges in the call graph by over 75%, yet all produce a recall of between 39% and 58%. Their precision ranges between 43% and 51%. Combinations (g) and (h) include methods 7 and 8 respectively, which are solely responsible for some aspect of the parsing tasks, but have nothing to do with the rest of the scenario (i.e. printing out the results to the screen). It is to be expected that adding further landmark methods should significantly improve the recall and perhaps sufficiently limit the resulting call graph to improve on the precision as well. Combination (i) does includes all of the landmark methods but only produces a marginal improvement on combinations (g) and (h). 108
For most landmark combinations (in NanoXML as well as FreeMind) the deviation between number of edges and nodes removed from the call graph was minimal (10% or less). Combinations (k), (l) and (m) were a notable exception to this, where all combinations resulted in a 79% reduction in the number of edges, but only a 31.7% reduction in the number of nodes. This is mainly due to the fact that these combinations all involve the selection of landmark methods that belong to a object within a class hierarchy. They are designed to focus on executions belonging to objects of type (see appendix C).
MyBuilder
MyBuilder,
which inherits from the
StdXMLBuilder
class
overrides several methods belonging to StdXMLBuilder. Because
of this, for every call to a method in StdXMLBuilder in the call graph, there is another call to the overriding method in MyBuilder. These superfluous (polymorphic) edges are removed when the inspector specifies that the methods of interest are located in MyBuilder, which is the case for method combinations (k), (l) and (m). The use of landmark methods to reduce the number of polymorphic edges is also discussed in section 5.2.5.3.
The trade-off between precision and recall that was discussed in the results of the FreeMind case study can again be identified with some NanoXML. Combinations (b) and (j) in figure 5.3.10 and combinations (d) and (i) in figure 5.3.12 are prominent examples. This trade-off is however not as pronounced as it was in the results for FreeMind and there are several cases where there isn’t a trade-off between precision and recall. Combinations (g), (h), (k), (l) and (m) in the first scenario and (g), (h) and (n) in the second scenario are notable examples. In the first scenario all of the highlighted combinations involve either landmark method 7, 8, or both. Similarly in the second scenario, combinations (g), (h) and (n) involve either landmark method 11, 12, or both. All of these combinations in both scenarios result in a substantial reduction of the call graph (by 70% of its edges or more). This indicates that there may be a selection of key methods (such as 7 and 8 in the first scenario and 11 and 12 in the second scenario) that can substantially reduce the size of the call graph, without compromising too much on the recall (usually a substantial reduction in the size of the call graph is reflected by a significant drop in recall). Methods 7 and 11 both implement the IXMLBuilder.startElement method (7 overrides 11). Methods 8 and 12 both implement the
IXMLBuilder.addAttribute
method (8 overrides 12). With only eight method combinations from a single system it is difficult to develop any generalisable observations. It is however apparent that the selection of these landmark methods involves determining the runtime type of an object, and doing so by selecting its interface methods as landmark methods. In this case it led to a substantial reduction in the size of the call graph. 109
hammock unconstrained slices
FreeMind "add child node" feature
hammock unconstrained slices
NanoXML "Print XML data as it is parsed" feature 100 % of call graph edges
% of call graph edges
100
10
1
0.1
10
1
0.1
a
b
c
d
e
f
g
h
i
j
k
l
m
n
a
b
c
d
e
f
g
h
i
j
k
l
m
(a) FreeMind “add child node” use case (b) NanoXML “Print XML data as it is parsed” scenario F IGURE 5.3.14. Log-scale graphs showing the number of edges added by individual phases in the algorithm 5.3.3. Discussion and Conclusions. This subsection provides an overview of the results that have arisen as a result of the second study. The main points are: (1) There is an apparent compromise between precision and recall. (2) The accuracy of results is largely dependent on the topology of the call graph. (3) The results are largely in agreement with the observations from the previous study. 5.3.3.1. Compromise between Precision and Recall. The first observation that can be made in both case studies is the fact that there is often a compromise between precision and recall. Call graphs produced for FreeMind tend to produce a higher recall with lower precision, whereas those produced for NanoXML produced a higher precision with lower recall. To investigate the reasons for this apparent polarisation in precision and recall the call graphs produced by the landmark methods were analysed. Figure 5.3.14 compares the composition of one of the Freemind use cases (chart (a)) that produced high recall and low precision against a NanoXML scenario (chart (b)) that produced a relatively high precision and lower recall. It shows the percentage of edges in the call graphs that are contributed by the three individual phases in the algorithm (computation of the hammock graphs, tracing paths from calls identified by slices and tracing paths from unconstrained landmark methods). The charts show that the number of edges that belong to the hammock graphs is consistently small (often under 1% of the number of edges in the final call graph). Lack of precision (and corresponding high recall) tends to be a result of adding edges belonging to paths from the call sites marked by slices or tracing paths from landmark methods that are not constrained by further landmark methods, i.e. these then fan out to include the rest of the system. 5.3.3.2. Effect of Call Graph Topology. Section 5.3.3.1 observes that one of the reasons for a lack of precision in several of the results stems from the fact that too many superflous edges are added by tracing paths from calls identified by slices and tracing paths from unconstrained 110
(a) tree-shaped
(b) non tree-shaped
F IGURE 5.3.15. Tracing all paths from node c in a tree-like and non tree-like graph
landmark methods. The consequences for precision were more pronounced in FreeMind than in NanoXML. The main reason for this is that the number of edges added by this process of tracing all paths from a given call depends entirely on the topology of the call graph. If the graph contains many methods that are used in multiple contexts (i.e. it is highly connected), there is a higher chance of including edges that are not relevant. Every time a method is encountered that is called in multiple contexts, instead of following only the calls that are relevant to the context that is of interest, our approach traces along all of the calls. Figure 5.3.15 illustrates how the topology of a call graph can affect the precision of the result. In the graphs node c represents a point from which all paths are to be included. Edges in bold would be included by our algorithm. Graph (a) is a simple tree, whereas graph (b) has an additional edge (from
g
to b). In graph (a) the inclusion of the edges belonging to all paths
out of node c results only in the inclusion of a single branch of the tree. If the additional edge in graph (b) were irrelevant, it would have a significant effect on the accuracy of the final graph, because the algorithm would include it and all its successors (in this case resulting in the inclusion of the entire graph apart from two edges). This is the key reason why, no matter what landmark combination is selected, the use of landmark methods within highly connected call graphs tends to result in the inclusion of many superfluous edges and a corresponding low precision. 111
With respect to the two scenarios analysed in section 5.3.3.1, the connectivity of the FreeMind graph is higher than the NanoXML one. The call graph for the NanoXML scenario has a more tree-like structure than the call graph for the FreeMind use case. The average percentage of call graph edges that can be reached from a given method in FreeMind is 5.68% (so on average, following out all of the edges from a given node in FreeMind will return 47 out of 827 edges). The average percentage of call graph edges that can be reached from a given method in NanoXML is 4.49% (so on average, following out all of the edges from a given node in NanoXML will return 15 out of 334 nodes). This difference in terms of topology between the two graphs accounts to a large extent for the superflous edges that are included in the FreeMind case study. Because of this difference in call graph topology, a larger proportion of irrelevant edges and nodes are removed from the NanoXML call graph. In NanoXML the landmark combinations resulted in an average of 62% edge and 52% node reduction, whereas in FreeMind the landmark combinations resulted in an average of 37% edge and 34% node reduction. This overall reduction in call graph size is reflected by a corresponding increase in precision. In fact, the average level of precision is 43% for the NanoXML scenarios and only 22% for the FreeMind results. 5.3.3.3. Validating the observations from section 5.2. Several results obtained from the two case studies support the three observations that were the outcome of the study in the previous section (see section 5.2.4). This subsection takes each observation in turn and examines it with respect to the FreeMind and NanoXML case studies. The first observation was that combinations consisting of multiple landmark methods tend to result in a higher recall. With the exception of the FreeMind ”Add child node” use case, combinations involving multiple landmark methods in all three other use cases substantially increased the resulting recall value. In the ”New mind map” FreeMind use case the recall value increased by an average of 18%, in the first NanoXML scenario the average increase was 25%, and in the second NanoXML scenario the average increase was 55%. The second observation suggested that landmark methods can be used to reduce the number of candidate destinations for polymorphic calls. This was observed in the first NanoXML scenario, where landmark methods were methods that belonged to the prevented the inclusion edges to
StdXMLBuilder,
MyBuilder
which is extended by
object. These
MyBuilder.
The call
graph size reductions for combinations (k), (l) and (m) in figure 5.3.10 clearly show a 31% 112
drop in the number of nodes and a 79% drop in the number of edges. Initially, without landmark methods, the ratio of edges to nodes is 2.06:1. Many (superfluous) edges are included to methods in StdXMLBuilder. After inserting the landmark methods the ratio of edges to nodes is about 1.35:1, thanks to the removal of many of these edges to StdXMLBuilder. The third observation suggests that the context in which landmark methods are executed needs to be taken into consideration when they are selected. This observation was obtained as a result of looking for the reasons for low recall in the first study and was used as guidance for the selection of landmark methods in this study. In practice it is very difficult to determine by source code analysis alone whether the selection of one method as a landmark may cause the elimination of other relevant methods. Examining the context in which it may be called turned out to be tedious, although visual inspection of the call graph (e.g. examining where landmark methods are located with respect to each other) helped a lot in this respect. Although no concrete evidence was found to support this observation, it should still be taken into account during landmark method selection. 5.3.3.4. Conclusions. An interesting conclusion that can be derived from the results in figure 5.3.14 is that, in order to increase the precision of the technique, a more sophisticated approach is needed to trace paths from calls that are highlighted by slices and from unconstrained landmark methods. Simply tracing out all possible paths from these call sites can end up including largely irrelevant sections of the call graph. One potential avenue of future research would be to use some form of program specialisation technique to ensure that only paths that are relevant to the scenario under inspection would be traced from these call sites. One potential specialisation technique is program conditioning [CCL98], where statements that cannot be executed with respect to a given condition on the inputs are removed from the program. This is used as a basis for conditioned slicing (see section 3.5.2.4). For certain call graphs it is difficult to obtain a combination of landmark methods that will return precise results, restricting the size of the call graph. The ”Add child node” use case (figure 5.3.6) is an example. This was also observed in the ”select figure” JHotDraw use case in the study that examined the automated landmark method combinations (see figure 5.2.3). This suggests that landmark methods by themselves may not be sufficient. One possible approach to address this problem is to introduce a different type of ‘barrier’ method that ensures that all paths through that method aren’t included in the final result. This idea is based on Krinke’s barrier slicing (see section 3.3.9) and is elaborated in the next chapter. 113
This study has presumed that landmark methods are selected in a single step. It was originally suggested that this approach be used in an iterative manner, where the results are refined with each step (see figure 4.4.3). In this sense the results do not convey the extent to which precision and recall improve in once the inspector has read the code and had the opportunity to refine the landmark method combinations.
5.4. Using the Reduced Call Graph to Read and Understand the Source Code In order to be considered useful, the code technique must be precise and produce a high recall. This was evaluated in the previous two sections. It must however also be readable; it must be possible to traverse the reduced call graph produced by the landmarks and systematically comprehend how the use case under inspection functions. Producing a comprehensive reading strategy to accompany the landmark placement technique is beyond the scope of this work. It is however important to consider some of the problems that need to be addressed in terms of reading and comprehension for this technique to be adopted in practice. To investigate how useful the source code corresponding to the reduced call graph is for comprehension and inspection a sample use case was extracted from the hotel system. For each class that contributed to the use case, a hard-copy was produced that contained a summary of the class’ data members and methods at the top, and then contained the source code for only those methods contained in the reduced call graph. Each method was assigned a number. At every call site corresponding to a call in the reduced call graph, a reference was inserted that contained the number of the destination method. The final document is provided in appendix D. The document was scrutinised by two individuals who are familiar with the landmark technique and the Hotel System. Based on their experience in trying to understand the source code as presented in the document, the following issues were were raised: (1) Understanding code without further tool support is problematic because it is difficult to mentally keep track of the calling context in which a method is being read. It is difficult to remember the caller of a method, so when mentally executing a chain of calls, it is difficult to ‘ascend’ back up the call chain. (2) Although the reduced code set appears to be useful, a clear reading strategy is needed to mark out the important paths through the code. (3) It would be useful to clearly distinguish between method calls to other methods within the same class and calls to methods belonging to other classes. This would 114
help the inspector to establish on a more abstract level how classes (and objects) interact with each other as the use case executes. (4) There may be multiple instances (objects) of a single class that interact with the system at run time. A reading approach is required that helps the inspector understand how these objects interact from a static perspective.
The prototype tool that was introduced in section 4.3 would be helpful with respect to issue 1 because it displays the call stack that has led to a given method and provides a ‘backwards’ button to skip back to the previous method (see figure 4.3.3). It would be useful to augment the tool with a mechanism that enables the inspector to summarise the functionality of a particular method call in a given execution context. This would significantly ease the cognitive burden of tracing and understanding the method calls as the inspector revisits the method. Issue 2 has been highlighted in previous research by Dunsmore et al. [DRW03]. Although the work presented in this thesis addresses the problem he referred to as “chunking”, which is hampered by delocalisation and unpredictability (see section 2.4), the problem of devising a suitable reading technique is an important avenue for future research. Issue 3 suggests that more information is required to show how the extracted source code governs the interactions between different objects. The necessity of this information is highlighted by Pacione et al. [PRW04], who use the following questions (amongst others) as a basis for the evaluation of their model for software visualisation (for the sake of comprehension):
• What are the collaborations between the objects involved in an interaction? • What is the control structure in an interaction? • How does the state of an object change during an interaction?
Distinguishing between calls to other classes and within the same class can easily be added to the current tool, but providing more detailed information about the nature of the interaction is less trivial and relies not only on the analysis capabilities of this tool, but also on external visualisation and comprehension strategies, as proposed by Pacione. Issue 4 needs to be taken into account by any static object-oriented comprehension approach. Different objects of the same type may have different states. This is another challenge that has to be addressed by future research into object-oriented inspections. 115
5.5. Tool Evaluation Storey et al. [SFM99] propose a list of fourteen cognitive design elements that should be supported by a software comprehension tool if it is to be effective. These points are outlined below, with respect to the implementation as detailed in section 4.3. Points in which the tool performs favourably are highlighted in blue and those that are not met are highlighted in red. Improve program comprehension: Enhance bottom-up comprehension (1) Indicate syntactic and semantic relations between software objects: Edges contained in the reduced call graph produced by the approach link methods that may be relevant to a given execution. (2) Reduce the effect of delocalised plans: The reduced call graph pulls together elements of code from across the system that may affect or be affected by a particular method. (3) Provide abstraction mechanisms: Bottom-up comprehension relies on the construction of mental hierarchical abstractions from the low-level software entities. Storey et al. state that tools designed to aid software comprehension must enable the user to create and label their own abstractions as they proceed with the comprehension process. This facility is not provided by the tool. The ability to summarise multiple calls and methods as a single abstraction would certainly help during the inspection of a use case. Enhance top-down comprehension (4) Support goal-directed, hypothesis-driven comprehension: The hypothesis can be expressed in terms of the landmark methods that should be executed if a scenario executes. (5) Provide an adequate overview of the system architecture at various levels of abstraction: Storey suggests that, to enable top-down comprehension, the architecture of the system should be provided at multiple levels of abstraction. Systems such as SHriMP views [SM95] use nested graphs to represent these different levels of abstraction. The aim of representing a system in terms of its call graph is to facilitate the inspection of a system in terms of its behaviour as opposed to its structure. This is why the representation of system architecture (which is primarily structural) is beyond the scope of this project. Integrate bottom-up and top-down approaches 116
(6) Support the construction of multiple mental models: According to Storey a complete comprehension tool should enable the maintainer to construct multiple ’views’ of the program, where each view shows the program from a different abstraction. This tool only uses one model (or abstraction) which is the call graph, but does enable the inspector to view it from two ‘views’; graphically and textually. (7) Cross-reference mental models: Given that multiple abstractions of the system exist, they all need to cross-reference each other to facilitate navigation for the user. As stated for the previous point, this tool does not provide multiple abstractions and focusses entirely on the call graph. Reduce maintainer’s cognitive overhead Facilitate navigation (8) Provide directional navigation: ‘Directional navigation’ describes mechanisms for reading source code or documentation in the correct sequential order. As shown in figures 4.3.2 and 4.3.3, source code and call graph navigation before or after it has been trimmed with landmark methods is simply a matter of selecting methods from the list and using the ‘forward’ and ‘back’ buttons. (9) Support arbitrary navigation: As well as providing directional navigation, a general purpose comprehension tool should also allow the user to (arbitrarily) jump from one point in a program to another. This can be acheived with the tool by opening another call graph traversal window, which provides the user with a dialogue that lets them select their starting point in the system. Provide orientation clues (10) Indicate the maintainer’s current focus: The current focus is the source code for the method in the pane. (6) Display the path that led to the current focus: This is represented by the call stack, shown in the lower window in figure 4.3.3. (7) Indicate options for further exploration: The list of methods to the left of the pane indicate methods called by the current method and can be selected for further exploration. Reduce disorientation 117
(13) Reduce additional effort for user-interface adjustment: The interface provided by this tool is minimal and intuitive, only requiring that the maintainer choose the destination method from a list and navigate forwards or backwards. (14) Provide effective presentation styles: This refers to the need for effective visual feedback from the tool. It presents the call graph and updates it every time an additional landmark method is added (see figure 4.4.3), this is particularly effective because it immediately presents the user with a visualisation of the impact that different landmark methods have on the final call graph. The tool seems to be particularly successful with respect to the design elements aimed at reducing the maintainer’s cognitive overhead, where it satisfies all of the seven points. The tool focusses on the targeted extraction of a particular segment of the system. Several of the points listed by Storey et al. are aimed at tools with a more general purpose. In a later paper [Sto05] she divides program comprehension tools into three distinct categories: Extraction, analysis and presentation. It is presumed that the points presented above are for tools that aim to fulfill some element of functionality within all of these categories. This tool would be assigned to the “extraction” category, where the activity of program comprehension itself can be aided by some other specialised tool. Of the design elements that should aim to reduce the maintainer’s cognitive overhead this tool satisfies all of its points.
5.6. Conclusions This chapter has presented an evaluation of the landmark method call graph reduction technique that is detailed in chapter 4. The prototype tool (see section 4.3) was used as a basis for the evaluation. Section 5.2 contains a study that is based on the Hotel and JHotDraw systems. In this study the aim was to determine the properties of useful landmark method combinations by analysing the accuracy of every possible combination of up to three landmark methods for the call graphs of two use cases per system. Section 5.3 contains a further study that aimed to establish whether, by using knowledge of the system alone (gathered from sources such as source code documentation and manuals), it is possible to select useful landmark method combinations. Section 5.4 discusses the challenge of reading through the code contained within a reduced call graph (as produced by the landmark method technique). Finally section 5.5 evaluated the prototype tool itself with respect to a set of established generic comprehension tool guidelines. 118
In the first study landmark method combinations were (automatically) selected with out any regard for the contents of the methods, their position in the call graph, or their role in the use case. The results were computed in terms of precision and recall by comparing the resulting call graph against graphs that had been constructed manually for each use case. By plotting the precision and recall values as coordinates on a chart it was possible to identify clusters of combinations that fared particularly well, and particularly poorly. These were analysed manually in an attempt to identify the traits that made them so (un-)suitable. This analysis resulted in three key observations that need to be taken into account in the manual selection of landmark methods. These were: (1) Increasing the number of landmark methods tends to produce a higher recall but can compromise precision. (2) Landmark methods can be used to reduce the number of candidate destinations for polymorphic calls. (3) The context in which landmark methods are executed needs to be taken into consideration when they are selected. The next study served not only to establish the feasibility of manually selecting useful landmark method combinations, but also to attempt to support the above observations. The first two observations were indeed also observed in the second study. The third observation was taken into account when selecting landmark methods, but its impact on the final recall values was too difficult to establish. None the less it still serves as useful advice for the selection of landmark methods. The second study noted that it is difficult to achieve a balance between precision and recall. It established that this can depend on the topology of the call graph that is being analysed. If the graph is highly connected, the stages in the algorithm that involve tracing out all paths from a given point can end up including largely irrelevant swathes of the call graph. Importantly, the actual hammock graphs that are produced by determining direct paths between landmark methods tend to be very small and focussed. This led to the conclusion that, to become widely applicable, future work should address these latter stages and apply more suitable techniques to follow only relevant paths in the call graph. The final part consisted of evaluating the usability of the landmark method technique, both in terms of source code it produces and the prototype tool itself. The source code was evaluated as a hard copy (see appendix D) by two participants who are familiar with software inspections 119
and have a broad knowledge of the software system under inspection (the Hotel System). They raised four issues that need to be taken into account when reading the source code: (1) Understanding the code as a hard copy, without tool support, is difficult because the reader is required to mentally keep track of the calling context in which a method is read. (2) A clear reading strategy is required to emphasise the most important paths through the code. (3) It would be useful to distinguish between calls between methods in the same class and methods belonging to other classes. (4) A reading technique needs to be able to handle multiple instances of different objects from a static perspective. Issues 1 and 3 can be easily addressed by making a slightly augmented version of the current implementation available. Issues 2 and 4 are however less trivial. Chapter 2 introduced some of the problems that are involved in reading through object oriented source code. This tool certainly makes an important contribution, because it emphasises parts of the call graph that are important, providing an element of guidance. The actual process involved in reading through object-oriented source code however remains an important problem. The tool evaluation was carried out with respect to a set of comprehension tool criteria that were established by Storey et al. [SFM99]. Of the fourteen points listed, this implementation satisfies ten. The four points that remain unsatisfied are those that require multiple different models of the same system, and require different levels of abstraction for the sake of comprehension. This tool operates only in terms of a single model of the system, which is the call graph. The call graph is also only represented at a single level of abstraction. It is however certainly possible that the information that is produced by this tool could be integrated into more comprehensive visualisation suites, such as SHriMP [SM95].
120
CHAPTER 6
Conclusions and Future Work 6.1. Conclusions Software inspections are widely regarded as an effective fault detection mechanism. They are particularly useful because the inspector can make qualitative judgements about the software that cannot be achieved automatically. An effective inspection strategy, paired with a suitable testing approach provides a comprehensive fault detection and elimination approach, which in turn leads to safer and more trustworthy software. The now dominant object-oriented paradigm presents several challenges that have yet to be addressed by inspection research. Inter-operating mechanisms such as polymorphism, inheritance and dynamic binding make software documents, particularly the source code, difficult to read and understand from a static viewpoint. This thesis provides a tool-supported technique that aims to relieve the inspector of the cognitive burden involved in tracing through and understanding object-oriented source code by extracting those paths through the system that are relevant to the use case that is under inspection. There are several challenges that need to be addressed if the inspection of object-oriented source code is to become effective in practice. Object-oriented decomposition encourages the delocalisation of functionally related code across the system. It is however almost impossible to determine how these delocalised elements of source code interact from a static perspective. Because static analysis is sound, it is necessarily conservative, so every possible dynamic call has to be represented. There are usually too many to be feasibly taken into consideration by the inspector. This thesis presents a technique that uses only a limited amount of dynamic information, in the form of ‘landmark methods’, to determine the set of methods (and interconnecting calls) that are relevant for an inspection. The approach consists of two stages. The first stage determines the direct paths between landmark methods, by inducing hammock graphs on the call graph. Having identified these direct paths, the second stage uses code slicing to determine the other method calls that may influence the methods that are connected by the edges belonging to the direct paths. 121
The results of applying the technique were gathered by computing its precision and recall. This evaluation approach has been vital for obtaining feedback about the technique’s performance in the two studies. Instead of simply measuring the size of the extracted segment, it crucially also indicated what was superfluous and what was missing. This facilitated the process of determining which landmark method combinations were particularly suitable and which ones were not, by plotting them on a chart (e.g. 5.3.6). Precision and recall data was collected from four software systems, three of which are openly available on the SourceForge software repository. From each system, two use cases were identified. NanoXML (one of the systems in the evaluation) only had a very limited set of use cases. For that reason it was decided that the main use case was divided into two scenarios, and each scenario was used as a basis for evaluation instead of two separate use cases. Precision and recall results were evaluated in two separate studies. The first study produced precision and recall results for every possible combination of landmark methods. By presenting the results in this way it was possible to identify those combinations that fared particularly well. These were used to identify some preliminary observations that could be of use when manually selecting landmark methods. In the second study, landmark method combinations were selected manually. Methods were selected by using only knowledge of the system architecture and documentation. Out of the two systems that featured in the evaluation the technique performed favourably on the smaller system (NanoXML), producing slightly higher precision and recall values, as well as more substantial reductions in terms of the size of the call graph. In several of the results it was observed that there seems to be a compromise between precision and recall. For a given landmark method combination it will often perform particularly well in terms of either precision or recall, but rarely provide satisfactory results for both. Upon closer analysis the high recall values tend to be produced by the stages of the algorithm that involve following all of the paths in the call graph tend to return too many edges. This in turn depends on the topology of the call graph; if the call graph is highly connected this problem is exacerbated. An encouraging point that emerged from this analysis, however, is the fact that the hammock graphs themselves tend to be consistently small. It seems that the hammock graph edges provide the ‘core’ set of relevant edges. An important part of future work lies in developing a more sophisticated code extraction technique that can be used instead of simply tracing out all successor paths from the points highlighted. 122
6.2. Summary of Contributions This thesis applies source code analysis techniques to facilitate the task of manually inspecting object-oriented source code. In doing so it has made the following contributions: (1) An investigation of the issues that hamper the manual inspection of object-oriented source code, which concludes that tool support is necessary if object-oriented code inspections are to remain practical. (2) A flexible source code analysis technique that accepts lightweight run-time information in the form of landmark methods and produces a reduced call graph for inspection. (3) A prototype implementation of the approach that provides a basic architecture for the general approach and also demonstrates its feasibility. (4) A novel evaluation approach that is based on open-source software systems and can be used for the evaluation of future code extraction techniques and implementation. 6.2.1. An Investigation into the Issues that Hamper the Manual Inspection of ObjectOriented Source Code. Chapter 2 introduced the problems that arise in code inspections, and in object-oriented code inspections in particular. This work is grounded in the initial research that was carried out by Dunsmore [Dun02] for his thesis on object-oriented code inspections. He identified the problem of ‘delocalisation’ that arises when functionally related source code is spread across a multitude of objects. This thesis argues that the problem of delocalisation is exacerbated by the dynamic nature of object-oriented systems. Delocalisation is accompanied by the problem of unpredictability; it is almost impossible to determine how delocalised elements of functionality will interact at runtime from a static perspective. Dunsmore’s thesis argued that object-oriented systems require a different reading strategy to address these problems. This thesis augments Dunsmore’s argument, stating that the extent of delocalisation and unpredictability in object-oriented systems requires tool-support to complement a new reading technique. 6.2.2. A Flexible Source Code Analysis Technique. Chapter 4 introduces a new analysis technique for object-oriented systems that is designed to address the problems that are outlined in the previous subsection. Manual inspections involve the comprehension of sections of source code, and the identification of further relevant sections (Cant [CHSJ94, CJHS95] refers to this as a combined process of ‘chunking’ and ‘tracing’). The analysis technique that is introduced in chapter 4 partially automates this process. The inspector selects certain methods 123
that are known to be of relevance to the element of functionality under inspection (these are referred to as ‘landmark methods’), and the technique highlights paths in the call graph that are particularly relevant to these methods.
6.2.3. Architecture and Implementation. Besides introducing the analysis technique, chapter 4 also introduces an architecture that can be used for its implementation. It is based on the Model View Controller (MVC) architectural pattern [SG96]: • The model consists of the system’s call graph and dependence graphs for the methods. When landmark methods are selected by the user, changes to the call graph are computed via the model. • The view is responsible for representing the model to the user. For code inspections the two views that are considered to be most useful are a graphical representation of the call graph itself, along with a textual representation of individual methods in the call graph. The graphical view provides feedback on the effect individual landmark methods have on the size of the call graph, whereas the textual view can be used for the actual inspection. • The controller is responsible for enabling the inspector to add or remove landmark methods and to navigate the model. To add or remove landmark methods the inspector is presented with a dialogue box containing methods that belong to the call graph. To navigate the call graph, a window is presented with the current method, along with successor methods (methods that are called by the currant method). The inspector can either choose a successor method, or backtrack to the previous method.
6.2.4. Evaluation. The evaluation consists of two studies, which aim to establish the properties of useful landmark method combinations, as well as an evaluation of the prototype implementation of the tool itself. The results of the studies are primarily based on the precision and recall of landmark combinations. The use of precision and recall is only rarely been used to evaluate code extraction techniques. A possible reason for that it is extremely labour intensive, requiring the manual identification of the ‘relevant’ material. This evaluation demonstrates that it is (a) feasible and (b) helps to gain insights into the results. The first study computed the results for every possible combination of up to three landmark methods for the Hotel System and JHotDraw. By plotting the precision and recall results on a chart, it was possible to identify those combinations that fared particularly well and those that 124
produced poor results. These combinations were used as a basis for the following set of observations that were aimed at guiding the inspector in selecting suitable method combinations. An analysis of the results from the first study resulted in the following three observations: (1) Increasing the number of landmark methods tends to produce a higher recall but can compromise precision. (2) Landmark methods can be used to reduce the number of candidate destinations for polymorphic calls. (3) The context in which landmark methods are executed needs to be taken into consideration when they are selected. The following study was carried out on the FreeMind and NanoXML systems and investigated the manual selection of landmark methods, bearing in mind the observations that had been made in the previous study. It produced the following three conclusions: (1) There exists a compromise between precision and recall. (2) The precision of the technique is partially dependent on the topology of the call graph. (3) Imprecision is mainly due to the stages in the algorithm that involve tracing along all possible paths from a particular point in the call graph (as a result of identifying further call sites by slicing, or identifying successors in the call graph if a landmark method is not succeeded by a further landmark). (4) The technique can be used to remove superfluous polymorphic call graph edges.
6.3. Future Work 6.3.1. Landmine Methods. One of the conclusions in the evaluation was that, although the hammock graphs themselves are small, the paths that are traced from the final landmark methods and from call sites highlighted by the slices tend to return too many superfluous call edges. This often results in low values of precision. One potential approach to constraining the number of superfluous edges is to introduce a additional form of input that can be used by the inspector. A landmine method could be used as an antithesis to the landmark method; where a landmark method is expected to be crucial to the execution of a feature, a landmine method is known never to be executed with respect to that feature. The idea for a landmine method is based primarily on a similar technique proposed by Krinke [Kri04] to limit the size of static slices. He suggests the introduction of ‘barriers’, by tagging 125
nodes or edges in the dependence graph that are not to be traversed when the slice is computed. Landmine methods would implement a similar idea on the call graph, but could be used as a basis for making more useful inferences about what is (ir-)relevant in the call graph (i.e. if we know that method x is definitely not executed, we can conclude that method y becomes unreachable etc.).
6.3.2. Using Visualisation to Guide the Iterative Selection of Landmark (and Landmine) Methods. During the second evaluation experiment it became evident that visualising the call graph (as landmark methods are selected) can provide the inspector with useful feedback. As the graph is updated, the inspector can judge whether particular landmark methods effectively reduce the size of the call graph or not. One useful way to extend this would be to add visualisations for elements such as polymorphic call sites, to guide landmark method selection. It would also be useful to somehow annotate methods in the call graph with other useful information (e.g. if a node has a large number of successor edges in the call graph, the inspector should be notified that further landmark methods will be required to produce a manageable result.
6.3.3. Understanding Extracted Code. Although this thesis presents an approach for the isolation of source code that is relevant to a particular feature, it does not provide guidance on how to read the code systematically. Several generic reading techniques already exist for object-oriented software that could potentially be used for the source code that is identified by the landmark method approach. Dunsmore et al. [DRW03] propose the systematic reading strategy for object-oriented code, which aims to facilitate the manual resolution of dependencies between delocalised source code elements. Since the landmark method approach provides an automated means of determining these dependencies, it seems reasonable to suggest that they could be used in tandem. This can however only be confirmed by further research.
6.3.4. Evaluating Application-Specific Code Extraction Techniques. The literature review in chapter 3 shows that there exists a large number of code extraction techniques. It is often the case, particularly with slicing techniques, that they are evaluated in terms of the amount of information they return (% of lines of code of a program) and their computational efficiency [BH03]. Although sufficient for investigating the performance of code extraction techniques in isolation, such measurements provide insufficient insight into the performance of code extraction techniques with respect to a specific application. 126
This thesis has presented an approach to measuring the relevance of the returned code with respect to a particular application. By manually determining what code should be returned by the technique, precision and recall can be used as metrics to determine whether the technique is accurate and returns useful results. In light of the fact that (especially static) slicing approaches return a potentially large number of irrelevant code statements, it would make sense to use precision and recall to provide qualitative information about the approach, to complement the quantitative information that is provided by the size of the returned code segment(s). 6.4. Concluding Remarks The emergence of object-oriented programming presents a challenge for software inspectors. Object-oriented decomposition encourages the distribution of functionally related source code across the system. Paradigm features such as polymorphism and dynamic binding make it virtually impossible to determine how this delocalised source code operates when the system is executed. This thesis provides a novel approach to isolating source code for inspections. By extracting a subgraph of the call graph it not only shows which methods are useful, but also shows how each method is invoked. This makes it easier to read methods in the context of their execution, which is crucial to understand object-oriented code from the behavioural perspective. The contributions provided by this thesis are timely. There exists a large number of recent projects that focus on the development of so called ‘recommender’ systems, which aim at focussing the developer’s attention on parts of the system that are relevant to the task at hand (these were covered in section 2.5.3). Besides producing another recommender system specifically for code inspections, a novel evaluation approach has been proposed which provides qualitative as well as quantitative feedback. It is apparent that, in order to make object-oriented code inspections as efficient as possible, further research into tool-supported strategies is required. This thesis provides a useful basis for the development of new techniques and provides a number of potential future research avenues. With the continuous development of new code analysis and extraction techniques as well as the development of increasingly powerful code analysis frameworks the future looks promising for tool-supported code inspections.
127
APPENDIX A
The Java System Dependence Graph A.1. Introduction Analysing and representing program code in terms of its internal dependencies is useful for a variety of software engineering applications. These include operations such as slicing and the computation of program metrics. The program dependence graph represents these dependencies, where vertices are program elements and edges represent dependencies between them [OO84]. There have been several approaches to building graphs for different programming paradigms and languages. This appendix provides a brief overview of the Java System Dependence Graph (JSysDG), which draws together the benefits of previous object-oriented dependence graph representations. Construction of the graph is presented from a practical perspective and an example is provided which demonstrates that the approach presented is viable. Although dependence analysis is an established area, the JSysDG enables static analysis to be carried out on a graph which will produce more accurate results than other static Java dependence graphs. This is because it can represent abstract classes as well as interfaces and it can distinguish data members in parameter objects. The following sections provide details on each type of node in the graph, providing examples where appropriate. An example of a graph for a full program is provided in section A.5. A.2. The Java System Dependence Graph A dependence graph represents statements as vertices and dependencies between them as edges. Ottenstein and Ottenstein first suggested that dependence graphs could be used for software engineering operations in 1984 [OO84]. They proposed a graph which was capable of representing a program consisting of a single block of sequentially executed code. To enable the application of these operations to multi-procedural programs, Horwitz et al. introduced the System Dependence Graph [HRB90], which represents every procedure as an individual dependence graph. The procedure dependence graphs are linked to a central dependence graph, which represents the main program. 128
There have been several proposed modifications to the system dependence graph attempting to enable the representation of object-oriented programs. Such approaches must be able to cope with properties such as polymorphism, dynamic binding and inheritance. Larsen and Harrold proposed a graph capable of representing these features for C++ programs [LH96]. This was modified by Kovács et al. and Zhao1, to enable the representation of Java-specific features such as interfaces, packages and single inheritance [KMG96, Zha98]. Liang and Harrold suggest a more accurate means of representing calls to polymorphic objects and also provide a representation for polymorphic parameter objects which distinguishes individual data members [LH98]. The JSysDG integrates the Java-specific dependence graph features presented by Kovács et al. and Zhao with polymorphic object representations proposed by Liang and Harrold. For the construction of the JSysDG statements are categorised according to whether they contribute to the structure of a program (i.e. they are headers representing methods, classes, interfaces and packages) or the program’s behaviour (i.e. they belong to a method body). Each category is represented differently on the graph. When these different representations are combined, they provide a graph-based program representation. Because various types of dependencies between code elements are made explicit, analysis of the program can be simplified to a graph traversal algorithm or a slice. A conventional slice as defined by Horwitz et al. consists of all statements and predicates of the program that might affect the value of a variable x at a program point p [HRB90]. Object-oriented representations proposed by Larsen and Harrold [LH96] and Liang and Harrold [LH98] generate the dependence graph from C++. Several of the differences between C++ and Java require different edges or nodes in the graph. As an example Kovács et al. note that single inheritance in Java means that a subclass does not need to explicitly be linked to the methods it inherits because they can be efficiently computing by traversing the inheritance tree (inheritance in single inheritance tree structures can be computed in linear time) [KMG96]. Construction of the JSysDG relies on the fact that it is possible to perform some preliminary control, data and call flow analysis on a given Java program, in order to build a skeletal version of the graph. Given that this framework is established, other nodes relating to the program structure (e.g. method and class vertices) are added. The accuracy of any traversal algorithm which operates on the JSysDG (e.g. a slicing algorithm) depends on the accuracy of the flow analysis performed in the preprocessing stage. 1The abbreviation ‘JSysDG’ was chosen in order to avert confusion between this dependence graph construction
approach and Zhao’s JSDG. 129
Java System Dependence Graph (JSysDG)
4
Package Dependence Graph (PaDG)
3
Class Dependence Graph (ClDG)
Interface Dependence Graph (InDG)
Method Dependence Graph (MDG)
2
Formal Vertex
Statement Vertex
Actual Vertex
Statement / Variable Vertices
1 Formal−in Vertex
Formal−out Vertex
Actual−in Vertex
Actual−out Vertex
F IGURE A.3.1. Layered Architecture of Graph A.3. Representing the Graph as a Layered Architecture A layered architecture is organised hierarchically, where each layer is visible only to the layers that are directly adjacent [SG96]. There are two reasons for interpreting the graph as a layered architecture. The principal reason is that it is a very complex construct which is composed of a large number of different types of nodes and edges. Layered architectures allow a complex system to be simplified over several levels of abstraction. As an example, if we are only interested in the methods of a class, we concentrate on the Method Dependence Graph layer which represents methods as single nodes. If we are interested in relationships between its statements we can drill down to the statement layer. This section introduces the contents of the four individual layers as they are presented in figure A.3.1. We ascend the hierarchy, starting with the statement layer. Statement Layer (1). This layer contains vertices and edges which contribute to the behaviour or structure of a method. A statement could be the simple assignment of a value to a variable or a call to another method. Statements are connected to each other via control and data dependencies2. Figure A.3.2 illustrates a simple for loop. The statement result = result + d is control dependent on the for statement (control dependencies are shown in bold). It is data dependent on the statement initialising result. A further data dependence exists from result = result + d to return result. 2A control dependence A → B exists, if the execution of a statement B relies on the execution of a predicate statement c A. A data dependence A →d B exists, if the execution of a statement B references a variable which is defined / modified in a statement A. 130
int result = 0
for(int i; i 0){
S20
int a = Integer.parseInt(args[0]);
C6
int b = Integer.parseInt(args[1]);
C7
e = new SimpleCalc(a, b);
IE44
int a,b;
S3
C5
public class SimpleCalc implements Calculator{
S18
a = 6;
S21
E22
}
CE47
public SimpleCalc(int aIn, int bIn){
E48
}
S23
a = aIn;
S49
else
C24
b = multiply(a, bIn);
S50
E25
computePower((AdvancedCalc)e);
C26
int added = add(a,b);
S52
}
C27
int divided = divide(added);
C53
System.out.println(e.average());
S28
return divided;
C11
getStats(e);
S12
System.out.println(e.multiply(6,20));
public int average(){
E29
b = 20;
E54 S55
int result = c+d;
public static void getStats(SimpleCalc e){
S31
return result;
}
E32
public static void computePower(AdvancedCalc e){
S33
System.out.println(e.power());
public AdvancedCalc(int aIn, int bIn){ a = aIn; b = multiply(a, bIn); }
}
S30
System.out.println(“a: “+ e.getA() + “ b: “ + e.getB());
protected int multiply(int c, int d){ int result = c*d;
S56
return result; }
}
E57
private int divide(int c){
S58
int result=a^b;
S59
return result;
int result = c/2;
S34
public int power(){
return result;
} }
a = 6;
E51
private int add(int c, int d){
}
S16
public AdvancedCalc(){
}
e = new AdvancedCalc();
E15
public class AdvancedCalc {
}
C9
S14
int multiply(int c, int d); }
C8
E13
int average();
b = 20;
{
S10
interface Calculator{
} E35
protected int multiply(int c, int d){
S36
int result=0;
S37
for(int i=0; i