User selections in order in file "listrectangle.c". ...... We plan to use Gemstoneâ¢, an object-oriented database that integrates transparently with Smalltalk.
SOFTWARE REFACTORING APPLIED TO C PROGRAMMING LANGUAGE
BY ALEJANDRA GARRIDO Licenciada, Universidad Nacional de La Plata, 1997
THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2000
Urbana, Illinois
To Federico, my beloved husband and unconditional partner.
iii
I would like to sincerely thank my advisor, Dr. Ralph Johnson, for his constant guidance and support. He has reviewed this thesis with significant patience and care. I thank Don Roberts for his valuable help with T-gen and how to approach refactoring. I thank my friends, both in Argentina and here in Urbana. They are constantly giving me strength and enthusiasm. My parents deserve plenty of credit that I had reached this moment. Their love, trust and encouragement stay with me wherever I go. I need to specially thank my husband, Federico Balaguer, with whom I shared everything from our deep love to the technical discussions that continuously help my research. He gives me the happiness and confidence that make me able to face any obstacle. Finally, I thank God for having granted me the skills and opportunities that made this possible.
iv
!
"
#
$
%
&
'
#
"
1.1 Background................................................................ ................................ ................. 2 1.1.1 Preventive Maintenance ................................................................ ....................... 2 1.1.2 Emergence of Refactoring ................................................................ .................... 3 1.1.3 Refactoring Techniques................................................................ ........................ 5 1.2 Existing Refactoring Tools................................................................ ........................... 7 1.3 Motivation................................................................ ................................ .................. 13 1.4 Contributions ................................................................ ................................ ............. 15 1.5 Organization of this Thesis ................................................................ ........................ 16 (
)
*
+
,
-
.
/
0
1
-
2
*
3
,
4
.
5
6
7
8
+
+
9
5
-
:
,
4
(
;
.
4
7
.
*
?
2.1 Proposal for a List of C Refactorings ................................................................ .........17 2.2 Requirements in a Tool for C Refactoring................................................................ ..25 2.2.1 User Interface Requirements................................................................ .............. 25 2.2.2 Functionality Requirements ................................................................ ................ 26 2.3 Reuse of the Smalltalk Refactoring Browser ............................................................. 27 2.3.1 Reusing the Design of the Code-Browsing Framework.......................................28 2.3.2 Reusing the Design of the Transformation Framework .......................................30 2.4 Difference between Smalltalk and C Refactoring....................................................... 34 @
A
B
C
D
E
F
G
H
I
J
K
K
L
M
K
F
@
N
E
M
B
O
D
K
F
P
Q
R
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
H
G
S
T
3.1 The Architecture of the CR Tool ................................................................ ................ 37 3.2 The User Interface ................................................................ ................................ ....40 3.3 The Program Database ................................................................ ............................. 43 3.4 Preprocessing ................................................................ ................................ ...........49 3.5 Parsing................................................................ ................................ ...................... 50 3.6 Type Analysis................................................................ ................................ ............52 3.7 Implemented refactorings................................................................ .......................... 55 3.8 Implementation concerns ................................................................ .......................... 61 U
V
W
X
Y
Z
[
\
]
Y
^
V
_
X
`
Z
a
b
c
d
e
V
f
W
g
Y
h
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
\
i
j
4.1 The Original Code ................................................................ ................................ .....63 4.2 Program Representation ................................................................ ........................... 66 4.3 Refactorings Applied ................................................................ ................................ .66 4.3.1 Renaming structure fields ................................................................ ................... 68 4.3.2 Renaming functions................................................................ ............................ 70 4.3.3 Renaming Variables in "rectangle.c"................................................................ ...71
v
4.3.4 More Variable and Structure Field Renaming in "listrectangle.c" ........................73 4.3.5 Creating New Structure from Global Variables ................................................... 75 k
l
m
n
o
p
q
r
s
k
t
u
v
w
x
y
z
t
u
y
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
{
|
}
5.1 Results ................................................................ ................................ ...................... 78 5.2 Future Work ................................................................ ................................ .............. 80 ~
~
vi
Figure 3.1. Program Representation Module................................................................ ........ 39 Figure 3.2. The CR Browser................................................................ ................................ 41 Figure 3.3. Interaction Diagram for CRBrowser and CRenameVariableRefactoring ............42 Figure 3.4. Hierarchy of program database elements ........................................................... 45 Figure 3.5. Structure name space................................................................ ......................... 46 Figure 3.6. Object interaction to obtain a restricted AST ..................................................... 48 Figure 3.7. Method 'acceptVisitor' in class CRAssignmentNode.......................................... 52 Figure 3.8. Hierarchy of CRType ................................................................ ........................ 53 Figure 3.9. Type declaration and instance diagram showing type nesting.............................54 Figure 3.10. Hierarchy of CRefactoring................................................................ ............... 56 Figure 3.11. Method 'preconditionsInContext' in CRenameVariableRefactoring..................57 Figure 3.12. Method 'replacer' in CRenameVariableRefactoring.......................................... 57 Figure 3.13. Method 'preconditionsInContext' in CRenameFieldRefactoring .......................58 Figure 3.14. Method 'replacer' in CRenameFieldRefactoring ............................................... 59 Figure 3.15. Method 'preconditionsInContext' in CRenameFunctionRefactoring .................59 Figure 3.16. Method 'replacer' in CRenameFunctionRefactoring ......................................... 59 Figure 3.17. Struct declaration in CCreateStructFromVariablesRefactoring.........................60 Figure 4.1. Original code of "rectangle.c"................................................................ ............ 64 Figure 4.2. Original code of "listrectangle.c" ................................................................ ....... 65 Figure 4.3. Contents of CProgramDB................................................................ .................. 66 Figure 4.4. User selections in order in file "rectangle.c" ...................................................... 67 Figure 4.5. User selections in order in file "listrectangle.c".................................................. 68 Figure 4.6. Code of "rectangle.c" after renaming struct fields .............................................. 69 Figure 4.7. Code of "listrectangle.c" after renaming struct fields ......................................... 70 Figure 4.8. Code of "rectangle.c" after renaming function ................................................... 71 Figure 4.9. Code of "rectangle.c" after renaming variables .................................................. 72 Figure 4.10. Code of "listrectangle.c" after variable and struct fields renaming....................74 Figure 4.11. Code of "listrectangle.c" after creating new structure....................................... 77
vii
¡
¢
£
¤
¥
¡
Software Refactoring is an important part of the software development process, especially in the maintenance phase. For many years, several studies revealed that the cost of software maintenance is greater than the cost of development ([Boehm 75], [Lientz & Swanson 80]). Although
the
problem
is
recognized,
there
are
not
many
tools
to
help
maintainers/programmers. A precise, comprehensive, usable, efficient and complete tool for software refactoring is much harder to build than an optimizer, because the later does not need to be usable and comprehensive and complete. It just needs to be precise and efficient in what it does. Software Refactoring is slowly rising as a helpful hand in the maintenance phase. Sometimes called "program transformations" or "software restructuring", the concept is roughly the same: modifying software to make it easy to understand and to change [Arnold 86], given that those modifications do not alter the external behavior of the software or its semantic; it only comprises syntactic manipulation. Syntactic manipulation of code requires two main components: a good code analyzer and a fast search&replace engine that can be tuned to replace what the analyzer specifies (for example, it has to take care of scope boundaries). The other requirements of a good tool concentrate on user needs: graphical visualization, efficiency, fine-grained as well as coarsegrained refactorings and undo capabilities. Such a tool may take many years to build and 1
become useful. However, we can build parts of it and combine them with other tools. Different languages need different tools as the analyzer depends on the grammar. It may also be true that after some of them are constructed, a framework for refactoring tools can arise. The first part of this chapter presents the issues of preventive maintenance, how refactoring emerged as a solution and the different approaches to refactoring. Then some existing tools are presented. As a conclusion for that introduction, the chapter offers the motivation and contributions of this work.
1.1 Background 1.1.1 Preventive Maintenance In the context of a software engineering process, maintenance is the phase that applies changes to existing software, as the environment and requirements evolve. There are four different types of changes applied during software maintenance: correction, to fix defects, adaptation, to meet changing requirements, enhancement, to add new functionality, and prevention, to improve the flexibility of the software anticipating the other kinds of changes. The last type of change originates what is known as preventive maintenance or software reengineering [Pressman 97]. What it prevents is the deterioration of software over time. Even after the software has been released, the environment is in constant change and it urges many modifications. Over the years, different software engineers handle modifications on the same system, modifications that should be perform as fast as possible to keep a good standing in a competitive market. As a result, at some point the system becomes unstable and any change can provoke serious side effects. However, if preventive maintenance is constantly applied in the process, every change is followed by a revision and restructuring
2
that leaves the software prepared and open to be further corrected, adapted or enhanced more easily. Preventive maintenance comprises several activities: inventory analysis, document restructuring, reverse engineering, code restructuring, data restructuring and forward engineering. A complete explanation of these activities can be found in [Pressman 97]. This dissertation focuses on the activities of code and data restructuring, what we call software refactoring. It is then targeted at the activity of the programmer of code modification, while he/she prevents the code to decay. Code restructuring comprises the detection of duplicate code, non-reusable code, illegible code, problems in data structures and many other problems that a compiler cannot detect because the semantics of the program are still correct. It has a detection phase and the corresponding correction phase. The correction phase should not modify the behavior of the program; it should leave it more maintainable, reusable and understandable to a new developer. 1.1.2 Emergence of Refactoring Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code, yet improves its internal structure [Fowler 99]. That is, while improving the code, it only implies syntactic changes. Another definition that Roberts presents in [Roberts 99] refers as Refactoring as a noun, that is, not as a process but as a function: “Refactorings are behavior-preserving transformations”. The idea of refactoring originated from restructuring object-oriented (OO) systems. The term was introduced by William Opdyke and Ralph Johnson in [Opdyke & Johnson 90]. 3
Nowadays, refactoring is being used to represent both the process and the techniques, and is applied to other than OO languages. The emergence of refactoring in the OO field can be attributed to the complications that programmers usually find when trying to achieve reuse. The OO paradigm appears to be the key to reuse, but in fact a reusable OO application is only obtained after several iterations and after gaining proficient knowledge about the application domain. Research on restructuring OO systems was first focused on inheritance [Bergstein 91], [Casais 91]. However, Opdyke seems to be the first in presenting a wider catalog of refactorings that not only involved moving things up or down in an inheritance hierarchy but also renaming entities, splitting components, removing conditional statements, etc. [Opdyke 92]. Opdyke describes low-level refactorings and then he composes these to form complex refactorings. The low-level or basic refactorings in his catalog are: -
Creating a program entity (where an entity is a class, a variable or a method)
-
Deleting a program entity
-
Changing a program entity
-
Moving a member variable
The combination of the above refactorings leads to the following composite refactorings: - Refactoring to generalize: creating an abstract superclass. - Refactoring to specialize: subclassing and simplifying conditionals. - Refactoring to capture aggregations and components.
William Griswold is another pioneer in the area of software restructuring. He has approached restructuring from a different perspective: helping the programmer to recognize
4
and plan for restructuring. His research group is working on the Star Diagram, a graphical view of a data structure and its uses in the code [Bowdidge & Griswold 94]. The Star Diagram has been implemented for Ada, C, C++ and Java. 1.1.3 Refactoring Techniques Opdyke was concerned about the automatization of refactoring that would prevent the introduction of errors during the process. He proposed and described each of the primitive refactorings and proved the preconditions that must be met to ensure that the transformation preserves program behavior. However, he did not construct a tool implementing his ideas. Recently, Fowler [Fowler 99] has published a book on Refactoring. He is concerned about the process of refactoring object-oriented systems when there may be no tools available. The book presents a catalog of refactorings, with examples in Java. A similar approach has been under development at the University of Edinburgh, UK. In this case, Stevens and Pooley are concerned about the process of reengineering at organizations, and the problem that poses a significant challenge to software engineers: reengineering legacy systems [Stevens & Pooley 98]. They consider those legacy systems that “are too valuable to the business to be discarded, but are too hard to change to be enhanced without restructuring.” Stevens and Pooley express in their paper what they believe is the most important problem: the difficulty of transferring the expertise on software engineering to those who need it. System Reengineering Patterns are proposed as descriptions of expert solutions to a common systems reengineering problem, including context, advantages and disadvantages. There are four candidate reengineering patterns in [Stevens & Pooley 98], named: “Divide and Modernize”, “Externalizing an internal representation”, “Portability through backend abstraction” and “Deprecation”. As their 5
names imply, these are very abstract patterns, and they are not concerned about automatization or implementation steps for each pattern. Manual program manipulation can be highly error prone. The errors arise from such basic changes as renaming a variable: although we can have a smart search&replace engine allowing regular expressions as input such as that in Emacs [GNU], we can never write a regular expression that considers the scope of the variable (by definition of regular expression). Furthermore, finding duplicate code and converting it into calls to an extracted new function can be really cumbersome. A complete refactoring tool would comprise powerful visualization mechanisms for program elements and their uses, detection of duplicated code or other targets of refactoring, and automated transformations that ensure behavior preservation. Such a complete tool would be so complex that no one has approached it all together. Different research projects have implemented one of these parts and for a specific programming language. There are still no integrated tools, or tools that can be applied to different programming languages, if such a tool is even feasible. All transformation tools start by creating an abstract syntax tree (AST) from the program. The AST is easily and quickly constructed during parsing but can take substantial space. For this reason, only the required part of the AST is constructed. Many of the analysis functions required by the transformation tool can be generated from the AST. In some cases, more complex program representations are used, as program dependency graphs (PDGs). In [Bowdidge & Griswold 98], the authors employ PDGs to find the users of a variable. However, PDGs are too complex, require considerable time to build
6
and much space. In [Roberts 99] the author points out that ASTs contain sufficient information to implement powerful and fast refactorings, so PDGs should be avoided. Another approach to software refactoring is proposed at the University of Texas at Austin by Tokuda and Batory. It consists of manual manipulation of class diagrams for the C++ programming language [Tokuda & Batory 99]. Changes to the diagrams automatically trigger corresponding changes in the source code. They believe that interactive edition of class diagrams can be as revolutionizing as graphical user interface editors. They classify the possible refactorings in "Schema Refactorings", "C++ Refactorings" and "Pattern Refactorings". Examples of "Schema Refactorings" are adding new instance variables and moving methods up or down in the class hierarchy. "C++ Refactorings" are those specific to upgrading structures and procedures to classes and methods, while "Pattern Refactorings" allow the application of some basic design patterns to transform the class diagram. They have a tool under development that presupposes the existence of class diagrams. The next section outlines some of the available refactoring tools.
1.2 Existing Refactoring Tools This section presents some outstanding projects on Refactoring. The first two projects, the Refactoring Browser and the Maintainer’s Assistant, are tools that can perform a set of refactorings automatically or semi-automatically (with some input from the user). Then the Star Diagram is a graphical visualization tool that can help in identifying uses of a data structure that need refactoring. It allows for a few transformations. The next tool outlined is called DUPLOC, and can find and display duplicate code in a matrix representation. The next tool presented is a commercial product that also recognizes duplications in the code.
7
♦ The Refactoring Browser The Refactoring Browser is a tool implemented in VisualWorks and VisualAge for the Smalltalk language at the University of Illinois at Urbana-Champaign ([Roberts et al 97], [Roberts
99],
http://st-www.cs.uiuc.edu/users/brant/Refactory/RefactoringBrowser.html).
The success of the tool is mostly due to its complete integration with the Smalltalk environment and the development tools. The Refactoring Browser can be considered as an extension to the Smalltalk development browser. The Refactoring Browser operates by first parsing the code to be refactored and creating an abstract syntax tree (AST). The available transformations are encoded as templates in the form of ASTs, which may contain template variables. The transformation is accomplished by a parse tree rewriter that matches the concrete AST with a template AST and performs tree manipulation. The Refactoring Browser implements the preconditions proposed by Opdyke, and it also uses postconditions, that were proposed by Roberts [Roberts 99]. Postconditions help to eliminate some of the analysis in proving preconditions inside composite refactorings. Preconditions are implemented as instances of class Condition that are created and evaluated before applying a transformation. A condition when evaluated checks certain information from the Smalltalk environment. Another component of this framework is a change manager, which is responsible for recording which refactorings are performed. This allows for the implementation of undo and logging. Some of the available refactorings are: -
Extract Method
-
Extract Expression to Temporary 8
-
Inline Method / Parameter / Temporary
-
Move Method
-
Protect Instance Variable
-
Pull Up / Push Down Variable
-
Push Up / Push Down Method
-
Remove Class / Variable / Method / Parameter
-
Rename Class / Variable / Method / Temporary
-
Temporary to Instance Variable
The seamless integration of the Refactoring Browser with the Smalltalk environment has only one drawback: reusing it for other environments or programming languages is complicated. It is not only a problem of the interface with the environment and the user, but also a problem of coupling between the transformation subsystem and the nodes of the AST that the tool generates. The next Chapter describes the design of the Refactoring Browser in detail and how we can still apply the main ideas behind the design to construct a refactoring tool for C. ♦ The Maintainer’s Assistant The Maintainer’s Assistant is a tool developed by Tim Bull at the University of Durham, UK [Bull 94] (see http://www.dur.ac.uk/CSM/projects/ma/). The tool is part of the BYLANDS project at the same university. The project concentrates on reverse engineering of existing code using formally-proven, semantic-preserving program transformations on a Wide Spectrum Language (WSL) and theoretical work on the analysis of real-time programs (http://www.dur.ac.uk/~dcslejy/Bylands). 9
WSL was originally designed to simplify proofs of program equivalence. It is based on a sound mathematical basis of set theory and first order infinitary logic, and every transformation has been rigorously proved. The use of infinitary logic eliminates the need to determine loop invariants or fixed points of functionals when transforming loops. Using the Maintainer’s Assistant, the program code is first translated into WSL. An automatic translator is provided for IBM 370 assembler. Once in WSL, the user can interactively apply transformations to the code or the assertions of WSL. The transformations are represented by MetaWSL, an extension to WSL that incorporates pattern matching, template filling functions, statements for moving within the AST, etc. The user can select available transformations or write his own transformation using MetaWSL. The available transformations can be categorized in: -
Introduce, remove, manipulate assertions
-
Reordering conditionals
-
Removing conditionals that follow an assertion
-
Merging/splitting assignments
-
Inserting/eliminating assignments
-
Removing unused local variables
-
Expanding & factoring
-
Unrolling & rolling loops
-
Merging loops
10
Although the Maintainer's Assistant is founded on a sounded theoretical work, it does not appear to have practical application to the myriad of real world legacy systems. The program code has to be in WSL or IBM 370 assembler. It is possible that automatic translators from other programming languages into WSL and back could be constructed, but it would still have drawbacks. Firstly, the user needs to learn WSL and learn to manipulate program elements (as assertions) that might be very different from the ones he/she had in the original program. Secondly, it is not clear how pointer arithmetic present in conventional languages gets translated and manipulated in WSL. Finally, the transformations that apply to a program in WSL do not necessarily turn the code more readable and reusable when translated back into the original code. ♦ Star Diagram The Star Diagram is a visualization tool developed as one of the projects of the Software Evolution Lab at the University of California at San Diego, under the direction of William Griswold (see http://www-cse.ucsd.edu/users/wgg/stardiagram.html). The Star Diagram shows in a tree-like representation all the computations that refer directly or indirectly to a variable or data structure [Grisworld et al 96]. It implements what is known as program slicing [Weiser 84]. Therefore, the tool is very helpful for recognizing data coupling, interrelation of code fragments through shared variables, and the impact of a change to a data structure in the application. The diagrams for a program are generated from the abstract syntax tree and program dependency graphs [Bowdidge & Griswold 98]. Although the tool allows for simplifying the visualization of slices, large program fragments can result in very complex diagrams.
11
The tool also allows restructuring of programs to support data encapsulation. The transformations appropriate for data encapsulation include "Extract Function", "Inline Function", "Extract Parameter", "Inline Parameter", "Rename Function", "Move into Interface" and "Remove from Interface". The visualization provided by the Star Diagram is targeted at supporting the specific task of data encapsulation. Therefore, other kind of transformations would require the assistance of other views. The problem is that creating specialized graphical views requires a great deal of effort that unsettle the reason of its worth. Then, the user is required to fall back on restructuring manually via the text view [Bowdidge & Griswold 98]. ♦ Duploc Duploc is a visualization tool developed at the University of Berne, Switzerland [Rieger & Ducasse 98] (see http://www.iam.unibe.ch/~rieger/duploc/). Duploc reads in a C++, C or Java source code file, and detects duplicated code in it by using simple string matching lineby-line. It allows the visualization of duplicated code in a two-dimensional matrix. There is a dot in the matrix for each line that matches. The user can select a dot and choose to see the actual code that matches in a two-pane window, where the duplication appears in red. The tool does not allow applying refactoring, but it is helpful for visualizing sequences of duplicated code, signaled by different configurations of dots in the matrix. A disadvantage of the tool is that it only finds exact matches. ♦ Clone Doctor CloneDRTM is a commercial product sold by Semantic Designs, Inc. The tool automatically locates exact and near-miss duplicate code in applications written in C, C++, Java or COBOL (see http://www.semesigns.com/Products/Clone/index.html). 12
CloneDR differs from Duploc in its detection technique, because it uses parsing of ASTs instead of string matching. Therefore, it claims to find duplicates in spite of changed comments, white spaces, line breaks, or different variable names. The tool can also remove duplicates automatically or interactively. Automatic replacement uses preprocessor macros or subroutine calls, generated from the detected duplicates. CloneDR is part of another product call The Design Maintenance SystemTM, that supports construction and maintenance of systems, capturing specifications and designs.
1.3 Motivation The motivation for this work comes from a real project for a company that manufactured flight management controllers. Two years ago we were involved in refactoring the code of the Man-Machine Interface (MMI) subsystem of the Flight Management System for this company. The whole system was coded in C, with the exception of a few routines in Assembler. This was of course a real-time system, where fast response was the priority together with minimum memory space. Many developers had added their code over the years, patching, duplicating, and the code has turn impossible to understand and to modify. The company had two prospective clients whose demands would only be met with a competitive fast development time. Among the new requirements, the MMI had to operate under different modes, some menus had to change and the width of the screen had to grow. The last two requirements seemed very simple. However, the code was so poorly moduled, inflexible, hard-coded and unreadable, that just widening the screen required changing every file of the MMI. Constant values were hard-coded, variables were accessed globally, there was no data structures defined, common functionality and utility routines were merged with
13
application-dependent statements, there were long sequences of "if" statements, etc. The code was working for current clients, but it was impossible to maintain. At that point, there was no tool to help us refactoring the code. After the few weeks that took understanding the code, we wrote some Emacs scripts to modify it. The scripts could rename variables and change constant values for constant identifiers, but we had to run the scripts interactively, to make sure that a variable redefinition was not modified, or that the constant value did really represent the identifier we were changing it for. The major refactoring we had to do by hand was to create data structures grouping global variables and passing the data structure as a parameter to the functions that needed them. We also extracted many functions and changed the duplicated code for calls to those functions. The only tool that was targeted at C restructuring was the C-Star Diagram. However, it did not help because we knew each variable was accessed globally all over the code, so just looking the graph of places where it was accessed was not only impossible to read, it was useless. What we needed was a tool to perform simple refactorings as renaming of program elements, definition of structures and addition of parameters to functions. Finding duplicated code and extracting functions would have also been useful. Therefore, we needed a tool just like the Refactoring Browser, but for C code, and there wasn't any. We needed fine-grained refactorings, in an efficient and reliable tool, and we bet there are a lot of people in the same situation. There is much legacy code written in C, in many different platforms that needs maintenance. That project motivated us to try building such a tool: a tool with the same characteristics as the Refactoring Browser but outside the object-oriented arena. A tool that could run in different platforms and that could work in most of the cases of C programs. Of course just
14
dealing with C was hard enough, for the complication of its grammar and side effects of pointers, so it is unreal to say it will work on all possible cases. The next section outlines what we could accomplish in this work.
1.4 Contributions The contributions of this work are: •
A catalog of C refactorings. We propose a list of refactorings meaningful for C programs.
•
Insights on the reuse of a refactoring tool for another language. We have implemented a tool that can perform some of the refactorings in the proposed list. The tool was implemented in VisualWorks™ reusing the design of the Smalltalk Refactoring Browser. We could gather more experience on the components that could constitute a framework for refactoring tools.
•
Implementation of a program database. Refactorings need to be fast. Repetitive construction of abstract syntax trees for program analysis may degrade the speed of a tool. Conversely, maintaining them may take considerable space. We implemented a component that can gather information from the abstract syntax tree of the program and discard the tree. In this way, the analysis of the preconditions of a refactoring can be performed faster and, when rewriting, only the abstract syntax tree for the scope involved needs to be re built.
•
Definition of requirements and complexities when dealing with preprocessor macros during refactoring.
15
1.5 Organization of this Thesis The next chapter proposes a list of meaningful C refactorings. It then describes how performing refactorings to an object-oriented untyped language like Smalltalk differs from performing them in a procedural typed language like C. It shows what could be reused from the Smalltalk Refactoring Browser. Chapter 3 describes the implementation of the refactoring tool for C. Chapter 4 provides some examples. Chapter 5 summarizes the conclusions of this work and outlines future work.
16
¦
¹
§
º
»
¨
¼
©
º
ª
«
½
¬
¾
®
¯
«
°
¨
±
ª
²
¬
³
´
µ
¶
©
©
·
³
«
¸
ª
²
¦
¿
This chapter first proposes a catalog of C refactorings. Each refactoring is described at different levels of detail depending on its complexity. The second section presents the qualifying factors for a good refactoring tool for C. We found that the Refactoring Browser meets most of these qualifying factors but for Smalltalk. Then, the next section discusses how the main design ideas behind the Refactoring Browser can be reused and the consequences of doing so. These consequences are mapped in the following section to the intrinsic differences between the kind of transformations in Smalltalk and in C.
2.1 Proposal for a List of C Refactorings The kinds of refactorings that are meaningful for C are very different than those for Smalltalk. Most of the literature about refactoring is concentrated in object-oriented languages, and transformations in the inheritance hierarchy. The only catalog that we found that is not object-oriented is for a formal logical language, WSL [Bull 94]. The kinds of refactorings for WSL are very different too, as for example in WSL the concept of 'variable' is different (as in any functional language), there is no concept of pointers and the language contains logical assertions that can help to prove pre and post-conditions of refactorings.
17
Before constructing a refactoring tool for C, we defined the kinds of refactorings that C programmers would consider valuable. We extracted from existing catalogs ([Opdyke 92], [Bull 94], [Fowler 99]) the ones that can apply to C and added some from our experience with C programming. For example, there is a category called "Changing a Program Entity" just like the one in [Opdyke 92]. However, the transformations inside that category apply only to the kind of program entities that a C program can have. This catalog does not intend to be complete and exact, but it can be a good starting point for future work and discussion. Moreover, it only includes transformations whose analysis can be performed in a reasonable amount of time. The following is a list of refactorings grouped in four categories related to their function and granularity:
1. Adding a Program Entity: a) Add a variable b) Add a parameter to a function c) Add a typedef definition encapsulating an existing type d) Add a field to a structure e) Add a pointer to a variable
2. Deleting a Program Entity a) Delete unused variable b) Delete unused parameter c) Delete a function
18
3. Changing a Program Entity: a) Rename variable b) Rename constant c) Rename user-defined type d) Rename structure field e) Rename function f) Replace the type of a program entity g) Contract variable scope h) Extend variable scope i) Replace value with constant j) Replace expression with variable k) Convert variable to pointer l) Convert pointer to direct variable access m) Convert global variable into parameter n) Reorder function arguments
4. Complex refactorings: a) Group a set of variables in a new structure. b) Extract function c) Inline function d) Consolidate conditional expression e) For into while
19
f) While into for g) While into do while
Each of the refactorings is explained next. 1. Adding a Program Entity: most of them are part of complex refactorings. a) Add a variable: add an unreferenced variable to a specific scope. Used by refactoring "Group a set of variables in a new structure" to define the variable for the structure. b) Add a parameter to a function: add a new argument to the definition of a function. This refactoring is rather complex because every call to the function has to be changed adding the actual parameter or expression with the help of the user. c) Add a typedef definition encapsulating an existing type: if a structure or enumeration is selected, create a typedef definition around it with a new name for the selection. Otherwise add a new typedef. d) Add a field to a structure: add a field with a new name to an existing structure definition. e) Add a pointer to a variable: add a declaration of a pointer variable with pointed type equal to the type of the selected variable, and assign to the pointer the address of the selected variable.
2. Deleting a Program Entity: these are also used as part of complex refactorings. a) Delete unused variable: remove the declaration of a variable that is not referenced in its scope.
20
b) Delete unused parameter: remove a parameter from the function definition of a function whose body does not use the parameter. The actual parameter has to be removed from every call to the function. c) Delete a function: remove a function only when it is not called.
3. Changing a Program Entity: these refactorings include renaming as basic ones and conversions of type, scope, reference or statements as complex ones. a) Rename variable: change the name of a variable throughout and only in its scope. No other variable with the new name may exist in the same scope. The scope may be local to a function (including parameters) or composite statement, global to a function or global to a file. If the new name conflicts with the name of another variable defined in an outer scope, the user is inquired to continue. b) Rename constant: change the name of an identifier defined as const, in the same way as a variable is renamed. c) Rename user-defined type: change the name of a structure, enumeration or typedef definition. d) Rename structure field: change the name of a field in its definition and every use of the field throughout the scope of the structure. Do not rename a field with the same identifier but belonging to another structure. e) Rename function: change the name of a function in its definition and every call to the function in the file where it is defined and the files that include that one.
21
f) Replace the type of a program entity: the new type has to be previously defined and be compatible with the old type. This transformation is mostly used after a new typedef has been defined. g) Contract variable scope: when the use of a variable is selected inside a specific scope, and only if the variable is not used anywhere else in the outer scope where it was defined, move the declaration of the variable to the inner scope. h) Extend variable scope: move the definition of a variable to the immediate outer scope, only if it does not shadow an outer definition of the same variable. i) Replace value with constant: replace every occurrence of the same value for a new or existing constant. The values reached by this refactoring are those in the same file as the one selected and all files that include it. j) Replace expression with variable: replace an expression for a new or existing variable. If a new variable is to be defined, the type of the variable is defined to be the same as the type of the expression. Every occurrence of the same expression inside the same scope as the selected one is replaced. k) Convert variable to pointer: change the definition of a variable for a pointer to a variable with the same type. Also change every use of the variable for a pointer reference. The name of the variable does not change when it is upgraded to a pointer. l) Convert pointer to direct variable access: change the definition of a pointer for a variable of the same type of the pointed type. The name of the identifier does not change when it is downcasted to an unreferenced value. In case of nested pointers it only removes the outer pointer.
22
m) Convert global variable into parameter: find all functions that access the given global variable. For each of the functions, add a formal parameter to the function definition, add the global variable as an actual parameter in the function call, and replace access to the global variable inside the function for the parameter. The user may choose if the parameter should be a pointer type. n) Reorder function arguments: reorder the arguments in a function definition and in all calls to that function.
4. Complex refactorings: the refactorings that follow are composed of other refactorings. Their complexity is higher because they need to maintain more preconditions. a) Group a set of variables into a new structure: legacy systems often overuse global variables, which make a program non-reusable. Most changes, as minor as they can be, require global update. However, programmers use global variables when they otherwise should pass many of them as parameters. Passing too many parameters to a function can increase the calling time. The remedy to this problem is to define data structures grouping isolated variables, and pass a single reference to the structure as parameter. This refactoring defines structures by grouping existing variables. The second part of converting global access to parameters by reference is handled by the refactorings "Convert global variable into parameter". b) Extract function: replace a complicated expression or statement list with a function call. As pointed out in [Fowler 99], this is a very common transformation. When a function gets too long or complicated, a good practice is to divide it into smaller
23
fragments, turn each fragment into a function and replace the long function for a shorter one with function calls. The code extracted is scanned and for each reference to local variables in the source function, a parameter is added to the extracted function. This refactoring is common also to replace direct variable access by calls to accessor functions. c) Inline function: replace the selected function call with the body of the called function. Every formal parameter and local declarations inside the called function are added as declarations inside the calling function, changing names if necessary to avoid collisions. Then, an assignment statement is added to the beginning of the inlined code for each formal parameter, assigning the actual parameter expression. If the called function returns a value, the code is inlined before the expression that calls the function, and every return statement in the body of the function is transformed into an assignment to a new variable. The expression that contained the call now uses the new variable. d) Consolidate conditional expression: join adjacent cases in switch. e) For into while: transform a 'for' statement into a 'while' statement. f) While into for: transform a 'while' statement into a 'for' statement. g) While into do while: transform the statement, reversing the condition of the while for the do while.
24
2.2 Requirements in a Tool for C Refactoring The requirements of a refactoring tool for C programs may be divided into two categories: user-interface requirements and functionality requirements. The following subsections list and explain each requirement of the corresponding category. 2.2.1 User Interface Requirements a) File editor: the user should be able to load a file into an editor and edit the code as well as refactor the code. In refactoring mode, the editor should provide a context-sensitive menu with the refactorings available for the selection. b) Simultaneous loading of multiple files: almost all interesting C programs are divided into several files, at least in two: the .h and the .c. The functions and user-defined types in one file are "imported" into other files with the #include directive. Also, variables declared in an included file can be used in successive files by declaring them with the specifier extern at the front. All this implies that a modification to one file may need to be spread to several files. It is also helpful to be able to look and modify different files at the same time. A tool cannot find out all files that include a given file unless the user specifies them. Moreover, a file usually includes library routines that should not be modified. Therefore, the user is responsible for specifying all related files that he wants to see at the same time or that can be affected by a refactoring in one of them. c) Interactive: as Roberts and Brant point out in their chapter in [Fowler 99], programmers tend to perform refactoring of their code when they can do it interactively while adding new features.
25
d) Search capabilities: before a refactoring is applied, programmers perform various searches to estimate the impact of a change. e) Multiple views: users should be able to split the browser into several windows. When examining code is very useful to be able to look at different pieces of code at the same time. 2.2.2 Functionality Requirements a) Behavior preservation: an intrinsic requirement of a refactoring tool is to preserve the semantics or the behavior of the code after restructuring. A refactoring should not be able to proceed if it is unsafe with respect to behavior preservation. This requirement is essential to the usability of a refactoring tool: if refactoring introduces new bugs, programmers will not apply them. In his dissertation, Opdyke proposed preconditions based on certain program analysis to ensure that a refactoring preserves behavior. A refactoring tool should then be able to perform certain program analysis to check the preconditions of refactorings. b) Speed: another criterion for the usability of a refactoring tool is that it should not take too long, where "too long" depends upon the usual "waiting time" of a programmer of a particular language [Roberts 99]. C programmers may be used to a considerable compilation waiting time, but the refactoring should never take longer that performing it by hand. This criterion determines the types and complexity of refactorings available. c) Undo capability: this feature is very important for any exploratory system. If a refactoring cannot be undone, programmers are less willing to take the chance. Undo capabilities provide the freedom to transform the program, knowing that changes are not 26
permanent; in essence, they provide freedom to fail. If undo is not provided, then a versioning system should be. d) Integration in the development environment: an integrated development environment (IDE) joins the editor, compiler, linker, debugger and any other necessary tool for a specific programming language. Roberts and Brant claimed that when they integrated the Refactoring Browser directly into the Smalltalk browser (the editor), the tool became useful. The problem with this requirement is that current IDEs for C are not "open source", while the Smalltalk environment is completely open to extensions. The lack of availability of open IDEs forces to create new environments.
2.3 Reuse of the Smalltalk Refactoring Browser The Refactoring Browser for Smalltalk meets most of the requirements listed above that are not specific to C programs. Moreover, it is the only refactoring tool widely accepted and successful. Given that our objectives for a refactoring tool for C are so similar to those for the Refactoring Browser, we decided to reuse as much as possible from its design. This section outlines the main components of the Refactoring Browser as presented is [Roberts 99]. For each component, its reusability is evaluated in the context of a refactoring tool for a non-object-oriented language like C. The Refactoring Browser is divided into two subsystems: the Transformation Framework and the Code-Browsing Framework. Following is a description of each subsystem.
27
2.3.1 Reusing the Design of the Code-Browsing Framework The Code-Browsing Framework is pretty much related with the standard look-and-feel of the Smalltalk browser. However, the main presentation ideas can be reused in other refactoring tools. The main components of this subsystem are: the refactoring browser, environments, navigators, code tools, code models and navigator states. ♦ RefactoringBrowser This class plays the role of a Façade [Gamma et al 95] in the Refactoring Browser, holding onto the other components of the browsing framework. The look-and-feel of this browser follows pretty much the same as the traditional Smalltalk browser: different list panes in the upper part show categories, classes in a category and methods in a class, while the lower text pane is an editor for the selected class or the selected method. C programs are not as hierarchically divided into component parts as Smalltalk programs, but they are divided into files. A C programmer is used to look at code one file at a time, while a Smalltalk programmer is used to looking at the code one method at a time. As a consequence, a C tool would have only one list pane in the upper part to select file names, and a text pane in the lower part for the code of a file. ♦ Environments An environment, as defined by Roberts in his dissertation, is an arbitrary collection of classes and methods. Although still ignored in the Refactoring Browser, an environment is supposed to restrict the scope of a refactoring or arbitrary search. In the Smalltalk environment, the definition of restricted inner environments is necessary because everything is available at once, except for some dialects of Smalltalk that have recently introduced name spaces.
28
In the case of C programs, all files are not available at once. The user is commonly able to select the files that he/she needs to load, as mentioned above in the user-interface requirement of multiple files. The requirement explained that the user is responsible of specifying all related files that he/she wants to load. By selecting the group of files to load, an environment would be implicitly created, and all searches and refactorings would span into that specific environment. ♦ Navigator A Navigator in the Refactoring Browser is a set of panes that allow the selection of categories, classes and methods for code browsing. Different subclasses allow changing the way the tool displays the elements of the environment. It is difficult or unnatural to find in C more than one way to display files. May be the tool could be extended to be able to select functions inside files, and thus display the code of a specific function in the code pane. If that is the case, navigators could be used. ♦ CodeTools A À
Á
Â
Ã
Ä
Á
Á
Å
is a view on the lower portion of the
current selection in the upper Ô
Õ
Ö
×
Ø
Õ
Ù
Ú
Û
. Ü
Ú
Ý
Þ
ß
Ú
Ú
à
Æ
á
Ç
È
É
Ê
Ë
Ì
Í
Î
Ï
Ð
Ñ
Í
Ì
Ò
Ó
Ç
that depends on the Í
can be textual or graphical, so a class
hierarchy may be seen as a class diagram with the specific subclass of Ü
Ú
Ý
Þ
ß
Ú
Ú
à
.
On the contrary, C files do not have more than one textual way to see them. Graphical views could be valuable, for example to show callers of a function, but the diagram can turn so populated that it would be better to show it in a separate window.
29
♦ CodeModel and NavigatorState These two classes are used in the Refactoring Browser to determine the corresponding CodeTool for a particular selection. As CodeTools are not used for C refactoring, these classes are not used either. 2.3.2 Reusing the Design of the Transformation Framework The design of the Transformation Framework can be easily reused by other refactoring tools. That is, a tool would have the same components: refactorings, conditions, parser, tree rewriter, formatter, and change objects. The implementation of these components, however, is not reusable, as we explained below. A tool for C refactoring also needs another component that the Refactoring Browser can take for granted: the Program Database. It corresponds to the Smalltalk program database. ♦ Refactorings Every available Smalltalk refactoring is implemented by a subclass of subclass must implement two methods: í
î
ï
ð
ñ
ò
ó
must be met for the refactoring to be safe, and
ô
õ
÷
ô
ø
ñ
ù
ò
ã
ä
å
æ
ç
è
é
ê
ë
ì
. Each
, which returns the preconditions that ö
ú
â
û
ù
ü
ý
ø
ú
þ
ÿ
û
ù
, which carries out the
refactoring. A refactoring tool for C should also have a class for each different refactoring, implementing the same two methods. However, the same superclass used. The reason is that ý
ø
ú
þ
ÿ
û
ù
ý
ø
ú
þ
ÿ
û
ù
cannot be
implements many utilities and support methods that are
only specific for Smalltalk, or at least for an object-oriented language.
30
♦ Conditions There is a single
class. Different instances represent predicates that, when
evaluated with the corresponding argument, return a boolean value that determines if the refactoring is safe. The predicates correspond to analysis functions. Most of the analysis is resolved by the program database, like whether a class defines an instance variable of a given name. They can also turn into complicated blocks. As instances of
are created by different class methods, and they hold onto the
block that gets evaluated when the message
is received, we could use this same class.
Existing class methods could not be reused but new ones could be incorporated into a new category. ♦ The Parser For the Refactoring Browser they had to build their own Smalltalk parser, which they augmented with the ability to parse pattern variables and whose output is an abstract syntax tree. They did not use the standard Smalltalk parser for three main reasons: (1) they needed to extend the grammar with pattern variables, (2) they had to maintain information that usual parsers discard, like comments or formatting and (3) the design of the standard parser was very poor to be extended. VisualWorks™ comes with a parcel called "DLLCC-CParsing" that includes a scanner, parser and preprocessor for a reduced syntax of C programs that can be filed in as primitives. For the same three reasons as the authors of the Refactoring Browser, a new C parser has to be constructed, that also returns the abstract syntax tree of the program.
31
♦ Tree Rewriter The Tree Rewriter is the one that actually performs the refactoring, by manipulating the AST. It may be seen as the heart of the transformation framework. When the message
is sent to an instance of a
is parsed, obtaining an AST, an instance of
subclass, the source code involved
!
"
#
$
is created and it is asked to
rewrite the AST. The
!
"
#
$
is a visitor (from the Visitor Pattern in [Gamma et al 95]) of
parse tree nodes. It has a method specialized in each class of node. When it visits a node, it first checks if the node is a match for what it was supposed to search. Matching is evaluated by comparing two subtrees: the subtree representing the code that has to be changed and the subtree being visited. If it matches, the rewriter replaces the subtree being visited with a target subtree where the transformation has been applied. If the node does not match the source subtree, the rewriter visits its parts and then reassigns them to the parent node. Another possible approach to perform the transformations is to obtain first all the places that use or refer to the program element being changed, and then alter only those places. However, in most cases is faster to visit the entire AST than to calculate data flow information and apply it. Furthermore, The Tree Rewriter can be used in all cases of refactoring, even for transformation of statements, methods or blocks. The implementation of %
&
'
(
)
*
'
)
)
+
)
&
'
,
-
)
'
and %
&
'
(
)
*
'
)
)
.
)
/
'
0
1
)
'
cannot be used for
another language different of Smalltalk because they are implemented as visitors of specific classes of node, which depend on the grammar. A different language requires new classes as searcher and rewriter.
32
♦ Formatter The Formatter is another visitor of the abstract syntax tree that unparses it after refactoring, that is, it gets the source code back from the tree. A different Formatter is needed for another language different than Smalltalk. ♦ Changes The class 2
3
4
5
6
7
8
9
:
;
is an example of the Command pattern [Gamma et al 95]: its 3
instances commit a change into the environment, and can roll the change back, or undo the change. Committing a change means that the program database gets updated. There is one subclass of 2
3
4
5
6
7
8
9
:
;
3
for each primitive refactoring, and one representing composite
refactorings that is composed of more primitive ones. Changes are logged into a ?
@
A
B
C
D
E
F
G
H
I
J
B
H
B
I
@
F
, which is a Singleton [Gamma et al 95] maintaining the list of all
refactorings perform to an environment or in a session. As
?
@
A
B
C
D
E
F
K
L
M
B
H
I
@
is targeted to Smalltalk refactorings or transformations, a new
hierarchy is needed for a different programming language. ♦ Program Database The Smalltalk environment constantly maintains the various program entities and their relationships. The information is not maintained in a traditional database, but it is a searchable repository [Roberts & Brant 99]. Programmers can search for cross-references to any program element, mainly because any change to a Smalltalk class is immediately compiled into bytecodes and the database is updated. This program database speeds up the analysis of the preconditions of refactorings since the analysis information does not need to be recomputed every time.
33
When the language does not come with dynamic compilation, the program database has to be maintained explicitly. Our approach will be presented in the next chapter.
2.4 Difference between Smalltalk and C Refactoring This section presents the characteristics that make Smalltalk refactoring different from C refactoring. These characteristics had to be considered to implement the tool for C while trying to reuse as much as possible from the tool for Smalltalk. ♦ Program database As mentioned previously, the program database is implicit and omnipresent for the Smalltalk environment. The first difference with C is that we have to create this program database from the C files and update it explicitly as we transform the files. The program database should be able to respond to queries about program entities in an efficient way, that is, it should be accurate and fast, where fast usually implies small. Creating and updating this repository efficiently restricts the type of analysis and therefore of refactorings allowed. ♦ Scopes and includes In Smalltalk, the scope of a variable can be: the whole environment, a set of classes, a class, all instances of a class, a method or a block. The scope is determined very easily from where the variable was declared: either as global, as class variable, as class-instance variable, as instance variable, as temporal in the first line of a method or as temporal at the head of a block. Something important is that the only variables that can be shadowed, i.e., whose declaration can be overlapped, are temporal variables.
34
In C, the scope of a variable can be: a set of files, a file, a function or a compound statement. Determining the scope of a global program element is not straightforward. For example, a variable v declared outside any function in a file f1 is visible from its declaration to the end of the file. However, if another file f2 includes f1, and declares v using the keyword extern in front, the scope of the variable is extended to f2. Moreover, in C any variable can be shadowed. When performing a refactoring on a variable, the first thing to determine is its scope. The browser has to interact with the program database to obtain the scope. Then, only that scope has to be parsed, and the transformed code has to be reinserted where appropriate. Also, if the variable is redefined in an inner scope, the refactoring should not touch the inner scope. Only the outer definition has to change. As in Smalltalk only temporal variables can be redefined, and that is not common, it is not very important to impose this rule, and so the Refactoring Browser in that case does change the inner definition too. In the case of C, scope rules need to be strictly imposed to assure behavior-preservation. Moreover, additional refactorings as "Contract" or "Extend variable scope" are appropriate for C. Another refactoring that may occur in C is when a variable declared in an inner scope is renamed as a variable in an outer scope. The outer variable would be shadowed in the refactoring is performed, but it may not be illegal to do so. A possible strategy is to ask to user if he really intends to shadow the outer declaration. ♦ Typing An important difference between both languages is that Smalltalk is untyped, while C is strongly typed. Therefore, many refactorings in C concern types, as "Rename user-defined
35
type", "Replace the type of a program entity" or "Rename structure field". This determines that a Type Checker is another necessary component of a C refactoring tool. A related consideration is that in Smalltalk all variables are pointers to objects. In C, however, that is not the case. Then, additional refactorings for a C program include upgrading a variable to a pointer or vice versa, converting a pointer to a direct variable access. ♦ Defines & Macros Smalltalk does not have preprocessor directives as C. Preprocessor directives are used to include files, to define constant values or macros or for conditional compilation. File inclusion affects the scope of program elements as exposed above in the scope considerations. Define directives are however a complicated issue in refactoring. A C compiler calls a preprocessor as the first step in the compilation process. As a result, the preprocessor outputs a new source code where all preprocessor directives have been stripped out and "defines" have been replaced in the code. We decided that a refactoring tool should not stripped out preprocessor directives when the files are loaded. Doing so would mean showing the programmer a different source code than the one he/she would edit, causing disorientation. However, the directives should be evaluated. A huge concern is whether a programmer should be allowed to refactor macros, as that can turn really difficult. None of the existing tools for C or C++ that we know of deal with macros. More considerations about preprocessing are discussed in the next chapter.
36
N
O
P
Q
R
S
T
U
V
W
X
Y
Y
Z
[
Y
T
N
\
S
[
P
]
R
Y
T
^
_
`
Previous chapters described refactoring, how to apply it to C, existing refactoring tools and how to reuse previous work to build a tool for C refactoring. This chapter describes a first version of such a tool. It took a long time to implement the basic components of the transformation system and only a few refactorings could be implemented. However, we believe this first version will be easily extended to other refactorings, now that we have set up the underlying architecture. Moreover, the tool will be helpful for practice and testing of its usability while it is improved. The first section of this chapter gives the picture of the whole architecture of the C Refactoring tool, or CR tool. Subsequent sections describe the implementation of its main components.
3.1 The Architecture of the CR Tool The CR tool is composed of the following three modules: -
Code Browsing
-
Program Representation
-
Program Transformation
37
Code Browsing constitutes the user interface, from where the user selects files, edits them, and selects refactorings to apply to those files. As a response to user selection, the Code Browsing module instantiates refactoring objects, which are part of the Program Transformation module. Another parts of the Program Transformation module are the rewriter that executes the transformations and the refactoring log that allows undo. To check their preconditions and execute, the refactorings need program analysis information, which they obtain from the Program Representation module. This last module provides scope information, abstract syntax trees for scopes, nesting of modules, type analysis and other information necessary for refactorings. Each module is further described below. ♦ Code Browsing This module is composed of only one class, a
b
c
d
e
f
g
h
d
, which constitutes the user interface.
This class is a simplified version of the Smalltalk Refactoring Browser classes b
h
i
j
k
l
e
d
m
n
o
c
d
e
f
g
h
, d
p
j
q
m
o
j
l
e
d
and a
e
r
h
s
e
e
t
. The functionality of the class a
b
c
d
e
f
g
h
d
and its interaction with other modules is described in Section 3.2. ♦ Program Representation The program database, represented by an instance of the class u
v
w
x
y
z
x
{
|
}
{
~
{
{
, contains
all the information about the program code. This object maintains representations of program entities whose classes belong to the hierarchy of
. The program
database and its contents are detailed in Section 3.3. The
can produce abstract syntax trees from the code string.
Abstract syntax trees are composed of nodes from the hierarchy of
. Before
38
abstract syntax trees are built, preprocessing needs to be applied. This is explained in Section 3.4. The nodes of the trees and the process that creates them are described in Section 3.5. Other information that the
type analysis are instances of subclasses of
¢
£
¤
provides is type analysis. The results of ¡
. Section 3.6 explains type analysis. ¡
Figure 3.1 shows the abstract view of the Program Representation module. CRProgramDatabase
CRType
*
*
*
CRProgramDB Element
CRProgramNode
Figure 3.1. Program Representation Module ♦ Program Transformation This module is composed of the classes that represent the refactorings and carry out the transformations. Each refactoring is modeled by a subclass of
¡
¥
¦
§
¨
. Section 3.7
describes the refactorings that have been implemented in the CR tool. Subclasses of
¡
¥
¦
§
¨
must implement the methods
method returns an instance of
¨
©
§
§
¤
¡
¦
¨
©
§
§
¨
and
¢
¡
¡
«
¡
¦
¬
¡
. The
¢
¡
¡
¡
ª
§
¡
¥
¡
¥
¦
§
¨
. The first
. The second carries out the transformation. ¨
To carry out the transformation a refactoring uses a
¤
¡
¢
¡
¡
¡
ª
§
¡
, a subclass of
visits the nodes in the AST while replacing them.
Search & replace rules are modeled in the hierarchy of
®
¯
°
±
²
³
´
±
³
³
®
µ
¶
³
. All these classes
function in the same way as their counterpart in the Refactoring Browser. The only difference is that
®
´
±
³
³
·
³
°
±
¸
¹
³
±
and
®
´
±
³
³
®
³
º
±
»
¼
³
±
can be tuned to skip certain subtrees
of the AST to avoid redefinitions of a variable.
39
Another component that functions as a visitor of AST nodes is the class ½
¾
¿
À
Á
Â
Ã
Ä
Ä
Å
. It Á
transforms an AST back into its source code representation. Every change to the program is logged as instances of subclasses of The sequence of changes is maintained by an instance of ½
¾
Å
Æ
Ã
Ç
Ä
À
Á
classes are very similar to the correspondents in Smalltalk: ¾
Å
Æ
Ã
Ç
Ä
À
Á
Í
Ê
Ë
Ì
Ã
Ê
Ã
Ë
Å
Á
¾
È
½
Å
Ì
¾
Æ
Ã
Ã
Å
Ç
Ê
Ä
Ã
Æ
À
Ë
Ã
Ç
Á
Å
Ä
Á
È
½
É
Ã
Ê
Ë
Å
.
. These two Á
È
À
½
É
Ã
Ê
Ë
Å
and
, which were explained in the previous chapter.
3.2 The User Interface The class that represents the user interface is called ½
¾
Î
Á
À
Ï
Ð
Å
Á
½
¾
Î
Á
À
Ï
Ð
Å
Á
. A snapshot of the
appears in Figure 3.2. It is not sophisticated, but simple and useful to try the
refactorings that we implemented. As the figure shows, the ½
¾
Î
Á
À
Ï
Ð
Å
Á
allows loading a set of C files for refactoring. This
set of files delimits the scope of transformations, that is, it defines an environment as explained in the previous chapter. For that reason, the set of files involved in a refactoring has to be loaded as the first step in a session with the browser. The menu option "File Ñ
Load" opens the dialog for loading files.
After the required files are specified, the program database loads the files by parsing them and extracting information from the ASTs that it uses to populate itself. This process is further explained in Section 3.3. After loading, files are ready for refactoring. The list of filenames appears in the upper list pane. When the user selects a filename, its source code is shown in the lower text pane. From there on, the user can select parts of the code and choose a refactoring from the menu bar or the right button menu. Depending on the refactoring, the user may be asked for further
40
Figure 3.2. The CR Browser
input by a dialog window. For example, in a renaming refactoring, the user is asked for the new name. When a refactoring is chosen, the subclass of Ò
Ó
Ù
Ú
Û
Ü
Ý
Ö
Õ
Þ
ß
à
Ò
Ó
Ô
Õ
Ö
×
Ø
Ù
Õ
creates an instance of the corresponding
(see Figure 3.3). It initializes the refactoring object with the
selected filename, the selected string and the position of the selected string. Depending on the refactoring it can initialize other information, such as the new name in a rename. After that, the browser asks the refactoring object to execute. While it executes, the object will send update messages to the program database and the browser will ask the database for the new
41
code of the file. This interaction is shown in Figure 3.3 with an example of á
â
ã
ä
å
æ
ã
ç
å
è
é
å
ê
ë
ã
â
ã
ì
å
í
î
ï
è
é
ä
ð
.
a CRBrowser
rename Variable
the CRProgramDB
new
a CRenameVariableRefactoring
renameVariable:selection to:newName from:selectedFilename inInterval:selectedInterval performRefactoring execute
checkPreconditions
performRefactoring changeSourceCodeFrom: anAST updateContents sourceCodeForFile: selectedFilename
Figure 3.3. Interaction Diagram for CRBrowser and CRenameVariableRefactoring Changes are not stored permanently until the user selects "File ñ
Save". Only then the
changes are saved to disk. The user has to select exactly the program element to change because the browser does not do further processing on the selection, and the refactoring object assumes it receives a complete piece of code. By "complete" we mean the entire name to be changed or a set of the entire statements. If the selection was incorrect, the refactoring will not locate the element in the program database and it will notify the user about it. This can be considered a disadvantage, but adding more sophisticated functionality into the browser, e.g., complete a
42
selection, would require advanced editing technology as that in the Synthesizer Generator [Reps & Teitelbaum 89]. In the case of renaming refactorings, an advantage of the browser is that the user does not need to select the string with the old name from the definition or declaration of the program element to rename; the user can select the string from any reference to the program element. For example, for the rename variable refactoring, the user can select the name of the variable in its declaration or in any expression that uses the variable. In Section 2.2.1, as one of the user-interface requirements for a refactoring tool we mentioned that the user should be able to load a file into an editor and edit the code as well as refactor the code. Unfortunately, allowing the browser to edit files has a few difficulties. It is not possible to perform a refactoring on code that has not been saved and parsed. This determines two modes of operation: edition mode and refactoring mode. That is, a user can edit the code, but has to save his edition before any refactoring can be applied. The browser keeps track of dirty files, so refactoring functions are not available in the menu until the user saves them. The Smalltalk Refactoring Browser operates in exactly the same way. The browser does not yet support multiple views, that is the ability to be split into several windows.
3.3 The Program Database Chapter 2 expressed the need to maintain a program database, as a repository for the various program entities, their containment relationships that determines scope and other properties. All this information helps speed up the analysis of preconditions. The program database is modeled by the class ò
ó
ô
õ
ö
by a Singleton [Gamma et al 95] in a global variable called
÷
ò
õ
ø
ô
ù
õ
ú
ö
÷
ø
õ
û
ø
ø
ù
ü
ú
ø
ý
ÿ
þ
. It is implemented . The ò
ô
õ
ö
÷
õ
ø
ù
ú
ÿ
43
is reset at the beginning of every session with the loads a set of files. When loading, the
calls the
, that is, every time the user
and then the
to parse the source code of every file in the order of "includes", e.g., if f2
"includes" f1, then f1 gets parsed before f2. When the
returns the abstract syntax tree of each file, the
visits
that AST (following the Visitor pattern [Gamma et al 95]) and populates the repository with instances of the subclasses of
. Figure 3.4 shows the hierarchy of
classes for database elements. These elements are: -
program units, instances of
that contain: name of file, source code,
"define" declarations, functions, variables and types global to the file; -
functions, instances of
that contain: name, variables and
types declared inside the function, blocks or compound statements inside it, position interval with respect to the source code of the file, file container; -
blocks, instances of
that contain: variables declared inside the
block, nested blocks, position interval with respect to the source code of the file, container; -
variables, instances of
that contain: name, locations, kind
(extern, local, global, structure field or parameter) and container; -
structures, instances of
that contain: accessors, fields, locations
and container; -
file sets, instances of
, grouping files that include one another
and constitute the scope of an entity. Instances of
are not
maintained in the program database but created for a refactoring. 44
CRProgramDBElement
CRVariableDBElement
CRFileSetDBElement
CRFileDBElement
CRModuleDBElement
CRStructDBElement
CRFunctionDBElement
CRBlockDBElement
Figure 3.4. Hierarchy of program database elements
It is necessary to clarify what are the "accessors" in a are used. A
!
"
#
$
%
&
%
!
"
#
$
%
&
%
'
and how they
does not represent the "struct" construct per se but the name
'
space generated by the construct. This idea comes from [Kernighan & Ritchie 88]: "… each structure or union creates a separate name space for its members, so that the same name may appear in several different structures." The accessors for a
!
"
#
$
%
&
%
'
are those type names such that a variable
declared of any of these types can access the name space generated by the structure. Consider as an example the two typedef definitions in Figure 3.5. The accessor names for the name space in Figure 3.5 are: tnode, Treenode, Treeptr and Treearray. The function treeprint in the same figure accesses the structure and its fields.
The set of accessor names is used to perform a refactoring like "rename struct field". When the refactoring finds a field whose identifier matches the one being changed, it checks if the type name of the field's structure is in the set of accessor names. In the example, suppose we want to rename the field 'left'. The rewriter finds the node for the identifier
45
'left' inside the expression 'ptr->left'. The type name of ptr is Treeptr, which is one of the accessors of the name space being modified. typedef struct tnode *Treeptr; typedef struct tnode { char *word; Treeptr left; Treeptr right; } Treenode, Treearray[10]; void treeprint(Treeptr ptr) { Treenode node; Treearray array;
Struct Name space
if (ptr != NULL) { node = *ptr; treeprint(ptr->left); printf("%s", node.word); array[1] = node; treeprint(array[1].right); }
{ char *word; Treeptr left; Treeptr right;}
} name space access field access inside the name space
Figure 3.5. Structure name space Once all the information has been obtained from an AST, the tree is discarded. When a refactoring is latter checking its preconditions, it can query the (
)
*
+
,
*
-
.
/
0
for information
like: -
2
3
4
5
6
3
7
1
2
3
4
5
6
3
7
-
1
8
6
9
D
7
A
:
9
2
;
C
=
9
E
7
D
?
A
9
2
?
@
5
9
@
:
5
A
6
:
:
A
?
;
5
B
>
8
?
2
3
4
5
6
7
5
8
9
:
6
. These classes must
;