FOOD: An Intermediate Model for Automated Refactoring

1 downloads 0 Views 113KB Size Report
[4] Tom Mens: On the Use of Graph Transformations for Model Refactoring, ... Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 2002.
Book Title Book Editors IOS Press, 2003

1

SOMET 2006 - 5th International Conference on Software Methodologies, Tools and Techniques, pp 452 - 461.

FOOD: An Intermediate Model for Automated Refactoring a

Nicolas Juillerat a and Béat Hirsbrunner a University of Fribourg, Department of Computer Sciences, 1700 Fribourg, Switzerland Abstract. As big software projects grow, there is an increasing need of cleaning up or restructuring the existing code. This problem can be addressed by using refactorings, which are small semantics-preserving code transformations. Many refactorings have been automated in existing development environments to help the developer in this process. Most implementations are currently based on the abstract syntax tree. Unfortunately, this model, which was first designed for the compilation process, does not provide all the abstractions that are required for complex refactorings such as extracting a method. In this paper, the FOOD model is introduced and described. Based entirely on graphs, this model is targeted to the implementation of complex code transformations by providing the necessary abstractions. The “extract method” refactoring is applied on the FOOD model as a concrete example. Then a comparison with existing models such as abstract syntax trees is made. Keywords. Refactoring, FOOD Model, Extract Method, Programming Language, Graph

1. Introduction Refactorings [6,10,15] are semantics-preserving transformations of source code. The idea behind transforming code without changing the semantics is to improve the structure of existing code, especially when new features have to be added, or when the code need to be adapted to fit a new specification. It is becoming a standard for modern development environments to provide at least a minimal set of simple refactorings, such a renaming fields or methods, or moving them. But even if some tools can perform more complex transformations, many of them still have not been automated [5,6,7]. The first problem of automated refactoring is common to all generative and transformational techniques: the complexity of existing programming languages such as Java, C#, or C++: while these languages seem to be easy to use, they prove to be extremely complex as soon as we need to generate or transform a piece of code. For this reason, simpler languages or models are often chosen for such tasks. In compilation for example, it is usually too difficult to check the semantics of a program directly from the source code. For this reason, most compilers first transform the source code into an intermediate model that is easier to manipulate, typically an abstract syntax tree (AST). The second problem, specific to automated refactoring, is the absence of good models for it. Most existing tools are still basing their implementations of refactoring meth-

2

N. Juillerat et al. / The FOOD Model

ods on ASTs. Because an AST does not directly exhibit the properties that are required, it usually has to be augmented in various ways in order to be practicable: this can include complex flow analyses, a lot of additional data, or various scripting tools and languages [2]. Like the AST was a common approach to compilation, the FOOD model presented in this paper provides a new approach to refactoring. Based entirely on graphs, this model shows the clear separation between the complex analyses that are required before the refactoring operation and the refactoring operation itself. The aim of the FOOD model is to make the former reusable for all code refactorings, and the latter as simple as possible. Note that the FOOD model is not a model to express refactorings, but a model on which to apply refactorings. This paper is structured as follow: In section 2, we introduce the FOOD 1 model in details. In section 3, we apply a concrete refactoring on this model and compare it in section 4 with the AST.

2. The FOOD Model The FOOD model consists of two notions: the class graph and the dataflow. These two notions are described in details in the following subsections. 2.1. The Class Graph 2.1.1. Basics In FOOD, classes and their relations are represented by a graph. A dataunit 2 is a node of that graph. More precisely, a dataunit is either a class or a method signature. It can also be simply thought of as a class, if we consider, as in Smalltalk, that a method is also an object, and hence that the signature of a method is a class. A dataunit therefore models the class of an object or the signature of a method. Most object-oriented programming languages provide various relations between classes [8]: a class can refer to another class; this reference relation is expressed by fields. A class can extend another class, this is the inheritance relation. And finally, a class can contain another class, and a class contains methods; this is a container relation. In FOOD, relations between methods and classes are modelled by the edges of the graph. 2.1.2. The “has” and “is” Modifiers Existing models make use of several kind of edges [4] to model the different kinds of relations between classes and methods. The drawback of this approach is that it does no longer match a graph in the usual sense: because we do not simply have edges, but multiple kinds of edges, we have a sort of “super” graph. In order to keep a single kind of edges, FOOD provides only a single kind of relations between dataunit classes: the reference relation. Other kind of relations (extension 1 First-class 2 This

Object Oriented Dataflow word is the concatenation of data and unit, two terms coming from the dataflow theory.

N. Juillerat et al. / The FOOD Model

3

Class1 has

is

Class2

is

Class3

has

Method1

Figure 1. Classes, methods and their relations are represented by a graph with modifiers on the edges.

and container) are expressed by the use of two modifiers: has and is. These modifiers are extensions of the edges and are used only when relevant. A standard graph transformation, when required, will simply ignore them. The first modifier, has, has a meaning similar to the Eiffel expanded modifier, and can turn a reference into a container relation: the target of the reference must exist, and cannot change, during the whole life of the referee. The second modifier, is, allows a reference to be turned into an inheritance relation, by affecting the definition of assignment compatibility. Without this modifier, a source class and a target class are only assignment compatible if they are the same. With the is modifier, a source class is also assignment compatible to a target class if the source class refers to the target class through one or more edges that all have the is modifier. Inheritance can be expressed by using both modifiers: the fact that a class is assignment compatible with all its superclasses is expressed by the is modifier and the fact that the superclass of a class cannot be changed at runtime is expressed by the has modifier. To express the relation between a method or an inner class to its containing class, the has modifier only is used. There is no compatibility between a class and its containing methods or inner classes, so the is modifier is not present. The has modifier expresses the fact that the methods and the inner classes cannot be changed at runtime. FOOD allows the usage of the is modifier without the has modifier. In such a case, it means that the parent of an object can be replaced at runtime by another, assignment compatible object. Most programming languages have no equivalent relation. Dataunits representing methods, referenced without the has modifier, are also possible and mean that these methods can be replaced at runtime by other, compatible methods. Compatible methods are methods with the same signatures. Note that this replacement can only be done at runtime, but doing it only once at the initialisation of the class allows one to express usual, static overriding. 2.1.3. Implications Where existing programming languages explicitly introduce the notions of inheritance, classes, methods, interfaces, overriding, etc in order to be object-oriented, FOOD does it simply by the introduction of two new modifiers. As shown above, these new modifiers also allow dynamic modification of methods and class extensions at run-time easily, a possibility that is only found in programming languages introducing a complex metaobject protocol (like CLOS [3]) or the notion of aspects [19].

4

N. Juillerat et al. / The FOOD Model

Input port (method arguments)

method1

method2

Output port (method results)

static method

this.method3

this.method4

Current port (supply the method’s object) this

Figure 2. A sample dataflow.

Therefore, in FOOD, classes and their relations, are represented by a graph with only one kind of node (the dataunits), and one kind of directed edges. Edges are only differentiated by the mean of independent binary properties: the modifiers. The use of modifiers does not make algorithms simpler when these algorithms have to treat different kind of edges differently. But they can simplify other algorithms for which the kind of edge is not relevant: in such cases, the algorithm can ignore all or part of the modifiers, and treat the class graph in a more uniform way. As a concrete example, both the “rename method” and the “rename class” refactorings have been implemented by the same code using the FOOD model. Looking at the source-code of the Eclipse Java compiler reveals that these two refactorings are implemented by two different classes (although they naturally share a common interface). 2.2. The Dataflow FOOD uses a dataflow to represent code, or the implementation of a method. Again, a dataflow [9,14,16,17,18] is simply a kind of graph. The dataflow model used in FOOD has a few extensions toward object orientation, compared to traditional dataflow models. The figure 2 shows an example of dataflow in FOOD. The input and output ports of the dataflow are used to get arguments and to return result(s). Nodes are invocations of other methods, they also have input and output ports to gather the arguments and to forward the result(s). The actual implementation of the method underlying a node is given by another dataflow. Edges (also called connections), which are directed, are data carriers between output and input ports. In FOOD, nodes and dataflows have an additional port (shown on the left) to supply the current object, like the implicit “this” in Java. This is the main difference between the dataflow model used in FOOD and traditional dataflow models. Another difference is that in this model, data consist of references to dataunit instances, which are not just simple values or structures, but objects and methods passed by reference.

N. Juillerat et al. / The FOOD Model 10

5

1

Fby Fby >=

If

1

test

+

Exit Loop

Figure 3. Left: Dataflow of an empty loop counting from 1 to 10. Right: the generic skeleton of a loop.

There are two particular aspects of the dataflow model used in FOOD which require some attention: loops and sequence edges. A loop in FOOD is simply implemented by a cycle in the dataflow graph, like in circuit engineering [9] 3 . The cycle must contain at least one test that allows the program to either continue or escape the loop. Such a test is represented in the dataflow model by a node with two output ports, which sends data in only one of the output ports. One of the output ports is connected to the cycle and the other one leaves it. The node corresponding to the if instruction in particular has two output ports, one for the then case and one for the else case. A loop most of the time also requires an initialization and an iterative step, as in a Java for loop. This is done in dataflow models using a special node, fby 4 [18]: its implementation returns the first argument on the first evaluation (or iteration), and the second argument on all subsequent evaluations. The figure 3 illustrates the dataflow equivalent to the following Java piece of code: for (int i = 1; i < 10; i+= 1) {} Sequence edges are directed edges that carry a trigger signal. They are linked between two nodes, but on no particular port. The source node of a sequence edge automatically sends a trigger signal in it whenever it has been executed. The target node of a sequence edge cannot start its execution before the trigger signal has been received. Sequence edges are only used to specify the evaluation order of nodes. In the example of figure 2, method3 and method4 can be executed in any order. If method4 relies on sideeffects of method3, it is necessary to add a sequence edge from method3 to method4 to ensure that the former is executed after the latter. Sequence edges are also used to control the execution of a node with no input and output port. In the next section, we will show some advantages of the dataflow model for transformations of code such as refactoring. 3 Some other dataflow models do not use cycles, but instead the notion of repeating blocks [16], which is closer to textual languages 4 fby stands for First, followed by. It is proven in [18] that this node is sufficient to express any kind of loops

6

N. Juillerat et al. / The FOOD Model

3. Using FOOD for Code Refactoring The use of a class graph for refactoring has already proven effective in various cases [4], especially when the transformation mainly concerns the classes, methods and their relations. We show there how the dataflow can help in the implementation of transformations that are mainly changing the code. We take the most typical code-based refactoring as an example: extracting a method. 3.1. The Process Suppose we want to extract method2 and static method from the dataflow of figure 2. Let us name the subset of nodes consisting of these two nodes E. The following steps are required for the extraction: 1 Identify the subset of input edges I consisting of edges entering the subset E, that is, edges linking a node that is not part of E to a node that is part of E. 2 Identify the subset of output edges O consisting of edges leaving the subset E, that is, edges linking a node that is part of E to a node that is not part of E. 3 Identify the subset of internal edges W linking two nodes that are part of E. 4 Create the extracted dataflow using E ∪ I ∪ W ∪ O. 5 For each edge in I, add an input port of the same type to the extracted dataflow, and link it to the source of the edge. 6 For each edge in O, add an output port of the same type to the extracted dataflow, and link it to the end of the edge. At this point, the extracted dataflow is finished. It is shown in figure 4, on the left. 7 Remove items of E and W from the initial dataflow. Edges of I and O are still present in the initial dataflow, but are connected at only one endpoint after this step. 8 Add to the initial dataflow a new node n corresponding to the extracted dataflow, that is, with the same number and types of input and output ports. 9 Connect the ends of edges in I to the input ports of n, and the sources of the edges in O to the output ports of n. The result of this refactoring is illustrated on the figure 4. The extracted dataflow is shown on the left; and the original one, with the extracted part replaced by a new node, is shown on the right 5 . As we can see, a large number of steps are necessary in order to extract a method. But each step is: • A basic graph operation. In fact every step can run in constant or linear time with a proper implementation [11]. • A simple operation with no recursion and no complex lookup. The only point to be careful at, is to use an order-preserving data structure for the sets I and O, to ensure that no connections are swapped during the steps 5, 6, 8 and 9. Therefore, a complex refactoring such as extracting a method, can be implemented using only trivial tasks on the FOOD model. 5 This refactoring example has been implemented and tested, using an interpreter of the FOOD model written in Java by the authors.

N. Juillerat et al. / The FOOD Model

7

method1

method2

Extracted method

static method

this.method3

this.method4

Figure 4. The result of extracting a method in a dataflow. Left: the extracted dataflow. Right: the initial dataflow after the extraction.

3.2. Additional Comments These steps are performing the actual extraction in the dataflow part of the FOOD model, that is, the part that represents the code. Other steps are required on the other graph, representing classes and their relations; but these steps are performed in a way that is similar to already existing techniques [4] and are therefore not covered there. Other dataflow models already exist [12], but most of them do not directly express object-oriented notions or are too specific to a domain unrelated to refactoring. While some dataflow models have been proven very effective for compiler optimizations and parallel computing [14], they usually only provide a view on a particular aspect of the code. Hence, they cannot be used alone for code transformations.

4. Comparison with AST 4.1. A Theoretical Comparison Abstract syntax trees (AST) have been used successfully for a long time for the process of compilation [14]. Although most modern compilers also perform various code transformations (mainly optimisations), ASTs suffer of some limitations compared to the FOOD model. Consider for example that we do not only want to extract a method, but also to replace all occurrences of the same code fragment by a call to the extracted method. Using the example of figure 4, the following limitations of ASTs show up: • In the example of section 3, the identification of the inputs (arguments) and outputs (results) of the method to extract was performed by trivial graph operations. With an AST, a deep search for variable references would be required. • In the dataflow of the extracted method, method3 and method4 do not rely on each other’s side effects, because there is no sequence edge between them: they

8

N. Juillerat et al. / The FOOD Model

can be executed in any order. With a textual language, two different ASTs would be produced, depending on which method is invoked first. Only a complex analysis can figure out that the two ASTs are eventually semantically equivalent. • There is only one, low-level loop construct in FOOD; therefore, two equivalent loops result in the same dataflow. With textual languages, each loop construct (for, while, etc) might result in a different AST if not properly transformed. All the complex tasks that are required when using ASTs seem to be absent when using the FOOD model. But while they do not show up in this model directly, they are in fact still present. They are now part of the big process of converting code from an existing language to the FOOD model. Therefore, at a first glance, it seems that the usage of this model has just shifted the problem from one place to another; but this is precisely the gain of the FOOD model: why should one bother to do all these deep searches for references and other flow analyses as a part of the refactoring implementation, when they can be isolated, and thus done only once for all code refactoring implementations? The main problem of ASTs is that they were initially targeted to compilation only. As a result, some refactorings require many tedious tasks, just because the required information are not expressed directly by ASTs. The previous chapter has shown that, unlike usually stated, code refactorings such as extracting a method, are simple problems. The difficult part is to convert to a suitable model. But we believe that this part should not be part of the refactoring implementation itself. 4.2. Practical Results It is difficult to compare two different implementations of a transformation. Speed is not really an important factor: most transformations have a linear complexity and are performed within some seconds or less, which is acceptable for the user. There is also no way of comparing the quality of the result, because there is usually only one correct result. Although this kind of comparison is subjective, we have analysed the code of the “extract method” refactoring in the source code of the Eclipse Java compiler [1], and we have compared it with the source code of our implementation based on the FOOD model. In both cases, the method to extract is already abstracted (in the form of an AST in Eclipse, and in the FOOD model in our implementation) when the extracting code is invoked. The Eclipse version reveals about thousand lines of code in the class named ExtractMethodRefactoring. This class uses various other classes for the preconditions checking, for common transformations of the AST and for the flow analyses, meaning that the class captures only the essential steps that are specific to this transformation. Our implementation on the FOOD model has less than 200 lines of code, and it only delegates the basic graph operations (such as getting edges leaving a given node, adding or removing edges and nodes) to other classes. Like the Eclipse version, no precondition checking and no flow analyses are performed directly by these lines of code. This comparison reveals the main advantage of the FOOD model, its simplicity for performing advanced transformation. The other part of the process to take into account is the conversion of source code to the FOOD model. This part, which is not implemented yet, is expected to be more

N. Juillerat et al. / The FOOD Model

9

difficult than the conversion of source code to an AST; but not necessary more difficult than the conversion to an AST followed by the flow analyses required to extract a method using an AST. A more detailed comparison belongs to the future works. In all cases, the key idea is that the conversion of source code to the FOOD model is implemented only once, for all refactorings. Therefore, efforts in converting source code to the FOOD model are largely compensated by the simplicity of implementing all the individual transformations, especially if many complex transformations have to be implemented, as it tends to be the case in the most recent development environments.

5. Conclusion This paper has presented the FOOD model. It has shown how this model can express both classes and their relations, and the code of a program, using only graphs. It has also shown the potential of the FOOD model for the implementations of complex code refactorings. This has been demonstrated by the application of the “extract method” refactoring in a sample program. Beside of the FOOD model itself, we have shown that code refactoring is a trivial task once the required information has been abstracted in a suitable representation. A short comparison showed that the AST does not exhibit the requirements of some code refactorings. As a result, although the entire refactoring process is not more complex with AST, the refactoring implementation has to perform various additional tasks such as flow analysis which are not part of the initial construction of the AST. This erroneously makes the refactoring implementation look at least as complex as the initial construction of the AST. Although converting existing code to FOOD is more complex than a conversion to AST, this conversion can then be reused to simplify all kinds of refactorings. Because of the simplicity of refactoring in this model, FOOD opens the door to much more complex transformations, such as forming a template method, or converting conditionals to polymorphism.

References [1] Leif Frenzel: The Language Toolkit: An API for Automated Refactorings in Eclipse-based IDEs, Eclipse Magazin, vol. 5, 2006 [2] Mathieu Verbaere, Ran Ettinger and Oege de Moor: JunGL: a Scripting Language for Refactoring, Proceedings of the 28th International Conference on Software Engineering, 2006 [3] Ralf Lämmel, João Saraiva and Joost Visser: Generative and Transformational Techniques in Software Engineering, Pre-proceedings of GTTSE, 2005 [4] Tom Mens: On the Use of Graph Transformations for Model Refactoring, Pre-proceedings of GTTSE, pp. 67–98, 2005 [5] Joshua Kerievsky, Refactoring to Patterns, Addison-Wesley, 2004 [6] Tom Mens, Tom Tourwé: A Survey of Software Refactoring, IEEE Transactions on software engineering, vol. 30, no. 2, pp. 126–139, 2004 [7] Martin Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 2002 [8] B. Meyer: Object-Oriented Software Construction, Prentice Hall, 2nd edition, 2000

10

N. Juillerat et al. / The FOOD Model

[9] R. Mark Meyer, T. Masterson: Towards a better Programming Language: Critiquing Prograph’s Control Structures, The Journal of Computing in Small Colleges, Volume 15, Issue 5, pp. 181–193, 2000 [10] Donald Bradley Roberts, Practical Analysis for Refactoring, Phd at University of Illinois, 1999 [11] Thomas H. Cormen, Introduction to Algorithms, The MIT Press, 1998 [12] M. Boshernitsan, M. Downes: Visual Programming Languages: A Survey, CiteSeer Scientific Literature Digital Library, 1997 [13] M. Burnett, A. Goldberg, T. Lewis: Visual Object-Oriented Programming, Manning Publications, 1995 [14] G. Gao, L. Bic, J.-L. Gaudiot: Advanced Topics in Dataflow Computing and Multithreading, Wiley-IEEE Computer Society Press, 1995 [15] Bill Opdyke, Refactoring Object-Oriented Frameworks, Phd at University of Illinois, 1992 [16] D. Ingalls, S. Wallace, Y. Chow, F. Ludolph, K. Doyle: Fabrik, A Visual Programming Environment, ACM/SIGPLAN 00PSLA ’88 Conference Proceedings, 23, pp. 176–190, 1988 [17] K. Pingali and Arvind: Efficient Demand-Driven Evaluation, ACM Transactions on Programming Languages and Systems, Volume 7, Issue 2, pp. 311–333, 1985 [18] W. Wadge, E. Ashcroft: Lucid, the Dataflow Programming Language, Academic Press, 1985 [19] The AOSD Steering Committee: Aspect-oriented Software Development, http://aosd.net/ (last visited on July 2006)

Suggest Documents