Refactoring-aware Software Merging and Configuration Management Danny Dig, Kashif Manzoor [dig,manzoor2]@uiuc.edu
Tien N. Nguyen
[email protected]
ABSTRACT Refactoring tools allow programmers to change their source code quicker than before. However, the complexity of these changes cause versioning tools that operate at a file level to lose the history of entities and be unable to merge refactored entities. This problem can be solved by semantic, operationbased SCM with persistent IDs. MolhadoRef, our prototype, can successfully merge edit and refactoring operations which were performed on different development branches, preserves program history better and makes it easier to understand program evolution.
1. INTRODUCTION Refactorings [4] are program transformations that change the structure of a program without changing its external behavior. In recent years, tools like Eclipse [3] and IntelliJ IDEA [6] have made refactoring tools a standard for most Java programmers. The wide-spread use of a new kind of software tool often forces other tools to adjust to it. Refactoring tools make particular demands on software configuration management (SCM) tools. A refactoring tool allows a programmer to quickly make changes that potentially affect all parts of a system. Some refactorings are local in scope, such as extracting a function. However, changing the name or interface of an API function can have global scope, since every part of the system that uses the function will have to change. Changes that seem simple from a refactoring point of view can be complex from an SCM point of view unless the SCM tools can treat refactorings intelligently. Most SCM systems are based on files, not on program entities. They model changes in terms of the lines of a file that have changed, instead of program entities that changed. In contrast, a semantics-based SCM is tailored to a particular programming language and so can represent individual program entities such as classes and functions. Further, most SCM systems identify entities by name, whether it is file name or the name of a program entity. Renaming or moving an entity will cause the system to lose track of it. In contrast, Molhado [9] is an SCM infrastructure that creates unique, persistent IDs for each entity, and treats the name as an attribute that can change. Thus, it can track the history of an entity in spite of it being refactored. An SCM system needs to deal with branches; versions derived from a common base but not from each other. Making FSE’06 Porland, Oregon USA .
Ralph Johnson
[email protected]
branches is easy, but merging them can be hard. File-based SCM systems can merge changes automatically if they are to different parts of a file, but if two branches change the same part of a file then the merge fails and must be done manually. If one branch renames a function and another branch adds an argument to the same function, then a conventional SCM system cannot merge automatically the changes to the function declaration and function calls. This poster shows that those changes can be merged by a semantics-based SCM that gives program entities permanent IDs. It eliminates many conflicts when merging refactored versions in multiuser environments , helps track the history of refactored, fine-grained program entities, and represents the history at a higher level.
2. 2.1
OUR APPROACH Semantics-based, Operation-based SCM
We developed MolhadoRef, a semantics-based SCM system, which is able to version Java programs. It also maintains persistent identifiers for all program entities. It uses the operation-based SCM approach [7] to represent and record refactoring operations as first-class entities in the repository. In the operation-based approach, an SCM tool records the operations that were performed to transform one version into another and replays them when updating to that version. Recent extensions to refactoring engines [5, 3] allow to record and replay refactoring operations. MolhadoRef [2] is based on Molhado object-oriented SCM infrastructure [9], which was developed for creating SCM tools. MolhadoRef specializes it to represent Java programs. For each Java program entity, MolhadoRef associates unique identifiers which are used to track the entities’ histories, especially when they are refactored. MolhadoRef captures the semantics of a program with a Molhado component named CompilationUnit, which has a tree-based structure representing the program’s Abstract Syntax Tree (AST). Different properties of the AST nodes (e.g., declaration names, method arguments, etc.) are stored as properties in the Molhado repository and can change as a result of refactoring operation. Nevertheless, the identity of program entities remains intact even after refactoring operations.
2.2
Merging Algorithm
Our goal is to provide merging at the API level, that is, our merging algorithm aims for a correct usage of all the APIs. MolhadoRef treats a version as composed of the following three operations: API refactorings, API edits (e.g.,
add/remove class/member declarations), and code edits (all remaining edits). Code edits do not have well defined semantics, making it difficult to merge them correctly. API edits have better defined semantics. But refactorings are the operations with the most well defined semantics, so the ones that can benefit the most. Therefore, MolhadoRef merges code edits textually and since it is aware of the semantics of refactorings and API edits, it merges them semantically. The merging algorithm [1] takes as input three versions of the software: version V0 is the base version and V1 and V2 are derived from V0 . Also, the algorithm takes as input the refactorings that were performed in V1 and in V2 . These refactoring logs are recorded by Eclipse’s refactoring engine. INPUT = {V_2, V_1, V_0, refLogs} Operations op= 3-wayComparison(V_2,V_1,V_0) #1 Operations refs= detectRefactorings(refLogs) Operations edits= detectEdits(op, refs) repeat{ #2 {edits, refs}= userSolvesConflicts({edits, refs}) Graph refsDAG = createRefDependenceGraph(refs) {refs, refsDAG} = userEliminatesCircularDependences(refs, refsDAG) } until noConflicsOrCircDependences(edits, refsDAG) Version V_1_minusRef= invertRefactorings(V_1, refs) #3 Version V_2_minusRef= invertRefactorings(V_2, refs) Version V_merged_minusRef= #4 3-wayTextualMerge(V_2_minusRef, V_1_minusRef, V_0) orderedRefs= topologicalSort(refsDAG) #5 Version V_merged= replayRefactorings(V_merged_minusRef, orderedRefs); OUTPUT = {V_merged} Figure 1: Overview of the merging algorithm Step #1 detects the API edits through 3-way differencing between V1 , V2 and V0 . Step #2 searches for compile (e.g., two methods with the same name and signature in same class) and run-time conflicts (e.g., accidental method overriding) in API edits and refactorings. The algorithm also searches for possible circular dependences between refactorings. If any are found, the user deletes one of the refactorings involved in cycle. This process of detecting/solving continues until no more conflicts or circular dependences remain. Step #3 inverts each refactoring in V1 and V2 by applying another refactoring. For instance, it inverts renaming method m1 => m2 by applying a renaming m2 => m1 . By inverting refactorings, all the edits that were referencing the refactored program entities are changed to refer to the old version of the entities. This step produces two software components that contain all the changes in V1 , respectively V2 , except refactorings. Step #4 merges textually (using a modified version of the three-way merging [8]) all the API and code edits from V1−Ref and V2−Ref . Since the refactorings were previously inverted, all same-line conflicts that would have been caused by refactorings are eliminated. Therefore, textual merging of code edits can proceed smoothly. This step produces a −Ref software component, called Vmerged . −Ref Step #5 replays on Vmerged the refactorings that happened in V1 and V2 . Before replaying, the algorithm reorders all the refactorings using the dependence relations. Replaying the refactorings incorporates their changes into
−Ref the Vmerged which already contains all the edits. For instance, replaying a method renaming updates all the call sites to that method that were introduced as edits. We evaluated the effectiveness of MolhadoRef against CVS using the parallel development of MolhadoRef’s source code during its last three weeks of development. Due to refactorings that affected a few major API classes, CVS signaled 36 same-line conflicts (requiring human intervention) and after “successfull” merging it produced 41 compile-time errors and 7 run-time errors. MolhadoRef correctly signaled 1 conflict (two methods with the same name added in one class) and it produced no compile or run-time merge errors. Second, MolhadoRef helps in program understanding by raising the granularity level of changes from textual changes to structural changes. During the last 12 weeks of development, there were 67 refactorings which correspond to 1267 changed lines in MolhadoRef and its accompanying JUnit test suite. Undoubtedly, it is easier to read and understand 67 changes than 1267 (a reduction of 1 : 19). Third, being an ID-based SCM, MolhadoRef can always retrieve the history of refactored program entities. For the last three weeks of MolhadoRef development, CVS lost the history of two core files containing 73 API methods.
3.
CONCLUSIONS
Refactoring changes create problems for the current SCM tools that operate at the file level. As a result, the history of refactored entities is lost. We propose a novel SCM system, MolhadoRef, that is aware of program entities and the refactoring operations that change them. Because MolhadoRef uses a unique identifier for each program element, it can track the history of refactored program elements. In addition, we present a merging algorithm that can intelligently merge refactoring operations with manual edits. We believe that the availability of such semantics-aware, refactoringtolerant SCM tools is going to encourage programmers to be even bolder when refactoring. The reader can find more screen shots and download MolhadoRef at: netfiles.uiuc.edu/dig/MolhadoRef
4.
REFERENCES
[1] D. Dig, K. Manzoor, R. Johnson, and T. Nguyen. Refactoring-aware configuration management system for object-oriented programs. Technical Report UIUCDCS-R-2006-2770, UIUC, September 2006. [2] D. Dig, T. Nguyen, and R. Johnson. Refactoring-aware software configuration management. Technical Report UIUCDCS-R-2006-2710, UIUC, April 2006. [3] Eclipse Foundation. http://eclipse.org. [4] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring: Improving the Design of Existing Code. Adison-Wesley, 1999. [5] J. Henkel and A. Diwan. Catchup!: capturing and replaying refactorings to support API evolution. In ICSE’05, pp 274–283. [6] JetBrains Corp. http://www.jetbrains.com/idea. [7] E. Lippe and N. van Oosterom. Operation-based merging. In SDE5, pp 78–87. 1992. [8] W. Miller and E. W. Myers. A file comparison program. Softw., Pract. Exper., 15(11):1025–1040, 1985. [9] T. N. Nguyen, E. V. Munson, J. T. Boyland, and C. Thao. An infrastructure for development of object-oriented, multi-level configuration management services. In ICSE’05, pp 215–224.