MolhadoRef: A Refactoring-aware Software Configuration Management Tool Danny Dig
Tien N. Nguyen
Kashif Manzoor, Ralph Johnson
University of Illinois at Urbana-Champaign
[email protected]
Iowa State University
[email protected]
University of Illinois at Urbana-Champaign {manzoor2, johnson}@cs.uiuc.edu
Abstract
so can represent individual program entities such as classes and methods. Further, most SCM systems identify entities by name, whether it is file name or the name of a program entity. Renaming or moving an entity will cause the system to lose track of it. In contrast, Molhado [8] is an SCM infrastructure that creates unique, persistent IDs for each entity, and treats the name as an attribute that can change. Thus, it can track the history of an entity in spite of it being refactored. An SCM system needs to deal with branches; versions derived from a common base but not from each other. Making branches is easy, but merging them can be hard. File-based SCM systems can merge changes automatically if they are to different parts of a file, but if two branches change the same part of a file then the merge fails and must be done manually. If one branch renames a method and another branch adds an argument to the same method, then a conventional SCM system cannot merge automatically the changes to the method declaration and method calls. This demo shows that those changes can be merged by a semantics-based SCM that gives program entities permanent IDs. MolhadoRef tracks the history of refactored, fine-grained program entities, eliminates many conflicts when merging refactored versions in multi-user environments, and represents the history at a higher level.
Refactoring tools allow programmers to change their source code quicker than before. However, the complexity of these changes cause versioning tools that operate at a file level to lose the history of entities and be unable to merge refactored entities. This problem can be solved by semantic, operation-based SCM with persistent IDs. We propose that versioning tools be aware of program entities and refactoring operations. We present MolhadoRef, our prototype, which uses these techniques to ensure that it never loses history. MolhadoRef can successfully merge edit and refactoring operations which were performed on different development branches. Categories and Subject Descriptors D.2.9 [Software Engineering]: Configuration Management General Terms Management Keywords Refactoring, Configuration Management, Version Control
1. Introduction Refactorings [4] are program transformations that change the structure of a program without changing its external behavior. In recent years, tools like Eclipse [3] and IntelliJ IDEA [6] have made refactoring tools a standard for most Java programmers. The wide-spread use of a new kind of software tool often forces other tools to adjust to it. Refactoring tools make particular demands on software configuration management (SCM) tools. A refactoring tool allows a programmer to quickly make changes that potentially affect all parts of a system. Some refactoring operations are local in scope, such as extracting a method. However, changing the name or interface of a public method can have global scope, since every part of the system that uses the method will have to change. Changes that seem simple from a refactoring point of view can be complex from an SCM point of view unless the SCM tools can treat refactorings intelligently. Most SCM systems are based on files, not on program entities. They model changes in terms of the lines of a file that have changed, instead of program entities that changed. In contrast, a semanticsbased SCM is tailored to a particular programming language and
2. Our approach 2.1 Semantics-based, Operation-based SCM We developed MolhadoRef, a semantics-based SCM system, which is able to capture and version the underlying semantics of Java programs. It also maintains persistent identifiers for all program entities in its repository. It uses the operation-based SCM approach [7] to represent and record refactoring operations as first-class entities in the repository. In the operation-based approach, an SCM tool records the operations that were performed to transform one version into another and replays them when updating to that version. The operation-based approach gives a precise way to integrate changes caused by editing operations from different lines of parallel development [7]. As for refactoring, recent extensions to refactoring engines [3, 5] allow to record and replay refactorings. MolhadoRef [2] is based on Molhado object-oriented SCM infrastructure [8], which was developed for creating SCM tools. Unlike the file-based SCM approach, Molhado allows an SCM system to model and capture the structure of logical entities within a file and the operations on them. Molhado has a flexible data model that allows it to represent programs in any language. MolhadoRef specializes it to represent Java programs. For each Java program entity, MolhadoRef associates unique identifiers which are used to track the entities’ histories, especially when they are refactored. MolhadoRef captures the semantics of a program with a Molhado component named CompilationUnit, which has a tree-based structure
Copyright is held by the author/owner(s). OOPSLA’06 October 22–26, 2006, Portland, Oregon, USA. ACM 1-59593-491-X/06/0010.
732
RefactoringCrawler [1]. The order of refactorings is recorded using timestamps. Program entities and refactoring operations are directly accessible in the repository. Because MolhadoRef is aware of refactorings, the user can browse the history of program elements that were heavily refactored (e.g., methods that were renamed or moved to different classes). The interaction with the Eclipse front-end is triggered when a user wants to check in the code. The first time an Eclipse project is checked in, MolhadoRef creates the Molhado’s AST node for each program entity. After code is checked in for the first time, subsequent ‘check-in’s need to store only the changes from last check-in. MolhadoRef uses the Eclipse compare engine to learn the individual deltas (e.g. changes within a method body or addition/removal of classes, methods, and fields). The refactoring operations can be resuscitated and replayed back by Eclipse’s refactoring engine during an update operation. When the user invokes a checkout operation, MolhadoRef reconstructs (from its internal representation) the Java compilation unit and packages and invokes the Eclipse code formatter on the files. After MolhadRef brings the classes and packages into a project in the current Eclipse workspace, the user can resume his/her programming session using the Eclipse environment.
representing the program’s Abstract Syntax Tree (AST). Different properties of the AST nodes (e.g., declaration names, method arguments, etc.) are stored as properties in Molhado repository and can change as a result of refactorings. Nevertheless, the identity of program entities remains intact even after refactoring operations. 2.2
Merging Refactorings and Edit Operations
Two users, Alice and Bob, start working from the same base version, V0 . If Alice renames class A to B, and Bob renames method A.m1() to A.m2(), during merging of changes, a purely textual merge will fail (because it appears as if Alice deletes file A and Bob changes file A). A much better approach (called operation-based) is to merge the changes by replaying both refactorings on the base version V0 . However, after replaying the renaming of class A to B, the refactoring engine cannot replay the renaming of method A.m1() because class A no longer exists at this time. This dependency between refactoring operations appears because current refactoring engines are purely based on the names of the program elements. If the refactoring engine used the IDs of the program elements instead of names, the above scenario would never pose a problem. Thus, the presence of persistent IDs can automatically solve several types of conflicts in multi-user environments that are unsolvable within the name-based paradigm [2]. To make a name-based refactoring engine behave like an IDbased one, our merging algorithm uses two approaches. First one is the reordering the operations (e.g., replay the renaming of method A.m1() before renaming class A). Second one is modifying the refactoring engine so that in addition to changing source code it also changes subsequent refactorings (e.g., during the replay of renaming class A to B, the refactoring engine changes the representation of refactoring A.m1()->A.m2() to B.m1()->B.m2()). During code development, refactoring operations are intermixed with edit operations. We designed a merging algorithm that can handle well this reality of code evolution. The algorithm first identifies all the operations (both edits and refactorings) that were performed on the versions to be merged. Next, the algorithm searches for syntactic, semantic, and dependency conflicts. Syntactic and semantic conflicts are solved semi-automatically by involving the user, while the dependency conflicts are solved automatically by a topological sort algorithm. The sorting algorithm finds an order in which to replay refactorings such that all refactorings can proceed. Next, the algorithm undoes all the refactorings that were performed by Alice and Bob and then it merges textually all the edit operations. Lastly, on the textually merged version, the algorithm replays all the refactorings using the order that was determined previously. Replaying the refactorings lastly has the advantage of incorporating the refactoring semantics into all the textual edits that Alice and Bob performed. For instance, replaying the renaming of A.m1() to m2() updates all the call sites to m1() that were introduced as editing operations by Alice or Bob.
[2] D. Dig, T. N. Nguyen, and R. Johnson. Refactoring-aware software configuration management. Technical Report UIUCDCS-R-2006-2710, UIUC, April 2006.
2.3
[3] Eclipse Foundation. http://eclipse.org.
3. Conclusions Refactoring changes create problems for the current SCM tools that operate at the file level. As a result, the history of refactored entities is lost. We propose a novel SCM system, MolhadoRef, that is aware of program entities and the refactoring operations that change them. Because MolhadoRef uses a unique identifier for each program element, it can track the history of refactored program elements. In addition, MolhadoRef can intelligently merge refactoring operations with manual edits. We believe that the availability of such semantics-aware, refactoring-tolerant SCM tools is going to encourage programmers to be even bolder when refactoring. The reader can find screen shots and download MolhadoRef at: netfiles.uiuc.edu/dig/MolhadoRef.
4. Acknowledgments The first author is very grateful to IBM for an Eclipse Innovation Grant, Agile Alliance for a travel grant, and UIUC CS department for an Outstanding Mentoring fellowship.
References [1] D. Dig, C. Comertoglu, D. Marinov, and R. Johnson. Automatic detection of refactorings in evolving components. In ECOOP’06: European Conference on OO Programming, pages 404–428, 2006.
Tool Implementation
[4] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts. Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.
We implemented MolhadoRef, a semantic, operation-based SCM system as an Eclipse plugin. MolhadoRef uses the Eclipse Java editor as the front end and Molhado as the SCM back end. MolhadoRef connects two systems that work in different paradigms. Eclipse editors operate at the file level granularity. Molhado framework models source code entities at a finer level of granularity than file-based systems. Eclipse offers a name-based refactoring engine whereas MolhadoRef emulates an ID-based refactoring engine. In general, developers modify their program entities and apply refactoring operations. When they check in, all the changes to program entities as well as refactoring operations applied to them are recorded. The refactorings are captured using the new recordreplay refactoring engine in Eclipse 3.2 or can be inferred using
[5] J. Henkel and A. Diwan. Catchup!: capturing and replaying refactorings to support API evolution. In ICSE’05: Proceedings of International Conference on Software Engineering, pages 274–283, 2005. [6] JetBrains Corp. http://www.jetbrains.com/idea. [7] E. Lippe and N. van Oosterom. Operation-based merging. In SDE5: Proceedings of Symposium on Software Development Environments, pages 78–87. ACM Press, 1992. [8] T. N. Nguyen, E. V. Munson, J. T. Boyland, and C. Thao. An infrastructure for development of object-oriented, multi-level configuration management services. In ICSE’05: Proceedings of International Conference on Software Engineering, pages 215–224. ACM Press, 2005.
733