Adaptable Model Versioning based on Model Transformation By Demonstration DISSERTATION zur Erlangung des akademischen Grades
Doktor der technischen Wissenschaften eingereicht von
Philip Langer Matrikelnummer 0325934
an der Fakultät für Informatik der Technischen Universität Wien Betreuung: o.Univ.-Prof. Dipl.-Ing. Mag. Dr. Gerti Kappel
Diese Dissertation haben begutachtet:
(o.Univ.-Prof. Dipl.-Ing. Mag. Dr. Gerti Kappel)
(Prof. Dr. Jeff Gray)
Wien, 15.11.2011 (Philip Langer)
Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Adaptable Model Versioning based on Model Transformation By Demonstration DISSERTATION submitted in partial fulfillment of the requirements for the degree of
Doktor der technischen Wissenschaften by
Philip Langer Registration Number 0325934
to the Faculty of Informatics at the Vienna University of Technology Advisor: o.Univ.-Prof. Dipl.-Ing. Mag. Dr. Gerti Kappel
The dissertation has been reviewed by:
(o.Univ.-Prof. Dipl.-Ing. Mag. Dr. Gerti Kappel)
(Prof. Dr. Jeff Gray)
Wien, 15.11.2011 (Philip Langer)
Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Erklärung zur Verfassung der Arbeit Philip Langer Kulmgasse 32/9, 1170 Wien
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit einschließlich Tabellen, Karten und Abbildungen -, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe.
(Ort, Datum)
(Unterschrift Verfasser)
i
Acknowledgements This thesis would not have been possible without the valuable contributions of many people. I owe my gratitude to all those, who have supported me during the time when I worked on this thesis and who made this time to be a precious experience for me. I am very grateful to Dr. Gerti Kappel, who provided me with every kind of support and freedom to explore. She constantly encouraged me to aim high and to keep pushing forward, while also giving me the guidance to get back on track when I struggled. I am indebted to Dr. Jeff Gray, who supported me with valuable feedback and always kindly encouraged me to succeed with my thesis, despite the geographical distance and time difference. My deepest gratitude is to Dr. Martina Seidl and Dr. Manuel Wimmer, who have been both great mentors and friends throughout my time working on my thesis. Martina and Manuel have always been there to listen and give advice. I am deeply grateful for their insightful comments and constructive criticisms. I am also thankful to the entire AMOR team, Petra Brosch, Dr. Gerti Kappel, Dr. Werner Retschitzegger, Dr. Wieland Schwinger, Dr. Martina Seidl, Konrad Wieland, and Dr. Manuel Wimmer, who closely worked with me to build the basis of this thesis. Especially, I wish to thank Konrad Wieland with whom I studied for five years and shared an office for three years. He has always been a great colleague and friend. I would not have enjoyed my time in academia so much without him. I also would like to acknowledge all the other outstanding researchers, I had the honor to collaborate with during the last three years. My thanks go to Dr. Jordi Cabot, Dr. Claudia Ermel, Dr. Markus Herrmannsdoerfer, Dr. Birgit Hofreiter, Dr. Christian Huemer, Dr. Horst Kargl, Dr. Maximilian Koegel, Christian Pichler, Dr. Yu Sun, Dr. Gabriele Taentzer, and Dr. Jules White. This thesis would not have been possible without the support of my family, Regine and Hans, Julia and Peter, and Benjamin and Sara. My family has always been an essential element of my life and is a constant source of advice, concern, and dependability. Last but not least, I am deeply thankful to Andrea Rucker, who has always been there to share both my successes and my failings unconditionally. She encouraged me in difficult times and was greatly patient when I spent numerous nights and weekends working on this thesis instead of spending time with her. She always provided me with her help, advice, and understanding and constantly gave me the strength and confidence to move on.
iii
Abstract Model-driven engineering (MDE) is evermore adopted in academia and industry for being a new paradigm helping software developers to cope with the ever increasing complexity of software systems being developed. In MDE, software models constitute the central artifacts in the software engineering process, going beyond their traditional use as blueprints, and act as the single source of information for automatically generating executable software. Although MDE is a promising approach to master the complexity of software systems, so far it lacks proper concepts to deal with the ever growing size of software systems in practice. Developing a large software system entails the need for a large number of collaborating developers. Unfortunately, collaborative development of models is currently not sufficiently supported. Traditional versioning systems for code fail for models, because they treat models just as plain text files and, as a consequence, neglect the graph-based nature of models. A few dedicated model versioning approaches have been proposed, which directly operate on the models and not on the models’ textual representation. However, these approaches suffer from four major deficiencies. First, they either support only one modeling language or, if they are generic, they do not consider important specifics of a modeling language. Second, they do not allow the specification of composite operations such as refactorings and thus, third, they neglect the importance of respecting the original intention behind composite operations for detecting conflicts and constructing a merged model. Fourth, the types of detectable conflicts among concurrently applied operations is insufficient and not extensible by users. To address these deficiencies, we present four major contributions in this thesis. First, we introduce an adaptable model versioning framework, which aims at combining the advantages of two worlds; the proposed framework is generic and offers out-of-the-box support for all modeling languages conforming to a common meta-metamodel, but also allows to be adapted for enhancing the versioning support for specific modeling languages. Second, we propose a novel technique, called model transformation by demonstration, for easily specifying composite operations. Besides being executable, these composite operation specifications also constitute the adaptation artifacts for enhancing the proposed versioning system. More precisely, with our third contribution, we present a novel approach for detecting applications of specified composite operations without imposing any dependencies on the employed modeling environment. Fourth, we present a novel approach for detecting additional types of conflicts caused by concurrently applied composite operations. Furthermore, we contribute additional techniques for revealing potentially obfuscated or unfavorable merge results. Besides introducing the contributions from a conceptual point of view, we provide an open source implementation of these concepts and present empirical case studies and experiments for evaluating their usefulness and ease of use. v
Kurzfassung Model-driven engineering (MDE) findet als neues Softwareentwicklungsparadigma sowohl in der Wissenschaft als auch in der Industrie immer mehr Anwendung. Dabei werden Modelle als zentrale Artefakte der Softwareentwicklung angesehen und dienen nicht nur als Skizze oder Entwurf, sondern stellen zur Generierung von lauffähiger Software die einzige und vollständige Spezifikation dar. Auch wenn MDE ein vielversprechender Ansatz ist, der EntwicklerInnen dabei unterstützt die steigende Komplexität von Softwaresystemen zu meistern, fehlen derzeit Mittel und Wege mit der wachsenden Größe der zu entwickelnden Softwaresysteme umzugehen. Die Entwicklung großer Softwaresysteme erfordert die Zusammenarbeit vieler EntwicklerInnen. Kollaborative Entwicklung von Modellen wird jedoch derzeit nur unzureichend von MDE-Werkzeugen unterstützt. Herkömmliche Versionierungssysteme, eines der wichtigsten Werkzeuge für Softwarecode, sind für Modelle ungeeignet, da diese Systeme nur die textuelle Repräsentation von Modellen betrachten und die graphenähnliche Struktur von Modellen unberücksichtigt lassen. Um dieses Problem zu lösen wurden einige speziell für Modelle zugeschnittene Versionierungssysteme vorgestellt, die direkt mit Modellen und nicht mit ihrer textuellen Repräsentation arbeiten. Aktuelle Systeme weisen jedoch einige Mängel auf. Erstens unterstützen aktuelle Systeme entweder nur eine spezielle Modellierungssprache oder sie sind generisch und lassen daher die Besonderheiten von Modellierungssprachen gänzlich unberücksichtigt. Zweitens lassen existierende Modellversionierungssysteme die wichtige Bedeutung von zusammengesetzten Operationen wie z.B. Refactorings außer Acht. Drittens verabsäumen diese Systeme die Erkennung einiger wichtiger Konfliktarten und sind nicht von BenutzerInnen erweiterbar. Um die Mängel aktueller Systeme zu beseitigen, stellen wir ein adaptierbares Modellversionierungssystem vor, das die Vorteile von generischen und sprachspezifischen Versionierungssystemen vereint, indem es einerseits generisch ist, jedoch von BenutzerInnen in Hinsicht auf die Besonderheiten der Modellierungssprachen erweitert werden kann. Dafür stellen wir eine neue Technologie namens Model Transformation By Demonstration vor, die es auf einfache Weise erlaubt zusammengesetzten Operationen zu spezifizieren. Diese Spezifikationen sind nicht nur automatisch anwendbar, sondern dienen auch zur Erweiterung unseres Versionierungssystem. Einerseits ermöglichen sie die Erkennung von Anwendungen der spezifizierten Operationen. Andererseits ermöglichen sie die Erkennung spezieller Konflikte, die sich aus der gleichzeitigen Anwendungen von zusammengesetzten Operationen ergeben. Darüber hinaus behandelt diese Arbeit auch die Erkennung von weiteren potentiell unerwünschten Auswirkungen gleichzeitiger Änderungen. Die in dieser Arbeit vorgestellten Konzepte wurden in Form einer quelloffenen Implementierung veröffentlicht und mit empirischen Fallstudien und Experimenten evaluiert. vii
Contents 1
Introduction 1.1 Motivation . . . . . . . . . . . . 1.2 Model Versioning in its Infancy . 1.3 Contributions . . . . . . . . . . 1.4 Thesis Outline . . . . . . . . . .
. . . .
1 1 2 6 12
2
State of the Art 2.1 Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Software Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Model Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15 15 37 41
3
Adaptable Model Versioning 3.1 Motivating Examples . . . . . . . . 3.2 Categorization of Conflicts . . . . . 3.3 Design Principles of AMOR . . . . 3.4 Technical Infrastructure of AMOR . 3.5 Adaptable Merge Process of AMOR
49 49 59 64 66 72
. . . .
. . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . . .
4
Model Transformation By Demonstration 79 4.1 Endogenous Model Transformation By Demonstration . . . . . . . . . . . . . 80 4.2 Exogenous Model Transformation By Demonstration . . . . . . . . . . . . . . 114 4.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5
Operation Detection 131 5.1 Model Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2 Atomic Operation Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3 Composite Operation Detection . . . . . . . . . . . . . . . . . . . . . . . . . 147
6
Conflict Detection 6.1 Atomic Operation Conflict Detection . . 6.2 Composite Operation Conflict Detection 6.3 Signifier Warning Detection . . . . . . 6.4 Inconsistency Detection . . . . . . . . . 6.5 Limitations and Future Work . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
165 167 177 189 201 203 ix
7
Evaluation 205 7.1 Model Transformation By Demonstration . . . . . . . . . . . . . . . . . . . . 206 7.2 Composite Operation Detection . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.3 Conflict Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8
Conclusion 8.1 Contributions of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
231 231 232 233
A Open Source Implementation
235
List of Figures
238
Bibliography
243
x
CHAPTER
Introduction 1.1
Motivation
Software engineering [NRB69, PI82], being the systematic discipline of building high quality software systems, has a long history going back to the late 1960s. Since then, researchers and practitioners have been struggling to cope with the ever growing complexity and size of the developed systems. One way of coping with the complexity of a system has been raising the level of abstraction in the languages used to specify a system. As stated by Smith and Stotts, “the history of programming is an exercise in hierarchical abstraction. In each generation, language designers produce explicit constructs for conceptual lessons learned in the previous generation, . . . ” [SS02]. Besides dealing with the complexity of software systems under development, also managing the size of software systems constitutes a major challenge. As stated by Ghezzi et al., “software engineering deals with the building of software systems that are so large or so complex that they are built by teams of engineers” [GJM02]. Orthogonal to the challenge entailed by the complexity and size of software systems, dealing with the demand to evolve constantly a system, in order to meet ever changing and growing requirements, constitutes an additional major challenge. To summarize, Parnas defines software engineering as the “multi-person construction of multi-version software” [Par75]. More recently, model-driven engineering (MDE) has been proposed as a new paradigm for raising the level of abstraction once again [Béz05, GS03, Sch06]. In MDE, models, being an abstraction of the real world, are considered as central artifacts in the software engineering process, going beyond their traditional use as sketches and blueprints. Models constitute the basis and the single source of information to specify and automatically generate an executable system. Thereby, developers may build models that are less bound to an underlying implementation technology and are much closer to the problem domain [Sel03]. Consequently, developers are enabled to focus on modeling the problem domain instead of worrying about implementation details of a solution domain. As an ultimate result, MDE promises to decouple the developed solution from implementation-specific platforms, to raise the efficiency and ease of developing software, and, by implication, to achieve a software of higher quality. In the context of MDE, 1
1
the Object Management Group1 (OMG) has set the prerequisites for the adoption of MDE in practice by standardizing first the Unified Modeling Language (UML) [OMG03] and later the language for defining modeling languages, called Meta-Object Facility (MOF) [OMG04], as well as the common model exchange format XML Metadata Interchange (XMI) [OMG07] in the course of the Model Driven Architecture initiative [KWB03, Mel04, OMG05b]. Although MDE is a promising approach to cope with the ever growing complexity of systems, so far it lacks proper concepts to deal with the ever growing size of systems being built in practice. This, however, is crucial for MDE to succeed as a new paradigm in software engineering [FR07]. Developing a large system entails the need for a large number of developers who collaborate to succeed in creating a large system. Unfortunately, collaborative development of models is not sufficiently supported yet by current modeling tools [ABK+ 09]. As in traditional code-centric software engineering, versioning systems [CW98, Men02] are required, which allow for concurrent modification of the same model by several developers and which are capable of merging the operations applied by all developers to obtain ultimately one consolidated version of a model again.
1.2
Model Versioning in its Infancy
In traditional code-centric software engineering, text-based versioning systems, such as Git2 , Subversion3 , and CVS4 , have been successfully deployed to allow for collaborative development of large software systems. The reason for their success probably is that they can be used independently from the used programming language and integrated development environment (IDE). Especially, optimistic versioning systems gained remarkable popularity because they enable several developers to work concurrently on the same artifacts instead of pessimistically locking each artifact for the time it is changed by one developer. The price to pay for being able to work in parallel when using optimistic versioning systems is that after all developers finished their work, the operations of all developers have to be merged again. Merging is sometimes a tedious task because in case of spatially overlapping modifications conflicts are raised, which have to be manually resolved. To enable collaborative modeling among several team members, optimistic text-based versioning systems have been reused for models. Unfortunately, it turned out quickly that applying text-based comparison and conflict detection is inadequate for models and leads to unsatisfactory results [ABK+ 09]. This is because such versioning systems consider only text lines in a text-based representation of a model as, for instance, the XMI serializations. As a result, the information stemming from the model’s graph-based structure is destroyed and associated syntactic information is lost. Furthermore, obtained textual differences between the serialization of two versions of a model strongly differ from the operations actually performed by developers in their modeling environments. In other words, one operation applied to a model in a modeling editor often causes several scattered operations of multiple lines across the model’s textual rep1
http://www.omg.org http://git-scm.com 3 http://subversion.tigris.org 4 http://cvs.nongnu.org 2
2
resentation. Consequently, the correct identification of the actual model operations is impeded and hardly comprehensible when applying text-based versioning systems to models. However, correctly obtaining and understanding the actual model operations is crucial for detecting the effective conflicts and for creating a correctly merged model unifying all original operations. Nguyen et al. used the term impedance mismatch [NMB04] to refer to this unfavorable mismatch between an artifact’s representation put under version control and the representation users usually work with. This mismatch constitutes the root of the aforementioned drawbacks. To overcome these drawbacks caused by the impedance mismatch of text-based versioning systems used for models, dedicated model versioning approaches have been recently proposed (cf. Section 2.1.2 for a survey). Comparable to syntactic merge approaches [Men02], such approaches do not operate on the textual representation. Instead, they directly operate on the model’s graph-based structure to obtain applied operations, detect conflicts, and to eventually create a merged version. However, after carefully surveying these approaches the following major deficiencies have been identified in current approaches mitigating their use in practice. Deficiency 1: Dependency on the modeling editor versus imprecise versioning. The first task to be achieved when merging two concurrently modified versions is to obtain the operations that have been applied by developers in parallel. There are two approaches for obtaining operations. On the one hand, they may be identified using model differencing algorithms5 , which take two versions of a model as well as their common base version as input and compute the model differences by comparing these three states. On the other hand, operations between two versions of a model may be directly recorded6 in the modeling environment as they are performed by the user. In comparison to model differencing approaches, operation recording is, in general, more precise than model differencing. However, operation recording approaches inherently put restrictions on the used modeling editors because the editor used for modifying the model has to be capable of recording operations and represent them in a commonly processable format. However, recalling successful versioning systems for code, such as SVN and Git, only approaches that are independent from the used editor gained significant adoption in practice. Thus, we may draw the conclusion that a versioning system having an inherent dependency on the used editor might not find broad adoption in practice. In particular, as long as no standardized format for representing operations is available and as long as available modeling environments implement this standardized format, operation recording being the basis for collaborative modeling is in severe contradiction to the inherent vendor-neutral approach followed by the MDA initiative. Model differencing on the contrary is in tune with the goal of vendor independence but it lacks the precision of computed operations. The availability of precise operations, however, is crucial for a proper quality of the merge result. Furthermore, existing approaches are inflexible with respect to the trade-off between generic (i.e., language-independent) versioning and language-specific versioning. Generic versioning systems are applicable for all modeling languages conforming to a common meta-metamodel. However, such versioning systems are characterized by deficient versioning support because they neglect language-specific operations and conflicts. In contrast, language-specific versioning sys5 6
Also referred to as state-based versioning [BP08, CW98, Men02]. Also referred to as change- or operation-based versioning [CW98, KHWH10, LvO92, Men02].
3
tems are tightly bound to a modeling language and, therefore, usually provide better versioning quality for that specific language. However, this inflexibility poses a major drawback because of the rapidly growing number of domain-specific modeling languages (DSMLs) [GTK+ 07,KT08]. Moreover, it is very likely that several modeling languages are concurrently applied within one single project. Using language-specific versioning systems would entail using several versioning systems—one for each set of supported modeling languages—in one single project, which is usually infeasible. The issue of generic versus language-specific merging has already been posed by Westfechtel in [Wes91]: “On the one hand, the merge tool should be general, i.e., it should be applicable to arbitrary software documents. [. . . ] On the other hand, the merge tool should be intelligent, i.e., it should be based on a high-level concept of change in order to produce a result, which makes sense.” However, current model versioning approaches still offer no adequate solution to this issue yet. To summarize, the challenge is to achieve a high-quality operation and conflict detection without imposing dependencies regarding the used modeling editor and the supported modeling languages. How may the impreciseness of state-based model differencing be overcome? How may the quality of the conflict detection in generic versioning systems be increased in order to achieve the quality offered by versioning systems that also incorporate language-specific knowledge? Which language-specific knowledge is necessary for that? How may this knowledge be represented and plugged into a generic model versioning system? Deficiency 2: Tedious specification of composite operations. As in traditional code-centric software development, also models are often subjected to composite operations [SPLTJ01]. A composite operation is a set of cohesive atomic operations that are applied within one transaction to achieve ultimately one common goal. The most prominent class of such composite operations are refactorings as introduced by Opdyke [Opd92] and further elaborated by Fowler et al. [FBB+ 99]. Refactorings have well-defined preconditions, which specify whether a refactoring may be applied to a current state of an artifact and comprise a set of actions describing how to modify or “refactor” the current state to obtain an improved structure. As stated by Dig et al. [DMJN08], the knowledge and consideration of applied refactorings in the versioning process significantly improves the quality of the merge because the intention behind those operations constituting the refactoring can be considered while merging. The importance of considering refactorings in the context of the parallel evolution of software has also been stressed by Mens et al. [MTR05]; the knowledge on applied refactorings enables to detect so-called “structural refactoring conflicts”. Furthermore, the information on the applied composite operations helps other developers to better understand the evolution of a software artifact [KHvW+ 10]. Current model versioning approaches, however, largely neglect the importance of considering composite operations in model versioning. To enable model versioning systems to consider composite operations in the merge process, composite operations first have to be specified clearly. This specification must include the operation’s precise preconditions as well as their mechanics (i.e., the comprised atomic operations). Composite operations are inherently specific to a certain modeling language. Supporting many modeling languages requires to specify clearly all relevant composite operations for the languages of interest. When keeping domain-specific languages in mind, a pantheon of composite 4
operations have to be developed manually. As it seems to be impossible to pre-specify all combinations of composite operations, ideally, developers themselves should be enabled to specify composite operations on their own. Developers, however, are usually not trained to develop composite operations, or in more general terms model transformations, comprising explicit preconditions using currently existing model transformation techniques [SW08, Var06]. Therefore, the challenge is to develop an approach that eases the burden of creating welldefined specifications of composite operations. How may developers who are not trained to use model transformation techniques be enabled to develop model transformations on their own? Deficiency 3: Absence of information on applied composite operations. For taking composite operations into account, applications of composite operations have to be available explicitly in the list of obtained operations that have been applied between two versions of a model. One way to explicate applications of composite operations is to record the operations directly in the modeling editor. However, such recording approaches strongly depend on the modeling editor (cf. Deficiency 1). Moreover, a set of manually applied atomic operations, having together the intent of a composite operation (which is indeed frequently happening in practice [MHPB09]), cannot be identified by operation recording approaches because no explicit action has been executed in the modeling editor. When refraining from recording operations directly, state-based model differencing approaches have to be used. However, current model differencing approaches are not capable of detecting applications of composite operations because, so far, the a posteriori detection of applications of composite operations is an open issue. As a result, the information on applied composite operations is unavailable, which is, however, the crucial prerequisite for considering them subsequently in the merge process. To this end, the challenge is to build a model differencing approach that is capable of detecting applications of composite operations a posteriori. Enabling the detection of composite operation applications, which inherently are language-specific, is even more challenging, when aiming to apply a generic model differencing algorithm for the sake of language-independence. How may applications of composite operations be identified by a generic algorithm that solely analyzes two subsequent versions of a model? How may developers easily extend the set of detectable composite operations? Deficiency 4: Insufficient conflict detection. Existing model versioning systems fail to detect correctly all relevant conflicts [ABK+ 09]. Admittedly, in the model versioning research community, no full consensus has been established yet concerning the conflicts that are indeed relevant. Whether a scenario should be classified as a conflict often depends on how a modeling language is used, the goal of the modeling project, the phase of a project, or even on personal preferences. Hence, adaptability of the conflict detection component is all the more important because it enables developers to decide, depending on their use case, for which scenarios a conflict should be reported. However, current model versioning systems mostly provide no means for adaptability. Only very few systems support some basic configurations, such as the unit of comparison, but they do not allow users to perform more sophisticated customizations with respect to language-specific knowledge. Consider, for instance, an operation in a UML class diagram, which is primarily signified by its name, its return type, and its parameters. If developer 1 5
modifies an operation’s return type and developer 2 changes the name of the same operation, it is very likely that naively merging both modifications leads to an unfavourable result because both developers modified the primary meaning of the same operation, whereas they were not aware of the opposite modification. Aggravatingly, both modifications are not spatially overlapping, which is why current model versioning would not raise a conflict. As mentioned earlier, composite operations often have specific preconditions restricting the scenarios in which they may be applied. If developer 1 performs operations that violate the preconditions of an application of a composite operation that was performed by developer 2, a conflict should be raised; otherwise, the composite operation fails to be applied correctly in the merge process, which might lead to an erroneously merged model. Moreover, the knowledge on composite operations give a set of atomic operations a superior meaning reflecting the original intention of the modeler performing those operations more precisely. Being aware of this superior meaning, a model versioning system should regard the intention while merging and, for instance, incorporate also model elements in the application of the composite operations while merging that have been concurrently added by the another developer. However, current model versioning systems fail to raise conflicts with respect to composite operation’s preconditions and neglect the original intention behind applied composite operations. Addressing this deficiency poses several challenges, especially when aiming to use a generic conflict detection component for the sake of language-independence. What are the specifics of a modeling language that should be considered by a model versioning system in order to increase the quality of the conflict detection? How can these specifics be configured by users? How can a generic conflict detection component be designed to take those configured specifics into account? In the context of composite operations, it is currently unclear when to raise a conflict with respect to composite operations. Which types of conflicts may occur in scenarios that involve composite operations? How may such conflicts be detected by a generic conflict detection component for a user-extensible set of custom composite operations? How may a generic model versioning system also incorporate the original developer’s intention behind composite operations in the merge process?
1.3
Contributions
The overall goal of this thesis is to provide precise operation and conflict detection in the context of model versioning without imposing dependencies regarding the used modeling editor or modeling language. Nevertheless, language-specific composite operations should be considered and therefrom resulting merge conflicts should be detected. Before we discuss each contribution in detail, we briefly outline the applied versioning process (cf. Figure 1.1) as this process constitutes the context of the contributions presented in this thesis. In the course of this thesis, we apply a versioning process, which is referred to as check-out/check-in protocol [ELH+ 05] in the literature. According to this process, developers may concurrently check-out the latest version Vo of a model from a common repository at the time of t0 (cf. Figure 1.1). Thereby, a local working copy of Vo is created. Both developers may independently modify their working copies in parallel. As soon as one developer completes the work, assume this is developer 1, she performs a check-in at t1 . Because no other developer 6
Resolution
State-based Conflict Detection
Conflict-tolerant Merge
Operation-based Conflict Detection
Vr1
Operation Detection
Vo
Merge Process Developer 1
Check Out
Check In
Repository Check Out
Developer 2
Check In
Vo
t0
Vr2
t1
t2
t3
Figure 1.1: Versioning Process
performed a check-in in the meanwhile, her working copy can be saved directly as a new revised version Vr1 in the repository. Whenever developer 2 completes her task and performs the check-in, the versioning system recognizes that a new version has been created since the checkout. Therefore, the merge process is triggered at t2 in order to merge the new version Vr1 in the repository with the version Vr2 by developer 2. Once the merge is carried out, the resulting merged version, which incorporates the operations of developer 1 as well as those operations performed by developer 2, is saved in the repository. Within the versioning process, the merge process comprises the most sophisticated steps with the goal of unifying all concurrently performed operations of the involved developers and obtaining a consolidated merged model version. Ideally, this merged model version reflects the intentions of all developers without introducing any errors in the merged model. The first step of the merge is the operation detection aiming to identify the operations that have been applied by the developers to their working copies. In the next step, namely the operationbased conflict detection, all concurrent operations performed by both developers are revealed that interfere with each other. Besides operation-based conflicts, we also aim to detect statebased conflicts in a model to which all operations of both developers have been applied. By state-based conflicts we refer to violations of the modeling language’s validation rules coming from the metamodel and additional context conditions expressed using the Object Constraint Language (OCL) [OMG10]. To reveal such conflicts, we first have to compute a merged model version to be checked against the validation rules. As we might have identified operation-based conflicts previously, we apply a conflict-tolerant merge, which is capable of tolerating operation7
Adaptation Artifacts Match Rules
Operation Specifications C2
Signifier Specifications
Validation Rules
C4
Operation Operation Specification Specification Editor Editor C2C2
Vr2
Conflict Conflict Specification Specification Editor Editor
Resolution
C4
Adaptation AdaptationByByDemonstration Demonstration
State-based Conflict Detection
C3
Conflict-tolerant Merge
Operation-based Conflict Detection
Vo
Operation Detection
Vr1
Vm
C4C4
C4
Artifacts Adaptation Artifacts C1Adaptation Adaptable Model Versioning Framework Match Match Rules Rules
CX
… Contribution no. X of this thesis.
Operation Conflict Operation Conflict Specifications Specifications Specifications Specifications … Input/output C2C2
… Existing work is used.
Validation Validation Rules Rules … Adapts
C1
C4C4
Figure 1.2: Contributions of this Thesis
Version 11 Version
7
Resolution Resolution
Conflict-tolerant Conflict-tolerant Merge Merge State-based State-based Conflict Detection Conflict Detection
Operation Operation Detection Detection Operation-based Operation-based Conflict Detection Conflict Detection
based conflicts and which creates a merged version in every case. The resulting merged model Version Version 33 Version 0 Version is the input for0 the state-based conflict detection, which determines whether the merged model is well-formed and valid in terms of the modeling language’s rules. In case, conflicts have been identified, the involved developers have to specify a resolution for the raised conflicts first in order to obtain a consolidated model, which is finally saved as the new version in the repository. C3C3 C4C4 C4C4 Version 22 Version This general merge process has been conjointly elaborated by all project participants7 in the C1C1 Adaptable AdaptableModel ModelVersioning VersioningFramework Framework course of the research project AMOR8 . This thesis contributes solutions for creating the adapCXCX … … work is is used. Existing work used. no. X of this thesis. Contribution no. X of this thesis. tion artifacts andContribution for realizing the steps in… …Existing the merge process marked by C1C1toC4C4 in Figure 1.2, … «produces» relationship … «adapts» relationship … «produces» relationship … «adapts» relationship whereas the numbers 1 to 4 denote the respective contribution number introduced below. The remaining adaptation artifacts, namely the approaches for specifying match rules and adaptation rules, are adapted and integrated from existing work; the other steps in the merge process, namely conflict-tolerant merge and resolution, are not the particular focus of this thesis. For more information on these two steps, we kindly refer to the Ph.D. theses by Brosch [Bro11] and Wieland [Wie11], which have also been elaborated in the course of the project AMOR. In the following, we outline the contributions of this thesis in more detail. In alphabetical order: Petra Brosch, Gerti Kappel, Philip Langer, Werner Retschitzegger, Wieland Schwinger, Martina Seidl, Konrad Wieland, and Manuel Wimmer 8 AMOR (http://www.modelversioning.org), a research project funded by the Austrian Federal Ministry of Transport, Innovation, and Technology and the Austrian Research Promotion Agency under grant FIT-IT819584
8
C4
Contribution 1: Adaptable Model Versioning. According to the main principle of AMOR, the overall contribution of this thesis is to provide an adaptable versioning framework allowing for proper versioning support while ensuring generic applicability for various DSMLs. Therefore, the generic framework offers out-of-the-box support for all modeling languages conforming to a common meta-metamodel and, additionally, it enables users to improve the quality of the versioning capabilities by adapting the framework to specific modeling languages using certain well-defined adaptation points. Thereby, developers are empowered to balance flexibly between reasonable adaptation efforts and the required level for versioning support. The adaptation artifacts that can be created and plugged into the system in order to improve the versioning support for specific modeling languages are depicted in Figure 1.2. Contribution 2: Composite Operation Specifications. Predefined composite operations are helpful for efficient modeling: in particular, for automatically executing recurrent refactorings, applying model completions, and introducing patterns to existing models. Moreover, as previously stated, the availability of explicit specifications of composite operations is the prerequisite for considering applications of such operations in the merge process. Composite operations are inherently specific to a certain modeling language. However, as it is infeasible to predefine all relevant operation specifications for all modeling languages being used, developers should be enabled to specify such operations on their own and adapt the versioning system to allow for these composite operation specifications (cf. Figure 1.1). Composite operations are, in more general terms, endogenous model transformations [MG06]; that is, model transformations that incrementally transform an existing model. Consequently, the source and the target metamodel of an endogenous model transformation are the same. However, the specification of new model transformations requires programming skills involving dedicated model transformation languages and, by implication, deep knowledge of the respective metamodel [SW08, Var06]. Usually, developers do not have such skills. Therefore, in this thesis, we introduce a method for specifying endogenous model transformation within the user’s modeling language and environment of choice enabling to create easily Operation Specifications (cf. Figure 1.2). The ease of creation is achieved by introducing model transformation by demonstration. Thereby, developers apply or “demonstrate” the transformation to an example model once and, from this demonstration as well as from the provided example model, the generic transformation (i.e., the Operation Specification) is semi-automatically derived including its explicit preconditions, operations to be applied, and postconditions. For model versioning purposes, endogenous transformations are of major importance, which is why we focus on specifying endogenous transformations in this thesis. However, we also show how this approach for endogenous transformation can be extended to also enable the specification of exogenous transformations [MG06]; that is, transformations generating a new target model from an existing source model, whereas source and target model may correspond to different metamodels. By using this extension, transformations can be specified by demonstration that translate models from one modeling language to another.
9
Contribution 3: Operation Detection. The first step of the merge process (cf. Figure 1.2) is to identify operations explicitly—including atomic operations as well as composite operations— which have been applied between two versions of a model (e.g., Vo and Vr1 in Figure 1.2). As previously stated, operations applied between two versions of a model can be obtained either by recording the operations directly in the modeling editor or by applying model differencing. To avoid restricting the editor to be used, we refrain from recording the operations and apply model differencing using a two-phase process. First, a match is computed, which describes the correspondences between two versions of a model. In the second phase, differences are obtained by a fine-grained comparison of all corresponding model elements based on the beforehand computed match. Consequently, the quality of the obtained operations heavily depends on the quality of the computed match. To achieve a high-quality match, we assign universally unique IDs (UUIDs) to each model element and exploit these UUIDs for precisely matching model elements again after they have been modified. However, removed and re-added model elements (e.g., cut and paste) or similar model elements that have been added concurrently, have a different UUID, although they are equal. Hence, they cannot be matched because the content and characteristics of a model element are not considered in UUID-based matching. Aggravatingly, it is specific to the modeling language, to decide which characteristics of a model element should be used for determining whether they should be considered as a characteristic-based match. Therefore, we allow developers to specify language-specific match rules, which adapt the behaviour of the match algorithm for elements that could not be matched based on UUIDs. The language for expressing these match rules as well as the framework for evaluating those rules have been reused from existing work [Kol09]. Based on this improved match, atomic operations may be obtained precisely. However, as motivated above, not only atomic operations but also composite operations are a valuable source of information for versioning and allow for considering the actual developer’s intention behind a set of atomic operations. Therefore, in this thesis, we contribute an a posteriori composite operation detection method by which occurrences of composite operations applied between two versions of a model can be identified. The specifications of composite operations, which are created by users (cf. Contribution 2), is used for automatically executing them in the modeling environment as well as for detecting applications of the executed composite operations. Hence, users may easily extend the set of detectable and executable composite operations by using the aforementioned model transformation by demonstration approach. Contribution 4: Conflict Detection. Having obtained all atomic operations as well as all composite operations that have been applied concurrently by two developers, we then have to search for conflicts. We distinguish between two types of conflicts, in particular, operation-based conflicts and state-based conflicts. Operation-based conflicts denote two concurrently applied operations that interfere with each other. Such conflicts occur, if, for instance, one developer deletes a model element and another developer modifies the same model element. Obviously, we may not apply both operations without omitting the effect of one of these operations. For detecting such conflicts, we introduce dedicated conflict detection patterns in this thesis. Besides operation-based conflicts between atomic operations, we also have to regard operation-based conflicts arising from the application 10
of composite operations. For instance, if a composite operation that has been applied by one developer cannot be applied anymore after a concurrent atomic operation performed by the other developer, because the atomic operation modifies the model in a way that the preconditions of the composite operation fail. Therefore, we present an algorithm to identify situations in which applications of composite operations are interfered by concurrent operations performed by the opposite developer. Using the contributed conflict detection algorithms, which respect atomic operations as well as composite operation specifications, we are able to detect a wide range of important conflicts. However, for certain modeling languages, developers might want to adapt the versioning system to raise additional warnings with respect to language-specific knowledge. For example, as already mentioned above, two developers concurrently modify the same operation in a UML class diagram. Developer 1 changes the operation name, while developer 2 concurrently modifies the same operation’s return type. A generic model versioning system is not aware of the fact that an operation’s return type, in combination with its name and its parameters convey the superior meaning of an operation. As a result, no warning will be raised for these parallel modifications, because they are indeed not spatially overlapping, but they concurrently modify the superior meaning of the same operation potentially leading to unrecognized contradictions. To address this deficiency, we introduce an adaptation point allowing users to specify so-called signifiers of model element types of their modeling languages (cf. Signifier Specifications in Figure 1.2). By signifier, we refer to a combination of specific features of a model element type, which convey the superior meaning of its instances (e.g., the name, the return type, and the parameters of a UML operation). For detecting such issues mentioned before, we present a dedicated detection algorithm, which analyses the concurrent modifications of a model element’s signifier based on the language-specific signifier specifications provided by the user. State-based conflicts denote violations of the validation rules of a modeling language in the merged model. Such violations are also referred to as inconsistencies in literature. Validation rules for checking the consistency are inherently specific to a modeling language and may, therefore, be plugged into the system. Once plugged in, the versioning system validates each merged model using the specified validation rules and raises additional conflicts in case a rule is violated. Well-formedness and validation rules are part of the modeling language definition. Therefore, we reuse those definitions and apply existing validation frameworks to reveal state-based conflicts.
Open Source Implementation. Besides introducing all elaborated approaches from a conceptual point of view, we provide a prototypical implementation of the approaches presented in this thesis. The contributed implementations are based on the Eclipse Modeling Framework9 [SBPM08] and available under the terms of the Eclipse Public License10 (EPL 1.0). For more information on the contributed implementations, we kindly refer to Appendix A. 9 10
http://www.eclipse.org/modeling/emf http://www.eclipse.org/legal/epl-v10.html
11
1.4
Thesis Outline
This thesis is structured according to the previously introduced merge process. Parts of this thesis have been published in peer-reviewed journals, conferences, and workshops. Some initial ideas originate from previous work published in my Master’s thesis [Lan09] (in German) and have been extended in this Ph.D. thesis. In the following, we give a short overview of the remaining chapters of this thesis and refer to our publications that partially overlap with the content of the respective chapter. Chapter 2: State of the Art In the next chapter, we introduce the fundamental concepts of the involved research domains and survey existing approaches in the area of versioning, software adaptation, and model transformation. This chapter contains contents also published in [BKL+ 11a, KLR+ 11]. Chapter 3: Adaptable Model Versioning. In this chapter, we present the big picture of the proposed adaptable model versioning system. In particular, we introduce some motivating examples posing challenges to be solved in this thesis and present the generic AMOR merge process, which has been conjointly elaborated by all project participants. Next, we show how this process is extended to be adaptable by users in order to incorporate language-specific knowledge. This chapter contains contents also published in [BKS+ 10]. Chapter 4: Model Transformation By Demonstration. The specification of composite operations is the prerequisite for respecting applications of composite operations in the merge process. Therefore, we introduce our editor- and language-independent approach for specifying model transformation by demonstration for endogenous transformations, as well as for exogenous transformations in Chapter 4. This chapter contains contents also published in [BLS+ 09, LWB10, LWK10]. Chapter 5: Operation Detection. In this chapter, we show how operations applied between two successive versions of a model are obtained by only analyzing their states. In particular, we provide insights into the applied match function for finding corresponding model elements across model versions, the identification of atomic operations applied between these model versions, and, finally, how applications of composite operations are detected a posteriori. This chapter contains contents also published in [LWB10, TELW10, TELW11]. Chapter 6: Conflict Detection. Having obtained all applied operations, this chapter presents the conflict patterns used to detect operation-based conflicts between atomic operations. Subsequently, we introduce our approach to detecting conflicts between composite operations. Moreover, we describe the specification as well as the detection of custom language-specific conflicts and, finally, we show how state-based conflicts are revealed. This chapter contains contents also published in [LWB10, TELW10, TELW11]. 12
Chapter 7: Evaluation. In this chapter, we provide a detailed evaluation of each contribution presented in this thesis. This involves case studies, empirical user studies as well as precision/recall analysis and performance tests of the contributed implementations of the presented approaches. In addition to the evaluation of our own approach we also present comparisons with state-of-the-art approaches in the respective fields. Chapter 8: Conclusion. Finally, the contributions of the thesis are summarized and critically discussed. In this chapter, we point out current limitations and interesting research directions to be addressed in future.
13
CHAPTER
State of the Art In this chapter, we introduce the scientific foundations and survey the state of the art in the research areas that are related to the topics of this thesis. As the overall goal of this thesis is concerned with versioning of software models, we introduce the scientific background of versioning in software engineering being the predecessor of model versioning and survey existing model versioning systems in Section 2.1. Subsequently, we introduce the research area of software adaptation in Section 2.2 because the proposed model versioning system is designed to be adaptable to specific modeling languages. One major adaptation point of the model versioning system concerns composite operations applied to models. Composite operations are, in more general terms, model transformations. Thus, we survey existing model transformation approaches in Section 2.3.
2.1
Versioning
The history of versioning in software engineering goes back to the early 1970ies. Since then, software versioning was constantly an active research topic. As stated by Estublier et al. in [ELH+ 05], the goal of software versioning systems is twofold. First, such systems are concerned with maintaining a historical archive of a set of artifacts as they undergo a series of operations and form the fundamental building block for the entire field of Source Configuration Management (SCM), which deals with controlling change in large and complex software systems. Second, versioning systems aim at managing the evolution of software artifacts performed by a distributed team of developers. In that long history of research on software versioning, diverse formalisms and technologies emerged. To categorize this variety of different approaches, Conradi and Westfechtel [CW98] proposed version models describing the diverse characteristics of existing versioning approaches. A version model specifies the objects to be versioned, version identification and organization, as well as operations for retrieving existing versions and constructing new versions. Conradi and Westfechtel distinguish between the product space and the version space within version models. 15
2
The product space describes the structure of a software product and its artifacts without taking versions into account. In contrast, the version space is agnostic of the artifacts’ structure and copes with the artifacts’ evolution by introducing versions and relationships between versions of an artifact, such as, for instance, their differences (deltas). Further, Conradi and Westfechtel distinguish between extensional and intentional versioning. Extensional versioning deals with the reconstruction of previously created versions and, therefore, concerns version identification, immutability, and efficient storage. All versions are explicit and have been checked in once before. Intentional versioning deals with flexible automatic construction of consistent versions from a version space. In other words, intentional versioning allows for annotating properties to specific versions and querying the version space for these properties in order to derive a new product consisting of a specific combination of different versions. In this thesis, we only consider extensional versioning in terms of having explicit versions, because this kind of versioning is predominantly applied in practice nowadays. Furthermore, we focus on the merge phase in the optimistic versioning process (cf. Figure 1.1). In this section, we first outline the fundamental design dimensions of versioning systems. Subsequently, we present some representatives of versioning systems using different designs. Finally, we elaborate on the consequences of different design possibilities considering the quality of the merged version based on an example.
2.1.1
Fundamental Design Dimensions for Versioning Systems
Current approaches to merging two versions of one software artifact (software models or source code) can be categorized according to two basic dimensions (cf. Figure 2.1). The first dimension concerns the product space, in particular, the artifact representation. This dimension denotes the representation of a software artifact, on which the merge approach operates. The used representation may either be text-based or graph-based. Some merge approaches operate on a tree-based representation. However, we consider a tree as a special kind of graph in this categorization. The second dimension is orthogonal to the first one and concerns how deltas are identified, represented, and merged in order to create a consolidated version. Existing merge approaches either operate on the states; that is, the versions of an artifact, or on identified operations that have been applied between a common origin model (cf. Version 0 in Figure 1.1) and the two successors (cf. Version 1 and 2 in Figure 1.1). When merging two concurrently modified versions of a software artifact, conflicts might inevitably occur. The most basic types of conflicts are update-update and delete-update conflicts. Update-update conflicts occur if two elements have been updated in both versions whereas delete-update conflicts are raised if an element has been updated in one version and deleted in the other. A detailed discussion on more complex types of conflicts is given in Chapter 3. For more information on software merging in general, the interested reader is referred to [Men02]. Text-based merge approaches operate solely on the textual representation of a software artifact in terms of text files. Within a text file, the atomic unit of the versioned text file may either be a paragraph, a line, a word, or even an arbitrary set of characters. The major advantage of such approaches is their independence of the programming languages used in the versioned artifacts. Since a solely text- based approach does not require language-specific knowledge it may be adopted for all flat text files. This advantage is probably, besides simplicity and efficiency, 16
Graph-based Text-based
Artifact Representation
EMF Compare
EMF Store
JDiff
Lippe & Oosterom (1992)
SVN
Git
CVS
bazaar
State-based
MolhadoRef
Operation-based
Delta Identification and Representation
Figure 2.1: Categorization of Versioning Systems
the reason for the widespread adoption of pure text-based approaches in practice. However, when merging flat files—agnostic of the syntax and semantics of a programming language— both compile-time and run-time errors might be introduced during the merge. Therefore, graphbased approaches emerged, which take syntax and semantics into account. Graph-based merge approaches operate on a graph-based representation of a software artifact for achieving more precise conflict detection and merging. Such approaches de-serialize or translate the versioned software artifact into a specific structure before merging. Mens [Men02] categorized these approaches in syntactic and semantic merge approaches. Syntactic merge approaches consider the syntax of a programming language by, for instance, translating the text file into the abstract syntax tree and, subsequently, performing the merge in a syntax-aware manner. Consequently, unimportant textual conflicts, which are, for instance, caused by reformatting the text file, may be avoided. Furthermore, such approaches may also avoid syntactically erroneous merge results. However, the textual formatting intended by the developers might be obfuscated by syntactic merging because only a graph-based representation of the syntax is merged and has to be translated back to text eventually. Westfechtel was among the first to propose a merging algorithm that operates on the abstract syntax tree of a software artifact [Wes91]. Semantic merge approaches go one step further and consider also the static and/or dynamic semantics of a programming language. Therefore, these approaches may also detect issues, such as undeclared variables or even infinite loops by using complex formalisms like program dependency graphs and program slicing. Naturally, these advantages over flat textual merging have the disadvantage of the inherent language dependence (cf. [Men02]) and their increased computational complexity. Furthermore, it is not always trivial to point the developers to the modifications that caused the conflict. If such a trace back to the causing modifications is missing or inaccurate, it might be difficult for developers to understand and resolve the raised conflicts since they are reported based on a different representation, i.e., the graph, of the artifact, and not in the textual representation the developer is familiar with. The second dimension in Figure 2.1 is orthogonal to the first one and considers how deltas are identified and merged in order to create a consolidated version. This dimension is agnostic of the unit of versioning. Therefore, a versioned element might be a line in a flat text file, a node 17
in a graph, or whatsoever constitutes the representation used for merging. State-based merging compares the states, i.e., versions, of a software artifact to identify the differences (deltas) between these versions and merge all differences that are not contradicting with each other. Such approaches may either be applied to two states (Version 1 and Version 2 in Figure 1.1), called two-way merging, or to three states (including their common ancestor Version 0 in Figure 1.1), called three-way merging. Two-way merging cannot identify deletions since the common original state is unknown. A state-based comparison requires a match function which determines whether two elements of the compared artifact correspond to each other. The easiest way to match two elements is to search for completely equivalent elements. However, the quality of the match function is crucial for the overall quality of the merge approach. Therefore, especially graph-based merge approaches often use more sophisticated matching techniques based on identifiers and heuristics (cf. [KN06] for an overview of matching techniques). Model matching, or more generally the graph isomorphism problem is NP-hard (cf. [KR96]) and, therefore, very computation intensive. If the match function is capable of matching also partially different elements, a difference function is additionally required to determine the fine-grained differences between two corresponding elements. Having these two functions, two states of the same artifact may be merged with the algorithm shown in Algorithm 2.1. Note that this algorithm only serves to clarify conceptually basic state-based merging. This algorithm is applicable for both text-based and graph-based merging, whereas nX denotes the atomic element n within the product space of Version X; that is, no for an element in the common origin version and n1 or n2 for an element in the two revised versions, respectively. In line 1 of Algorithm 2.1, the merged version Vm is initialized by creating a copy of Vo . Then, it iterates through each element no in the common origin version Vo of a software artifact. In line 3 and 4, the elements matching with no are retrieved from the two modified versions Vr1 and Vr2 . However, there might be no match for no in Vr1 or Vr2 because no might have been removed. If no has a match in both versions Vr1 and Vr2 (cf. line 5), the algorithm checks whether no has been modified in the versions Vr1 and Vr2 . If the matching element, either n1 or n2 , is different from the original element no (i.e., it has been modified) in one and only one of the two versions Vr1 and Vr2 , the modified element is used for creating the merged version (cf. line 7 or line 10). If, however, the matching element is different in both versions, an updateupdate conflict is raised by the algorithm (cf. line 13). If the matching element has not been modified at all, the original element no can be left as it is in the merged version (cf. line 16). Next, the algorithm checks if there is no match for no in one of the two modified versions (i.e., it has been removed). If so, the algorithm determines whether it has been concurrently modified and raises, in this case, a delete-update conflict (cf. line 20 and line 24). If the element has not been concurrently modified, it is removed from the merged version (cf. line 21 and line 25). The element no is also removed, if there is no match in both modified versions; that is, it has been deleted in both versions (cf. line 28). Finally, the algorithm adds all elements from Vr1 and Vr2 that have no match in the original version Vo and, consequently, have been added in Vr1 or Vr2 (cf. line 32 and line 35). Operation-based merging does not operate on the states of an artifact. Instead, the operation sequences which have been concurrently applied to the original version are recorded and analyzed. Since the operations are directly recorded by the applied editor, operation-based 18
input : Common origin model Vo , two revised models Vr1 and Vr2 output: The merged model version Vm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Vm ← Vo // Initialize Vm with the contents of Vo foreach no ∈ Vo do n1 ← match(no in Vr1 ) n2 ← match(no in Vr2 ) if hasMatch(no in Vr1 ) ∧ hasMatch(no in Vr2 ) then if diff(no , n1 ) ∧ ¬ diff(no , n2 ) then Replace no with n1 in Vm end if ¬ diff(no , n1 ) ∧ diff(no , n2 ) then Replace no with n2 in Vm end if diff(no , n1 ) ∧ diff(no , n2 ) then Raise update-update conflict end if ¬ diff(no , n1 ) ∧ ¬ diff(no , n2 ) then Leave no as it is in Vm end end if hasMatch(no in Vr1 ) ∧ ¬ hasMatch(no in Vr2 ) then if diff(no , n1 ) then Raise delete-update conflict else Remove no in Vm end if ¬ hasMatch(no in Vr1 ) ∧ hasMatch(no in Vr2 ) then if diff(no , n2 ) then Raise delete-update conflict else Remove no in Vm end if ¬ hasMatch(no in Vr1 ) ∧ ¬ hasMatch(no in Vr2 ) then Remove no in Vm end end foreach n1 ∈ Vr1 do if ¬ hasMatch(n1 in Vo ) then Add n1 to Vm end foreach n2 ∈ Vr2 do if ¬ hasMatch(n2 in Vo ) then Add n2 to Vm end Algorithm 2.1: State-based Merge Algorithm
19
approaches may support, besides recording atomic operations, also to record composite operations, such as refactorings (e.g., [KHWH10]). The knowledge on applied refactorings may significantly increase the quality of the merge as stated by Dig et al. [DMJN08]. The downside of operation recording is the strong dependency on the applied editor, since it has to record each performed operation and it has to provide this operation sequence in a format which the merge approach is able to process. The directly recorded operation sequence might include obsolete operations, such as updates to an element which will be removed later on. Therefore, many operation-based approaches apply a cleansing algorithm to the recorded operation sequence for more efficient merging. The operations within the operation sequence might be interdependent because some of the operations cannot be applied until other operations have been applied. As soon as the operation sequences are available, operation-based approaches check parallel operation sequences (Version 0 to Version 1 and Version 0 to Version 2) for commutativity to reveal conflicts (cf. [LvO92]). Consequently, a decision procedure for commutativity is required. Such decision procedures are not necessarily trivial. In the simplest yet least efficient form, each pair of operations within the cross product of all atomic operations in both sequences are applied in both possible orders to the artifact and both results are checked for equality. If they are not equivalent, the operations are not commutative. After checking for commutativity, operationbased merge approaches apply all non-conflicting (commutative) operations of both sides to the common ancestor in order to obtain a merged model. In comparison to state-based approaches, the recorded operation sequences are, in general, more precise and potentially allow for gathering more information (e.g., change order and refactorings), than state-based differencing. In particular, state-based approaches do not rely on a precise matching technique. Moreover, state-based comparison approaches are—due to complex comparison algorithms—very expensive regarding their run-time in contrast to operation-based change recording. However, these advantages come at the price of strong editor-dependence. Furthermore, one part of the computational complexity which was saved in contrast to statebased matching and differencing is lost again due to operation sequence cleansing and non-trivial checking for commutativity. Nevertheless, operation-based approaches scale for large models from a conceptual point of view because their computational effort mainly depends on the length of the operation sequences and—in contrast to state-based approaches—not on the size of the models [KHWH10]. Anyhow, the border between state-based and operation-based merging is sometimes blurry. Indeed, we can clearly distinguish whether the operations are recorded or differences are derived from the states, nevertheless, some state-based approaches derive the applied operations from the states and use operation-based conflict detection techniques. However, this is only reasonable if a reliable matching function is available, for instance, using unique identifiers. On the contrary, some operation-based approaches derive the states from their operation sequences to check for potentially inconsistent states after merging. Such an inconsistent state might for instance be a violation of the syntactic rules of a language. Detecting such conflicts is often not possible by solely analyzing the operation sequences. Eventually, the conflict detection strategies conducted in state-based and operation-based approaches are very similar from a conceptual point of view. Both check for direct or indirect concurrent modifications to the same element and try to identify illegal states after merging, whether the modifications are explicitly given in terms of operations 20
or whether they are implicitly derived from a match between two states. Selected Representatives In Figure 2.1, we show some representatives for each combination of the two dimensions in the domain of source code versioning, as well as model versioning. In the following, we briefly introduce and compare the representatives listed in Figure 2.1. For a more detailed description of existing model versioning approaches we kindly refer to Section 2.1.2. The combination of text-based and state-based merge approaches are probably the most adopted ones in practice. For instance, traditional central version control systems, such as CVS1 and SVN2 , use state-based three-way merging of flat text files. The smallest indivisible unit of merging in these systems is usually a line within a text file, as is the case for the Unix diff utility [HM76]. Lines are matched across different versions by searching for the Least Common Sub-sequence (LCS). For efficiency, usually only completely equal lines are matched and, therefore, no dedicated difference function for deriving the actual difference between two lines is required: A line is simply either matched and, therefore, equal or unmatched and, therefore, considered to be added or removed at a certain position in a text file. Consequently, parallel modifications to different lines can be merged without user intervention as long as they are at different positions. As soon as the same line is modified in both versions (Version 1 and Version 2) or modified and concurrently deleted, a conflict is annotated in the merged file. As stated earlier, due to their syntax and semantics unawareness, compile-time and run-time errors might be introduced by the merge. The same applies to the distributed version control systems (DVCS) git3 and bazaar4 , since they are also state-based and line-based. The major difference to SVN and CVS is their distributed nature. DVCS disclaim a single central repository and take a peer-to-peer approach instead. Developers commit their operations to a local repository, i.e., a peer, and push them to other remote peers as they wish. Besides several other organizational advantages, this enables a higher commit frequency since a commit does not immediately affect other developers. Operations might, therefore, be grouped into atomic commits and pushed to other peers more easily which is a step towards operation-based merging. MolhadoRef [DMJN08], a representative for text- and operation-based approaches, aims at improving the merge result by considering refactorings applied to object-oriented (Java) programs. Applications of refactorings are recorded in the development environment. When two versions are merged, all recorded refactorings are undone in both modified versions. Then the versions, excluding the refactoring applications, are merged in a traditional text-based manner, and, finally, all refactorings are re-applied to this merged version. This significantly improves the merge result and avoids unnecessary conflicts in many scenarios. However, as already mentioned, a strong dependency to the applied editor is given because the editor has to provide operation logs. Furthermore, handling refactorings requires language-specific knowledge encoded in the merge component. 1
http://www.cvshome.org http://subversion.tigris.org 3 http://git-scm.com 4 http://bazaar.canonical.com 2
21
Several state-based approaches exist which operate on a graph-based representation of the versioned software artifact. In Figure 2.1, we cite two representatives for graph-based and statebased approaches—one for source code, namely JDiff [AOH07], and one for software models, namely EMF Compare5 [BP08]. JDiff is a graph-based differencing approach for Java source code. Corresponding classes, interfaces and methods are matched by their qualified name or signature. This matching also accounts for the possibility to interact with the user in order to improve the match of renamed but still corresponding elements due to the absence of unique identifiers. For matching and differencing the method bodies, the approach builds enhanced control-flow graphs representing the statements in the bodies and compares them. Thereby, JDiff can provide information that accurately reflects the effects of code operations on the program at the statement level. EMF Compare is a model comparison framework for EMF based models. It applies heuristics for matching model elements and can detect differences between matched elements on a fine-grained level (metamodel features of each model element). The matching and differencing is applied on the generic model-based representation of the elements. There are several purely operation-based approaches which record operations directly and apply merging on a graph-based representation. The first paper, which introduced operationbased merging was published by Lippe and Oosterom [LvO92]. They propose to record all operations applied to an object-oriented database system. After the precise change-sets are available due to recording, they are merged by re-applying all their operations to the common ancestor version. In general, a pair of operations is conflicting if they are not commutative. EMFStore [KHWH10] is an operation- and graph-based versioning system for software models. Since EMF Compare and EMFStore are representatives of model versioning systems, they are further elaborated on in Section 2.1.2.
Consequences of Design Decisions To highlight the benefits and drawbacks of the four possible combinations of the versioning approaches based on Figure 2.1, we present a small versioning example depicted in Figure 2.2 and conceptually apply each approach for analyzing its quality in terms of the detected conflicts and derived merged version. Consider a small language for specifying classes, its properties, and references linking two classes. The textual representation of this language is depicted in the upper left area of Figure 2.2 and defined by the EBNF-like Xtext6 grammar specified in the box labeled Grammar. The same language and the same examples are depicted in terms of graphs in the lower part of Figure 2.2. In the initial version (Version 0) of the example, there are two classes, namely Human and Vehicle. The class Human contains a property name and the class Vehicle contains a property named carNo. Now, two users concurrently modify Version 0 and create Version 1 and Version 2, respectively. All operations in Version 1 and Version 2 are highlighted with bold fonts or edges in Figure 2.2. The first user changes the name of the class Human to Person, sets the lower bound of the property carNo to 1 (because every car must have exactly one number) and adds an 5 6
22
http://www.eclipse.org/emft/projects/compare http://www.eclipse.org/Xtext
Version 0
Text-based Representation
1: 2: 3: 4: 5: 6:
Version 1
class Human { string[1..1] name } class Vehicle { integer[0..1] carNo }
1: 2: 3: 4: 5: 6: 7:
class Person { string[1..1] name Vehicle[0..*] owns } class Vehicle { integer[1..1] carNo }
1: 2: 3: 4: 5: 6:
class Human { string[1..1] name } class Car { integer[0..1] regId }
Grammar Class:= "class" name=ID "{" (properties+=Property)* (references+=Reference)* "}"; Reference:= target=[Class] "[" lower=BOUND ".." upper=BOUND "]" name=ID; Property:= type=ID "[" lower=BOUND ".." upper=BOUND "]" name=ID; terminal ID:= ('a'..'z'|'A'..'Z'|'_')+; terminal BOUND:= (('0'..'9')+)|('*');
Version 2
Version 0
Version 1 Person : Class
Graph-based Representation
Human : Class
Vehicle : Class
name : Property type = string lower = 1 upper = 1
carNo : Property type = integer lower = 0 upper = 1
owns : Reference lower = 0 upper = *
Vehicle : Class
:
Containment Edge
carNo : Property type = integer lower = 1 upper = 1
Version 2 Human : Class
name : Property type = string lower = 1 upper = 1
Car : Class
regId : Property type = integer lower = 0 upper = 1
Legend =
name : Property type = string lower = 1 upper = 1
Edge
Figure 2.2: Versioning Example
explicit reference owns to Person. Concurrently, the second user renames the property carNo to regId and the class Vehicle to Car. Text-based versioning. When merging this example with text- and state-based approaches (cf. Figure 2.3a for the result) where the artifact’s representation is a single line and the match function only matches completely equal lines (as with SVN, CVS, Git, and bazaar), the first line is correctly merged since it has only been modified in Version 1 and remained untouched in Version 2 (cf. Algorithm 2.1). The same is true for the added reference in line 3 of Version 1 and the renamed class Car in line 4 of Version 2. However, the property carNo shown in line 5 in Version 0 has been changed in both Versions 1 (line 6) and Version 2 (line 5). Although different features of this property have been modified (lower and name), these modifications result in a 23
Version Version 3 3 class Person 1: 1: class Person { { string[1..1] name 2: 2: string[1..1] name Vehicle[0..*] owns 3: 3: Vehicle[0..*] owns Version3 3 4: 1: class Person{ { } } Person Version 1:4:class a: integer[1..1] carNo 5: class Car { 2: string[1..1] name a: integer[1..1] carNo 5: class Car { 2: 6: string[1..1] name b: integer[0..1] regId b: owns integer[0..1] regId Vehicle[0..*] 6: 3:3: Vehicle[0..*] owns c: integer[1..1] regId 7: } c: integer[1..1] regId 7: } 4:4:} } a: integer[1..1] carNo classCar Car{ { a: integer[1..1] carNo 5:5:class integer[0..1] regId (a)(a) b:b: integer[0..1] regId 6:6: (a) State-based Versioning integer[1..1] regId c:c:integer[1..1] regId 7:7:} } (a) (a)
Version Version 3 3 Rename-Op: class Person Rename-Op: 1: 1: class Person { { string[1..1] name 2: 2: string[1..1] name change Class.name; change Class.name; Car[0..*] owns 3: 3: Car[0..*] owns update Property.type Property.type Version 3 update 4: } 1: class Person { Rename-Op: 4: } Version 3 1: class Person { Rename-Op:
[email protected] a: integer[1..1]
[email protected] 5: class Car withwith string[1..1] name a:name integer[1..1] carNo 5: class Car { { change Class.name; 2: 2: string[1..1] change Class.name; b: integer[0..1]
[email protected]; 6:
[email protected]; b: integer[0..1] regId Car[0..*] owns 6: 3: 3: Car[0..*] owns update Property.type integer[1..1] regId update Property.type 7: c: c: integer[1..1] regId 7: 4: } } 4: } }
[email protected] integer[1..1]
[email protected] class withwith a: a: integer[1..1] carNo 5: 5: class CarCar { { (b) b: integer[0..1] regId
[email protected]; (b) 6: b: integer[0..1] Versioning regId 6:
[email protected]; (b) Operation-based integer[1..1] regId c: c: integer[1..1] regId 7: 7: } }
Figure 2.3: Text-based Versioning (b)(b) Example Version Version 3 3
name : Property name : Property type= string = string type carNo : Property carNo : Property lower lower = 1= 1 Version type= integer = integer Version 3 type upper =1 upper = 1 name : Property lower Person :Reference Class name : Property lower =1 Person : :Class owns =1 owns : Reference upper type = = string upper = 1=: 1Property type string lower carNo : Property lower = 0= 0 carNo lower= = upper lower 1 :1Class upper = *= * regId :integer Property Car regId : Property Car : Class type = type = integer upper= = upper 1 1 type==1 =1 integer lower =integer type owns : Reference lower owns : Reference lower lower upper =0=1 0 upper ==1 lower = 0 lower = 0 upper upper = 1= 1 upper= = upper ** regId : Property regId : Property Car : Class Car : Class Person : Class Person : Class
XX
XX
(a)(a)
type = integer = integer type lower = 0 =0 lower upper = 1 =1 upper
(a) (a)Versioning (a) State-based
3
Version Version 3 3
Person : Class Person : Class
: Class Person : Class owns : Reference owns :Person Reference lower lower = 0= 0 upper upper = *= *
name : Property name : Property type= string = string type Version 3 lower lower = 1= 1 Version upper upper = 1= 1 name : Property name : Property
3
type= string = string type regId : Property regId : Property lower lower = 1= 1 type= integer = integer owns : Reference type upper owns : Reference upper = 1= 1 lower lower = 1= 1 Car lower 0 lower 0:=Class Car :=Class upper upper = 1= 1 upper upper = *= * regId : Property regId : Property
(b)(b) typetype= integer = integer
: Class CarCar : Class
lower lower = 1= 1 upper upper = 1= 1
(b)(b) (b) Operation-based Versioning
1 1
Figure 2.4: Graph-based Versioning Example 1 1
concurrent change of the same line and, hence, a conflict is raised. Furthermore, the reference added in Version 1 refers to class Vehicle, which does not exist in the merged version anymore since it has been renamed in Version 2. We may summarize that text- and state-based merging approaches provide a reasonable support for versioning software artifacts. They are easy to apply and work for every kind of flat text file irrespectively of the used language. However, erroneous merge results may occur and several “unnecessary” conflicts might be raised. The overall quality strongly depends on the textual syntax. Merging textual languages with a strict syntactic structure (such as XML) might be more appropriate than merging languages which mix several properties of potentially independent concepts into one line. The latter might cause tedious manual conflict and error resolution. One major problem in the merged example resulting from text-based and state-based approaches is the wrong reference target (line 3 in Version 1) caused by the concurrent rename of Vehicle. Operation-based approaches (such as MolhadoRef) solve such an issue by incorporating knowledge on applied refactorings in the merge. Since a rename is a refactoring, MolhadoRef would be aware of the rename and resolve the issue by re-applying the rename after a traditional merge is done. The result of this merge is shown in Figure 2.3b. Graph-based versioning. Applying the merge on top of the graph-based representation depicted in Figure 2.2 may also significantly improve the merge result because the representation used for merging is a node in a graph which more precisely represents the versioned software artifact. However, as already mentioned, this advantage comes at the price of language dependence because merging operates either on the language specific graph-based representation or a 24
translation of a language to a generic graph-based structure must be available. Graph- and statebased approaches additionally require a match function for finding corresponding nodes and a difference function for explicating the differences between matched nodes. The preciseness of the match function significantly influences the quality of the overall merge. Assume matching is based on name and structure heuristics for the example in Figure 2.2. Given this assumption, the class Human may be matched since it contains an unchanged property name. Therefore, renaming the class Human to Person can be merged without user intervention. However, heuristically matching the class Vehicle might be more challenging because both the class and its contained property have been renamed. If the match does not identify the correspondence between Vehicle and Car, Vehicle and its contained property carNo is considered to be removed and Car is assumed to be added in Version 2. Consequently, a delete-update conflict is reported for the change of the lower bound of the property carNo in Version 1. Also the added reference owns refers to a removed class which might be reported as conflict. This type of conflict is referred to as delete-use or delete-reference in literature [TELW10, Wes10]. If, in contrast, the match relies on unique identifiers, the nodes can soundly be matched. Based on this precise match, the state-based merge component can resolve this issue and the added reference owns correctly refers to the renamed class Car in the merged version. However, the concurrent modification of the property carNo (name and lower) might still be a problem because purely state-based approaches usually take either the entire element from either the left or the right version to construct the merged version. Some state-based approaches solve this issue by conducting a more fine-grained difference function to identify the detailed differences between two elements. If these differences are not overlapping—as in our example—they can both be applied to the merged element. The result of a graph-based and state-based merge without taking identifiers into account is visualized in Figure 2.4a. Purely graph- and operation-based approaches are capable of automatically merging the presented example (cf. Figure 2.4b). Between Version 0 and Version 1, three operations have been recorded, namely the rename of Human, the addition of the reference owns and the update concerning the lower bound of carNo. To get Version 2 from Version 0, class Vehicle and property carNo have been renamed. All these atomic operations do not interfere, i.e., they are commutative, and therefore, they all can be re-applied to Version 0 in order to obtain a correctly merged version. To sum up, a lot of research activity during the last decades in the domain of traditional source code versioning has lead to significant results. Approaches for merging software models draw a lot of inspiration from previous works in the area of source code merging. Especially graph-based approaches for source code merging form the foundation for model versioning. However, one major challenge still remains an open problem. The same trade-off as in traditional source code merging has to be made regarding editor- and language-independence versus preciseness and completeness. Model matching, comparison and merging, as discussed above, can significantly be improved by incorporating knowledge on the used modeling language, as well as language-specific composite operations, such as refactorings. On the other hand, model versioning approaches are also forced to support several languages at the same time, because even in small MDE projects several modeling languages are usually combined. Thus, a generic infrastructure, which may be adapted for several modeling languages is as valuable, but it is 25
challenging to design such an infrastructure.
2.1.2
State of the Art in Model Versioning
In the previous section, general versioning concepts have been introduced without putting special emphasis on model versioning. These general concepts, being the result of extensive research efforts of the past thirty years, constitute the basics for dedicated graph-based model versioning systems, which emerged more recently. In this section, we focus on the state of the art in model versioning and survey existing approaches in this area. Features of Model Versioning Approaches In the following, we surveying techniques applied for detecting operations applied between two versions of a model, as well as on the techniques used for detecting conflicts among those operations. Furthermore, we reveal whether these approaches are specifically tailored to a certain modeling language or whether they are generic, in the sense that they are applicable for all modeling languages that are defined in terms of a common meta-metamodel. If they are generic, we further investigate their adaptability to language-specific aspects. Particularly, we consider the following features. Flexibility concerning the modeling language. This feature indicates whether model versioning systems are tailored to a specific modeling language and, therefore, are only usable for one modeling language, or whether they are generic and, therefore, support all modeling languages defined by a common meta-metamodel. Flexibility concerning the modeling editor. Model versioning systems may be designed to work only in combination with a specific editor or modeling environment. This usually applies to approaches using operation recording. In contrast, model versioning systems may avoid such a dependency and refrain from relying on specific modeling environments by only operating on the evolved models put under version control. Operation recording versus model differencing. As already introduced in Section 2.1.1, we may distinguish between approaches that obtain operations performed between two versions of a model by applying operation recording or by model differencing. If an approach applies model differencing, which is, in general, more flexible concerning the adopted modeling editors, it is substantial to consider the techniques conducted in the match function for identifying corresponding model elements because the quality of the match is crucial for an accurate subsequent operation detection. We may distinguish between match functions that rely on universally unique IDs (UUIDs), and those applying heuristics based on the model element’s content (i.e., feature values and contained child elements). Relying on UUIDs, even intensively modified model elements can still be matched very efficiently. However, relying on UUIDs only, model elements that have been concurrently added by two developers will obviously not have a comparable UUID although they potentially should be identified as corresponding when considering their contents. The same applies to deleted and newly added model elements having the same 26
content as the beforehand deleted model element. When completely neglecting the contents of model elements, which is the case when only UUIDs are used, important matches might be missed. Thus, it would be beneficial to combine UUID-based and content-based matching. To summarize, we distinguish between operation recording, model differencing, and in case modeling differencing is applied, whether UUIDs, content-based heuristics or both are used for detecting corresponding model elements. Composite operation detection. The knowledge on applied composite operations is the prerequisite for considering them in the merge process. Therefore, it is a distinguished feature whether an operation detection component is also capable of detecting applications of composite operations besides only identifying atomic operations. It is worth noting that, in case of model differencing, the state-based a posteriori detection of composite operation applications is highly challenging as stated in Section 6 of [DCMJ06]. Adaptability of the operation detection. Obviously, generic operation detection approaches are, in general, more flexible than language-specific approaches because it is very likely that several modeling languages are concurrently applied even within one project and, therefore, should be supported by one model versioning system. However, neglecting language-specific aspects in the operation detection phase might lead to a lower quality of the detected set of applied operations. Therefore, we investigate whether generic operation detection approaches are adaptable to language-specific aspects. In particular, we consider the adaptability concerning language-specific match rules, as well to specify language-specific composite operations to be detected in the operation detection approaches under consideration. Detection of conflicts between atomic operations. One key feature of model versioning systems is, of course, their ability to detect conflicts arising from contradictory operations applied by two developers in parallel. Consequently, we first investigate whether the approaches under consideration are capable of detecting conflicts between contradictory atomic operations. Such conflicts occur between two atomic operations, for instance, if one developer updates a feature value of a model element whereas the other developer concurrently deletes the same model element. This type of conflict is often referred to as delete-update conflict in literature [BKL+ 11a, TELW10, Wes10]. Also some other types of conflicts between atomic operations have been introduced in literature, such as update-update conflicts and delete-use conflicts. In this survey, we do not precisely examine which types of conflicts are supported. We rather investigate whether conflicts arising from contradictory atomic operations are considered at all. Detection of conflicts caused by composite operations. Besides conflicts caused by contradicting atomic operations, conflicts might also occur if a composite operation applied by one developer is not applicable anymore, after the concurrent operations of another developer have been performed. Such a conflict occurs if a concurrent operation causes the preconditions of an applied composite operation to fail. Therefore, we investigate whether the investigated model versioning approaches adequately consider composite operations in their conflict detection phase. 27
Detection of state-based conflicts. Besides conflicts caused by operations (atomic operations and composite operations), a conflict might also occur if the merged model contains errors in terms of the modeling language’s well-formedness and validation rules. Consequently, we examine model versioning approaches under consideration whether they perform a validation of the resulting merged model. Adaptability of the conflict detection. According to the evaluation concerning the adaptability of the operation detection approach in generic model versioning systems, we also review the adaptability to language-specific aspects of the conflict detection approach. This involves techniques to configure language-specific conflict types that can not be covered by a solely generic analysis of the obtained operations. Evaluation Results In this section, we introduce current state-of-the-art model versioning systems and evaluate them on the basis of the features discussed in the previous section. The considered systems and the findings of this survey are summarized in Table 2.1 and discussed in the following. Please note that the order in which we introduce the considered systems is alphabetically and has no further meaning. ADAMS. The “Advanced Artifact Management System” (ADAMS) offers process management functionality, supports cooperation among multiple developers, and provides artifact versioning [DLFOT06]. ADAMS can be integrated via specific plug-ins into modeling environments to realize versioning support for models. In [DLFST09], De Lucia et al. present an ADAMS plug-in for ArgoEclipse7 to enable version support for ArgoUML models. Because artifacts are stored in a proprietary ADAMS-specific format to be handled by the central repository, models have to be converted into that format before they are sent to the server and translated back to the original format, whenever the model is checked out again. ADAMS applies statebased model differencing based on UUIDs. Added model elements, which, as a consequence, have no comparable UUIDs, are matched using simple heuristics based on the element names to find corresponding elements concurrently added by another developer. The differences are computed at the client and sent to the ADAMS server, which finally performs the merge. The ADAMS plug-in for models is specific to a ArgoUML models. A specific translation has to be provided for each supported model type to allow ADAMS to process these models. Interestingly, ADAMS can be customized to a certain extent. For instance, it is possible to customize the unit of comparison; that is, the smallest unit, for which, if concurrently modified, a conflict is raised. In [DLFST09], it is also mention that the conflict detection algorithm may be customized for specific model types with user-defined correlation rules, which specify when two operations should be considered as conflicting. However, it remains unclear, how these rules are exactly specified and how these rules influence the conflict detection. The implementation promoted in this publication is not available to further review this interesting customization feature. Composite operations and state-based conflicts are not supported. 7
28
http://argoeclipse.tigris.org
Modeling Editor
~ ü n/a ~ ~ ~ -
ü ü ü ü ü ü ü ü ü
ü ü ü ü ü ~ ü ü ü ü ü ü ü
Composite
ü ü n/a ü ü ~ ü - n/a n/a n/a ~ ü ~ ü ~ ü ü ü ü ü ~ ü ü ü ü ü ~ ~ ü -
Atomic
Modeling Language
Operation-based Conflicts
Adaptability Match
ü n/a n/a n/a ü ~ ü ü n/a -
Operations
Composite Operations
Model Differencing ü n/a ü ü ü n/a
Adaptability
ü ü -
ü ü n/a ü ü ü ü ü ü ü n/a
State-based Conflicts
n/a ü -
Combination
ADAMS Alanen and Porres Cicchetti et al. CoObRA DSMDiff EMF Compare EMFStore Gerth et al. Mehra et al. Oda and Saeki Odyssey-VCS 2 Ohst, Welle, Kelter RSA SMOVER Westfechtel
Content
OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
Conflict Detection Flexibility
UUID
Operation Recording
Operation Detection
Legend ü Feature applies. ~ Feature partially applies. - Feature does not apply. n/a Not applicable or unknown.
Table 2.1: Evaluation of State-of-the-art Model Versioning Systems
Approach by Alanen and Porres. One of the earliest works on versioning UML models was published by Alanen and Porres [AP03], who presented metamodel independent algorithms for difference calculation, model merging, as well as conflict resolution. They identified seven elementary operation types a developer may perform to modify a model. For calculating the differences between the original version and the modified version, first a match between model elements is computed based on UUIDs. Based on this match, created, deleted, and changed elements are identified. Alanen and Porres provide an algorithm to compute a union of two sets of operations whereas also merging values of ordered features are considered. The proposed algorithms are specific to UML models and do not allow for any customization or treatment of 29
composite operations. Still, their algorithms serve as a fundamental and influential work for many other researchers in the area of model versioning. Approach by Cicchetti, Di Ruscio, and Pierantonio. Cicchetti et al. [CDRP08] present an approach to specify and detect language-specific conflicts arising from parallel modifications. Their work does not address the issue of obtaining differences, but proposes a model-based way of representing them. Howsoever the differences are computed, they are represented by instantiating an automatically generated language-specific difference metamodel. Conflicts are specified by manually created conflict patterns. These conflict patterns are represented in terms of a model of difference elements, which are reported as conflict whenever found in the combination of two difference models. To this end, a hand-crafted set of language-specific conflict patterns, represented as forbidden difference patterns, can be established to create a dedicated conflict detection system. Thereby, the realization of a customizable conflict detection component is possible. The authors also allow to specify reconciliation strategies to specific conflict patterns. Although the authors do not discuss how differences and applications of composite operations are obtained, their approach supports also conflicts caused by composite operations. It seems to be a great deal of work to establish a complete set of conflict patterns for a specific language; nevertheless, in the end, a highly customized conflict detection can be achieved. CoObRA. The Concurrent Object Replication framework CoObRA developed by Schneider et al. [SZN04] realizes optimistic versioning for the UML case tool Fujaba8 . CoObRA records the operations performed on the model elements and stores the recorded operations in a central repository. Whenever other developers update their local models, these operations are fetched from this repository and replayed locally. To identify equal model elements, unique identifiers are used. Conflicting operations are not applied (also the corresponding local change is undone) and finally presented to the user who has to resolve these conflicts manually. In [SZ07], the authors also shortly discuss state-based conflicts in terms of inconsistencies. CoObRA is capable of detecting a small subset of such conflicts when the underlying modeling framework rejects the execution of a certain operation. For example, a class cannot be instantiated anymore if the respective class has been concurrently deleted. However, for instance, concurrent additions of an equally named class is not reported as conflict. The authors also shortly mention composite operations in terms of a set of atomic operations grouped into commands. The operation recording component seems to be capable of grouping atomic operations into commands to allow for a more comprehensible undo mechanism. In particular, one command in the modeling editor might cause several atomic operations in the log; if the user aims to undo the last change, the complete command is undone and not only the latest atomic change. In their papers, however, no special treatment of these commands in the merge process is mentioned. DSMDiff. In [LGJ07], the authors pointed out the urgent need for language-independent model differencing when domain-specific modeling languages are adopted. Therefore, a metamodelindependent differencing tool, named DSMDiff, is proposed, which makes no assumptions on 8
30
http://www.fujaba.de
the editors used for modifying the models. To also allow for comparing models that are not subsequent versions, the proposed algorithm refrains from relying UUIDs. Instead, correspondences between model elements are obtained from signature and structural matching. Having obtained the corresponding model elements, the differences are computed by traversing through the model and comparing the corresponding elements with each other. DSMDiff supports only a two-way comparison and is, consequently, not directly designed to detect merge conflicts. DSMDiff is tailored to be completely generic: it’s applied heuristics work for all domain-specific languages, but cannot be adapted with language-specific match rules. Furthermore, it does not support detecting applications of composite operations. EMF Compare. The open-source model comparison framework EMF Compare [BP08], which is part of the Eclipse Modeling Framework Technology (EMFT) project9 , supports generic model comparison and model merging. EMF Compare provides two-way and three-way model comparison algorithms for EMF-based models. As for instance with DSMDiff, EMF Compare’s model comparison algorithm consists of two phases, a matching phase and a differencing phase. The matching phase aims at establishing one-to-one correspondences between model elements in the original model and the revised models. For this, EMF Compare supports either UUIDbased matching or content-based matching, which applies a combination of four heuristics: type, name, value, and relationship similarity. However, the combination of UUIDs and heuristics are not directly supported. Based on the established match, the differencing phase computes the differences between all corresponding model elements. The model element correspondences and differences are represented by a match model and a difference model, respectively. Additionally, EMF Compare provides a merge service, which is capable of applying difference elements in a difference model to allow for merging models. It also offers basic conflict detection capabilities and user interfaces for displaying match and difference models. All these features of EMF Compare are generic; consequently, they can be applied to any EMF-based model irrespectively of the modeling language these models conform to. However, EMF Compare can be extended programmatically for language-specific matching and differencing. Thus, it is not adaptable in the sense that it can be easily configured for a specific language, but it constitutes a programmatically extensible framework for all tasks related to model comparison. EMFStore. The model repository EMFStore, presented by Koegel et al. [KHWH10], has been initially developed as part of the Unicase10 project and provides a dedicated framework for model versioning of EMF models. After a copy of a model is checked out, all operations applied to this copy are tracked by the modeling environment. Once all modifications are done, the recorded operations are committed to a central repository. For recording the operations, a framework called Operation Recorder [HK10] is used. This framework exploits the the EContentAdapter and the EMF Command Framework for listening and saving all applied operations. Thereby, modifications performed in every EMF-based editor can be recorded. Also transactions (i.e., a series of dependent operations) can be tracked and grouped accordingly. Having two lists of the recorded operations, in particular, the list of uncommitted local operations and 9 10
http://www.eclipse.org/modeling/emft http://www.unicase.org
31
the list of new operations on the server since the last update, relationships among those operations are established, in particular, the requires relationship and the conflicts relationship. The former relationship expresses dependencies between operations, the later indicates contradicting modifications. As the exact calculation of these relationships requires expensive computations, heuristics are applied to obtain an approximation for setting up those relationships. The conflict detection component classifies two operations as conflicting, if the same attribute or the same reference is modified. Furthermore, the authors introduce levels of severity to classify conflicts. They distinguish between hard conflicts and soft conflicts referring to the amount of user support necessary for their resolution. Whereas hard conflicts do not allow including both conflicting operations within the merged model, for soft conflicts this is possible (with the danger of obtaining an inconsistent model). Summarizing, EMFStore is completely operation-based; that is, the actual model states are never considered for detecting conflicts. This also entails that a removed and subsequently re-added model element is treated as a new model element so that all concurrent operations to the previously removed element are reported as conflict. Composite operations can be recorded and saved accordingly. In the conflict detection, however, composite operations are not specifically treated. If an atomic change within a composite operation conflicts with another change, the complete transaction is indeed marked as conflicting; the intentions behind composite operations, as well as potentially violated preconditions of composite operations are not specifically considered.
Approach by Gerth et al. Gerth et al. [GKLE10] propose a conflict detection approach specifically tailored to the business process modeling language (BPMN) [OMG09]. To identify the differences between two process models (cf. [KGFE08]), in a first step, a mapping between corresponding elements across two versions of a process model is computed based on UUIDs which are attached to each element. In the next step, for each element that has no corresponding counterpart in the opposite version, a operation is created representing the addition or deletion. The resulting operations are specific to the type of the added or deleted element (e.g., InsertAction or DeleteFragment). Finally, this list of operations is hierarchically structured according to the fragment hierarchy of the process model in order to group those atomic operations into so-called compound operations. Consequently, these compound changes group several atomic operations into composite additions or deletions. Having identified all differences in terms of operations between two process models, syntactic, as well as semantic conflicts among those concurrent operations can be identified using a term formalization of process models. According to their definitions, a syntactic conflict occurs if an operation is not applicable after another operation has been performed. A semantic conflict is at hand whenever two operations modify the same elements so that the process models are not “trace equivalent”; that is, all possible traces of a process model are not exactly equal. Obviously, rich knowledge on the operational semantics of process models has to be encoded in the conflict detection to be able to reveal semantic conflicts. Although the authors presented an efficient way of detecting such conflicts, no possibility to adapt the operation detection and conflict detection mechanisms to other languages is foreseen. 32
Approach by Mehra, Grundy, and Hosking. The publication by Mehra et al. [MGH05] mainly focuses on the graphical visualization of differences between versions of a diagram. Therefore, they provide a plug-in for the meta-CASE tool Pounamu, a tool for the specification and generation of multi-view design editors. The diagrams created with this tool are serialized in XMI and are converted into an object graph for comparison. In their proposed comparison algorithm, the differences are obtained by applying a state-based model differencing algorithm, which uses UUIDs to map corresponding model elements. The obtained differences are translated to Pounamu editing events, which are events corresponding to the actions performed by users within the modeling environment. Differences cover not only modifications performed on the model, but also modifications performed on the graphical visualization. The differences between various versions are visualized in the concrete syntax so that developers may directly accept or reject modifications on top of the graphical representation developers are familiar with. In their works, also conflict detection facilities are shortly mentioned. However, this aspect seems not to be the primary focus of the approach and, consequently, is not elaborated in more detail. Composite operations are not considered at all. Approach by Oda and Saeki. Oda and Saeki [OS05] propose to also generate versioning features along with the modeling editor generated from a specified metamodel as known from metamodeling tools. The generated versioning-aware modeling editors are capable of recording all operations applied by the users. In particular, the generated tool records operations to the logical model (i.e., the abstract syntax tree of a model), as well as the diagram’s layout information (i.e., the concrete syntax). Besides recording, the generated modeling tool includes check in, check out, and update operations to interact with a central model repository. It is worth noting that only the change sequences are sent to the repository and not the complete model state. In case a model has been concurrently modified and, therefore, needs to be merged, conflicts are identified by re-applying all recorded operations to the common ancestor version. Before each change is performed in the course of merging, its precondition is checked. In particular, the precondition of each change is that the modified model element must exist. Thereby, delete-update conflicts can be identified. Update-update conflicts, however, remain unrevealed and, consequently, the values in the resulting merged model might depend on the order in which the recorded updates are applied because one update might overwrite another previous update. Composite operations and their specific preconditions are not particularly regarded while merging. The approach also does not enable to specify additional language-specific conflicts. Although metamodel violations can, in general, be checked in their tool, they are not particularly considered in the merge process. Because the versioning tool is generated from a specific metamodel, the generated tool is language dependent; the approach in general, however, is independent from the modeling language. However, the approach obviously forces users to use the generated modeling editor to be able to use their versioning system. Odyssey-VCS 2. The version control system Odyssey-VCS by Oliveira et al. [OMW05] is dedicated to versioning UML models. Operations between two versions of a model are identified by applying state-based model differencing relying on UUIDs for finding corresponding model elements. Language-specific heuristics for the match functions may not be used. Also 33
language-specific composite operations are neglected. Interestingly, however, for each project, so-called behavior descriptors may be specified, which define how each model element should be treated during the versioning process. Consequently, the conflict detection component of Odyssey-VCS is adaptable, in particular, it may be specified which model elements should be considered to be atomic. If an atomic element is changed in two different ways at the same time, a conflict is raised. These behavior descriptors (i.e., adaptations) are expressed in XML configuration files. Thus, Odyssey-VCS is customizable for different projects concerning the unit of comparison, as well as whether to apply pessimistic or optimistic versioning. Conflicts coming from language-specific operations, as well as additional language-specific conflicts, however, may not be configured. Odyssey-VCS may be used either with a standalone client or with arbitrary modeling tools. More recently, Odyssey-VCS 2 [MCPW08] has been published, which is capable of processing any EMF-based models and not only UML models. A validation of the resulting merged model is not considered. Approach by Ohst, Welle, and Kelter. Within the proposed merge algorithm, also Ohst et al. [OWK03] put special emphasis on the visualization of the differences. Therefore, differences between the model as well as the layout of the diagram are computed by applying a state-based model differencing algorithm relying on UUIDs. Conflict detection, however, is not discussed in detail; only update-update and delete-update conflicts are shortly considered. After obtaining the differences, a preview is provided to the user, which visualizes all modifications, even if they are conflicting. The preview diagram can also be modified and, therefore, allows users to resolve easily conflicts using the concrete syntax of a diagram. For indicating the modifications, the different model versions are shown in a unified document containing the common parts, the automatically merged parts, as well as the conflicts. For distinguishing the different model versions, coloring techniques are used. In the case of delete-update conflicts, the deleted model element is crossed out and decorated with a warning symbol to indicate the modification. IBM Rational Software Architect (RSA). The Eclipse-based modeling environment RSA11 is a UML modeling environment built upon the Eclipse Modeling Framework. Under the surface, it uses an adapted version of EMF Compare for UML models by offering more sophisticated views on the match and difference models for merging. These views show the differences and conflicts in the graphical syntax of the models. The differencing and conflict detection capabilities are, however, equal to those that are offered by EMF Compare, besides that RSA additionally runs a model validation against the merged version and, in case an validation rule is violated, the invalid parts of the model are graphically indicated. SMOVER. The semantically-enhanced model versioning system by Reiter et al. [RAB+ 07], called SMOVER, aims at reducing the number of falsely detected conflicts resulting from syntactic variations of semantically equal modeling concepts. Furthermore, additional conflicts shall be identified by incorporating knowledge on the modeling language’s semantics. This knowledge is encoded by the means of model transformations which rewrite a given model to so-called 11
34
http://www.ibm.com/developerworks/rational/library/05/712_comp/index.html
semantic views. These semantic views provide a canonical representations of the model, which makes certain aspects of the modeling language more explicit. Consequently, also potential semantic conflicts might be identified when the semantic view representations of two concurrently evolved versions are compared. It is worth noting that the system itself is independent from the modeling language and language-specific semantic views can be configured to adapt the system to a specific modeling language. The differences are identified using a state-based model differencing algorithm based on UUIDs. Therefore, the system is independent of the used modeling editor. However, this differencing can not be adapted to specific modeling languages and only works in a generic manner. SMOVER also only addresses detecting conflicts regarding the semantics of a model and does not cover syntactic operation-based conflicts. Approach by Westfechtel. Recently, Westfechtel [Wes10] presented a formal approach for merging EMF models. Although no implementation of his work is available, it provides welldefined conflict rules based on set-theoretical conflict definitions. In this paper, Westfechtel does not address the issue of identifying differences between model versions and rather focuses on conflict detection only and assumes the presence of change-based differences that can be obtained by, for instance, EMF Compare. Westfechtel’s approach is directly tailored to EMF models and defines context-free merge rules and context-sensitive merge rules. Context-free merge rules determine “the set of objects that should be included into the merged versions and consider each feature of each object without taking the context [i.e., relationships to other objects] into account“ [Wes10]. The presented algorithm also supports merging of ordered features and specifies when to raise update-update conflicts. In contrast to context-free merging, context-sensitive merge rules also consider containment conflicts, delete conflicts, and reference conflicts. Containment conflicts occur, in particular, if an object in the merged model has no unique container, or if the merged model comprises cyclic containment structures, or if a dangling object (i.e., an object having no parent except for the root object) exists. Delete conflicts occur if an object has been deleted and concurrently modified, or if an object has been deleted and concurrently added as a reference value in another object, or if an object has been deleted and concurrently moved. Finally, reference conflicts concern inconsistent operations to bi-directional references. Besides these conflicts, Westfechtel also addresses state-based conflicts arising from the well-formedness rules of EMF. However, no techniques that enable further language-specific constraints are discussed. Moreover, he only addresses conflicts among atomic operations and is not adaptable to language-specific knowledge. Summary After surveying existing model versioning approaches, we may conclude that the predominant strategy is to apply state-based model differencing and generic model versioning. The majority of model differencing approaches rely on UUIDs for matching. However, only ADAMS combines UUIDs and (very simple) content-based heuristics. The detection of applications of composite operations is only supported by approaches applying operation recording. The only approach that is capable of detecting composite operations by using a state-based model comparison approach is Gerth et al.; however, their approach is specifically tailored to process models and the supported composite operations are limited to compound additions and deletions. 35
Consequently, none of the surveyed generic approaches is capable of detecting applications of more complex composite operations having well-defined pre- and postconditions without directly recording their application in the editor. Furthermore, none of the approaches are adaptable in terms of additional match rules or composite operation specifications. EMF Compare and EMFStore foresee at least an interface to be implemented in order to extend the set of detectable applications of composite operations. In EMF Compare, however, the detection algorithm has to be provided by an own implementation. In EMFStore, additional commands may be plugged into the modeling editor programmatically for enabling EMFStore to record them. Obviously, all model versioning approaches provide detection capabilities for conflicts caused by two concurrent atomic operations. Unfortunately, most of them lack a detailed definition or at least a publicly available implementation. Therefore, we could not evaluate which types of conflicts can actually be detected by the respective approaches. In this regard, we may highlight Alanen and Porres, EMF Compare, EMFStore, Gerth et al., and Westfechtel. These either clearly specify their conflict detection rules in their publications or publish their detection capabilities in terms of a publicly available implementation. Only Cicchetti et al. and Gerth et al. truly consider composite operations in their conflict detection components. However, in the case of Cicchetti et al., all potentially occurring conflict patterns in the context of composite operations have to specified manually. It is not possible to derive automatically the conflict detection capabilities regarding composite operations from the specifications of such operations. The approach by Gerth et al. is, as already mentioned, tailored to specific modeling language and only supports rather simple composite operations. EMFStore partially respects composite operations. More precisely, if a conflict between two atomic operations is revealed and one atomic operation is part of a composite operation, the complete composite operation is reverted. However, additional preconditions of composite operations are not considered. None of the surveyed approaches aims at respecting the original intention behind the composite operation; that is, incorporating concurrently changed or added elements in the re-application of the composite operation when creating the merged version. State-based conflicts have not gained much attention in the model versioning community yet. CoObRA is capable of detecting at least a subset of all potentially occurring violations of the modeling language’s rules. Westfechtel only addresses the basic well-formedness rules coming from EMF, such as spanning containment tree. Only Gerth et al., Oda and Saeki, and the RSA perform a full validation after merging. Most of the proposed conflict detection approaches are not adaptable by the user. ADAMS and Odyssey-VCS provide some basic configuration possibilities such as changing the unit of comparison. EMF Compare can be programmatically extended to attach additional conflict detection implementations. Only Cicchetti et al. and SMOVER allow to plug in language-specific artifacts to enable revealing additional conflicts. However, in the approach by Cicchetti et al., the conflict patterns have to be manually created in terms of object models, which seems to be a great deal of work requiring deep understanding of the underlying metamodel. Due to the lack of a public implementation, it is hard to evaluate the ease of use and the scalability of this approach. SMOVER allows to provide a mapping of a model to a semantic view in order to enable the detection of semantically equivalent or contradicting parts of a model. The comparison and conflict detection algorithm that is applied to the semantic views, however, is not adaptable. 36
Consequently, SMOVER only aims to detect a very specific subset of conflicts and can be seen as orthogonal to existing model versioning systems.
2.2
Software Adaptation
This thesis proposes an adaptable model versioning system that provides extension points to be used for adapting the system’s behaviour to specific modeling languages, as well as to recurrently applied composite operations being considered in the merge process. Therefore, we consider existing work in the domain of software adaptation. Software adaptation, however, is a large research domain on its own; thus, we provide only a brief overview on the terminology and basic concepts in this section. Although extensive research in the domain of requirements engineering has lead to welldefined systematic processes to determining efficiently and precisely the needs of potential users, it is impossible to anticipate fully the requirements of all different future users and to foresee every potential change in the environment in which the software operates. As a consequence, approaches are needed to adapt the behaviour of software systems as efficiently as possible. The term adaptation is defined by the Merriam-Webster dictionary12 as the adjustment to environmental conditions or a modification of an organism that improves its fitness under the conditions of its environment. Correspondingly, in the domain of software systems, adaptation refers to the modification of a system to satisfy new requirements and changing circumstances [TMD09]. We also refer to [And05] for a detailed discussion of the meanings of the terms “adaptability”, “adaptation”, and “flexibility”. The reasons why a system has to be adopted are manifold. By the adaption of a software one may realize corrective changes, such as, for instance, bug fixes, a modification to the functional requirements such as, adding new or changing existing features, changes to the non-functional properties of a system, or improvements concerning changed operating environments [TMD09]. According to Oppermann et al. [Opp05], we may distinguish between adaptive and adaptable systems, which are complementary to each other. Adaptivity refers to the ability of an adaptive system to itself adapt automatically and autonomously to changing conditions, which is also referred to as self-adaptation [CdLG+ 09]. In contrast, adaptability refers to the ability of an adaptable system to be actively changed by its stakeholders in order to improve its functioning for specific use cases or environments. Oppermann et al. [OR97] describes the whole spectrum from adaptive to adaptable systems as depicted in Figure 2.5. This spectrum ranges from adaptive systems, in which the stakeholder has no control over performed adaptations, via systems, in which stakeholders may choose from a set of suggested adaptations through to a adaptable system, in which a stakeholder has to initiate actively the adaptation on her own.
2.2.1
Adaptive Systems
Adaptive systems are capable of adjusting their behaviour in response to their perception of the environment and the system itself [CdLG+ 09]. The concept of adaptivity has been applied in many research domains, such as adaptive user interfaces, autonomic computing, embed12
http://www.merriam-webster.com/dictionary/adaptation
37
There have been many attempts in the last decade to include user models and adaptation features within systems with a view to improve the correspondence between user, task and system characteristics and increase the user’s efficiency. Two kinds of systems have been developed for supporting the user in his/her tasks. Systems that allow the user to change certain system parameters and adapt their behaviour accordingly are called adaptable. Systems that adapt to the users automatically based on the system’s assumptions about user needs are called adaptive (Oppermann, 1994). The whole spectrum of concept of adaptation in computer systems is shown in figure 1. Adaptive
Adaptable
System initiated System initiated adaptivity (No user adaptivity with precontrol) information to the user about the changes
User selection of adaptation from system suggested features
User desired adaptability supported by tools (and performed by the system
User initiated adaptability (No system initiation)
Figure 1. Spectrum adaptation in by computer systems [OR97] Figure 2.5: Spectrum ofofAdaptation Oppermann
The next section discusses the main objectives of learning systems and describes how the adaptation can be useful to complement their objectives. This is followed by a comparative study of learning systems and office application systems from the ded systems, autonomous architectures. As stated byfor Cheng adaptation point of robots, view. and Thisservice-oriented comparison provides the background the et al.
in [CdLG+ 09], there is a lack of consensus among researchers and practitioners on the points of variation among adaptive systems. Therefore, Cheng et al. identified four variation points referred to as modeling dimensions, which are shortly described in the following. Adaptive systems may vary, firstly, in terms of goals that they aim to achieve. Goals can either refer to self-adaptability aspects, or to the middleware, or the infrastructure that supports the adaptive system. Secondly, systems may vary regarding their cause of adaptation. These causes may, for instance, be the actors interacting with the system, the environment in which the system operates, or properties of the system itself. Thirdly, adaptive systems may differ in the mechanism used to react; that is, the adaptation process itself. Finally, Cheng et al. also introduces the variation point regarding the effects or impact of the adaptation upon the system. In the context of this thesis, especially model-based self-adaption is an interesting research field. The term models at runtime refer to software models that are used to reason about the operating environment and runtime behaviour of a software system [BBFJ11, Nie11]. These models aim to represent abstractions of runtime phenomena, such as resource efficiency, context dependency, as well as personalization of systems. By taking advantage of these abstractions of runtime information, runtime decisions can be facilitated and better automated. Thus, runtime models may play an integral role in the management of self-adaptive systems. For more information on this research topic, we kindly refer to the yearly workshop
[email protected] . Self-adaptation and adaptivity being model-based or not, however, is beyond the scope of this thesis. We rather aim at providing a set of extension points that can be utilized by stakeholders to adapt the behaviour of the model versioning system according to their needs. Therefore, the approaches proposed in this thesis may rather be ascribed to adaptable systems according to the classification by [OR97] depicted in Figure 2.5.
2.2.2
Adaptable Systems
Adaptable systems enable their stakeholders to modify actively the system’s behaviour. Thus, after recognizing the need for a modification of an existing system, its stakeholders initiate the adaptation to improve or extend the system for addressing the stakeholder’s specific requirements. Adaptable systems may vary in terms of adaptation time, adaptation transparency, and adaptation technique. 13
38
http://www.comp.lancs.ac.uk/~bencomo/WorkshopMRT.html
Adaptation time. Taylor et al. [TMD09] distinguish between offline and online adaptations. The former concerns systems that are taken offline before they can be changed and which are, eventually, restarted or re-installed again after they have been adapted. Obviously, there are many scenarios in which offline adaptation is infeasible; for instance, non-stop systems, such as web services that have to run 24/7 or systems that, when restarted, loose (mental) context that cannot be saved and recreated during maintenance. Another scenario in which offline adaptations are infeasible concerns systems that are difficult to reinstall, such as software in automobiles. Therefore, these systems have to be adaptable at run time, or during they are online. Gschwind [Gsc02] further distinguishes between design-time, compile-time, and run-time adaptations.
Adaptation transparency. The transparency of an adaptation classifies adaptable systems according to how much an adaptation has to “know” about the system being adapted. In [Gsc02], Gschwind distinguishes between black-box, gray-box, and white-box adaptations. Black-box adaptations are not aware of the actual implementation of the system being adapted. Hence, it only interacts with interfaces or abstract definitions of the system. In the case of gray-box adaptations, the user, who performing the adaptation, does not have to understand the implementation of the system. However, the toolkits and compilers that actually perform the adaptations must be able to access the implementation of the system being adapted in order to directly modify the implementation or to use knowledge on the implementation for further optimizations. Whitebox adaptations refer to cases in which the user undertaking the adaptation has to know the implementation in order to be able to adapt it.
Adaptation techniques. Besides the adaptation time and transparency, we may also categorize adaptable systems according to the applied adaptation technique; that is, how an adaptation is specified and deployed. These techniques differ regarding the level of abstraction and the degree of automation. In the following, we discuss some techniques for adapting software systems. Please note that this list is not intended to be complete. We rather aim to provide a brief overview on different adaptation approaches. Object-oriented design [Boo90, Mey88] follows the principle of design for change [Dij82, Par79], which enables developers to structure their software in a way to minimize the impact of future changes. Object-oriented design offers, among others, three fundamental concepts to ease future adaptations: information hiding, inheritance, and composition. Information hiding protects values that are intended to be only used by one class and, in combination with an interface concept, decouples two dependent implementations. As a result, changing one class has mostly no impact on classes using the class that is changed. Inheritance is a way of reusing, extending, or, in combination with polymorphism, altering the implementation invoked by other classes. Finally, composition allows to compose more complex objects from several other objects and, thereby, compose their behaviour. In this context, design patterns [GHJV95], such as the Abstract Factory Pattern or the Strategy Pattern, provide a solution to more flexibly change the behaviour of a system. Although object-oriented design allows to structure a software to be more easily changed in future, the implementation must be accessible and known to the 39
developer aiming to adapt the system. Thus, object-oriented design per se enables white-box adaptation at design time. Architectural styles, which are the more coarse-grained counterpart to design patterns, may also foster the ability of a system’s behaviour to be adapted. Taylor et al. [TMO09], proposed a conceptual framework called BASE for evaluating, comparing, and combining techniques for run-time adaption of a software system based on architectural styles. This framework differentiates techniques based upon the following four aspects of adaption: (i) the behaviour aspect, which specifies how the behavioural specification is changed, (ii) the asynchrony aspect, which indicates whether a system can continue to run while the behaviour is changed, (iii) the state aspect, which specifies how the current state of a system is changed, and finally, (iv) the execution context aspect, which concerns the influence of the adaptation on the execution runtime (e.g., the virtual machine). Popular architectural styles that have been described by the aforementioned framework, are among others, pipes and filters [SG96], the event notification architecture [Rei90], and the service-oriented architecture [Pap03] (SOA). Most of these architectural styles allow for run-time adaptation (e.g., SOA) and offer white-box adaptation; that is, they do not force users to know the internal implementation in order to adapt the system. Frameworks [FS97] offer generic functionality in the context of a specific application that can be selectively extended by client code. Frameworks usually provide an application programming interface (API) with which client code communicates. In this regard, frameworks are also one very common way of realizing an adaptable software. Unlike software libraries, frameworks should dictate the control flow and call client code [Rie00] and not vice versa. This paradigm is referred to as Inversion of Control [FS97]. Clients may choose which functionality they want to extend by instantiating the respective part of the framework. Many frameworks offer a default behaviour for parts that have not been overwritten by client code. Usually, frameworks offer a white-box adaptation because only the API has to be known to the users instantiating the framework. However, frameworks traditionally allow for design-time adaptations only. Component-based Software Engineering [HC01,KB98] (CBSE) enables software adaptation on a more coarse-grained level than with object-oriented design. Thereby, the goal of CBSE is to glue prefabricated components together to construct a new software system. Every component has well-defined interfaces through which components interact with each other. The actual component’s implementation is completely hidden from other components. Consequently, components may easily be exchanged by other components having compatible interfaces as the component to be removed. This enables the adaptation of a software system by exchanging components. One popular framework realizing the component-based architecture is the Open Services Gateway Initiative Framework [OSG03] (OSGi). In OSGi, a component, called bundle, can register its services in a central service registry. Bundles may be deployed and exchanged at run time to allow for online adaptation. As bundles only interact with the interfaces of other bundles, CBSE can be considered as black-box adaptation technique. Aspect-Oriented Software Development (AOSD) is a fairly young but rapidly advancing research field and adopts the idea of separating concerns, which has been originally raised by [Par72]. More precisely, AOSD aims at separating crosscutting concerns from traditional units of decomposition, such as class hierarchies. Crosscutting concerns are concerns that are distributed over several parts of an application. In particular, AOSD represents the convergence 40
of several approaches, such as adaptive programming [Lie96], composition filters [ABV92], subject-oriented programming [HO93], multi-dimensional separation of concerns [TOHS99], and aspect-oriented programming [KLM+ 97]. Although the primary goal of all of them is to allow for separating crosscutting concerns, all these approaches may easily be used to adapt existing software systems. For instance, if a specific behaviour of an existing system shall be adapted, developers may configure a pointcut, which specifies a join point at the place at which the behaviour to be adapted is located. Whenever the system execution reaches the join point, the additional code, called advice, specified in the pointcut is executed. In many AOSD frameworks, a specific compiler is needed that weaves the advices into the join points. Hence, only design-time adaptation is possible. However, using other techniques, such as proxy objects, also run-time adaptation can be realized. Anyway, adapting software systems using aspects, the original code of the software system being adapted must be available and known to developer; consequently, AOSD only enables white-box adaptation. Configuration of a software system is heavily used in practice to influence a system’s behaviour. Configuration, however, is a very broad term ranging from specifying simple initial settings through to extending an existing software with custom behaviour that is specified using sophisticated scripting languages [Ous98]. Depending on the used configuration language, configuration can be very powerful whereas still no deep knowledge on the implementation of the system being adapted is required (i.e., black-box adaptation). However, developers specifying the adaptation must be aware of the configuration language. Such languages often use a generic textual syntax such as XML [W3C08] or YAML [BKnIN09] and usually are specific to the system being adapted; only the system to be adapted is able to interpret the adaptations. Thus, these languages are comparable to domain-specific modeling languages (DSML) to a certain extent, especially, if the language is defined by a dedicated schema language, such as XML Schema [W3C09], which would correspond, in terms of DSMLs, to a metamodel.
2.3
Model Transformation
The approach proposed in this thesis aims at respecting the importance of composite operations in model versioning by considering their applications during the merge process. Composite operations are, in more general terms, model transformations. Therefore, we discuss the state of the art of model transformation in this section and present existing model transformation languages. One very promising approach for easing the specification of model transformations is model transformation by example (MTBE). In this thesis, we introduce a novel technique called model transformation by demonstration, which can be seen as a special kind of MTBE. Therefore, we also review existing work in MTBE in this section.
2.3.1
Basics of Model Transformations
In general, a model transformation takes a model as input and generates a model as output14 . Mens et al. [MG06] distinguish between two kinds of model transformations. First, there are 14
Also multiple input models and output models may be possible, but in the scope of this thesis, such settings are not considered.
41
exogenous transformations, which are also referred to as model-to-model transformations or outplace transformations. In such transformations, the source and target metamodels are distinct, as for instance, a transformation from UML class diagrams to ER models [Che76]. Second, there are endogenous transformations, which are also referred to as in-place transformations, deal with scenarios, in which the source and target metamodels are the same as, for instance, a refactoring of a UML class diagram. In the following, we elaborate on these two kinds in more detail. Exogenous transformations. Exogenous transformations are used both to exploit the constructive nature of models in terms of vertical transformations, thereby changing the level of abstraction and building the bases for code generation, and for horizontal transformation of models that are at the same level of abstraction [MG06]. Horizontal transformations are of specific interest to realize different integration scenarios as, for instance, translating a UML class model into an Entity Relationship (ER) model [Che76]. In vertical and horizontal exogenous transformations, the complete output model has to be built from scratch. Endogenous transformations. In contrast to exogenous transformations, endogenous transformation only rewrite the input model to produce the output model. The first step in such transformations is the identification of model elements to rewrite, and, in the second step, these elements are updated, added, and deleted. Endogenous transformations are applied for different tasks, such as model refactoring, optimization, evolution, and simulation, to name just a few. Model transformation languages. Various model transformation approaches have been proposed in the past decade following different paradigms (cf. [CH06] for a survey). However, mostly they are based on either a mixture of declarative and imperative concepts, such as ATL [JABK08], ETL [KPP08], and RubyTL [CMT06], or on graph transformations, such as AGG [Tae03] and Fujaba [NNZ00], or on relations, such as MTF15 and TGG [AKRS06]. Moreover, the Object Management Group (OMG) has published the model transformation standard QVT [OMG05a]. All approaches describe model transformations by transformation rules using metamodel elements, whereas the rules are executed on the model layer for transforming a source model into a target model. Rules comprise in-patterns and out-patterns. The in-pattern defines when a rule is actually applicable and retrieves the necessary model elements for computing the result of a rule by querying the input model. The out-pattern describes what the effect of a rule is, such as which elements are created, updated, and deleted. All mentioned approaches are based on the abstract syntax of modeling languages only, and the concrete syntax (i.e., the notation) of the modeling language is completely neglected.
2.3.2
Model Transformation By Example
Specifying model transformations with existing model transformation languages requires users to know the respective transformation language. Moreover, users also have to be familiar with the metamodel (i.e., the abstract syntax) of the involved modeling languages. This is because 15
42
http://www.alphaworks.ibm.com/tech/mtf
Configurati Modeling Configuration & Generation
conditions Conditions [revised]
Legend:
Generate transformation
Create source model
Source model
Create target model
Target model
Edit correspondences Metamodel correspondences [revised]
Create model correspondences
Metamodel correspondences [implied]
Manual
Transformation
Automatic
Model correspondences
Imply metamodel correspondences Legend
Generate transformation
Transformation
automatic manual
Figure 2.6: Process of Model Transformation by Example
current model transformation languages are specified using the abstract syntax of a model and not using the concrete syntax, which is, however, the only representation users are more familiar with. Thus, creating such transformations based on the abstract syntax is often complicated and hard to accomplish [BW07, dLV02, SW08, Var06, WMA+ 07]. To address this problem, model transformation by example (MTBE) approaches have been proposed, which follow the same fundamental idea as query by example developed for querying database systems by giving examples of query results [Zlo75] and programming by example for demonstrating actions, which are recorded as replayable macros [Lie01]. Thus, instead of specifying the transformation in terms of rules operating on metamodel concepts, MTBE allows to define transformations using examples represented in the model’s concrete syntax. Consequently, the user’s knowledge on the concrete syntax (i.e., the notation) of a modeling language is sufficient for developing model transformations. During the last five years, various MTBE approaches [BV09,DHN09,GMnGSFF09,KSB08, Var06, WSKK07], have been proposed. In the following, we discuss the general process of specifying model transformations by example, and subsequently, we present an instantiation of this process for transforming UML Class Diagrams to ER Diagrams [Che76]. Finally, we conclude this section by elaborating on the peculiarities of current MTBE approaches. MTBE Process The main idea of MTBE is the semi-automatic generation of transformations from so-called correspondences between source and target model pairs. The underlying process for deriving model transformations from model pairs is depicted in Figure 2.6. This process, which is largely the same for all existing approaches, consists of five steps grouped in two phases. Phase 1: Modeling. In the first step, the user specifies semantically equivalent model pairs. Each pair consists of a source model and a corresponding target model. The user may decide whether to specify a single model pair covering all important concepts of the modeling languages, or several model pairs whereby each pair focuses on one particular aspect. The requirement on the model pairs are twofold. First, certainly they must conform to their metamodels, 43
and second, all available modeling concepts of the source modeling language should be covered by the examples—at least for the intersection of both modeling languages. In the second step, the user has to align the source model and the target model by defining correspondences between source model elements and corresponding target model elements. For defining these correspondences, a correspondence language has to be available. One important requirement is that the correspondences may be established using the concrete syntax of the modeling languages. Hence, the modeling environment must be capable of visualizing the source and target models and the correspondences in one diagram or at least in one dedicated view. Phase 2: Configuration & Generation. After finishing the mapping task, a dedicated reasoning algorithm is applied to derive automatically metamodel correspondences from the model correspondences. How the reasoning is actually performed is explained in more detail by the example discussed below. The automatically derived metamodel correspondences might not always reflect the intended mappings. Thus, the user may revise some metamodel correspondences or add further constraints and value computations. Note that this step is not foreseen in all MTBE approaches, because it may be argued that this is contradicting with the general by-example idea of abstracting from the metamodels. Nevertheless, it seems to be more userfriendly to allow the modification of the metamodel correspondences in contrast to modifying the generated model transformation code at the end of the generation process. Finally, a code generator takes the metamodel correspondences as input and generates executable model transformation code. MTBE Example For exemplifying the presented MTBE process, we apply it to specify the transformation of the core concepts of UML class diagrams into ER diagrams. As modeling domain, a simple university information system is used. The user starts with creating the source model comprising the UML classes Professor, and Student, as well as a one-to-many association between them as depicted in the upper left area of Figure 2.7. Subsequently, the corresponding ER diagram, depicted in the upper right area of Figure 2.7, is created. In this figure, both models are represented in the concrete syntax, as well as in the abstract syntax in terms of UML object diagrams. After both models are established, the correspondence model is created, which consists of simple oneto-one mappings. These mappings are depicted as dashed lines in Figure 2.7a and Figure 2.7b between the source and target model elements. In the next step, a reasoning algorithm analyzes the model elements and its properties (i.e., attribute and reference values) in the source and target models, as well as the correspondences between them in order to derive metamodel correspondences. In the following, we discuss inferring metamodel correspondences between classes, attributes, and references. Class correspondences. For detecting class correspondences, the reasoning algorithm first checks whether a certain object type in the source model is always mapped to the same object type in the target model. If this is the case, a full equivalence mapping between the respective classes in the source and target metamodel is generated. In our example, a full equivalence 44
Professor
1
examines
name:String
Student
*
examinee i
examiner
Professor
name:String
1
examines
examiner
*
Student
examinee name
name
(a) (b) p2:Property
p1:Property a1:Association
name: name lower: 1 upper: 1 type: String
c1:Class name: Professor i Ab t t false isAbstract: f l
(c)
name: name lower: 1 upper: 1 type: String
name: examines
p3:Property
p4:Property
name: examiner lower: 1 upper: pp 1 type: Undefined
name: examinee lower: 1 upper: pp -1 type: Undefined
type
0..1
0..1
role
Property
c2:Class name: Student i Ab t t false isAbstract: f l
Class
0..1
*
a2:Attribute
name: examines
name: name
name: name
ro2:Role
ro1:Role
name: examiner name: examinee
Association assoc
r1:Relationship
a1:Attribute
*
e2:EntityType
e1:EntityType name: Professor c1:Cardinality lower: 1 upper: 1
type
EntityType atts
name: Student
lower: 1 upper: -1
Role
1
class
c2:Cardinality
2
roles
card
Cardinality
1
*
Attribute
Relationship
attribute
Figure 2.7: Example for Exogenous Transformations: (a) Correspondences in Concrete Syntax, (b) Correspondences in Abstract Syntax, and (c) Metamodels Rule 1: Class ‐> Entity (name Attribute [class OclUndefined] (name OclUndefined] (name Role, Cardinality [association OclUndefined] (r.name f o r e a c h ( a | c r e a t e R e l a t i o n s h i p r ( r . name = a . name , r . r o l e s = a . r o l e ) ) ;
Existing Approaches We compare existing approaches by highlighting their commonalities and differences. Mostly all approaches define the input for deriving exogenous transformations as a triple comprising an input model, a semantically equivalent output model, as well as correspondences between these two models. These models have to be built by the user, preferably using the concrete syntax as is, for instance, supported by [WSKK07], but most approaches do not provide dedicated support for defining the correspondences in graphical modeling editors. Subsequently, reasoning techniques, such as specific rules again implemented as model transformations [GMnGSFF09, Var06, WSKK07], inductive logic [BV09], and relational concept analysis [DHN09] are used to derive model transformation code. Current approaches support the generation of graph transformation rules [BV09, Var06] or ATL code [GMnGSFF09, WSKK07]. All approaches aim for semi-automated transformation generation meaning that the generated transformations are intended to be further refined by the user. This is especially required for transformations involving global model queries and attribute calculations, such as aggregation functions, which have to be manually added. Furthermore, it is recommended to develop iteratively the transformations, i.e., after generating the transformations from initial examples, the examples must be adjusted or the transformation rules must be adapted in case the actual generated output model is not fully equivalent to the expected output model. However, in many cases it is not obvious whether to adapt the aligned examples or the generated transformations. Furthermore, adjusting the examples might be a tedious process requiring a large number of transformation examples to assure the quality of the inferred rules. In this context, self-tuning transformations have been introduced [KWSK09, KSB08]. Self-tuning transformations exploit the examples as training instances in an iterative process for further improving the quality of the transformation. The goal is to minimize the differences between the actual output model produced by the transformation and the expected output model given by the user by using the differences to adapt the transformation over several iterations. Of course, adapting the transformation is a computation intensive problem leading to very large search spaces. Whereas in [KWSK09] domain-specific search space pruning tailored to EMF-based models is used, a generic meta-heuristic–based approach is used in [KSB08] to avoid an exhaustive search. 47
2.3.3
Summary
Model transformations gained enormous attention from the MDE community in research and practice. As a result, several matured dedicated model transformation languages emerged. In this domain, MTBE—as an approach to ease the challenging task of manually specifying model transformations in terms of metamodel-based transformation rules—seems to be a very promising research direction. A variety of research papers on MTBE have been published within the last years. However, all of these papers mentioned above focus on deriving exogenous model transformations from user-specified correspondences between a source and target model. Interestingly, MTBE dedicated for endogenous model transformations has not gained much attention yet. At the time when we started to work in this field, we were the first to propose an MTBE approach dedicated to endogenous model transformations in [Lan09,BLSW09,BLS+ 09], which takes advantage of the demonstration of an endogenous model transformation performed by a user instead of exploiting user-specified model correspondences as is the case for existing MTBE approaches. Nevertheless, at the same time a very similar approach by Sun et al. [SWG09] emerged. Sun et al. introduced the notably suitable term model transformation by demonstration for this approach—in the remainder of this thesis, we will adopt this term for our approach. As their approach has been published at the same time as we published ours, we refrain from discussing their approach as state of the art; we rather present a detailed comparison between their approach and the one presented in this thesis in Chapter 4.
48
CHAPTER
Adaptable Model Versioning In this chapter, we present the big picture of the proposed adaptable model versioning system AMOR [BKS+ 10]. This system is the result of the equally named research project1 , which has been carried out from 2009 to 2011. Please note that the basic idea behind AMOR (i.e., building an adaptable model versioning system) and the AMOR merge process have been elaborated conjointly by all project participants2 . In the following, we present some motivating examples posing the challenges that are solved in this thesis. Next, we introduce a categorization of conflicts that might occur when merging the parallel work of two developers on the same model. The goal of this categorization is to set up the terminology of conflicts used in the remainder of this thesis. Subsequently, we discuss the basic design principles of AMOR and disclose our design rationale. We also provide a brief introduction of AMOR’s technical infrastructure in this chapter. Finally, we present an overview of the generic merge process first and subsequently, we show how this generic process is extended in order to be adaptable with respect to language-specific knowledge.
3.1
Motivating Examples
In this section, we introduce small model versioning scenarios in which two developers, referred to as developer 1 and developer 2 in the following, concurrently modify a common original model denoted with Vo . The issues occurring in these scenarios go beyond simple spatial overlapping and, hence, conflicting operations. Instead, these scenarios illustrate merge issues for which language-specific knowledge is necessary to handle them correctly. Hence, current generic approaches would largely fail to either report the correct conflict or to produce an optimally merged version. 1
AMOR (http://www.modelversioning.org), a research project funded by the Austrian Federal Ministry of Transport, Innovation, and Technology and the Austrian Research Promotion Agency under grant FIT-IT819584. 2 In alphabetical order: Petra Brosch, Gerti Kappel, Philip Langer, Werner Retschitzegger, Wieland Schwinger, Martina Seidl, Konrad Wieland, and Manuel Wimmer.
49
3
3.1.1
Additions of Equal Model Elements
Although entirely equal operations might be treated correctly to a certain extent by generic model versioning systems, in several scenarios additional language-specific knowledge is necessary to enable the correct identification of operations having an equal effect—especially if additions of model elements are involved. Otherwise, the equality of model elements may be hard to determine. An example of such a scenario is depicted in Figure 3.1a. In this scenario, we illustrate the need for language-specific knowledge by means of UML class diagrams [OMG03]. The original model Vo contains two classes, which are shown in the concrete syntax on the left side, as well as in the abstract syntax in terms of an object diagram on the right side. Please note that we omitted some details in the abstract syntax for the sake of readability. The original model is now concurrently modified by two developers leading to the revised versions Vr1 and Vr2 . Both developers concurrently add a generalization relationship between the classes Employer and Person specifying Employer to be a subclass of Person. Although not directly visible in the concrete syntax, generalizations are realized in UML using a dedicated object of type Generalization. Thus, corresponding Generalization objects, g1 and g2, exist in the object diagrams of Vr1 and Vr2 , respectively. Using a generic merge approach, all modifications applied by both developers may be merged without any conflicts because no spatially overlapping concurrent operations have been performed. However, when naively merging the modifications of both developers, we end up having two Generalization objects expressing exactly the equivalent semantics in the merged model Vm , because two distinct objects g1 and g2 have been added; thus, they are included in the merged version. This redundancy in the merged model is obviously unfavourable and might even, in the worst case, cause the editor to fail when trying to open the merged model. A completely generic merge approach is not aware of the fact that these two objects, g1 and g2, are entirely redundant and as a consequence, is not able to detect and report such a scenario. An ideal model versioning system would recognize that g1 and g2 are indeed distinct objects that, however, express the equivalent semantics. Being aware of this information, the ideal model versioning system would be able to omit either the addition by developer 1 or the one by developer 2 in order to obtain a finally merged model as depicted in Figure 3.1b.
3.1.2
Additions of Similar Model Elements
A comparable yet different scenario is presented using Ecore models [SBPM08] in Figure 3.2a. The common original model Vo contains two Ecore classes, Shop and Product. This model is now concurrently modified. In particular, developer 1 adds the reference sells to Shop, which refers to Product. This reference’s cardinality has a lower bound of 1 and an unbounded upper bound (cf. Vr1 in Figure 3.2a). Concurrently, developer 2 adds a reference also named sell to Shop, which refers to Product. However, the lower bound is specified to be 0 (cf. Vr2 in Figure 3.2a). Thus, the added model elements are largely similar, but not completely equivalent because of the different lower bounds 0 and 1. Applying a generic merge to this scenario, we obtain the model Vm depicted in Figure 3.3b. As expected, this merged model contains both the reference sells added by developer 1 as well 50
Vo Vo
Vr1 Vr1
Concrete Concrete Syntax Syntax
Abstract Abstract Syntax Syntax
Person Person
person : Class person : Class
Employer Employer
employer : Class employer : Class
Vr2 Vr2
Concrete Concrete Syntax Syntax
Abstract Abstract Syntax Syntax
Person Person
person : Class person : Class
Employer Employer
Vm Vm
g1: Generalization g1: Generalization
Abstract Abstract Syntax Syntax
Person Person
person : Class person : Class
Employer Employer
employer : Class employer : Class
Concrete Concrete Syntax Syntax
Concrete Concrete Syntax Syntax
g2 : Generalization g2 : Generalization employer : Class employer : Class
Abstract Abstract Syntax Syntax
Person Person
person : Class person : Class g1: Generalization g1: Generalization
Employer Employer
g2: Generalization g2: Generalization
employer : Class employer : Class
(a) Scenario with Generic Merge
Vm Vm
Concrete Concrete Syntax Syntax
Abstract Abstract Syntax Syntax
Person Person
person : Class person : Class
Employer Employer
g1: Generalization g1: Generalization employer : Class employer : Class
(b) Optimal Merge
Figure 3.1: Addition of an Equal Model Element
51
V Voo Shop Shop
Product Product
V Vr2 r2
VVr1r1 Shop Shop
sells 1..* sells 1..*
Product Product
V Vmm Shop Shop
Client Shop
sells sells sells sells
1..* 1..* 0..* 0..*
sells 0..* sells 0..*
Product Product
Product Product
(a) Scenario with Generic Merge
V Vmm Shop Shop
sells ?..* sells ?..*
Product Product
(b) Optimal Merge
Figure 3.2: Addition of a Similar Model Element
as the reference sells added by developer 2. Ultimately, we end up having two redundant equally named references in the merged model. Moreover, the two redundant associations are not completely equal. Hence, a decision has to be made among the developers to specify which lower bound finally should be applied in the merged model. However, a generic merge approach, which is unaware of the fact that an reference’s name as well as its target class are the meaning-carrying properties or the signifier of a reference, would neither detect the redundancy nor indicate the need for such a decision. The term signifier is discussed in Section 3.2 in more detail. Ideally, a model versioning system that involves language-specific knowledge would be able to detect the correspondence between both added references, because it would compare the references’ names as well as their target class. Being aware of the correspondence, an ideal system would only incorporate one of both additions performed by the developers so that only one reference is included in the merged model. Additionally, the system should detect that the added objects, however, are not entirely equal and, therefore, indicate the need for a decision on how to resolve this contradiction.
3.1.3
Concurrent Change of a Model Element’s Signifier
Another type of conflict, which may be hard to detect by solely generic approaches, occurs if two concurrent, yet not spatially overlapping operations, both modify the same model element’s properties. Ultimately, the resulting model may obfuscate the intentions of both developers. 52
Vo
Concrete Syntax
Abstract Syntax person : EClass
Person Concrete
Vo
Abstract
r1 : EReference Syntax
Syntax
person : EClass
Employer Person Employee
employer : EClass
employee : EClass
r1 : EReference
Vr1
Employer
Concrete Syntax
Vr1
Employee
employer : EClass Abstract Syntax
employee : EClass
personAbstract : EClass
Concrete
PersonSyntax
Syntax
r1person : EReference : EClass Person
Employer Employer
r1 : EReference employer : EClass employee : EClass
Employee Employee
employer : EClass
Vr2 Vr2
employee : EClass
Concrete Syntax
Abstract Syntax
Concrete Syntax
Abstract
person : EClass Syntax
Person
person : EClass
r1 : EReference
Person
r1 : EReference
Employer Employer
VmV
employer : EClass
Employee
employer : EClass
Employee
Concrete Concrete Syntax
m
Abstract
Abstract Syntax Syntax
Syntax
person: EClass : EClass person
Person Person
Employer Employer
employee : EClass employee : EClass
r1: :EReference EReference r1
Employee Employee
employer :: EClass : EClass employer EClass employee employee : EClass
(a) Scenario with Generic Merge
Vm
Vm
Concrete
Concrete Syntax Syntax Person
Abstract Abstract Syntax person : EClass
person : EClass
Syntax
r2: EReference
r2 : EReference
r1: EReference
Person
r1 : EReference Employer
Employer
Employee
Employee
employer : EClass
employer : EClass
employee : EClass
employee : EClass
(b) Optimal Merge
Figure 3.3: Concurrent Change of a Model Element’s Signifier
53
Consider the scenario for Ecore models depicted in Figure 3.3a. The model versions are shown in the concrete syntax on the left side and in the abstract syntax in terms of object diagrams on the right side. The original model Vo contains three classes; namely, Person, Employer, and Employee. Additionally, there is a reference between Employer and Person. Now, developer 1 modifies the target of the reference from Person to Employee (cf. Vr1 in Figure 3.3a). As a result, the object r1 representing the reference is retained, but its link is changed from Person to Employee. Concurrently, developer 2 modifies the source of the reference from Employer to Employee (cf. Vr2 in Figure 3.3a). In the concrete syntax, this change is realized by moving the reference r1 from its original container Employer to Employee. Developer 1 intended the reference to go from Employer to Employee and developer 2 wanted the reference to go from Employee to Person. As in the previous scenario, a generic model versioning system would not report a conflict because no spatially overlapping operations have been applied to the original model. Therefore, all modifications are merged to obtain an integrated model: the target of the reference r1 is changed from Person to Employee (as performed by developer 1) and the reference is moved from the original container Employer to Employee. However, merging both operations leads to a model that, in the end, contradicts the intention of both developers: in the merged model Vm (cf. Figure 3.3a), the reference is contained by Employee and also refers to Employee. In other words, the reference has been changed accidentally into a reflexive reference, although none of the developers intended it to be that way. One potential merged model that better reflects the intentions of both developers is depicted in Figure 3.3b. In this merged model Vm , the original reference r1 has been duplicated: one reference reflects the operations of developer 1 and the other one reflects the operations of developer 2. Admittedly, this is only one possible way of resolving this issue. The developers should be confronted with a warning so that they are aware of their indirectly contradicting operations regarding the meaningful properties or signifier of a model element. Unfortunately, generic approaches will not be able to identify the concurrent change of a model element’s signifier (cf. Section 3.2 for a more detailed discussion of the term signifier).
3.1.4
Intentions Behind Composite Operations
As already stressed in Section 2.1.2, the importance of considering composite operations in the conflict detection and during the merge is neglected by current model versioning systems. Of course, the intention behind composite operations can not be regarded by solely generic approaches. In the following, we present a scenario that illustrates the drawbacks of neglecting composite operations in detail by discussing the model versioning scenario depicted in Figure 3.4a. Consider the common original UML state machine [Har87] Vo in Figure 3.4a representing the states of a phone: starting in the state Idle, the phone changes its state to DialTone when the event lift (the handset) is issued. Being in this state, users may hangup or dial, which causes the state of the phone to change to the state Dialing. In this state, a user may keep on dialing until she hangs up the handset again, which causes the phone to switch to the state Idle. Please note that for the sake of readability this state machine does not cover all possible states of a real-world phone. 54
VVo o
hangup hangup lift lift
Idle Idle
DialTone DialTone dial dial
hangup hangup
Dialing Dialing dial dial
Vr1 Vr1
Vr2 Vr2
hangup hangup
Idle Idle
Active Active DialTone DialTone
lift lift
hangup hangup lift
Idle Idle
dial dial
Dialing Dialing
hangup hangup
lift
DialTone DialTone dial dial
hangup hangup
Dialing Dialing
Connecting end dialing Connecting
dial
dial
end dialing
Vm Vm
dial
dial
hangup
hangup lift Idle lift Idle
Active Active DialTone
DialTone dial Dialingdial
hangup
hangup
Dialing
end dialing
Connecting
dial
end dialing
Connecting
dial
(a) Scenario with Generic Merge
Vm Vm
hangup
hangup
Idle
Idle
lift
Active DialTone Active
lift
dial DialTone end dialing
Dialingdial
end dialing Connecting Dialing dial
Connecting
(b) Optimal Merge
dial
Figure 3.4: Intention Behind a Composite Operation
55
Again, this original model is changed by two developers in parallel. Developer 1 identifies the need for applying the refactoring Introduce Composite State [SPLTJ01] to this state machine. Therefore, a new composite state called Active is introduced. Next, the states DialTone and Dialing are moved to the newly created composite state. Then, the target of the transition called lift, which was originally DialTone, is changed to the new composite state. To preserve the semantics of the state machine, a new initial state with a transition to DialTone has to be created in the composite state. Finally, both transitions named hangup, which are outgoing from state DialTone and Dialing back to Idle can be folded: one of these transitions is deleted and the other one is moved to the composite state. The refactored state machine is depicted in Vr1 in Figure 3.4a. In parallel, developer 2 works towards completing this state machine and adds a new state named Connecting. This state has one incoming transition, namely end dialing and one outgoing transition named hangup referring back to the state Idle (cf. Vr2 in Figure 3.4a). When merging these two revised state machines using a generic merge algorithm, all atomic operations that have been performed by both developers can be merged without any issues. The resulting merged state machine Vm is depicted in Figure 3.4a. This state machine contains the composite state as well as the state Connecting, which resides outside of the composite state having the outgoing transition hangup. However, recall that the original intention behind the refactoring applied by developer 1 is to collect all states sharing the common transition hangup and put them together into the composite state Active. This is obviously not the case in the naively merged state machine Vm in Figure 3.4a. A merged state machine, which better reflects the intentions of both developers, is illustrated in Figure 3.4b. In this state machine, the new state Connecting resides within the composite state Active just as developer 1 intended it to be. Of course, the transition hangup, which originally was outgoing from Connecting, is removed because this transition is already present in the containing composite state Active.
3.1.5
Violated Preconditions of Composite Operations
Another scenario for merging state machines is depicted in Figure 3.5. The original model Vo is equal to the original model in Section 3.1.4. Again, developer 1 performs the refactoring Introduce Composite State in order to collect all states having an outgoing state named hangup. However, in contrast to the previous scenario in Section 3.1.4, developer 2 now does not introduce an additional state but renames the transition connecting DialTone and Idle from hangup to abort (cf. Vr2 in Figure 3.5). Again, the modifications of both developers can be merged by fine-grained generic model versioning systems without raising any conflicts. In the resulting model Vm in Figure 3.5, the transition going from the composite state Active to Idle is, according to the change of developer 2, now named abort. As a result, the merge inadvertently changed the semantics of the model because the state Dialing, which originally had an outgoing transition named hangup, now implicitly has an outgoing transition abort from its containing composite state Active. The reason for this unintended change of the semantics is that the preconditions responsible for ensuring the semantic preservation of the state machine refactoring have not been considered during the merge. Ideally, a model versioning system would check whether the preconditions 56
Vo
hangup lift
Idle
DialTone dial
hangup
Dialing dial
Vr1
Vr2
hangup
abort
Active Idle
lift
Idle
DialTone
lift
DialTone dial
dial
Dialing
hangup
Dialing dial
dial
Vm
abort
Active Idle
lift
DialTone dial
Dialing dial
Figure 3.5: Violated Precondition of a Composite Operation
of the applied composite operations still hold after the modifications of the opposite developer have been performed. In the scenario at hand, the condition of the Introduce Composite State refactoring restricting all folded transitions to be named equally is violated in Vr2 depicted in Figure 3.5 (DialTone.abort 6= Dialing.hangup). Consequently, a corresponding conflict indicating this violation of the composite operation’s precondition should be raised prior to constructing the merged model.
3.1.6
Inconsistent Merge Results
The model versioning scenario depicted in Figure 3.6 deals with operations that lead to an inconsistent merge result in terms of language-specific validation rules. In the original model Vo in Figure 3.6, a UML model comprising a class diagram and a dependent sequence diagram is illustrated. More specifically, the class diagram contains two classes, namely Client and Logger. The class Logger can be instantiated using the public constructor Logger() and may receive messages to the operation print(String). The interaction between these two classes is specified in 57
Vo c1 : Client
Client
Logger()
Logger
l1 : Logger
print(...)
+ Logger() + print(String)
Vr1 c1 : Client
Client
l1 : Logger getInstance()
Logger - Logger() + getInstance() + print(String)
print(...)
Vr2 c1 : Client
Client
Logger()
Logger
l1 : Logger
print(...)
+ Logger() + print(String)
c2 : Client Logger()
l2 : Logger
Vm Client Logger - Logger() + getInstance() + print(String)
c1 : Client
l1 : Logger getInstance() print(...)
c2 : Client Logger()
l2 : Logger
Figure 3.6: Inconsistent Merge Results
the sequence diagram next to the class diagram in Figure 3.6. In particular, the client instantiates the logger using the public constructor in order to be able to call the operation print(). This UML model is now concurrently modified by two developers. Developer 1 decides to turn the class Logger into a Singleton [GHJV95]. More precisely, developer 1 modifies the visibility of the constructor to private and introduces a new operation named getInstance for obtaining the single instance of the class. Accordingly, developer 1 also adapts the sequence 58
diagram: instead of creating the instance of Logger by calling its constructor, the new operation getInstance is used (cf. Vr1 in Figure 3.6). In parallel, developer 2 introduces new instances of the classes Client and Logger in the sequence diagram. Unaware of the operations performed by developer 1, developer 2 adds a call to the constructor for instantiating the class Logger (cf. Vr2 in Figure 3.6). When the modifications of both developers are merged generically, no conflict is raised. Instead, we obtain the merged model Vm depicted in Figure 3.6. In this merged model, the class Logger is, according to the operations by developer 1, a singleton containing the constructor, which is now private, as well as the operation getInstance. Also, the part of the sequence diagram that already existed in the original model Vo has been adapted accordingly because the operations of developer 1 are incorporated into the merged model. However, this model comprises an inconsistent call of the private constructor of the class Logger in the part of the sequence diagram that has been introduced by developer 2 who was not aware of the modifications performed by developer 1.
3.2
Categorization of Conflicts
Having presented some exemplary conflict scenarios, we now present a more systematic view on conflicts by grouping conflict types into categories. For this purpose, we first discuss the meaning of the term conflict in related research areas and survey existing categorizations of conflict types. We derive the terminology of conflict types used in the remainder of this thesis.
3.2.1
Existing Conflict Categorizations
The term conflict has been used in the area of versioning to refer to interfering operations in the parallel evolution of software artifacts. However, the term conflict is heavily overloaded and differently co-notated. Besides using the term conflict, also the terms interference and inconsistency have been applied synonymously in the literature as, for instance, in [Fea89, TP05] and [Men02], respectively. The term conflict usually refers to directly contradicting operations; that is, two operations, which do not commute [LvO92]. Nevertheless, there is a multitude of further problems that might occur, especially when taking syntax and semantics of the versioned artifact’s language into account. Therefore, in order to better understand the notion of conflict, different categories have been created to group specific merge issues as surveyed in the following. In the field of software merging, Mens [Men02] introduces textual, syntactic, semantic, and structural conflicts. Whereas textual conflicts concern contradicting operations applied to text lines as detected by a line-based comparison of a program’s source code, syntactic conflicts denote issues concerning the contradicting modification of the parse tree or the abstract syntax graph; thus, syntactic merging takes the programming language’s syntax into account and may also report operations that cause parse errors when merged (cf. line-based versus graph-based versioning in Section 2.1). Semantic merging goes one step further and also considers the semantic annotation of the parse tree, as done in the semantic analysis phase of a compiler. In this context, static semantic conflicts denote issues in the merged artifact such as undeclared vari59
ables or incompatible types. Besides static semantic conflicts, Mens also introduced the notion of behavioural conflicts, which denote unexpected behavior in the merged result. Such conflicts can only be detected by applying even more sophisticated semantic merge techniques that rely on the runtime semantics [Men02]. Finally, Mens also introduces the notion of structural conflicts, which arise when one of the applied operations to be merged is a “restructuring” (i.e., a refactoring) and the merge algorithm cannot uniqually decide in which way the merged result should be restructured [Men02]. Mens stresses that detecting structural conflicts is a challenging future research topic [Men02]; thus, it is worth noting that detecting structural conflicts among composite modeling operations is a key topic of this thesis (cf. Chapter 6). Also the notion of conflict in the domain of graph transformation theory serves as a valuable source of knowledge in this matter. As defined by Heckel et al. [HKT02], two direct graph transformations are in conflict if they are not parallel independent. Two direct graph transformations are parallel independent if they preserve all elements that are in the match of the other transformation; otherwise we encounter a delete-use conflict. Another manifestation of such a case is a delete-delete conflict. Although both transformations delete the same element anyway, this is still considered a conflict because one transformation deletes an element that is indeed in the match of the other transformation. If the graph transformations additionally comprise negative application conditions, they also must not create elements that are prohibited by negative application conditions of the other transformation; otherwise an add-forbid conflict occurs. To summarize, two direct graph transformations are in conflict, if one of both disables the other. Furthermore, as shown in [Ehr79], based on the local Church-Rosser theorem [CR36], we may further conclude that two parallel independent direct transformations can be executed in any order with the same final result. In the domain of model versioning, no dedicated, widely accepted categorization of different merge conflict types has been established yet. Nevertheless, Westfechtel establishes a detailed definition of conflicts between two atomic operations in [Wes10]. More precisely, he distinguishes between context-free conflicts and context-sensitive conflicts. Context-free conflicts denote contradicting changes to the same feature value in the same model element (also known as update-update conflict); thus, the context of the model element is not taken into account. In contrast, context-sensitive conflicts concern also the context of a concurrently modified model element such as the container and referenced model elements. Context-sensitive conflicts are again classified into (i) containment conflicts, which occur, for instance, if both developers move the same model element to different containers so that no unique container can be chosen automatically, (ii) delete conflicts, which denote delete-update, delete-use, and delete-move conflicts, and finally, (iii) reference conflicts, which concern contradicting changes to bi-directional references. This categorization is tailored to EMF models and are defined clearly using set-theoretical rules. However, Westfechtel considers only generic conflicts among atomic operations.
3.2.2
Conflict Categorization Applied in this Thesis
Having surveyed existing conflict categorizations and terminologies, we now introduce the categories of conflicts and the terminology used in the remainder of this thesis. The following categorization is partly based on the conflict categorization we presented in [BLS+ 10a, BKL+ 11a]. Nevertheless, we now adapt and refine certain parts in order to better integrate it into the con60
m1
VoV V
oo
m m11
Vr1V Vr1r1 m2 mm22
m2
m m22
Vr2V Vr2r2 m1 mm11
VmV Vmm
m1
VV o V
oo
m m11
m2
m m22
VV Vr1r1 r1
VV V r2r2 r2
m2mm22
m1
mm 1m11
m m11
m m m222
(a) Parallel Dependence (a) Parallel Parallel Dependence Dependence (a)(a) Parallel Dependence
m2
m1
m m22
V V Vr1 r1 r1 V V Vm' m' m'
VV V mmm
V V Vooo
m m11
V V Vr2r2 r2 ≠≠
m111 m m
m2
m m22
V VVr1r1r1 m mm 22 2
Vm'' V m''
(b) Non-commutativity (b) (b)Non-commutativity Non-commutativity Non-commutativity (b)
o V VV oo
V VV r2 r2r2 m m1m 1 1
V VV m mm (c) Inconsistent Result (c) (c) Inconsistent Result Result (c)Inconsistent Inconsistent Result
Figure 3.7: Properties of Two Operations
text of this thesis. Furthermore, we introduce the notion of merge warnings representing merge issues, which do not directly interfere with the merge process or destroy the consistency of the model, but which should still be brought to the attention of the involved developers. An overview of the terminology of merge issues is depicted in Figure 3.8. Two concurrent operations m1 and m2 applied to the same version of an artifact Vo may have three different properties that indicate a conflict as depicted in Figure 3.7. First, similar to the concept of parallel independence from graph transformation theory, two operations m1 and m2 may be parallel dependent (cf. Figure 3.7a). That is, the operation m2 cannot be applied after the operation m1 has been applied. In other words, the preconditions of m1 are not fulfilled anymore after m2 has been applied. Second, according to Lippe and Oosterom [LvO92], we may encounter the case that the operations m1 and m2 do not commute (cf. Figure 3.7b) such that m1 (m2 (Vo )) 6= m2 (m1 (Vo )). Thus, no unique merged version can be found. Third, if the operations m1 and m2 are parallel independent and commutative, the result Vm may be inconsistent with a specification of the artifact’s language, as described by Mens [Men02] (cf. Figure 3.7c). Overlapping Operations We use the term overlapping operations or operation-based conflict to denote two operations that are either parallel dependent or not commutative. Thus, both operations cannot be applied together without nullifying one operation; in other words, overlapping operations interfere with the merge unless at least one of the overlapping operations is omitted. In such a conflict, atomic operations as well as composite operations may be involved. In the following, we discuss conflicts arising from parallel dependence and non-commutativity in more detail. An overview is depicted in Figure 3.8. Parallel Dependence. As defined in graph transformation theory, for determining parallel independence the preconditions of operations are crucial. The precondition of atomic operations is that the affected model element still exists. For instance, updating an attribute value of a model element requires that the model element is not deleted by a concurrent operation; otherwise, we 61
Merge Issue
Conflict
Overlapping Operations Parallel Dependence • Delete-Update • Delete-Use • Add-Forbid • ...
Warning
Inconsistent State
Composite Operation Match
NonCommutativity
Syntactic Inconsistency
Semantic Inconsistency
• Update-Update
• Metamodel • Validation rules
• Semantic Domain
Unexpected Signifier Match
Concurrent Signifier Change
Figure 3.8: Terminology of Merge Issues
encounter parallel-dependent operations, or more precisely, a delete-update conflict. Additionally, atomic operations such as adding a link to model elements obviously require both the source model element and the target model elements to exist. In case the target has been deleted concurrently, we use the terminology of graph transformations and denote such a scenario as delete-use conflict. For composite operations, the preconditions may be more complicated as they may also check for non-existence in terms of negative application conditions or require certain attribute or reference values in a model. Consequently, with composite operations we additionally may face add-forbid or update-forbid conflicts. An example of an update-forbid conflict is illustrated in Section 3.1.5. According to graph theory, there are also delete-delete conflicts among direct graph transformations because both transformations require the same element to exist in order to be able to delete the element. However, such a conflict is not important in the context of model versioning because both developers intended to delete the element anyway; hence, we may delete it in the merged model and by this, reflect the intention of both developers. Non-commutativity. Besides parallel dependence, operations may also overlap if they do not commute. For example, if two operations update the same attribute of a model element to different values, the order in which the operations are applied to the common origin model affect the respective attribute value in the merged model; thus, such operations do not commute and are referred to as update-update conflicts. Please note that whenever two operations update the same attribute of a model element to the same value, the property of commutativity does hold so that no conflict is at hand. Inconsistent State Even if concurrent operations are parallel-independent and commutative (i.e., they are not overlapping), they may still cause an inconsistent state if they are both applied to a merged model. This inconsistent state has been caused by operations, but the inconsistency itself concerns the state and may only be detected when analysing the resulting state in contrast to analysing the operations. Hence, we also refer to them as state-based conflicts. According to the categorization 62
of Mens [Men02], we may further distinguish between syntactic inconsistencies and semantic inconsistencies (cf. Figure 3.8). This differentiation is made upon the specification type with which the state is inconsistent. Syntactic Inconsistency. The merged model may be inconsistent with the abstract syntax specification of a modeling language. In our context, the abstract syntax is specified by the metamodel and additional validation rules (e.g., OCL invariants). The metamodel may be seen as the context-free syntax specification and the validation rules as additional context constraints. In the UML specification [OMG03] as well as in other literature, the term static semantics is used to refer to such context conditions. However, as stressed by Harel and Rumpe in [HR04], context conditions (also if they are sometimes called static semantics) are not the specification of a language’s semantics; context constraints simply further restrict the abstract syntax. Thus, violations of the static semantics are still syntactic conflicts. An example for a syntactic inconsistency is illustrated in Section 3.1.6. Semantic Inconsistency. The merged model may also be inconsistent with a specification of the semantics of a modeling language. A language’s semantics must specify the meaning of all concepts using a well-defined and well-understood semantic domain (e.g., denotational semantics [Win93]). At the moment, however, there is “no simple and obvious way to define this complex semantic domain precisely, clearly, and readably.” [HR04]. Consequently, the semantics of modeling languages is often specified only in an informal way. Nevertheless, in this thesis, we do not consider semantic inconsistencies, but list them here for the sake of completeness. Merge Warnings Conflicts have to be eventually resolved in order to obtain a consolidated and consistent model. However, in many merge scenarios, the involved developers should for now be only informed that there are merge issues, which indeed do not directly interfere the merge process or destroy the consistency of the model, but which should be still carefully reviewed by the developers. Therefore, picking up on the idea of Koegel et al. [KHWH10], we introduce merge warnings and discuss the specific types of warnings, which we considered in this thesis, in the following (cf. Figure 3.8). Composite Operation Match. A composite operation is more than its set of contained atomic operations. The atomic operations are applied to fulfill a common goal reflecting the intention of the developer who applied it. The intention of the developer is fulfilled when the composite operation has been applied successfully to all selected and matching model elements. However, if another developer concurrently changes or adds model elements, the effect of the composite operation might be mitigated because the concurrent operations have not been considered in the original application of the composite operation. As already mentioned, composite operation specifications comprise detailed preconditions and the application of a composite operation affects model elements that fulfill or match the preconditions. If concurrent operations applied by another developer modify the model so that this match is influenced, we may either face an 63
operation-based conflict, or we may encounter valid preconditions and an increase of the match size. Thus, the composite operation application is still valid, however, more model elements match the preconditions after the concurrent operations have been applied than before. Therefore, developers should be notified in terms of a warning that these additionally matching model elements might be also incorporated in the composite operation application. An example that illustrates such a scenario is presented in Section 3.1.4. Signifiers. Adopting the notion of signs and signifiers in linguistics [DS16], we introduce the term signifier to refer to one or more intrinsic or extrinsic properties of a model element that convey the superior meaning of the respective model element. For instance, the meaning of a UML operation is mainly conveyed by its name, its return type, and the types of its contained parameters; thus, the signifier of a UML operation is a combination of its name and return type, as well as the types of its contained parameters. These properties, constituting the signifier of a model element, may overlap with the natural identifier of the model element such as its name. However, a natural identifier is usually only one intrinsic property. A signifier, on the contrary, may additionally incorporate multiple properties, which may also come from its context such as its child model elements, its container, or cross-referenced model elements. As these properties are particularly important for the meaning of a model element, we argue that they should be treated specifically in the merge process. Therefore, we introduce two types of warnings related to signifiers in the following. Unexpected Signifier Match. An unexpected signifier match indicates scenarios in which two model elements, which have either been added or modified, eventually have the same signifier; that is, they share the same meaningful properties. If these two model elements are completely equal as in the scenario in Section 3.1.1, we may safely remove one of those added or modified model elements to avoid redundancies in the merged model. If, however, the model elements indeed have the same signifier, but are not entirely equal, a decision of the developer is needed to verify if both model elements should be retained or how they should be joined (cf. Section 3.1.2 for an example). Such scenarios are referred to as unexpected signifier match. Concurrent Signifier Change. Besides having new signifier matches, we may also face the opposite case. One model element is modified concurrently so that the signifier is affected contradictorily in both revised models; that is, after the concurrent modifications, the corresponding model elements have different signifiers. In such scenarios, which are referred to as concurrent signifier change, it is likely that the model elements meaning is obfuscated and, therefore, developers should be warned and review the merged model. An example for such a scenario is given in Section 3.1.3.
3.3
Design Principles of AMOR
In this section, we discuss the basic design principles of the adaptable model versioning system AMOR. Subsequently, we discuss several fundamental techniques with regard to these principles and document the reasons behind the design decisions made when developing AMOR. 64
Flexibility concerning modeling language and editor. In traditional, code-centric versioning, mainly language-independent systems that do not pose any restrictions concerning the used editor gained significant adoption in practice. Thus, we may draw the conclusion that a versioning system that only supports a restricted set of languages and that has an inherent dependency on the used editor might not find broad adoption in practice. Also, when taking into consideration that domain-specific modeling languages are becoming more and more popular, language-specific systems seem to be an unfavorable choice. Therefore, AMOR is designed to provide generic versioning support irrespective of the used modeling languages and modeling editors. Generic model versioning can be achieved by using one of two alternatives. The first alternative is having an internal representation of models, which are put under version control. This internal representation must be capable of expressing every piece of information that is also available in the original model. The implementations of the versioning system may then be designed to work with models conforming to the internal representation and are consequently independent of the original modeling language. However, this requires the existence of a bi-directional transformation between models conforming to a specific modeling language and models conforming to the internal representation. Specifying these transformations might be a tedious task. Therefore, we use an alternative way of realizing a language-independent system, which is actually used by several other generic model versioning systems. Instead of translating every model into an internal representation, we use the reflective interfaces of the Eclipse Modeling Framework [SBPM08] (EMF). Thereby, all modeling languages can be handled immediately that are supported by the chosen metamodeling framework (i.e., a metamodel is specified in terms of the metamodeling framework’s meta-metamodeling language). By choosing a popular metamodeling framework, a plethora of modeling languages can be handled at one stroke. Of course, this only allows to deal with modeling languages for which a metamodel (conforming to the supported meta-metamodel) is available. Nevertheless, it is always possible to develop a transformation from the models defined in the “foreign” metamodeling framework into a corresponding new or existing metamodel conforming to the supported metamodeling framework. A model versioning system that is also independent of the used modeling editor must not make any assumptions on how a model is manipulated by users and must not rely on specific features on the editor side. Therefore, we may not apply editor-specific operation recording to obtain the applied operations. Instead, AMOR works only with the states of a model before and after it has been changed and derives the applied operations using state-based model differencing. Easy adaptation by users. Generic versioning systems are very flexible, but they lack in precision in comparison to language-specific versioning systems because no language-specific knowledge is considered (cf. Section 3.1 for examples in which language-specific knowledge is required). Therefore, a generic versioning system should be adaptable with language-specific knowledge whenever this is needed. Some existing model versioning approaches are adaptable in terms of programming interfaces. Hence, it is possible to implement specific behavior to adapt the system according to their needs (i.e., white-box adaptation as discussed in Section 2.2). Especially with domain-specific modeling languages, a plethora of different modeling 65
languages exists, which often are not even publicly available. Bearing that in mind, it is hardly possible for versioning system vendors to pre-specify the required adaptations to incorporate language-specific knowledge for all existing modeling languages. Thus, users of the versioning system should be enabled to create and maintain those adaptation artifacts by themselves. This, however, entails that these adaptation artifacts do not require deep knowledge on the implementation of the versioning system and programming skills. In other words, black-box adaptations should be preferred over white-box adaptations and the adaptation artifacts should be created in a descriptive language that is easy to use (cf. Section 2.2). Therefore, AMOR is designed to be adapted by providing descriptive adaptation artifacts and uses, as far as possible, well-known languages to specify the required language-specific knowledge. No programming effort is necessary to enhance AMOR’s versioning capabilities with respect to language-specific aspects. Besides aiming at the highest possible adaptability, the ease of adaptation is one major goal of AMOR. Thus, for one of the most complicated adaptation points (i.e., the specification of composite operations), we introduce a novel technology named model transformation by demonstration (cf. Chapter 4) to achieve this goal. Don’t Repeat Yourself. Adapting a software system to one’s specific needs is often a great deal of work. Besides gathering the requirements and identifying the right adaptation points for realizing those requirements, also specifying the correct adaptation artifact might be a timeconsuming task. This effort can be counteracted by easing the approach and providing appropriate tool support to create the adaptation artifact, but also by not forcing developers to specify repeatedly the same piece of knowledge over and over again. This principle is also known as Don’t Repeat Yourself (DRY) and has been introduced by Hunt and Thomas [HT00]. In particular, they state that “every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. In AMOR, we adopt this principle in order to aim at reducing the adaptation effort. More precisely, we designed AMOR to exploit user-specified match rules for improving the model matching, but also for enabling the system to detect unexpected signifier matches and concurrent signifier changes (cf. Section 3.2). Furthermore, a user-specified composite operation specification allows for automatic execution of the composite operation, but also for the a posteriori detection of its applications and for detecting composite operation conflicts and composite operation match warnings (cf. Section 3.2). Finally, we reuse the constraints of a modeling language’s abstract syntax specification to reveal inconsistent states after merging.
3.4
Technical Infrastructure of AMOR
In this section, we give a brief overview of the technical infrastructure of this thesis. The concepts presented in this thesis can be ported to any platform and metamodeling framework. However, the concepts have been implemented in terms of Eclipse plug-ins [BG03,CR04,Hol04] and are elaborated in the context of EMF [SBPM08]. When describing models in the following, we refer, in particular, to EMF-based models. Furthermore, for realizing the contributions presented in this thesis, we reuse and extend the model comparison framework EMF Compare [BP08] 66
and integrate the Epsilon Comparison Language (ECL) [Kol09]. Therefore, we provide a short overview of these technologies in the following.
3.4.1
Eclipse and Eclipse Plug-ins
Eclipse3 [BG03, CR04, Hol04] is an open-source4 , Java-based software development environment with the goal of providing a generic platform for bundling integrated development environments (IDE). The most popular bundle is the Eclipse IDE for Java. Besides this bundle, there are numerous other bundles for several programming languages or other application domains. These bundles range from environments for report development to IDEs for an extensive set of diverse programming languages such as C++, Ruby, PHP, and many others. The development of Eclipse is organized by an independent consortium consisting of many companies and organizations. Its implementation is performed by thousands of professional and independent developers spread all over the world. This diversity of stakeholders and developers has led to a very powerful, flexible, and extensible platform and a variety of features. A project of such a size and complexity may hardly be organized by one single organization unit. Therefore, the development of Eclipse is divided into three main projects having distinct responsibilities and a specific focus. These three projects are (i) the Eclipse Project, (ii) the Tools Project, and (iii) the Technology Project. Each of these main projects consists of a range of subprojects such as the Java Development Tools Project (JDT), the C/C++ Development Tools Project (CDT), and many others (cf. [SBPM08]). It is worth noting that the main goal of Eclipse is not to provide an IDE for a specific set of programming languages. It rather aims at offering a platform that enables to develop every kind of IDE for every kind of language. For achieving this goal, the core of Eclipse is designed to be a runtime system that manages and loads plug-ins. This runtime system is a component-based system called Equinox, which is an implementation of the Open Services Gateway initiative (OSGi) [OSG03]. Every plug-in contributes a set of features by providing its own implementations or bundling, and composing features of other plug-ins. A plug-in consists of all artifacts required to realize the set of features. This comprises the compiled source code, interface definitions, image resources, dependencies to other plug-ins, etc. The central declaration of a plug-in is the so-called plugin.xml, which wraps the following information (as stated in [SBPM08]). • Requires: Dependencies to external libraries and libraries provided by other plug-ins. • Exports: Visibility of its own public classes, which can be called by other plug-ins. • Extension Points: Public declaration of interfaces that can be used by other plug-ins to extend the behaviour of its own plug-in. • Extensions: Public declaration of the implementations that are contributed by this plug-in to other plug-ins (i.e., extensions extending foreign extension points). 3 4
http://www.eclipse.org Eclipse Public License (EPL): http://www.eclipse.org/legal/epl-v10.html
67
3.4.2
(Meta-)Modeling with EMF and Ecore
When describing models in this thesis, we refer to models that are based on EMF. EMF is a matured Eclipse-based framework providing powerful metamodeling support within the Eclipse ecosystem. EMF has found significant recognition among researchers and practitioners, which is also why we chose EMF as the underlying modeling technology. EMF offers, besides the metametamodeling language Ecore (introduced below), facilities for code generation, generation of modeling editors, reflective APIs to access and manipulate models generically. Based on EMF, many very powerful technologies have been built, which allow, for instance, to persist models in relational databases, to transform models, and much more. In the following, however, we focus on introducing the metamodeling language Ecore and discuss its relationship to the well-known metamodeling stack [Küh06]. The heart of EMF is its metamodeling language Ecore, a Java-based implementation of the Essential Meta Object Facility (EMOF) [OMG04] standardized by the Object Management Group (OMG). Using Ecore, developers may specify a metamodel to define the abstract syntax of a new modeling language. This metamodel may then be used to generate modeling editors for creating models, that is, instances of the developed metamodel. The relationship among meta-metamodels, metamodels, and models may best described in terms of the metamodeling stack [Küh06]. The metamodeling stack consists of three layers called M3, M2, and M1 whereas a model in M2 conforms to a model in M3 and a model in M1 conforms to a model in M2. M3: Meta-metamodel. In the most upper layer in the metamodeling stack, namely M3, the meta-metamodeling language is located (cf. Figure 3.9). In the context of EMF, this metametamodeling language is Ecore. The core language elements of Ecore are depicted in the upper area of Figure 3.9 in terms of a UML class diagram. Please note that we do not present all language elements and features in this figure. Instead, we concentrate on those classes and features that are of paramount importance in the current context. Ecore allows to model EClasses, which may contain an arbitrary number of structural features. For structural features, upper and lower multiplicities have to be defined. Additionally, structural features having an upper multiplicity greater than 1, may be defined as ordered. Structural features are divided into two distinct subsets, namely EReferences and EAttributes. Attributes as well as references must have a type. For attributes, primitive data types such as String, Boolean, and Integer are allowed. References refer to classes for defining their types and may additionally be defined as containments. This means that referenced elements are nested inside the container element and, therefore, the deletion of a container element results in cascaded deletions of all directly and indirectly contained elements. It is worth noting that Ecore is recursively specified by Ecore. This means that, for example, EReference is indeed an instance of EClass having the name “EReference”. This class contains, for instance, the structural feature “containment”, which is an instance of EAttribute and more. M2: Metamodel. The meta-metamodeling language may now be used to create metamodels. A metamodel specifies the abstract syntax of a modeling language and is an instance of Ecore, which resides in M3—therefore, a metamodel resides on M2. In Figure 3.9, we provide a small example of such a metamodel in terms of an object diagram. In particular, this metamodel is 68
Metametamodel
ENamedElement name : EString
0..* features
M3
EClass 1..1 type
EStructuralFeature ordered : EBoolean upperBound : EInt lowerBound : EInt
EReference containment : EBoolean
EAttribute
Metamodel
so : EReference type
features
name = "source" ...
M2
state : EClass
trans : EClass
name = "State"
name = "Transition"
ta : EReference
features type
name = "target" ...
features features
na : EAttribute
ev : EAttribute
name = "name" ...
conforms to
name = "event" ...
target
h : Transition event = "hangup"
idle : State
Model source
dialTone : State
name = "Idle"
M1
conforms to
type: EDataType
name = "DialTone"
source
l : Transition event = "lift"
target
hangup
Idle
lift
DialTone
Figure 3.9: Metamodeling with Ecore
a simplified excerpt of the state machine metamodel. A state machine consists of States and Transitions. Therefore, we have two instances of Ecore’s EClass, one for states and one for transitions. Both classes contain an attribute (i.e., an instance of Ecore’s EAttribute): a state has a name and a transition has an event. Transitions further refer to the source state and the target state. Therefore, the metamodel for state machines contains two instances of EReferences, namely source and target. M1: Model. The metamodel in M2 may now be instantiated to specify arbitrarily many state machines on M1. In Figure 3.9, we illustrate a small state machine comprising two states and two 69
Figure 3.10: EMF Compare Architecture [EMC]
transitions between those states. More precisely, the states are instances of the corresponding class State in the metamodel residing in M2. In the upper area of M1 in Figure 3.9, the small state machine model is depicted in terms of an object diagram and in the lower area of M1, the same model is illustrated by the commonly used concrete syntax of state machines for the sake of readability.
3.4.3
EMF Compare
EMF Compare5 [BP08] is a subproject of the Eclipse Modeling Framework Technology project (EMFT) and provides an extensible tool and framework for model comparison and merging. Therefore, we also considered EMF Compare in the discussion of existing work related to the topics of this thesis in Section 2.1.2. EMF Compare supports two-way and three-way model comparison. The model comparison process is divided in a two-phased process: the match phase and the differencing phase (cf. Figure 3.10). In the match phase, the so-called generic match engine aims to identify corresponding model elements among two or three versions of a model by either a UUID-based or heuristicsbased match. Having obtained the correspondences, they are saved into a match model. Based on this match model, the so-called diff builder compares each set of corresponding elements and computes the fine-grained differences at the feature level. The computed differences are saved into a diff model. The resulting diff model may be optionally “refactored” by user-specified implementations of the diff extension interface. The goal of these diff extensions is to allow for improving the structure of the abstract diff model according to some language-specific rewriting rules. In some modeling languages, one change from the user perspective results in several differences from a generic perspective, as for instance one element in the concrete syntax is 5
70
http://www.eclipse.org/emf/compare
represented by several elements in the abstract syntax. The diff extension allows to improve the comprehensibility of the diff model by using language-specific diff extensions, which search for specific difference elements and group them into one difference element accordingly. Additionally, EMF Compare offers user interfaces for visualizing match and difference models, provides extension points for export difference models into reports, and also allows to merge models by applying difference elements from a diff model to the input models. To summarize, EMF Compare is a very flexible and extensible framework that can be used for any tasks related to model comparison. Hence, AMOR heavily makes use of the extensions offered by EMF Compare. In particular, AMOR replaces the match engine provided by EMF Compare with our own implementation and uses only EMF Compare’s diff builder. For incorporating applications of composite operations in the diff model, AMOR exploits the diff extension interface (cf. Chapter 5). Furthermore, EMF Compare’s merger is the basis for AMOR’s model transformation engine that is used to execute composite operations (cf. Chapter 4) and for merging models in AMOR.
3.4.4
Epsilon Comparison Language
The Epsilon Comparison Language6 (ECL) [Kol09, KRP11] is a domain-specific language for developing model comparison rules. ECL is part of the Epsilon project, which is a family of interoperable task-specific languages for working with EMF models. In particular, the Epsilon project provides languages for code generation, model-to-model transformation, model validation, comparison, migration, merging, and refactoring. The aim of ECL is to enable the specification of language-specific comparison algorithms in a rule-based manner. Thereby, ECL can be used to identify pairs of matching elements between two models conforming to the same or even different metamodels. ECL supports inheritance among match rules, recursive calls of rules using the function matches, rule guards, which can be used to restrict the execution of a rule in certain scenarios, as well as lazy rules, which are only manually invoked. Furthermore, ECL allows to specify custom operations, which can be called from several rules. Another very distinguished feature for a domain-specific language is that existing external libraries may be called from ECL rules. Thereby fuzzy string matching frameworks or dictionaries such as WordNet7 can be integrated easily in an ECL rule system. The concrete syntax specification for ECL match rules is provided in Listing 3.1 and an example for a match rule, which uses an external library for string matching, is given in Listing 3.2. In this example, the rule FuzzyTree2Tree matches two instances of the metaclass Tree if their label is similar to a certain degree in terms of the Levenshtein [Lev66] distance, as specified in the operation fuzzyMatch, and if their parents match. To verify whether the parents match, the generic function matches can be called recursively from any rule. In the course of this thesis, we show how ECL is integrated in our versioning framework to allow users to plug in language-specific match rules to improve the generic UUID-based matching (cf. Chapter 5). Furthermore, ECL rules are involved in the conflict detection approach presented in this thesis to reveal merge issues concerning similar model elements as shown in 6 7
http://www.eclipse.org/gmt/epsilon/doc/ecl http://wordnet.princeton.edu
71
Listing 3.1: Concrete Syntax of a Match Rule [KRP11] 1 2 3 4 5 6 7 8 9 10 11
( @lazy ) ? ( @greedy ) ? ( @abstract ) ? r u l e match < l e f t P a r a m e t e r N a m e >: < l e f t P a r a m e t e r T y p e > w i t h < r i g h t P a r a m e t e r N a m e >: < r i g h t P a r a m e t e r T y p e > ( e x t e n d s ( < ruleName > ,)∗ < ruleName > ) ? { ( guard ( : e x p r e s s i o n ) | ( { s t a t e m e n t B l o c k } ) ) ? compare ( : e x p r e s s i o n ) | ( { s t a t e m e n t B l o c k } ) ( do { s t a t e m e n t B l o c k } ) ? }
Listing 3.2: Example of a Match Rule using Fuzzy String Matching [KRP11] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
pre { var s i m m e t r i c s = new N a t i v e ( " o r g . e p s i l o n . e c l . t o o l s . textcomparison . simmetrics . SimMetricsTool " ) ; } rule FuzzyTree2Tree match l : T1 ! T r e e w i t h r : T2 ! T r e e { compare : l . l a b e l . f u z z y M a t c h ( r . l a b e l ) and l . p a r e n t . m a t c h e s ( r . p a r e n t ) and l . c h i l d r e n . matches ( r . c h i l d r e n ) } o p e r a t i o n S t r i n g f u z z y M a t c h ( o t h e r : S t r i n g ) : Boolean { return simmetrics . s i m i l a r i t y ( s e l f , other , " Levenshtein " ) > 0 . 5 ; }
the motivating scenario in Section 3.1.2, as well as concurrent operations that contradictorily modify the signifier of model elements as shown in the motivating scenario in Section 3.1.3. For more information on how ECL is integrated to improve conflict detection, we kindly refer to Chapter 6.
3.5
Adaptable Merge Process of AMOR
In this section, we first introduce the generic merge process of AMOR and, subsequently, we show how this process is extended in this thesis so that it may incorporate language-specific knowledge. In particular, we discuss the adaptable components in this extended merge process and its adaptation points, which may be used for enhancing the quality of the operation and conflict detection. 72
3.5.1
Generic Merging in AMOR
With the generic merge process, AMOR offers generic versioning support for every EMF-based model without requiring users to perform any kind of adaptation (i.e., out of the box). All components in this generic process are designed to be model metamodel agnostic and operate only on the reflective API provided by EMF. The generic merge process is depicted in Figure 3.11. This figure presents a more finegrained view on the same merge process that was introduced in Figure 1.2. Furthermore, we now illustrate explicitly the artifacts that are exchanged between the steps of this process. The input of this merge process are three models: the common original model Vo and two concurrently changed models, Vr1 and Vr2 . Thus, Vr1 is the result of the first modification m1 performed by developer 1 and Vr2 is the result of the second modification m2 performed by developer 2. UUID-based Matching. The first step of the merge process in Figure 3.11 is the UUID-based matching. The goal of this step is to identify the corresponding model elements between Vo and Vr1 , as well as between Vo and Vr2 . As the merge process aims to be generic, no languagespecific correspondence rules are used. Instead, this match algorithm assumes that there are immutable UUIDs attached to each model element, which are used to map unambiguously each model element in Vo to its respective counterpart in Vr1 and Vr2 . The obtained correspondences are saved into two distinct match models. The first match model MVo ,Vr1 represents the correspondences between Vo and Vr1 , and the second match model contains the mappings between Vo and Vr2 . Atomic Operation Detection. The goal of the next step is to identify the atomic operations that have been applied to the common original model Vo in order to obtain the revised models, Vr1 and Vr2 . Therefore, in this step, each pair of corresponding model elements in the match model is compared to each other. In particular, each feature value of both corresponding model elements is checked whether they are equal or not. If they are not equal, a corresponding operation is derived and saved into a so-called diff model. Additionally, for each model element in
m1
Vr1
Diff Model
MVo,Vr1
DVo,Vr1
UUID-based Matching
Vo m2
Match Model
Vr2
Vm
Atomic Operation Conflict Detection
Atomic Operation Detection Match Model
Diff Model
MVo,Vr2
DVo,Vr2
Conflict Resolution
Vm\C Cm1,m2
Conflict Annotation
Vm\C
Conflict Model
Cm1,m2
Conflict-tolerant Merge
Figure 3.11: Generic Merge Process of AMOR
73
the revised model that has no corresponding model element in the original model, an operation element representing the addition is saved to the diff model. Accordingly, an operation element representing a deletion is saved for the opposite case. This step is performed for both match models, MVo ,Vr1 and MVo ,Vr2 , in order to create the two diff models, DVo ,Vr1 and DVo ,Vr2 . Ultimately, these diff models, DVo ,Vr1 and DVo ,Vr2 , contain all operations that have been performed in the course of the modification m1 and m2 , respectively. This concludes the two-phased operation detection for atomic operations in the merge process. These two phases are elaborated in more detail in Chapter 5. Having identified all atomic operations that have been performed concurrently, we may now proceed with identifying conflicts among these operations. Atomic Operation Conflict Detection. The input of the atomic operation conflict detection are two diff models, one for each revised model. These two diff models are now analysed to detect overlapping atomic operations (cf. Section 3.2). To reveal such cases, for each operation contained by one diff model, it is checked whether an operation exists in the opposite diff model that is parallel dependent or non-commutative. Finally, each detected conflict is saved to a conflict model (cf. Cm1 ,m2 in Figure 3.11). A detailed discussion of the atomic operation conflict detection is provided in Chapter 6.1. Conflict-tolerant Merge. As argued in [BLS+ 10b] and further elaborated in [Wie11], resolving conflicts directly in a preliminarily merged model is easier and more natural than resolving conflicts by choosing one of the conflicting operations that should be applied from a list of conflicting operations. Therefore, the conflict-tolerant merge produces a model, called Vm\C , to which all operations of both developers are applied that are not in conflict with another operation. Conflict Annotation. In the next step, the preliminarily merged model Vm\C is annotated with all conflicts in Cm1 ,m2 that need to be resolved. For annotating models independently of their metamodel, we introduced a novel mechanism called EMF Profiles in [LWWC11], which ports the light-weight language extension mechanism known from UML Profiles [FFVM04] to domain-specific models in EMF. For annotating conflicts, we developed a dedicated conflict profile [BKL+ 11b], which is used to indicate merge conflicts directly in the merged model Vm\C . The annotated merged model, referred to as Vm\C Cm1 ,m2 in Figure 3.11, is handed over to the next step. Conflict Resolution. Having annotated all previously detected conflicts in the merged model, the user may resolve the conflicts directly in the model. In [BKS+ 10] and [BSW+ 09], we presented more automated or supportive ways for users to resolve conflicts by introducing conflict resolution recommendations and collaborative, synchronous modeling tools for conflict resolution, respectively. The topic of conflict resolution is also further elaborated in [Bro11]. After all conflicts have been resolved, the new merged model Vm is saved in the common repository.
74
The presented merge process provides reasonable versioning support that is comparable to the quality of state of the art such as [KHWH10]. However, the generic process is not able to handle correctly the model versioning scenarios presented in Section 3.1.
3.5.2
Adaptation Points of the Merge Process
Having introduced the generic merge process, we show how this process is extended to allow for its adaptation with respect to language-specific knowledge. The extended adaptable merge process depicted in Figure 3.12 aims at correctly handling the challenging model versioning examples presented in Section 3.1. In the following, we discuss the reasons behind the new steps in the merge process and provide a brief overview of the functionality of the introduced adaptable steps. Accuracy of the atomic operation detection. The accuracy of the atomic operation detection is crucial for all succeeding tasks in the merge process. In this context, the accuracy can be specified in terms of precision and recall as defined by Olson and Delen in [OD08]. These terms, precision and recall, originally stem from the area of information retrieval and denote the completeness of pattern recognition algorithms. If the operation detection lacks in precision, a succeeding conflict detection phase might raise incorrect conflicts. The main reason for a lack of precision in the operation detection when using state-based model differencing lies in a lack of precision of the model matching phase. Consider for instance, developer 1 modifies the name of a model element and developer 2 adds a new containment to the same model element. If the model matching component is not capable of matching the model element in the original model with the corresponding model element in the revised model of developer 1 because of the different name, a deletion of that model element is reported as well as an addition of another (actually the same) model element having the new name. Consequently, a delete-update conflict is reported because developer 2 added a new containment to the model element that has been incorrectly considered as removed. If the operation detection provides a low recall (i.e., some applied operations have not been detected), the succeeding conflict detection might also miss detecting some important conflicts. To summarize, high precision and recall of the model matching and the operation detection is an essential prerequisite for high-quality conflict detection. Perhaps the most accurate way of obtaining the applied operations among model versions with model differencing algorithms is to use UUIDs. UUID-based matching, however, completely neglects the contents (i.e., its properties, references, and containments) of a model element. However, in some scenarios, the content is an important source of information for obtaining the precise operations. For instance, if a model element has been deleted and a new model element having similar properties as the deleted one has been added again (e.g., cut and pasted elements), UUID-based approaches are not able to establish correct correspondences. The same is true for equal or at least similar model elements that have been added concurrently by different developers as is the case in the model versioning scenarios presented in Section 3.1.1 and in Section 3.1.2. Therefore, we introduce a new matching step after the UUID-based matching, named rule-based matching, with the goal of improving the match models (MVo ,Vr1 [UUID] and MVo ,Vr2 [UUID]) obtained from UUID-based matching. This improvement is achieved by 75
m1 Vo m2
Vr1
Vr2
UUID-based Matching
MVo,Vr1
Match Model
[UUID]
MVo,Vr1
Match Model
[improved]
DVo,Vr1
Diff Model
[atomic]
DVo,Vr1
Diff Model
[composite]
Diff Model
[composite]
DVo,Vr2
Diff Model
Composite Operation Detection
Match Model
[atomic]
DVo,Vr2
Atomic Operation Detection
MVo,Vr2
Rule-based Matching Match Model [improved]
Conflict Model
Cm1,m2 [composite]
Conflict Annotation
Vm\C Cm1,m2
Conflict Resolution
Composite Operation Conflict Detection
Operation Specifications
[UUID]
Cm1,m2
Conflict Model
[inconsistencies]
Signifier Warning Detection
MVo,Vr2
Match Rules
[signifier warnings]
Conflict Model
Cm1,m2
Inconsistency Detection
Validation Rules
Conflict-tolerant Merge
Vm\C
Vm\C
Figure 3.12: Adaptable Merge Process of AMOR
Atomic Operation Conflict Detection
Conflict Model
[atomic]
Cm1,m2
Vm
76
using content-based heuristics to find corresponding model elements that could not be matched using UUIDs. Consequently, we aim at combining the advantages of UUID-based and contentbased matching methods. The specific properties of a model element that should be used for matching two model elements, however, are specific to the modeling language. Therefore, we introduce an adaptation point that allows users to specify match rules for a certain modeling language, which are interpreted by the rule-based matching component to improve the match model. The improved matches are incorporated in the match models (MVo ,Vr1 [improved] and MVo ,Vr2 [improved] in Figure 3.12) and are handed over to the next step. For more information on the adaptable rule-based matching, we kindly refer to Chapter 5. Composite operations. The next extension to the generic merge process concerns composite operations. As illustrated in the versioning scenarios in Section 3.1.4 and 3.1.5, the knowledge on applications of composite operations between two versions of a model significantly helps in many scenarios to better respect the original intention of a developer, as well as to reveal additional issues when merging two concurrent modifications. The prerequisites for considering applications of composite operations is to detect them. When using state-based model differencing algorithms, this is a challenging task because only two succeeding versions of a model are available. To address this challenge, we introduce the new step, composite operation detection, immediately after the step for detecting atomic operations in the extended merge process, as depicted in Figure 3.12. This step takes two diff models, DVo ,Vr1 [atomic] and DVo ,Vr2 [atomic], which contain a set of applied atomic operations, as input. These two diff models are analyzed to find occurrences of specific diff patterns within them. Having found such a diff pattern, the pre- and postconditions of the respective composite operation are evaluated; if these conditions hold for a certain pattern, an application of the respective composite operation is detected and saved to the input diff models. The detection of user-specified composite operations among atomic operations in a diff model is presented in Section 5.3. The diff models, enriched with the obtained information on applied composite operations (DVo ,Vr1 [composite] and DVo ,Vr2 [composite] in Figure 3.12) are handed over to the next step. For detecting conflicts caused by violated preconditions or issues concerning the original intention behind the composite operation, we installed the new step called composite operation conflict detection in the merge process after the step for detecting atomic operation conflicts. This step is based on the previously detected applications of composite operations and checks for each application whether concurrent operations affect the validity of the preconditions and whether more model elements match the preconditions after the concurrent operations than before. The former is done to detect composite operation conflicts and the latter allows to detect composite operation match warnings (cf. Section 3.2). If such a merge issue is detected, a corresponding conflict or warning description is added to the input conflict model Cm1 ,m2 [atomic]. Composite operations are inherently specific to a certain modeling language. Therefore, the composite operation detection and the composite operation conflict detection are designed to be adaptable for new modeling languages by allowing users to add new operation specifications. For creating such operation specifications, we introduce a novel approach called model transformation by demonstration in Chapter 4. According to our principle, don’t repeat your77
self, such specifications contain the information necessary to detect their applications, which is presented in Chapter 5, as well as the information that is needed to detect composite operation conflicts and composite operation match warnings. The detection of such conflicts and warnings is presented in Chapter 6.2. Signifiers. The importance of considering signifiers of model elements is illustrated in the versioning scenarios presented in Section 3.1.2 and 3.1.3. For addressing such issues, we introduce the step called signifier warning detection in the merge process depicted in Figure 3.12. This component searches for added or changed model elements in both modifications, m1 and m2 , that unexpectedly have matching signifiers, as well as for concurrent operations that both change the signifier of the same model element in a contradicting manner; in other words, this component aims to detect unexpected signifier matches and concurrent signifier changes (cf. Section 3.2). If such issues are detected, the input conflict model Cm1 ,m2 [composite] is extended by additional warning descriptions. The resulting conflict model Cm1 ,m2 [signifier warning] is handed over to the next step. Thereby, scenarios like those presented in Section 3.1.2 and 3.1.3 can be detected to avoid unfavorable redundancies and unintended obfuscations of existing model elements. Which properties of a model element’s metaclass have to be combined in order to obtain the signifier cannot be derived generically from a metamodel. Therefore, this component is adaptable to allow users to specify the signifier specifications according to their own needs. For this specification, we reuse the technology used for adapting the model matching phase; that is, language-specific match rules. For more information on detecting merge issues in the context of signifiers, we kindly refer to Chapter 6.3. Inconsistencies. Finally, we introduce a new step addressing the consistency of the resulting merged model in the adaptable merge process depicted in Figure 3.12. This step, called inconsistency detection, is situated after the conflict-tolerant merge and validates the preliminary merged version Vm\C against language-specific validation rules to reveal inconsistencies that are inadvertently introduced by the merge. An example for such an inconsistency is shown in the versioning scenario in Section 3.1.6. Such consistency rules are specific to the modeling language and are usually specified along side the metamodel by the language designer. Thus, we reuse the consistency rules coming from the language definition and apply the EMF Validation framework8 for validating the merged model. This framework supports validation rules specified in the Object Constraint Language (OCL) [OMG10], as well as rules programmed in Java. If inconsistencies in the merged model Vm\C are found, these are added to the input conflict model Cm1 ,m2 and passed on to the next step in the process. For more information on detecting model inconsistencies, we kindly refer to Chapter 6.4. The remaining steps of the adaptable merge process are the same as in the generic merge process. In these remaining steps, the preliminarily merged model Vm\C is annotated with all detected conflicts in Cm1 ,m2 . Finally, the annotated model is passed to the user in order to resolve the raised conflicts and review the annotated warnings. Eventually, the resulting model Vm is finally saved to the repository. 8
78
http://www.eclipse.org/modeling/emf/?project=validation
CHAPTER
Model Transformation By Demonstration Predefined composite operations are helpful for efficient modeling, in particular, for automatically executing recurring refactorings, applying model completions, and introducing patterns to existing models. Moreover, the availability of explicit specifications of composite operations (comprising pre- and postconditions as well as the atomic operations to be applied) is the prerequisite for adequately considering applications of such operations in the merge process. Composite operations are tailored specifically for a certain modeling language. As domainspecific modeling is becoming more important, a plethora of different modeling languages exist. Consequently, it is infeasible to predefine all relevant composite operations for all modeling languages being used in practice. Therefore, users of a certain modeling language themselves should be enabled to specify such composite operations on their own so that these specifications can be used for automatically executing the specified composite operations, but also for adapting a model versioning system as outlined in Chapter 3. Composite operations are, in more general terms, endogenous model transformations [MG06] (cf. Section 2.3). Thus, an approach is needed that allows users to develop easily such endogenous model transformations to represent composite operations. For specifying model transformations, several dedicated languages (cf. [CH06] for an overview) have been developed in the last decade. Most of them are based on the abstract syntax as defined in the metamodel makes it difficult for common users of modeling languages to specify model transformations, because they are usually unfamiliar with the abstract syntax as they mainly work with the concrete syntax of the modeling languages (i.e., their notation) and not with its metamodel [SW08, Var06]. This is aggravated by the fact that metamodels may become very large. For instance, the UML 2 metamodel [OMG03] has about 260 metamodel classes [MSZJ04]. Moreover, some language concepts, which have a particular representation in the concrete syntax, are not even explicitly represented in the metamodel. Instead, these concepts are hidden in the metamodel and may only be derived by using specific combinations of attribute values and links among model elements [KKK+ 07]. 79
4
To address this problem, we introduce a novel approach for specifying endogenous model transformation more easily using the concrete syntax. The increased ease of use is achieved by applying an approach called model transformation by demonstration (MTBD). In MTBD, users apply or “demonstrate” the transformation to an example model once and, from this demonstration as well as from the provided example model, the generic model transformation is semiautomatically derived. Please note that at the time when we published our approach for specifying composite operations by demonstration in [Lan09, BLSW09, BLS+ 09], a very similar approach by Sun et al. [SWG09] emerged1 . Thereby, Sun et al. introduced the notably suitable term model transformation by demonstration for such demonstration-based specification approaches. Thus, in the remainder of this thesis, adopt this term. For model versioning purposes, endogenous model transformations are of major importance. Therefore, we focus on specifying endogenous model transformations in Section 4.1. However, in Section 4.2, we also show how the idea behind MTBD can be extended to also enable the specification of exogenous model transformations. Finally, in Section 4.3, we discuss current limitations of our MTBD approach for endogenous as well as for exogenous model transformations and highlight some potential research directions to be addressed in the future.
4.1
Endogenous Model Transformation By Demonstration
Our MTBD approach for specifying endogenous model transformation, called Eclipse Modeling Operations2 (EMO), is designed according to the principles of AMOR (cf. Section 3.3). More precisely, EMO aims at enabling users who are not trained in model transformation languages and who are unfamiliar with the modeling language’s metamodel, to specify endogenous model transformations, called composite operations hereafter, without posing any restrictions regarding the modeling language and modeling editor. In the following, we first introduce an exemplary composite operation in Section 4.1.1 serving as a running example for the remainder of this section. Subsequently, we give an overview of the basic idea behind EMO in Section 4.1.2 and present the specification process in more detail in Section 4.1.3 by means of solving the running example. In Section 4.1.4, we examine the concept of templates and their bindings to model elements and in Section 4.1.5, we show how developed composite operations are executed to arbitrary models. In Section 4.1.6, we introduce advanced features of our approach for also addressing more complex composite operations. Finally, we discuss the related work in the area of MTBD in Section 4.1.7 and point to some possible directions for future work for endogenous as well as for exogenous model transformations by demonstration in Section 4.3. Please note that we present an evaluation of our MTBD approach for endogenous model transformations in order to assess its usefulness and ease of use by conducting an empirical case study with 57 users in Section 7.1. 1 2
80
We provide a detailed comparison of our approach and the approach by Sun et al. in Section 4.1.7. http://www.modelversioning.org/emf-modeling-operations
StateMachine name : String states
CompositeState
*
* states
SingleState name : String
PseudoState
source 1 1 target
transitions * * incoming
Transition name : String
initial : Boolean
Figure 4.1: Metamodel for State Machine
Active
dial [incomplete] DialTone
dial
hangup lift hangup
hangup hangup
dial [invalid]
hangup
Idle
hangup
dial [incomplete]
Dialing lift
dial [valid]
Dialing
dial [invalid]
Invalid hangup
hangup
dial
dial [valid]
Idle
Invalid
Busy
DialTone
callee hangs up
Connecting busy
Connecting
connected
busy
Busy
connected
callee hangs up Talking
Talking
answered Ringing
(a) Initial Phone State Machine
answered
Ringing
(b) Refactored Phone State Machine
Figure 4.2: Refactoring Introduce Composite State [SPLTJ01]
4.1.1
Running Example
For illustrating the functionality of EMO, we make use of a refactoring for UML state machines. Therefore, we first introduce the metamodel of the simplified state machine modeling language in Figure 4.1. This metamodel contains the class StateMachine acting as a container for arbitrarily many instances of SingleState through the containment reference states. Such instances of SingleState may further contain instances of Transition through the reference transitions. A transition refers to its connected states through the references source, which is actually the opposite reference of transitions and target. The reference target also has an opposite reference, which is named incoming. Thereby, states “know” their incoming transitions. Besides usual states, the metamodel also contains the class PseudoState, for expressing initial and end states, as well as the class CompositeState grouping arbitrarily many other states. The refactoring serving as a running example is called Introduce Composite State. We illustrate this refactoring by applying it to a concrete example that represents the states and transitions of a phone. The initial state machine and the refactored state machine are depicted in Figure 4.2. Please note that this refactoring as well as the example is taken from Sunyé et al. [SPLTJ01]. The initial phone state machine shown in Figure 4.2a contains several states such as Idle, DialTone, and Dialing. Please note that whenever a hangup event occurs, the phone switches 81
back to state Idle. The multitude of similar transitions, which are pointing to the state Idle and which are triggered by the same event, suggests the application of the refactoring Introduce Composite State. This refactoring introduces a composite state and folds all hangup transitions into one single transition as depicted in Figure 4.2b. More precisely, the refactoring consists of the following atomic operations: 1. A composite state named Active is created. 2. All states having the outgoing transition hangup are moved into the new composite state Active. 3. The outgoing hangup transitions of these states are folded into one single transition, which is outgoing from the composite state Active. 4. The target of the transition lift is changed to the state Active. 5. A new initial state having a transition to DialTone is created in Active. Although the specification of such a refactoring is possible by using general-purpose programming languages, this task would require programming skills and deep knowledge of the underlying modeling framework and the modeling language’s metamodel. When developing the Introduce Composite State refactoring in Java, the solution comprises nearly 100 lines of code for implementing only the pure refactoring logic, not counting an implementation of the refactoring’s preconditions and the code necessary for realizing a user interface for applying it. Another alternative to specify such composite operations is to use dedicated model transformation languages. This enables the development of composite operations, for instance, by developing declarative transformation rules, which is more concise in comparison to an implementation using general-purpose programming languages. However, as already stressed, besides requiring experiences in using such model transformation technologies, current approaches force users to specify the transformation rules using the abstract syntax of the modeling language, which might quickly become challenging and complex for untrained users. Furthermore, model transformation approaches are rarely integrated in current modeling environments. Thus, tool adapters are required to enable calling the transformation from within the modeling environment, which again requires dedicated knowledge for implementing such adapters. Modelers, as the potential users of the our approach, are familiar with the notation, semantics, and pragmatics of the modeling languages they use in daily activities. They are, however, not experts in programming languages, transformation techniques, or APIs. Therefore, a novel approach is required to enable the specification of composite operations without posing these prerequisites.
4.1.2 EMO at a Glance Composite operations may be described by a set of atomic operations, namely, create, update, delete, and move, which are applied to a model that adheres to certain preconditions [ZLG05]. A straightforward way to obtain these atomic operations from a user demonstration is to record each user interaction within the modeling environment as proposed for programming languages in [RL08]. However, this would demand an intervention in the modeling environment, and due to the multitude of modeling environments, we refrain from this possibility according to the 82
Modeling
1 Create initial model
Revised model 5
Configuration & Generation
2 Copy initial model
Initial model
Imply conditions Conditions [implied]
6
4
Working model 3
Perform updates
State-based comparison
Diff model
Match models
Edit conditions Conditions [revised]
7 Generate Operation Specification
Operation Specification
Match model Legend automatic manual
Figure 4.3: Process of Endogenous Model Transformation By Demonstration
design principles of AMOR (cf. Section 3.3). Instead, we apply a state-based model comparison to determine the demonstrated atomic operations. This allows the use of any editor without depending on editor-specific operation recording. To overcome the imprecision of heuristic state-based model comparison approaches, a unique ID is automatically assigned to each model element before the user demonstrates the atomic operations. Moreover, EMO is designed in such a way to be independent from any specific modeling language, as long as it is based on Ecore or the metamodel may be mapped to a corresponding metamodel expressed in Ecore. Therefore, we propose a two-phase specification process as shown in Figure 4.3. In the following, we discuss this two-phase specification process step by step. Phase 1: Modeling. In a first step, the user creates the initial model in a familiar modeling environment. This initial model contains all model elements that are required in order to apply the composite operation. In a second step, each element of the initial model is annotated automatically with an ID, and a so-called working model (i.e., a copy of the initial model for demonstrating the composite operation by applying its atomic operations) is created. In the third step, the user performs the complete composite operation on the working model, again in a familiar modeling environment by applying all necessary atomic operations. The output of this step is the revised model, which is together with the initial model the input for the second phase of the operation specification process. Phase 2: Configuration & Generation. Due to the unique IDs, which preserve the relationship among model elements in the initial model and their corresponding model elements in the revised model, the atomic operations of the composite operation may be obtained precisely in step 4 by using a state-based model comparison. The results are saved in the diff model. Sub83
sequently, an initial version of pre- and postconditions of the composite operation is inferred in step 5 by analyzing the initial model and the revised model, respectively. The automatically generated conditions from the example might not always entirely express the intended pre- and postconditions of the composite operation. They only act as a basis for accelerating the operation specification process and may be refined by the user in step 6. In particular, parts of the conditions may be activated, deactivated, or modified within a dedicated environment. If needed, additional conditions may be added. After the configuration of the conditions, the operation specification is generated in step 7, which is a model-based representation of the composite operation consisting of the diff model and the revised pre- and postconditions, as well as the initial and revised example model. Thus, this model contains all necessary information for its further usage such as applying the operation to arbitrary models (cf. Section 4.1.5).
4.1.3 EMO in Action In the previous section, we illustrated the operation specification process from a generic point of view. In the following, we show how the refactoring Introduce Composite State from Section 4.1.1 is specified using EMO from the users’ point of view. Users are supported during the specification process by EMO’s user interface of which some extracts are depicted in Figure 4.5. To view the complete user interface for developing composite operations, we kindly refer to the EMO project website3 containing several screencasts and further information regarding the implementation.
(a) Initial Model
(b) Revised Model
Figure 4.4: Example Models for Specifying Introduce Composite State
Step 1: Create initial model. The user starts with modeling the initial example model. For this task, the user may use any editor, such as GMF4 -based graphical editors, EMF’s tree-based editor, or even a text editor for directly modifying the model’s XMI serialization, as EMO is independent of editor-specific operation tracking and solely relies on state-based model comparison. In this step, every model element has to be introduced that is necessary and essential 3 4
84
http://www.modelversioning.org/emf-modeling-operations http://www.eclipse.org/modeling/gmf
(a) Differences
(b) Derived Preconditions
(c) Edit Preconditions
Figure 4.5: Screenshots of the User Interface of EMO
85
to demonstrate the composite operation. It is not necessary to create every state of the diagram shown in Figure 4.2a. Therefore, in the initial model, only those states are created that are essentially required for the refactoring and that will be modified differently later. Ultimately, the initial model consists of three states (cf. Section 4.4a). First, the initial model contains the state Idle, which will remain outside the composite state introduced in the course of the refactoring. Second, it comprises the state DialTone, which will be moved to the newly added composite state acting as first state, and, finally, the state Dialing, which will only be moved to the composite state losing its transition to Idle. There is no need to model, for instance, the state Connecting shown in Figure 4.2a as it is equally modified as Dialing. For these equally handled states, EMO provides techniques to define iterations in the configuration phase that we discuss later. Step 2: Copy initial model. When the user confirms the initial model, the automatic copy process is initiated, which first adds a unique ID to every model element of the initial model before the working copy is finally created. Step 3: Perform updates. After the ID-annotated working copy is created, it is opened in the user-selected editor ready to be modified for demonstrating the composite operation. The user applies each operation of the composite operation to this copy. In our example, the user has to add a composite state named Active, move the single states DialTone and Dialing into it, introduce a new initial state in Active, connect it with DialTone and change or remove the other transitions. The final revised model is depicted in Figure 4.4b. Step 4: State-based comparison. In this step, the state-based model comparison between the initial model and the revised model is executed to identify automatically the previously demonstrated atomic operations. Internally, the comparison is realized by an extension of EMF Compare. Actually, the same model comparison component is used for this task that is also applied for obtaining atomic operations for model versioning purposes as presented in Chapter 5. When the comparison is completed, the detected atomic operations are saved in terms of a diff model, which is depicted in Figure 4.5a. For a precise specification of the composite operation, it is important that the user performs only those operations that directly represent the composite operation. Step 5: Imply conditions. Next, EMO automatically derives the preconditions from the initial model and the postconditions from the revised model. The generation process works similarly for the pre- as well as for the postconditions: for each model element in the respective model, a so-called template is created. A template describes the role a model element plays in the specific composite operation. For each template, conditions are generated, which describe the required characteristics of a model element to be a valid match for a template. Thereby, for each feature value of the model element in the example model, an according condition is generated. For our example refactoring, this generation process creates the preconditions depicted in Figure 4.5b. In particular, this figure shows the template StateMachine_0 representing the root container of the initial model. Furthermore, it contains templates representing the three states Idle, DialTone, and Dialing and their respective preconditions. These templates have a symbolic name (e.g., 86
SingleState_1), and are arranged in a tree hierarchy to indicate their containment relationships,
which reflects the containment hierarchy of the corresponding example model. For expressing the condition bodies, OCL is used. However, we extended OCL in order to refer from within a condition body to other templates in order to express generically a reference to other model elements or their feature values; therefore, a dedicated syntax is introduced. For instance, the expression incoming->includes(#{Transition_1}) in the template SingleState_0 indicates that its feature incoming must include a model element that fulfills the conditions of the template Transition_1. The scope of a template is either the initial model or the revised model. Nevertheless, it is still possible to access the template of the opposite model in the conditions using the prefixes initial: and revised: in template names, respectively. We discuss templates, conditions, and how they are evaluated in more detail in Section 4.1.4. Step 6: Edit conditions. The automatically generated conditions might not always perfectly reflect the intended pre- and postconditions of the composite operation. They only act as a seed for accelerating the operation specification process and may be refined manually in this step. EMO allows to adapt the generated conditions in three different ways. First, the user may relax or enforce conditions. This is simply done by activating or deactivating the check boxes beside the respective templates or conditions. If a template is relaxed, all contained conditions are deactivated. By default, conditions constraining String or Boolean features and null-values are deactivated (cf. Figure 4.5b), because in our experience, they are not relevant in most of the cases. Due to this default configuration, we do not have to relax any further conditions in order to reflect the true conditions of the refactoring. However, we have to enforce and modify one condition as discussed later. Second, the user may modify conditions by directly editing them. For our example, it is necessary to specify that a state, which is moved into the composite state must contain an outgoing transition having the same name as the transition to be folded (in our example hangup). The condition ensuring that every state that is moved to the composite state must have a transition has already been generated: SingleState_2.outgoing->includes(#{Transition_3}). However, the condition restricting the transition’s names to be equal has to be reactivated and modified. In particular, we have to change a condition in the template Transition_2, which is contained by the template SingleState_2 (representing the state Dialing). For this transition template, we modify the condition constraining the name feature as depicted in Figure 4.5c. This condition ensures that the transition must have the same name as the transition (represented by template Transition_1) that will be moved to the composite state and, therefore, acts as outgoing transition for all states (represented by the template SingleState_2) that are moved into the composite state. As depicted in the screenshot in Figure 4.5c, the user is assisted when modifying conditions by immediately checking the condition against the initial model or revised model, if the edited condition is a postcondition, to indicate the correctness of the condition. Thereby, the user gets immediate feedback whether the condition is syntactically correct, but also whether at least the example model fulfills the modified condition. If the condition is not fulfilled by the example model, it is very likely that the user made a mistake and specified a semantically incorrect condition. Furthermore, users are assisted while editing conditions by context-sensitive code completion. 87
Finally, users may adapt the composite operation specification by augmentation. Thereby, users may introduce custom conditions, define iterations, and annotate necessary user input for setting parameters of the composite operations. In our example, the user has to introduce one iteration for the template SingleState_2. This iteration specifies that all atomic operations that have been applied to the model element represented by this template have to be repeated for all its matching model elements when applied to an arbitrary model. The reason why we attach iterations to templates and not directly to the to-be-repeated operations is that we feel that attaching them to templates is more in tune with the general idea of the by-example concept; users are more familiar with the example models they provide than the automatically derived atomic operations. The impact of iterations on the execution of composite operations is elaborated in more detail in Section 4.1.5. Besides the iteration, the user also has to introduce a user input annotation for the name feature of the template CompositeState_0 to indicate a value, which has to be set by the user of the refactoring. Obviously, iterations may only be specified for templates from the initial model and user input for features of templates from the revised model. In the course of applying a composite operation, certain values in the revised model often have to be computed from specific values in the initial model. Therefore, users may modify or add postconditions. Thus, postconditions are not only used to ensure the correct revised model, they may also yield value computations. Although not necessary for our running example, consider for instance the composite operation called Encapsulate Field [FBB+ 99] for UML class diagrams. This composite operation generates one method for getting and one method for setting the value of a public attribute and, finally, turns the visibility of the attribute to private. For this composite operation, the method name, the return type of the getter method, and the parameter name and type of the setter method have to be computed from the source model. For example, the postcondition self.name = ’get’ + #{Attribute}.name.firstToUpper() can be used to compute the correct name of the getter method for an attribute. Step 7: Generate Operation Specification. After the user finished editing the conditions and augmenting the operation specification, the Operation Specification is generated. This model-based representation of the composite operation contains all necessary information for its further usage, such as applying the operation to arbitrary models (cf. Section 4.1.5), detecting applications of the operation a posteriori (cf. Chapter 5), and revealing conflicts coming from the preconditions of the composite operation (cf. Chapter 6). Operation specifications conform to the metamodel depicted in Figure 4.6. The class CompositeOperationSpecification contains general information, such as the composite operation’s name, the modeling language, for which it can be used, as well as the initial and revised model, the pre- and postconditions, the iterations, and the DiffModel comprising the atomic operations. In particular, the initial and the revised model are kept in the attributes initialModel and revisedModel of the class CompositeOperationSpecification and the pre- and postconditions are each saved in terms of instances of ConditionModel via the references preconditions and postconditions, respectively. Each condition model contains one root Template representing the root model element of the initial or revised model. This root template contains a number of sub-templates, which may have subtemplates corresponding to the containment hierarchy of the model elements in the initial or revised model. The specific model element that is represented by the respective template is ref88
preconditions
CompositeOperationSpecification name : String titleTemplate : String version : String modelingLanguage : String refactoring : Boolean initialModel : EObject revisedModel : EObject
ConditionModel 1 1 createdAt : String postconditions
Template
template name : String title : String * /hasIteration : Boolean *
inputs
UserInput
*
name : String feature : EStr.F.
*
subTemplates *
templates
1
specification
active : Boolean oclExpression : String local : Boolean
Iteration
parent Feature
representative
Condition
iterations
*
root Template
1
EObject (from Ecore)
1
EStructuralFeature (from Ecore)
1 1
differenceModel
*
DiffModel
DiffElement
(from EMF Compare)
(from EMFCompare)
FeatureCondition
CustomCondition
diffElements
feature
* differenceElements
Figure 4.6: Operation Specification Metamodel
preconditions : ConditionModel
: CompositeOperationSpecification name = "Introduce Composite State" titleTemplate = "#{SimpleState_1}.name …" modelingLanguage = “http://statemachine/” refactoring = true initialModel = initial_stateMachine revisedModel = revised_stateMachine
initial_templ: Template name = "StateMachine_0" title = "StateMachine" …
… representative
initial_stateMachine : StateMachine …
singleState2 : Template representative
iteration1 : Iteration
dialing : SimpleState dialTome : SimpleState dialing : SingleState name = “Dialing” name = “Dialing” name = "Dialing" isFinal = true isFinal = true … isInitial = false
name = "SingleState_2" title = "Dialing : SingleState" hasIteration = true
template
specifications
… : Template … : Template
… : Template
… … …
parentFeature
: DiffModel (from EMF Compare)
: FeatureCondition : FeatureCondition
: FeatureCondition
: EAttribute : DiffElement : DiffElement : DiffElement (from EMF Compare) (from EMF Compare) (from EMF Compare) “Dialing has been moved” “Dialing has been moved” “Dialing has been moved”
(from StateMachineMM) name = “name” eType = EString …
feature
oclExpression = " = ‘Dialing’ " active = false local = true
: EReference (from StateMachineMM)
name = “states” eType = State …
Figure 4.7: Excerpt of the Operation Specification for the Running Example
erenced through the reference representative. Furthermore, instances of Template are specified by a list of custom conditions and feature conditions. Instances of FeatureCondition constrain the value of a specific feature and are generated automatically in step 5. We will discuss the concepts behind templates and how they are bound to model elements in more detail in Section 4.1.4. Figure 4.7 illustrates an excerpt of the object diagram representing the operation specification for the previously described example refactoring. In particular, this figure highlights some 89
objects, such as the introduced iteration, the template hierarchy and its references to the concrete model elements, as well as an instance of a FeatureCondition for the feature name of template SingleState_2. All of these components have their counterpart in the user interface to be modified easily by the user.
4.1.4
Condition Models, Templates, and Template Bindings
Before we show how composite operation specifications can be applied to arbitrary models in the next section, we first discuss condition models and templates, and how they are matched with models to obtain valid template bindings. Condition Models. As depicted in the metamodel of operation specifications in Figure 4.6, an OperationSpecification holds one ConditionModel for the operation’s preconditions and one for the operation’s postconditions. Such a condition model contains a set of templates by which it generically describes the characteristics a model should satisfy. A condition model as a whole successfully matches with a model part if each of its templates has a matching model element within a model. Templates. As already mentioned, the purpose of templates is to describe the required characteristics a model element must have in order to be a valid match, and which relationships to other model elements within the described model must exist. These required characteristics and relationships are defined in terms of conditions contained by the respective template. As each template is generated from an existing model element in the example model in step 5 of the operation specification process, each template preserves the relationship to the original model element it has been generated from, through the reference representative (cf. Figure 4.6). According to the containment hierarchy of the example model, templates are organized in a tree structure having one root template (representing the example model’s root model element), which has sub-templates (representing the root element’s children), which may have sub-templates. For explicating this containment structure, each template, except for the root template, refers to the structural feature of the respective modeling language’s metamodel, through which the represented model element is contained. For example, instances of the metaclass SingleState are contained by instances of StateMachine through the structural feature states (cf. state machine metamodel in Figure 4.1). Consequently, the template SingleState_2, which indeed represents an instance of such a SingleState, refers to this containment reference by the reference parentFeature (cf. object diagram in Figure 4.7). Thereby, templates that contain sub templates have further implicit conditions regarding their containments, besides their explicitly contained conditions. For instance, the template StateMachine_0, has, among others, the implicit containment condition self.states->includes(#{SingleState_2}) coming from its contained template SingleState_2. Conditions. Templates are further defined by a set of explicitly contained conditions that must be fulfilled by a matching model element. As already mentioned, conditions are expressed using OCL expressions, which are saved in the condition’s attribute oclExpression (cf. Figure 4.6). 90
Thus, the full expressive power of OCL may be used for constraining matching model elements. Condition models may contain two types of conditions: instances of FeatureCondition, which constrain the value of a certain feature indicated by the reference feature in the metamodel in Figure 4.6, and instances of CustomCondition, which are not explicitly tied to a specific feature. As for each feature of a template’s represented model element, a dedicated condition is generated. Only instances of FeatureCondition are automatically created in step 5 of the operation specification process. The explicit link to the feature that is constrained by an instance of FeatureCondition allows for easier processing and reasoning. For instance, a FeatureCondition for the feature name having the oclExpression = “Dialing” is rewritten to the OCL expression self.name = “Dialing”, whereas self is bound to the model element to be evaluated. If this condition is not fulfilled by a model element, we easily may conclude, without having to analyze the contents of the OCL expression in detail, that the model element’s value at the name feature causes the condition to fail. This is not as easily possible for the equivalent CustomCondition, which would have the OCL expression name = “Dialing”, because we would have to interpret this OCL expression in detail to find out the specific feature value that causes the condition to be invalid. As mentioned above, we extended OCL to allow for referring to other templates and its values. Therefore, we introduced a dedicated syntax: by using #{} in a condition, users may refer to the model elements that are bound to the referenced template. For evaluating OCL expressions that contain such template references, occurrences of these references are replaced with expressions navigating to the model element that is currently bound to the referenced template. For instance, the OCL expression = #{Transition_1}.name in a FeatureCondition constraining the name feature is replaced with the following OCL expression5 , whereas 1 is the index of the state and 0 is the index of the transition that is currently bound to the template Transition_1: s e l f . name = s e l f . e C o n t a i n e r ( ) . e C o n t a i n e r ( ) . s t a t e s . s e l e c t ( 1 ) . t r a n s i t i o n s . s e l e c t ( 0 ) . name
Such replacements are computed by first finding the closest common parent container of both currently bound model elements in the model and then deriving the direct navigation from the source model element to the target model element. To enable a more efficient processing and reasoning, conditions additionally save whether they refer to other templates or whether they are local; that is, no reference to other templates are involved (cf. attribute local in the metamodel depicted in Figure 4.6). Template Bindings. When matching model elements to condition models, the mappings between templates and their matching model elements are described by so-called bindings. These bindings are realized by a weaving model conforming to the metamodel depicted in Figure 4.8. As a condition model contains arbitrarily many templates, a ConditionModelBinding contains for each of a condition model’s template exactly one TemplateBinding. Such a TemplateBinding connects one template with one model element that is bound to the respective template. In other words, one instance of a ConditionModelBinding constitutes an intrinsically valid set of 5
Please note that we omitted required type castings (oclAsType()) and collection castings (asSequence()) in the OCL expression for the sake of readability.
91
ConditionModel BindingCollection
conditionModel Bindings *
ConditionModel Binding
/ambigous : Boolean * template
Template (from OperationSpecification MM)
1
template Bindings object
TemplateBinding
EObject 1
(from Ecore)
Figure 4.8: Template Binding Metamodel
Condition Model singleState0 : Template transition0 : Template singleState1 : Template transition1 : Template singleState2 : Template transition2 : Template
Condition Model Binding 1 : TemplateBinding
State Machine Model Idle
DialTone
: TemplateBinding hangup
: TemplateBinding : TemplateBinding : TemplateBinding
hangup
: TemplateBinding hangup
Condition Model singleState0 : Template transition0 : Template singleState1 : Template transition1 : Template singleState2 : Template transition2 : Template
lift
Condition Model Binding 2 : TemplateBinding
Dialing Connecting
State Machine Model Idle
lift
DialTone
: TemplateBinding hangup
: TemplateBinding : TemplateBinding : TemplateBinding
hangup
: TemplateBinding hangup
Dialing Connecting
Figure 4.9: Example for Condition Model Bindings
distinct one-to-one relationships between templates and model elements, whereas each template of the condition model is bound to exactly one model element and one model element is bound only once. Because of multiple matches of a condition model in a model or because of iterations attached to templates, one template may also be bound to multiple model elements. This is realized by having a ConditionModelBindingCollection, which may contain multiple intrinsically valid and unique ConditionModelBindings. If there are multiple ConditionModelBindings, they may overlap regarding a subset of their template bindings. Consider, for instance, the example depicted in Figure 4.9. In this example, we have the condition model expressing the preconditions of our example refactoring on the left. On the right, there is an excerpt of our running example’s state machine. Please note that the upper condition model as well as state machine are the same as the lower ones; we graphically split them for the sake of readability. Between the condition model and the state machine, there are two condition model bindings, Condition Model Binding 1 and Condition Model Binding 2. Both are intrinsically valid and unique, how92
Condition Model singleState0 : Template
State Machine Model Idle
lift
DialTone
transition0 : Template hangup
singleState1 : Template transition1 : Template singleState2 : Template
hangup
transition2 : Template hangup
Dialing Connecting
Figure 4.10: Condition Model Bindings Combined Within One Binding Collection
ever, Condition Model Binding 1 binds the template objects singleState2 and transition2 to other model elements in the state machine than Condition Model Binding 1 does. The rest of the bindings are overlapping. If now both of the condition model bindings are combined within one ConditionModelBindingCollection, we obtain the combined template binding depicted in Figure 4.10. In this combined binding, the template objects singleState2 and transition2 are now multiply bound. As already mentioned, it is only allowed to bind multiple model elements to one template, if there is an iteration attached to the respective template; it is, however, also valid if the multiple binding is only due to an iteration attached to its direct or indirect parent template. Consequently, the ConditionModelBindingCollection in Figure 4.10 is valid, if the template object singleState2 has an iteration attached to it. Note that there is no need for an iteration at transition2 despite there are multiple transitions bound to it. This is because there is indeed an iteration at its container template singleState2 and in the context of each state bound to singleState2, only one single transition has been bound; more than one bound transition within the context of one state bound to the template singleState2 would be disallowed. If this would be intended, we would have to attach another iteration to the template transition2. If, on the other hand, a ConditionModelBindingCollection comprises multiple bindings to a template that does not directly or indirectly have an iteration attached to it, the ConditionModelBindingCollection is ambiguous as it is not clear which model elements should be transformed without repeatedly applying atomic operations demonstrated in the specification process. Therefore, the user has to remove one of the ambiguous bindings, before the composite operation may be executed. Assuming that we have no iterations configured in the example depicted in Figure 4.9, the user would have to remove either the binding between singleState2 and state Dialing or the binding between singleState2 and state Connecting. By doing so, the entire condition model binding (either Condition Model Binding 1 or Condition Model Binding 2) is discarded from the ConditionModelBindingCollection that originally held both bindings. Thereby, not only the ambiguous binding regarding the state, but also the ambiguous binding of its contained transition, is removed. Consequently, as long as there is still one binding left in the collection, the user always ends up having at least one valid and complete binding. 93
Finding valid template bindings. Finding valid bindings in a model for a given condition model is, basically, graph-based pattern matching [Gal05] or, more precisely, the problem of finding a subgraph isomorphism [Ull76], whereas the condition model corresponds to the pattern graph or graph query and the model corresponds to the data graph. According to the categorization of graph-based pattern matching problems by Gallagher [Gal05], the problem of finding valid bindings for a condition model deals with exact matching to find all optimal solutions because applying inexact matching to only achieve approximate solutions is insufficient for our use case. Exact matching for obtaining all optimal solutions is an NP-complete problem [Gal05, Ull76]. One of the earliest approaches to exact pattern matching is the subgraph isomorphism algorithm proposed by Ullmann [Ull76], which uses a depth-first tree-search algorithm. Thereby, a search tree is built, whereas each tree-hierarchy level maps to a node of the pattern graph and the tree nodes are constituted by nodes in the data graph. The algorithm traverses through that tree depth-first and checks whether all conditions down the way to the tree leaves are fulfilled. If the algorithm finds an invalid node or transition, it discards the whole remaining branch of the tree and goes on. Ultimately, the remaining tree contains all exact matches; each match is a path in the tree from the root element to its leaves. We use a similar approach to find valid bindings. However, we do not enumerate all potential binding combinations in a tree in advance; we rather employ a recursive backtracking algorithm, which dynamically selects the next model elements to be evaluated. The input for the matching algorithm is the model to be matched and the condition model. Additionally, the user has to provide an initial binding for at least one template to one model element. Basically, the algorithm iterates depth-first through the condition model template by template. For each template, the algorithm checks whether a binding for that template already exists. If the template is not bound yet, it selects all heretofore unbound model elements that are a potential match and evaluates them with the current template’s conditions. At this point, only local conditions or conditions that refer to already bound templates can be evaluated immediately. For the remaining conditions, which refer to currently unbound templates, the algorithm first again selects candidate model elements for all unbound templates these conditions refer to. As a result, the algorithm obtains a set of base candidates for the current base template, and, for each of these base candidates, a set of referenced candidates for each referenced template. To explore all potential branches, the algorithm builds the permutation of all unique element-to-template binding combinations. Each base candidate is now evaluated against the remaining conditions with each of its permutation of referenced candidate bindings. If more than one valid base candidate in the context of the referenced candidate bindings remains, the algorithm proceeds with accepting only one of these base candidates and referenced bindings and starts a recursion for each remaining combination of base candidates and its referenced bindings. Thereby, each potential branch is evaluated by one recursion. If a recursion reaches a point in which no valid model element can be found for the next template or the model element bound to the current template is invalid, the branch is discarded. Otherwise, the recursion stops as soon as a complete condition model binding has been built finally. Thereby, one recursion only builds one unique and intrinsically valid ConditonsModelBinding, which only contains one-to-one bindings. Ultimately, all detected valid ConditonsModelBinding are put into a ConditionModelBindingCollection, which represents all valid matches, being ambiguous or not. 94
A1
lift
Idle
SingleState_0 DialTone Transition_0 hangup
SingleState_1 dial
Transition_1 SingleState_2
Dialing
hangup
Transition_2
Connecting
hangup
A2
B2
lift
Idle
SingleState_0
SingleState_0
DialTone
Transition_0 hangup
SingleState_1 dial
Transition_1 SingleState_2
Dialing
hangup
Transition_2
Connecting
hangup
dial hangup
B4
SingleState_0
SingleState_0
SingleState_1
Dialing
hangup
Transition_2
Connecting
hangup
A6
hangup
B6
SingleState_0
SingleState_0
Idle
SingleState_1
hangup
Connecting
hangup
SingleState_1 dial
Transition_1 hangup
hangup
DialTone
Transition_0 hangup
Transition_2
hangup
Transition_2
Dialing
lift
DialTone Transition_0
SingleState_2
dial
Transition_1 SingleState_2
Connecting
hangup
DialTone
Transition_0
…
lift
Idle
Dialing
Transition_2
…
lift
SingleState_1 dial
Transition_1 SingleState_2
Idle
hangup
hangup
SingleState_1 dial
Transition_1
C4 DialTone
Transition_0 hangup
Connecting
hangup
SingleState_0
DialTone Transition_0
SingleState_2
hangup
…
lift
Idle
Dialing
Transition_2
…
lift
Idle
dial
Transition_1 SingleState_2
Connecting
hangup
… A4
Dialing
Transition_2
(3(3 mm oror e)e)… … …
SingleState_1
Transition_1 SingleState_2
DialTone
Transition_0 hangup
hangup
SingleState_1
lift
Idle
SingleState_0
DialTone Transition_0
C2
lift
Idle
Dialing Connecting
dial
Transition_1 SingleState_2
hangup
Transition_2 hangup
Dialing Connecting
Figure 4.11: Example for the Template Matching Algorithm
Example. For making our algorithm for finding valid condition model bindings more clear, we go through a small example, which is depicted in Figure 4.11. In particular, we show how the precondition model of our running example (cf. Figure 4.5b) is matched with an excerpt of the phone state machine model (cf. Figure 4.11). Assume the user specified an initial binding that maps the state Idle to the template SingleState_0. The algorithm traverses depth-first through the condition model. Thus, the first template to be considered is SingleState_0. As this template has already been bound in the initial binding, we may directly proceed with checking its conditions. First, we consider the implicit condition transitions->includes( 95
#{Transition_0}) coming from SingleState_0’s contained template Transition_0, which is heretofore not bound to a model element. Consequently, we first have to obtain candidates for Transition_0 before we may evaluate the implicit condition. As already mentioned, templates explicitly refer to the feature through which they are contained by their parent element (cf. reference parentFeature of templates in the operation specification metamodel in Figure 4.6). For the template Transition_0, this is the reference called transitions in single states. Therefore, all model elements from Idle.transitions are retrieved to obtain the referenced candidates for Transition_0, which is just the transition lift in our example. Additionally, our current base template SingleState_0 contains two explicit conditions referring to other templates, which is on the one hand incoming->includes(#{Transition_1}) and on the other hand incoming->includes(#{Transition_2}), whereas the referenced templates, namely Transition_1 and Transition_2, have not been bound yet. Thus, we first have to select candidates for these two templates before we may evaluate these two conditions of SingleState_0. As we have no explicit hints on suitable candidates for Transition_1 and Transition_2, we have to consider all heretofore unbound model elements having the type Transition in the state machine as candidates for both referenced templates. Therefore, all transitions in the state machine, except for lift, are now evaluated against the conditions of template SingleState_0. According to these conditions, only those transitions remain relevant that are incoming to the state Idle (i.e., all transitions named hangup). All other transitions are discarded as candidates for Transition_1 and Transition_2 (cf. A1 in Figure 4.11). To explore all potential branches arising from these candidates for these two templates, we now have to proceed with all k-permutations of n, whereas k is the number of templates (i.e., Transition_1 and Transition_2) and n is the number of model elements (i.e., the three transitions named hangup). This leads to six combinations. The first combination (cf. A2 in Figure 4.11) is further considered in the current branch named A and for each of the remaining combinations, a new recursion is started (branch B to F. For the sake of readability, in Figure 4.11, only three branches (A, B, C) are depicted. The next template to consider is Transition_0, which has been already bound in all branches. Therefore, we directly proceed with evaluating its conditions. Fortunately, one of its conditions, namely source = #{SingleState_0}, refers to a template that is bound already and, as a result, may immediately be proved to be valid in all branches. The other condition, target = #{SingleState_1}, refers to the heretofore unbound template SingleState_1. Thus, we first have to find suitable candidates for the referenced template before we may evaluate this condition. The only state that fulfills this condition is DialTone, so we proceed with this candidate for SingleState_1 in each branch. The next template to be evaluated is the beforehand referenced template SingleState_1. This template contains one condition, that is, incoming = #{Transition_0}. As the referenced template has been bound already in all branches, we may directly evaluate it, which leads to accepting DialTone for SingleState_1 in all branches. The next template to be considered is Transition_1, which also has been bound already. This template contains the conditions source = #{SingleState_1} and target = #{SingleState_0}. As we also have a binding for both referenced templates, we may evaluate these conditions right away. However, in branch C, the condition concerning the source is not fulfilled because the source of the transition is actually Dialing and the model element bound to SingleState_1 is DialTone (cf. C4 in Figure 4.11). Consequently, branch C is 96
discarded. The bindings in the branches A and B, however, fulfill both conditions. Thus, we proceed with these branches by evaluating the next template, SingleState_2. This heretofore unbound template comprises only the implicit condition regarding its contained template Transition_2, which has been bound already in both remaining branches. Therefore, we first have to select all remaining states as candidates and check the implicit condition for each candidate. In branch A, only the state Dialing fulfills this condition as it contains the transition bound to Transition_2. In branch B, the only state that fulfills this condition is, on the contrary, Connecting because, in this branch, a different transition is bound to Transition_2. Anyway, in both branches, we found a valid state. Therefore, we may proceed with evaluating the last template Transition_2. As we already have bindings in all branches, we may directly evaluate all three conditions of this template. The first condition concerns the name of the transition (cf. Figure 4.5c); as both transitions are named equally to the transition bound to Transition_0, this condition is fulfilled in both branches. The same is true for the remaining two conditions concerning the transition’s source and the target. Consequently, we end up having two valid ConditionModelBindings, which are depicted in A6 and B6 of Figure 4.9.
4.1.5
Execution of Operation Specifications
In this section, we show how operation specifications are executed to arbitrary models that fulfill the operation’s precondition. When executing operation specifications, we first have to obtain a precondition model binding based on an initial binding specified by the user (cf. Section 4.1.4). Having a complete precondition model binding, we now aim to apply the same operations that have been demonstrated by the user when specifying the operations to the bound model elements. Diff elements in operation specifications. The atomic operations that have been demonstrated during the specification process of a composite operation are saved in the operation specification in terms of a diff model. For obtaining such a diff model from the user-provided example models, we employ an extension to the state-based model comparison, which is realized by EMF Compare. For more information on obtaining atomic operations, we kindly refer to Chapter 5. In the context of executing operation specifications, it is sufficient to know that the obtained diff model contains diff elements, which precisely describe the applied atomic operation. In particular, such diff elements indicate the operation type (e.g., addition, deletion, update) and refer to the modified model elements and, where required, to the updated feature. Thus, such diff elements contain enough information to apply the described operation. EMF Compare merge API. Fortunately, EMF Compare provides, besides its model comparison features, also a merge API, which is capable of applying detected diff elements to the compared models. For instance, if a model comparison detected the diff element “feature f of model element e has been updated from value v1 to value v2 ”, we may apply the difference to the opposite model version (i.e., the concurrently modified version) so that the value v1 in feature f is updated to v2 in the corresponding model element in the opposite model of e. Thereby, 97
Initial Model changed Object changed Object
A
value
t
B
changed Object
Diff Model
Revised Model
: DeleteFeatureValue feature = “transitions" : InsertFeatureValue
A value
t
feature = “transitions" B
Preconditions Model
: FeatureUpdate feature = "target"
ta : Template
Diff Rewriting
tt : Template tb : Template
value
Model A
Diff Model’
Model A’
: FeatureUpdate X
changed Object
z
Y
value changed Object
feature = "target" : InsertFeatureValue
X value
z
feature = “transitions" value
Y
: DeleteFeatureValue changed Object
feature = “transitions"
Figure 4.12: Example for Rewriting and Executing Diff Elements
EMF Compare allows to merge two models by applying all changes to a model that have been applied to the opposite model. We exploit this merge API to realize the execution of operation specifications. Rewriting diff elements. EMF Compare, however, allows to apply diff elements only to the compared models directly and not to other models. Therefore, we first clone the operation specification’s diff model and rewrite this diff model copy so that the references that originally refer to the model elements of the operation specification’s initial model ultimately refer to the model elements to which the composite operation shall be applied. This rewriting mechanism is illustrated by a small example in Figure 4.12. To keep the diff models small and clear, a new exemplary composite operation is used. More precisely, this composite operation changes the direction of an existing transition. Therefore, the initial model comprises two states, A and B, and one transition t. The precondition model accordingly contains three templates; one for each model element. In the revised model, the container state of transition t is changed from its original container A to the state B and the transition’s target is changed to state A. Consequently, the diff model contains three diff elements, namely, a DeleteFeatureValue for detaching the transition from its original container state, an InsertFeatureValue for adding the detached transition to the new container state again, and a FeatureUpdate for changing the target of the transition. Please note that we use our own terminology for diff elements and our own metamodel for representing diff elements in this thesis as we feel that EMF Compare’s diff model might be not as concise and clear to readers of this thesis. For a detailed discussion of diff models, we kindly refer to Chapter 5. This diff model refers to model elements in the initial and revised example model by the references changedObject and value for indicating the affected model elements (cf. Figure 4.12). 98
For applying this operation specification to an arbitrary model, named Model A in Figure 4.12, the respective model elements have to be bound first to the three templates in terms of a precondition model binding. The next step is to create a copy of the original diff model, called Diff Model’, and rewrite it accordingly. In particular, the references to the initial model are changed so that they refer to the corresponding model elements in Model A. For instance, DeleteFeatureValue originally refers to state A through the reference changedObject. We know that state A is represented by the template ta, which is bound to state Y in Model A. Therefore, we may rewrite the reference target of changedObject from A in the operation specification’s Initial Model to Y in Model A. The same mechanism applied to all remaining references going from diff elements to model element in the operation specification’s Initial Model. For rewriting the reference targets going to the operation specification’s Revised Model, it is required to make use of the match model between the Initial Model and the Revised Model. Please note that this match model is not depicted in Figure 4.12 for the sake of readability. It maps each model element in the Revised Model to its corresponding original model element in the Initial Model. Thus, when rewriting, for instance, the reference value going from InsertFeatureValue to the Revised Model’s transition t, we may obtain its corresponding transition t in the Initial Model through the match model, get its representing template tt and, finally, recall the bound transition z in Model A. The same mechanism can be applied for the reference value of FeatureUpdate. Having rewritten all diff elements in Diff Model’, we may use EMF Compare’s merge API to apply them to Model A. Although it is not required in our example, additions of model elements pose another challenge. Added model elements in the operation specification’s Revised Model certainly have no corresponding model element in the Initial Model. Thus, we may not rewrite the references going from the diff element to the added model element in the Revised Model as easily. Therefore, we apply a two-phase diff rewriting and execution. First, we apply only additions by copying the added elements to the respective location in the model to which the composite operation is applied and keep the relationship between the originally added model element to the created copy in an intermediate trace model. Thereby, we have to apply the additions starting from top-level elements to bottom-level elements in terms of the containment hierarchy. Otherwise, we could not add a child to a parent that has not been created yet. Next, we have to rewrite the reference values of these added elements as they still might refer to model elements in the operation specification’s Revised Model. Consider a scenario in which a new transition has been added; by only copying it to the model to which we aim to apply the composite operation (named again Model A hereafter), the transition would still refer to the target state of the operation specification’s Revised Model. Therefore, after copying all added model elements, we have to walk through all of their feature values, check whether these are model elements in the operation specification’s Revised Model, and, if so, change the value to the corresponding model element in Model A. For that, we have to query either the match model or the intermediate trace model containing the correspondences between added model elements in the Revised Model and their copies in Model A. Subsequently, we may process all other diff elements. Handling iterations in the execution. To recall, iterations are attached to precondition templates to indicate that the diff elements affecting these templates (called iterative templates hereafter) shall be repeated for all model elements bound to such templates. Thereby, iterations have 99
two consequences concerning the composite operation execution: first, an iteration enables the multiple binding of model elements to the iterative template (cf. Section 4.1.4), and, second, all operations that have been applied to the corresponding model element during the demonstration are repeated for each bound model element. In the following, we discuss the latter consequence in more detail. An unambiguous ConditionModelBindingCollection that contains more than one ConditionModelBinding entails that at least one template within the bound condition model is iterative; otherwise, the binding collection would be ambiguous. As a consequence, each unique and intrinsically valid ConditionModelBinding within the collection describes the context within which one iteration of the operation specification’s diff model shall be performed. Thus, iterations are realized by creating and executing a rewritten copy of the operation specification’s diff model for each unique ConditionModelBinding within one collection. However, we may not naively repeat all diff elements in each iteration; otherwise, we might, for instance, inadvertently add more than one model element to a container model element that is indeed not bound to an iterative template. Moreover, it is only possible to delete one model element or one feature value once and not repeatedly in each iteration. Therefore, we have to regard certain rules when copying the diff model for repeating the execution. In particular, only those diff elements are copied that refer, either by the reference changedObject or value, to a model element that is represented by an iterative and multiply bound template. One exception, however, are FeatureUpdates, which set the value of a single-valued feature; such diff elements are only copied if the reference changedObject, and not only the reference value, refers to a model element represented by an iterative and multiply bound template; otherwise, we would overwrite the same value in each iteration over and over again. Please note that the reference value might refer to a model element in the Revised Model so that we have to use the match model again to infer whether the Revised Model’s element is represented by an iterative and multiply bound precondition template. Subsequently, these copied diff elements are each rewritten for one unique ConditionModelBinding. Thereby, we ensure that each diff element is tailored precisely to be executed within the correct context.
4.1.6
Considering More Sophisticated Composite Operations
In this section, we discuss advanced features of EMO in order to address more sophisticated composite operations. Please note that these features mainly concern the expressive power of operation specifications, rather than automating their specification. Notation Before we present the advanced features, we first introduce the notation used for depicting operation specifications. Therefore, in Figure 4.13, an example of an operation specification is shown. Directly below the , there are two areas, namely Intial Model and Revised Model, illustrating the initial and the revised model in the concrete syntax, respectively. In particular, for presenting the advanced features, we use Ecore models and use the concrete syntax of UML class diagrams. Each model element is annotated with an Object ID in brackets (e.g., [1]) to indicate the mapping between the initial and revised models, as well as their 100
Initial Model
Revised Model
A [1]
A [1]
att [2] : String ref [3]
Object ID ID
B [4] att [2] : String
Essential Preconditions •EClass_0 [1] • features->includes(#{EAttribute_0}) • EAttribute_0 [2] • containingClass = #{EClass_0}
Essential Postconditions •EClass_0 [1] • features->includes(#{EAttribute_0}) • features->includes(#{EReference_0}) •EReference_0 [3] • containingClass = #{EClass_0} • type = #{EClass_1} • containment = true •EClass_1 [4] • features->includes(#{EAttribute_0}) • EAttribute_0 [2] • containingClass = #{EClass_1}
Iteration
Figure 4.13: Notation for Illustrating Specifications Initial Model RevisedOperation Model A [1]
A [1]
att [2] : String
corresponding templates in the condition models, which are depicted below in two dedicated ref [3] Object ID IDand Essential boxes entitled Essential Preconditions B [4]Postconditions. In these condition modatt [2] : String models are organized in a hierarchical els, template names are printed in bold. The condition enumeration according to the containment hierarchy of the initial or revised model. If iterations are attached to precondition templates, this is indicated by the icon next to the iterative Essential Preconditons Essential Postconditons template’s name. •EClass_0 [1] •EClass_0 [1] • features->includes(#{EAttribute_0}) • EAttribute_0 [2] • containingClass = #{EClass_0}
Introducing Copies Iteration
• features->includes(#{EAttribute_0}) • features->includes(#{EReference_0}) •EReference_0 [3] • containingClass = #{EClass_0} • type = #{EClass_1} • containment = true •EClass_1 [4] • features->includes(#{EAttribute_0}) • EAttribute_0 [2] • containingClass = #{EClass_1}
In the initial version of EMO, the supported diff element types that can be detected between the initial model and the revised model after the demonstrations were additions, deletions, moves, and updates (of reference and attribute values). However, in many operations, a model element should be copied, including its containments, instead of simply added. Therefore, we introduce the diff element type copy. EMF Compare, however, does not support detecting copies. Thus, users may convert detected additions into copies in the configuration phase by selecting the respective diff element representing the addition in the list and convert it into a copy, whereas the copy source has to be selected manually from the initial model. Obviously, only those model elements are allowed to be selected as copy source, if they have the same metamodel type and the same properties as the originally added model element. After choosing the copy source, there is an explicit reference from the copy-typed diff element to the copy source being a model element in the initial model; thus, also iterations attached to the specified copy source can now 101
Push Down EOperation Initial Model
Revised Model
A [1]
A [1]
operation() [2]
B [3]
C [4]
B [3] operation() [2]
Essential Preconditions •EClass_0 [1] • features->includes(#{EOperation_0}) • EOperation_0 [2] • containingClass = #{EClass_0} •EClass_1 [3] • eSuperTypes->includes(#{EClass_0}) •EClass_2 [4] • eSuperTypes->includes(#{EClass_0})
C [4] operation() [5]
Essential Postconditions •EClass_0 [1] • features->includes(#{EOperation_0}) •EClass_1 [3] • eSuperTypes->includes(#{EClass_0}) • EOperation_0 [2] • containingClass = #{EClass_1} •EClass_2 [4] • eSuperTypes->includes(#{EClass_0}) • EOperation_1 [5] • containingClass = #{EClass_2}
Figure 4.14: Push Down EOperation for Illustrating the Benefits of Copy
be supported. In particular, if the copy source element is represented by an iterative template, the execution engine repeats creating the copy for each bound model element. Having only usual diff elements representing plain additions, no such reference to the initial model’s elements exists and, consequently, no iterations can be attached. For both additions and copies, the execution engine repeats applying the respective diff element for each target model element (i.e., the new container of the added or copied model element), if the target’s template is iterative. The benefits of supporting model element copies in composite operations is illustrated in the example depicted in Figure 4.14, which shows the specification of the refactoring Push Down EOperation. By applying this composite operation, an EOperation contained by a specific EClass is pushed down to all its subclasses. To support several EOperations to be pushed down to two or more subclasses, this operation specification contains two iterations, one for the EOperation and one for the second subclass. The detected diff elements after the user’s demonstration are a move of EOperation_0 from EClass_0 to EClass_1 as well as the addition of EOperation_1. In fact, however, by this addition a copy of the model element represented by the template EOperation_0 is created. Therefore, the user may select the diff element representing the detected addition and turn it into a copy and specifies EOperation_0 to act as the copy source. This diff element is denoted with in the postconditions area in Figure 4.146 , whereas EOperation_0 indicates the copy source. Thereby, when applying this operation specification to an arbitrary model, instead of adding the operation() model Please note that this syntax is only used here; in EMO, these annotations are attached using a more user-friendly user interface. 6
102
Extract Superclass Initial Model
Revised Model C [7] operation() [2] attr : String [3]
A [1] operation() [2] attr : String [3]
B [4]
A [1]
B [4]
operation() [5] attr : String [6]
Essential Preconditions •EClass_0 [1] • features->includes(#{EOperation_0}) • features->includes(#{EAttribute_0}) • EOperation_0 [2]