Diff and Merge Support for Model Based Development

4 downloads 309 Views 188KB Size Report
May 17, 2008 - Diff and Merge Support for Model Based Development. Lars Bendix. Department of Computer Science. Lund Institute of Technology. S-221 00 ...
Diff and Merge Support for Model Based Development Lars Bendix

Pär Emanuelsson

Department of Computer Science Lund Institute of Technology S-221 00 Lund, Sweden

Ericsson AB S-583 30 Linköping Sweden

[email protected]

[email protected]

ABSTRACT Feature oriented development using models has many advantages. However, there are serious obstacles when it comes to using this approach because of lack of support for teamwork when using models. We believe that one solution is a better connection between research and problems – and we suggest some strategies to pursue to achieve better support for teamwork when using models.

Categories and Subject Descriptors D.2.7 [Software Engineering]: Distribution, Maintenance, and Enhancement.

General Terms Management, Design, Experimentation.

Keywords Models, parallel work, versioning, diff, merge.

1. INTRODUCTION The use of models in software development becomes more and more popular. Models have many advantages, but their widespread use also causes problems. The problems that we want to address in this paper are caused by the fact that models are used not just by a single person, but by a whole development team. Models that are in continuous use will inevitably be modified, and it is those modifications that are at the root of our problems. One problem is that several people need to work with and make changes to the model in parallel. Usually when concurrent changes have to be carried out, two copies are made and the alternatives are then merged later. However, the tools we are using do not give good support for merging of models, resulting in the developers reverting to a sequential approach to changes. However, even for an apparently sequential approach it has been

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CVSM’08, May 17, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-045-6/08/05...$5.00.

pointed out that there are potential problems. Most often what happens is “sequential parallelism” [18], where two or more change tasks are split up into smaller bits, which are interleaved with each other. This means that even though we are using locking for each modification, the final result might still not be consistent. Another problem is that even when change tasks are actually strictly sequential and we do not need parallel alternatives to be merged, we are still interested in the history of our model. The development history tells us how the model has changed and can tell us the difference between two versions – or between one release and another. However, if there is no tool support for doing diff on models, we have to rely on other and less optimal methods. Finally, there are problems because people expect things to be “business as usual” as they know it from working with textual documents and they do not know how to cope with the problems in this new domain of models. In the following, we first give a brief overview of research in diff and merge of models from our point of view. Then we describe some of the problems that our developers struggle with in their day-to-day work with models. This is followed by an analysis of some of the roads that we see could lead to improvements for teams of developers working on models. Finally, we draw our conclusions.

2. ACADEMIC RESEARCH Quite a lot of research has been done on versioning, differencing, comparing, merging, and union in the domain of models. In this section, we will give a very brief overview of what is related to our particular context (which is described in the next section). How to combine the parallel work of two developers into a single model is treated by Alanen et al. [1]. They describe three generic algorithms for diff, merge and union of models. Their diff algorithm calculates the difference between two models. They define their merge to be the application of a diff to a model, creating a new model. Finally, applying the diffs between two alternatives and the common original to the original model, they obtain the union of the two alternatives. The union algorithm effectively provides a three-way merge for models much the same way as is done for text documents. Alanen et al. [1] rely heavily on universally unique identifiers for all model elements in their algorithms, which could in some cases be a problem to satisfy. However, their approach and results are general for any modelling language based on the Meta Object Facility (MOF) – which is the case for the UML language.

In general, comparison approaches that do not rely on universally unique identifiers have been very slow and impractical for real use, despite the advantage of allowing more flexibility. However, a new technique, presented by Treude et al. [20], has made it an order of magnitude faster to compare models without the use of unique identifiers. Documents that have to be compared are transformed into an internal representation from an XML-based file format. They can handle correctly cycles in the graph for a model, and even though they use heuristics for finding similar elements, their tests show that in practise errors are very rare. Another advantage of their approach is that it can also be used in domains, where similarity of elements does not depend on their compositional structure, but mainly on their neighbourhood – as is the case for Petri nets. On a more practical level, Oliveira et al. [15] have actually implemented a version control system for UML to provide configuration management support for CASE tools that work with UML models. They identify the need for basic version control operations, such as diff, patch and merge also for UML models, but in their paper they describe only the implementation of merge. Their approach is limited to UML models, but their version control is very flexible in the sense that it allows the user to define the Unit of Comparison and the Unit of Versioning to fit their specific needs. MetaDiff [10] provides a framework for model comparison. The framework in itself does not provide algorithms for comparing and merging models, but is intended as a framework for experimenting with different algorithms for MOF-based modelling languages. Since it appears to us that the field of model comparison and merging is still not stable, it seems like a good idea to have such a framework. In the Eclipse community work is carried out on a modelling framework and in the context of this, Toulme [19] presents work on providing a comparing utility for EMF models. This work focuses on model comparison and does not provide for merging of models (in contrast to the work of Oliveira et al. [15]). However, it is interesting that they do not rely on universally unique identifiers (as does Alanen et al. [1]), but on using the semantics when matching the model elements.

intrusiveness, which are qualities that any practical model comparison technique or tool should posses. For the usage scenarios, we find software evolution analysis and model merging particularly interesting as they match our situation, whereas we find model comprehension less of a problem. Inconsistency detection – or rather the contrary, consistency checking – we find very important, as that is a general problem when working with models and needs to be solved in any case.

3. INDUSTRIAL PROBLEMS Ericsson has developed and deployed several large systems containing millions of lines of code that have been generated from UML models. Hence Ericsson is dependent on having reliable and safe processes and tools for handling large models. Since a defect in a model will inevitably end up as a defect in the code there is no room for tools and processes that work “most of the time”. The code is generated from the model without manual intervention and generated code must not be changed. Typically the systems are required to be in operation 99.999% of the available time. Development projects have found good solutions to most of the to-be-expected problems and for example execution speed and code volume are under control. However many of the projects have not found a good way to handle collaborative aspects of the development and evolution of UML models. Therefore much energy has been put into methods to avoid model merge as far as possible. This can be done for example by avoiding parallel development and to have as much “one track” as possible, such as avoiding development branches and avoid error corrections being worked on in parallel with development and have only one or few product versions that are supported and in use. For our specific needs, we are only looking for a solution to a limited problem:

Both syntactical and semantical aspects of models are taken into consideration for conflict detection when merging models in the work of Altmanninger [2]. This allows for a more precise conflict detection and can also determine the reason for the conflict, thus providing much better support for the merge process even in the presence of conflicts.



All developers are in the same coordinated project, so everybody works in the same way if needed.



Everybody is using the same set of tools



Historic versions of the models are always available (possible to make 3 way merge)

Finally, Brunet et al. [8] states a manifesto for research on model merging, in order to be able to discuss and compare the many different approaches to model merging. They propose a set of useful model management operators (merge, match, diff, split, and slice) and specify the idealized algebraic properties of each operator. Using this framework, different proposals can be compared, exposing their real differences and suppressing irrelevant differences.



Graphical changes without semantic significance need not have an elegant solution, but can be essentially manual.

Along the same lines, Selonen [17] provides a review of five different approaches to model comparison. The selected approaches are compared based on a set of key characteristics. However, more importantly, Selonen also gives a list of desirable qualities for model comparison techniques, and a list of potential usage scenarios. Desirable qualities include identifier independence, reliability, usability, composability and non-

Some of the merge-avoiding strategies have good effects in general on the project, such as restricting the number of supported variants and versions of a system. However, many of the projects would like to do feature oriented development. In this kind of development a multi-disciplinary team develops a single feature of a system from start to end through all development phases. This development is quite different from having one team for requirements, one for design one for test etc. All the feature teams would work on the same model and one should not have to do detailed synchronisation and planning between the teams. This kind of development would require efficient and reliable model merge.

For the projects that really want to do feature oriented development the technology choice is – do not use modelling – use programming instead! That is very unfortunate and something that we would like to change – modelling should be compatible with feature oriented development. We will not take the position “blame the tools”. Rather we will study how projects divide models into manageable pieces and study how the different merge-avoiding strategies interact. Tools will not solve all problems and we will have to find strategies that work with non-perfect tools.

4. ANALYSIS From a tool-user perspective we cannot wait for new research results – or for research results to wander into new tools that we can buy. We need some more immediate support for the problems we experience. We must try to see how we can “get by” with what we have now and make that work. Something that is under our “control” and can be implemented immediately are “the processes” used for working with models. We should be looking for ways of working in parallel with models that makes it less problematic for tools to show diffs and find merges later on. Like, “if you do this in parallel to the model, it is bound to cause serious problems later on in creating the resulting merge”. This line of research will have to look at the basic problems of team collaboration [5] and could draw on inspiration from work like Continuous Integration [9]. Awareness of what others in a team are doing also seems to be important in trying to avoid conflicting changes and thus future (potential) merge problems. This indicates that we should not only focus our attention on the merging of models, but also on the capability of doing diff on models to allow us to see what other people are – or have been – changing. Likewise, we should be looking for ways to physically structure our models to allow as much parallel work as possible with as little contention as possible (but keeping in mind that even contention-less work on models is not without problems [18]). We should look into different strategies for dividing up our work based on the physical structure of the model to facilitate a team of people to work as much in parallel as possible with as few problems as possible – like the split-combine strategy suggested by Magnusson et al. [13]. For this part, we should be able to profit from previous work from the programming language world on using the syntax to split up programs into smaller (more finegrained) parts [11], [12]. This should also be able to provide the same flexibility in defining Unit of Comparison and Unit of Versioning as in Oliveira et al. [15]. When we compare with the situation of working with textual documents instead of models, it is important to pay attention to the subtleties that make merging of parallel work in that domain. In reality automatic merge of textual documents is not as straightforward as many people tend to believe. Even if we do not get merge conflicts at the “syntax” level, there may still be semantic merge conflicts present that are not flagged by the merge tool. That is why a second step is necessary in textual merge. In this step, we check the semantical correctness of the merge result, and this is usually done using the compiler and possibly a regression test. It is therefore important that research on model merging also

look at both steps; not just a syntactically correct merge, but also a verification of the semantical consistency of the result. In this context it might be more important to look at definitions of “consistency” – what should be the result of a merge – and the suppression of “irrelevant” (difference) details in favour of focus on what are relevant changes/differences to present to the user. Part of the input for this could come from theory of algebraic properties of models [8], but we also find it important that “real life” opinions are taken into consideration. Also on a more general note, we could take inspiration from much previous work on diff and merge done in the textual domain. Try to identify what are the similarities – and differences – between working with textual documents and models. Strategies for working in parallel on textual documents – why can/cannot they be used for models too. Using the inspiration from the textual domain, we can find out where the “metaphor/comparison” breaks down – and why? Can we “fix” that by behaving differently – or “cope” with it because of better (that is, the same as for the textual case) tool support. In working with diff and merge problems in the textual domain, people has tried to take into consideration both syntactical [7] and semantical [6] aspects of programming languages. The major obstacle to the “survival” of these approaches seems to have been the lack of generality with respect to the quantity of different programming languages. There has also been work on exploiting the hierarchical structure of documents for merging text [3] and facilitating working with “consistent” configurations [4] in which differences between versions are easy to find. The former approach seems to have died out together with the syntax-directed editors, the latter approach still lives in the way that version control tools like Subversion works. Ideally a diff or merge tool for models should be independent from other tools used, so we can combine modelling solutions from several vendors. Ohst et al. [14] shows what a tight integration between versioning and editor tool can provide. However, in our context we do not value the possibility to capture “the shift of a method” to the point that we would consider such an integration “worth the price”. We believe that it should be possible to make diff and merge tools for models that give better and higher level of support than the corresponding text based tools: •

Models have more/higher level information that is more explicit.



Some basic techniques such as unique identifiers, threeway merge, … must be used



Need for common models – open source

5. CONCLUSIONS From out very own narrow perspective, we would propose the following research strategies for diff and merge of models: •

short term: look at “the processes” for working with models. Can we improve collaboration techniques assuming that we use existing tools.



medium term: get the tool vendors to use research results and define new requirements on tools.



long term: new research (targeted at our problems)

We are currently investigating the short-term strategy – and would like to influence (participate in formulating) the medium- and long-term strategies.

6. REFERENCES [1] Alanen, M. and Porres, I.: Difference and Union of Models, in Proceedings of The Sixth International Conference on the Unified Modeling Language – UML 2003, San Francisco, California, October 20-24, 2003. [2] Altmanninger, K.: Models in Conflict – Towards a Semantically Enhanced Version Control System for Models, in Proceedings of the Doctoral Symposium at the 10th International Conference on Model Driven Engineering Languages and Systems, Nashville, Tennessee, September 30-October 5, 2007. [3] Asklund, U.: Identifying Conflicts During Structural Merge, in Proceedings of Nordic Workshop on Programming Environment Research, Lund, Sweden, June 1-3, 1994. [4] Asklund, U., Bendix, L., Christensen, H. B., and Magnusson, B.: The Unified Extensional Versioning Model, in Proceedings of the 9th International Symposium on System Configuration Management, Toulouse, France, September 57, 1999.

[9] Fowler, M.: Continuous Integration, Retrieved June 1st, 2006, from http://www.martinfowler.com/articles/continuousIntegration. html. [10] Kofman, M., Perjons, E.: MetaDiff – a Model Comparison Framework, http://metadiff.sourceforge.net/docs/metadiff.pdf, undated. [11] Kristensen, B. B., Madsen, O. L., Møller-Pedersen, B., and Nygaard, K.: Syntax Directed Program Modularization. In: Interactive Computing Systems (ed. P. Degano, E. Sandewall), North-Holland, 1983. [12] Madsen, O. L., Møller-Pedersen, B., and Nygaard, K.: Object-Oriented Programming in the BETA Programming Language, Chapter 17: Modularization, Wiley, 1993. [13] Magnusson, B., Asklund, U., and Minör, S.: Fine-Grained Revision Control for Collaborative Software Development, in Proceedings of the 1st ACM SIGSOFT symposium on Foundations of Software Engineering, Los Angeles, California, December 8-10, 1993. [14] Ohst, D., Kelter, U.: A Fine-grained Version and Configuration Model in Analysis and Design, in Proceedings of the International Conference on Software Maintenance, Montreal, Canada, October 3-6, 2002. [15] Oliveira, H., Murta, L., and Werner, C.: Odyssey-VCS: a Flexible Version Control System for UML Model Elements, in Proceedings of the 12th International workshop on Software Configuration Management, Lisbon, Portugal, September 5-6, 2005.

[5] Babich, W. A.: Software Configuration Management – Coordination for Team Productivity, Addison-Wesley, 1986.

[16] Sabetzadeh, M., Nejati, S.: TReMer: A Tool for Relationship-Driven Model Merging, poster at Formal Methods 2006, Hamilton, Ontario, August 23-25, 2006.

[6] Berzins, V.: Software Merge: Semantics of Combining Changes to Programs, ACM Transactions on Programming Languages and Systems, Vol. 16, No. 6, November, 1994.

[17] Selonen, P: A Review of UML Model Comparison Approaches, in Proceedings of Nordic Workshop on Model Driven Engineering, Ronneby, Sweden, August 27-29, 2007.

[7] Buffenbarger, J.: Syntactic Software Merging, in Proceedings of the 5th International Workshop on Software Configuration Management, Seattle, Washington, April 2425, 1995.

[18] Thione, G. L., Perry, D. E.: Parallel Changes: Detecting Semantic Interferences, in Proceedings of COMPSAC 2005, Edinburgh, UK, July 26-28, 2005.

[8] Brunet, G., Chechik, M., Easterbrook, S., Nejati, S., Niu, N., and Sabetzadeh, M.: A Manifesto for Model Merging, in Proceedings of the international workshop on Global integrated model management, Shanghai, China, May 22, 2006.

[19] Toulmé, A.: Presentation of EMF Compare Utility, position paper at Eclipse Summit, Esslingen, Germany, October 1112, 2006. [20] Treude, C., Berlik, S., Wenzel, S., Kelter, U.: Difference Computation of Large Models, in Proceedings of ESEC/FSE 2007, Dubrovnik, Croatia, September 3-7, 2007.

Suggest Documents