Christian Tosi, Marco Zanoni, Francesca Arcelli, Claudia Raibulet. Università degli Studi di Milano-Bicocca,. DISCo â Dipartimento di Informatica Sistemistica e ...
Joiner: from Subcomponents to Design Patterns Christian Tosi, Marco Zanoni, Francesca Arcelli, Claudia Raibulet Università degli Studi di Milano-Bicocca, DISCo – Dipartimento di Informatica Sistemistica e Comunicazione Via Bicocca degli Arcimboldi, 8, I-20126, Milan, Italy
{christian.tosi, marco.zanoni, arcelli, raibulet}@disco.unimib.it
Abstract. Design pattern detection is a challenging issue addressed in various works, which provide different approaches and techniques, and propose several prototypes and tools. Some approaches propose the decomposition of design patterns into simpler elements to manage the complexity of their detection process. In this context, we describe our approach towards design pattern detection based on design pattern subcomponents. The core of this paper describes the Joiner module, which aims to group the classes of the system under analysis into sets that may represent design pattern instances. Keywords: design pattern detection, reverse engineering, design pattern subcomponents.
1 Introduction An important objective of reverse engineering is to identify the fundamental components of an analyzed system and to obtain its constituent structures. Getting this information should significantly simplify the restructuring and maintenance activities, as the system would be seen as a set of simpler coordinated components, rather than as a unique monolithic block. In this context, design patterns play a relevant role and detecting them can provide important hints about the system under analysis, both for program comprehension and for system re-documentation. With the aim to detect design patterns for reverse engineering purpose, we found particularly useful to consider the subcomponents of design patterns. Several works in the literature focus on subcomponents of design patterns as for example subpatterns in FUJABA [4], Elemental Design Patterns (EDPs) in SPQR [9, 10, 11], microarchitectures in PTIDEJ [3] and micro patterns [2]. Our research group started by detecting EDPs [9] in Java source code and then by proposing a new kind of subcomponents called design pattern clues [7]. EDPs capture solutions to very common problems in the software development and more precisely in the daily practice of every programmer: creation of objects, abstraction of interface, delegation of implementation, and so on. Their purpose remains the same of the design patterns (in that they capture design intends), but the application's sphere is much more restricted. In fact, if the design patterns offer
solutions to problems of a certain dimension which can involve a relevant number of classes, EDPs deal with problems of much more limited dimensions, which generally do not involve more than two or three interacting classes. While design patterns Clues have been proposed as completion of EDPs, in the attempt to achieve a more exhaustive identification of design patterns. There are cases where EDPs and clues seem to be enough to identify some particular design patterns. A catalogue of EDPs can be found in [11] and a catalogue of design pattern Clues is available in [7]. Our project called MARPLE (Metrics and Architecture Recognition Plug-in for Eclipse) is a reverse engineering tool, whose main purpose is design pattern recognition [12, 13]. Figure 1 shows the overall architecture of MARPLE. The first module is the Information Detector Engine, whose core component is the Basic Element Detector (BED), which deals with the formalization and identification of the design pattern subcomponents. In our project with subcomponents we mean EDPs and clues, called in the rest of the paper Basic Elements.
Figure 1: MARPLE’s Architecture The BED module interacts with the Joiner module, which gathers the information detected by the BED stored in a XML file and evaluates it in order to identify possible instances of design patterns inside the source code. The main goal of this paper is to describe the Joiner component, which groups the information retrieved by the BED module in groups of classes that could represent design pattern instances. Joiner represents the key module for the design pattern detection, because the output of this module provides the set of classes used by the neural network based module which assigns the matching probability for each detected design pattern. In addition to the design pattern recognition, our project aims to provide other types of support to reverse engineering, such as visual metrics all integrated with the design pattern module and internal to the Eclipse IDE.
2 Joiner This section presents the main aspects related to the Joiner module focusing on its data formalization and the different work steps.
2.1 Representation of the Basic Elements The Basic Elements (EDPs and Clues) are represented as graph edges, with the following characteristics: • Source Class: is the class where the basic element is found; • Destination Class: is the class to which the found basic element refers; • Name: specifies the basic element's univocal name. In addiction, it is possible to assign a set of specific attributes to each edge, depending on its kind of Basic Element (EDP or Clue). Hence, we decided to represent all the basic elements found in an analyzed system as a graph G={V,E} where: • V is the set of nodes consisting of all the classes belonging to the analyzed system; • E set of edges consisting of all found basic elements. In this graph we are looking for design pattern instances identifying in this way the roles of the nodes (and therefore of the classes) in a design pattern instance. The role of a design pattern class is intended as the precise task the class has to do in the overall structure of the design pattern, and it determines its functions. Moreover, in the same design pattern the same role can be assigned to more than one class, if its definition allows it: in this case the role modifies its multiplicity. Among its functionalities, the Joiner module has the task to infer each role of each class starting from the class functionalities, or better from its basic elements. 2.2 Recognition Rules The recognition rules of the Joiner define the roles, the relations between roles and the multiplicity of roles. In relation to the basic elements and the graph structure we decided to divide the search in two steps: • Strict rule: this first step has to find the class roles into the graph, don't taking care of the role multiplicity; • Results merging: this second step takes as input the roles found in the first phase and merges them basing on the multiplicity specified in the rule. To explain the search process on the graph, we'll take as an example the Abstract Factory Design Pattern (see Figure 2), by introducing the corresponding “Joiner Model”:
Figure 2: Generic Abstract Factory Representation
Where the nodes and the edges stand for: AF: Abstract Factory CL: Client CF: Concrete Factory P: Product AP: Abstract Product
1: delegate 2: retrieve 3: inheritance 4: create object 5: abstract interface
2.3 Strict Rules Considering the problems related to graph matching using rule systems, the first step of our search process is a subgraph search in the original graph using rules (see Figure 3) with these characteristics: • The nodes represent the class roles in the design pattern; • The edges represent the basic elements linking the roles; • The roles are univocal.
Figure 3: Strict Rule for Abstract Factory
During this phase we do not search if a role can be fulfilled by more than one class, but we only look for all the subgraphs isomorphic to the rule. So we get as result for each rule a set of disjoint subgraphs, that can have common elements. Each of these sets can have only one class for each role specified in the rule, so the main task of this first phase is to assign the design patterns roles to their corresponding classes (see Figure 4).
Figure 4: Strict Rule Application on Abstract Factory Note that the rule application gives as result four distinct structures, which may have common nodes. 2.4 Result Merging A design pattern role can be fulfilled by more than one class so the first phase must be improved by a subsequent process that merges the subgraphs referring to the same instance of a design pattern. Analyzing the recognition rules properties we found that the basic elements that link the roles are similar to the relations between entities in an entity-relationship (ER) model for databases; in particular each relation can be defined with a multiplicity type of ``1 to 1'' or ``1 to many'' (see Figure 5).
Figure 5: Merge Rule for Abstract Factory By representing this merge rule in terms of tables and attributes, we obtain tables having some roles as attributes, and other tables having references to the first group of tables with roles as attributes. With this structure we can identify into a design pattern many multiplicity levels, which are many sets in which all elements have the same multiplicity, organized hierarchically. So, in this first hierarchy we have all the roles with multiplicity 1 present in the design pattern, and as children other sets of roles having all the same multiplicity (see Figure 6).
Figure 6: Relational Grouping of Roles in Abstract Factory By implementing a tree data structure to respect the level definition (similarly to an equivalent relational structure), and merging all the subgraphs found into this structure, it is possible to reconstruct the detected design patterns: each distinct design pattern will be obtainable visiting each of the trees, which are built starting from each distinct root at the first level of multiplicity (see Figure 7).
Figure 7: UML Tree Structure for Results Merging Filling this tree leads to the fact that if for the same role different classes are found and this role is in the same multiplicity level of another role, then in the tree will be created two separate nodes representing the identified classes, each one associated to the same class that was present in the role at the same level. These nodes represent all the variations there can be at a certain tree level. For the design pattern definition itself, to each multiplicity level of a certain type corresponds different classes for each role, only one of the nodes is the one we are looking for. All the others can be deleted. This operation of deleting wrong branches from the result must be done in a subsequent phase, automatically if there can be defined other refining rules, or with the collaboration of the users, using an interface which allows the deleting of wrong branches. 2.5 Implementation The choice of the implemented algorithm type has been done after an analysis of the problem's size and of some known solutions for this application domain. In fact one of the major problems found in other existent tools for design pattern detection has been the computation time. Normally logic rule resolution approaches, like Prolog, are very slow, due to the fact that in order to solve the problem one needs to create a huge number of logic expressions which have to be all computed by the analyzer (i.e. Pat [6], PTIDEJ [3]);
in other cases [3, 14] metrics have been added to these rules to make them more precise and to discard many useless cases. In [5] is described another very interesting algorithm of reduction of subgraphs matching to bit-string matching from the bioinformatics area, using techniques derived from DNA pattern research; unfortunately the implementation of these algorithms is quite complex and it takes a long setup time, and its precision level is not always very clear. In our case we preferred to use a quite simple algorithm, as suggested by reading the book [8], based on a constrained backtracking during the graph deep-first search. Our issue is characterized by a big number of elements in its search scope, but the search rules (the strict rule) use a small number of elements at a time (never more than ten). Starting from these considerations, we assume that the number of recursive calls is small, hence the algorithm complexity is low (approximately linear). We obtained the first results using an “hardcoded” algorithm specific to the “Abstract Factory” design pattern that, analyzing a program composed by 250 classes (JHotDraw [1]), gives a result in 4 seconds finding 16 possible design pattern instances.
3 Conclusions and Further Work The actual development of the Joiner module is in a growing stage, so the study of other implementations will go on, in parallel with the evolution of our approach. We will verify in particular if the currently used recognition algorithm is the most appropriate or if it can be further improved. If we will maintain the actual algorithm, the tasks to perform further are the completion of the algorithm generalization and the creation of the rule catalogue for the creational design patterns category; the catalogue will be completed with all the other design patterns Another important step is the conceiving of a manner to take advantage of the additional attributes of the basic elements; this phase must be performed after the rule merge, in order to be able to delete the highest possible number of false positives due to the conditions checked only after the merging process. The results obtained by Joiner will be given in input to the neural network module (see Figure 1). Currently, this module has been trained through EDPs and Clues for creational and behavioral design patterns.
References 1. 2.
Erich Gamma and Thomas Eggenschwiler. JHotDraw, 2006. Joseph (Yossi) Gil and Itay Maman. Micro patterns in Java code. In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on Object Oriented Programming, Systems, Languages, and Applications, pp 97–116, New York, NY, USA, 2005. ACM Press.
3.
4.
5.
6.
7. 8. 9.
10.
11.
12. 13. 14.
H. Albin-Amiot and Y. –G. Gueheneuc. Meta-modeling design patterns: application to pattern detection and code synthesis. In proceedings of the 1st ECOOP 2001 workshop on Automating Object-Oriented Software Development Methods. 2001, TR-CTIT-01-35. J. P. Wadsack, J. Niere and J. Wendehals. Design pattern recovery based on source code analysis with fuzzy logic. Technical Report TR-RI-01-222, University of Paderborn, Paderborn, Germany, March 2001. O. Kaczor, Y. –G. Gueheneuc, and S. Hamel. Efficient Identification of Design Patterns with Bt-vector Agorithm. Proceedings of the 10th European Conference on Software Maintenance and Reengineering, 2006, pp.175–184, 2006. C. Kramer and L. Prechelt. Design recovery by automated search for structural design patterns in object-oriented software. In WCRE ’96: Proceedings of the 3rd Working Conference on Reverse Engineering, pp. 208, Washington, DC, USA, 1996. IEEE Computer Society. S. Maggioni. A New Approach for Design Pattern Recognition through Design Pattern Clues MsC Thesis, Università degli Studi di Milano-Bicocca, Italy, 2005. S. S. Skiena. The Algorithm Design Manual. Telos, Springer-Verlag, 1998. J. McC. Smith, D. Stotts, Elemental Design Patterns: A Formal Semantics for Composition of OO Software Architecture, In Proceedings of the 27th Annual IEEE/NASA Software Engineering Laboratory Workshop, Greenbelt, MD, 2002, pp. 183-190 J. McC. Smith and D. Stotts. Elemental Design Patterns: A Link Between Architecture and Object Semantics. Technical Report TR02-011, Computer Science Department, University of North Carolina at Chapel Hill, March 2002. J.McC. Smith. An Elemental Design Pattern Catalog. Technical Report 02-040, Computer Science Department, University of North Carolina at Chapel Hill, December 2002. C. Tosi. A New Approach for Design Pattern Detection based on subcomponents: EDPs and Clues, MsC Thesis, Università degli studi di Milano-Bicocca, 2006. M. Zanoni. Joiner: From Subcomponents to Design Patterns. MsC Thesis, Università degli studi di Milano-Bicocca, Italy, 2006. Y. -G. Gueheneuc,, H. Sahraoui, F. Zaidi, Fingerprinting Design Patterns, Proceedings of the 11th working Conference on Reverse Engineering, 2004, pp. 172-181