Abstract. In this paper we present a two phase approach to the recovery of structural design pattern. In the first phase, the design pattern instances are identified ...
A Two Phase Approach to Design Pattern Recovery Andrea De Lucia, Vincenzo Deufemia, Carmine Gravino, Michele Risi Dipartimento di Matematica e Informatica Università di Salerno 84084 Fisciano(SA), Italy {adelucia, deufemia, gravino, mrisi}@unisa.it
Abstract In this paper we present a two phase approach to the recovery of structural design pattern. In the first phase, the design pattern instances are identified at a course-grained level by considering the design structure only and using a visual language parsing technique. Then, the identified candidate patterns are validated by a fine-grained source code analysis phase. The latter phase is an enhancement of a previous approach developed by the authors aiming at improving the results of precision and time performances. The retrieval effectiveness of the approach is assessed by applying the recovery technique on four software systems.
1. Introduction Design patterns abstract a reusable object-oriented design that solves common non-trivial recurring design problems in a particular context [14]. A design pattern describes the roles, responsibilities, and collaboration of participating classes and instances. Thus, the extraction of design pattern information from source code helps the comprehension of the adopted solution for a software system. Recovered design pattern instances can also be used to highlight wished properties of the design model, which can be reused whenever a similar problem is encountered and improve the documentation associated with the system. Indeed, they clearly provide information about class and object interactions and their underlying intent [14]. Moreover, design patterns usually reduce the component coupling, which allow designers to perform certain types of program evolution with minimal changes. However, since pattern descriptions are abstract and informal, and not explicitly documented in software source code, their recovery has to be manually performed in most cases. In this paper we describe an effective recovery
approach for structural design patterns, which is characterized by high precision values. The recovery process is organized in two phases. In the first phase, the class diagram is extracted from source code and design pattern instances are identified based on the design structure only by using a recovery technique based on visual language parsing [6]. In the second phase the identified candidate patterns are validated by performing a source code analysis, which eliminates many false positives and consequently increases the precision of the approach [27]. The retrieval effectiveness of the proposed approach has been assessed by applying the recovery process on two versions of JHotDraw library (5.1 and 6.0b1), QuickUML 2001, and Apache Ant 1.6.2. JHotDraw library is a software system that was originally developed to illustrate the good use of design patterns for designing systems [16]. As a consequence, the high density of design patterns contained in JHotDraw has allowed us to stress the new approach and compare it with the previous one [6]. On the contrary, QuickUML 2001 and Apache Ant have allowed us to analyze how our recovery approach works on software systems with a lower density of design patterns. The results of the case studies in terms of precision and time performances are considerably enhanced with respect to those obtained by using our previous approach [7]. In particular, we obtained precision values ranging from 0.80 to 0.97 which are very promising and suggest that the fine-grained source code analysis phase has effectively improved the recovery technique. The paper is structured as follows. In Section 2 related work in the areas of design pattern recovery are reported. The proposed design pattern recovery process is presented in Section 3, while Section 4 is devoted to report the results of the performed case studies. A discussion of the results is presented in Section 5. Conclusions and suggestions for future work end the paper.
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
2. Related work In this section we present a discussion of some works related to our by considering the employed pattern identification strategy, the representation used for coding design patterns, and the kind of support they provide for recognition (i.e., manual, semi-automatic or automatic pattern recovery). SPOOL is a tool that retrieves design patterns from C++ code based on structural descriptions of design patterns [18][19]. The relevant C++ source code elements are represented in UML/CDIF format, whereas the patterns are represented as abstract design components and stored in a central repository. The tool supports the manual, semi-automatic, and automatic recovery. The automatic approach presented in [26] extends an algorithm searching for minimal key structures employed in [19]. Key structures define the minimal class and/or object structure that must be present to identify a pattern. The aim of the authors was to improve the recall and precision values characterizing the method in [19] and to cover all the GoF design patterns [14]. This is accomplished by representing patterns in terms of positive and negative search criteria which are applied on the tree of classes representing the input source code. The Pat system is able to recover instances of structural design patterns using information on the pattern class structure [21]. In particular, design patterns are represented as Prolog rules whereas source code is expressed in terms of Prolog facts. Thus, instances of design patterns are retrieved by applying a Prolog query. Some information such as the difference between concrete and abstract classes is not extracted. The approach is automatic, but false positives have to be removed manually. In [3] Balanyi and Ferenc presented an automatic method for discovering design patterns in C++ source code which employs a pattern miner algorithm. In particular, the C++ code is represented as an Abstract Semantic Graph (ASG), obtained by using the Columbus system, the patterns to be retrieved are stored in a XML-based language (DPML), and the recovery algorithm matches the XML DOM tree obtained from the DPML descriptions to the ASG. In order to differentiate true and false positive pattern instances the recovery technique has been enhanced by employing machine learning methods [12]. In particular, true and false instances of patterns are distinguished by using a learning database created by manually tagging a large C++ system [12]. The approach presented in [22] employs a semiautomatic approach based on cliché recognition and
graph transformation rules. The input source code is expressed in terms of an Abstract Syntax Graph (ASG) and design patterns are defined as graph transformation rules with respect to this graph. The recovery process is performed by a deductive analysis algorithm iteratively applied by users, who annotate the ASG with information concerning the identified pattern instances. In [23] the approach has been enhanced by combining graph transformation rules with fuzzy logic. The automatic approach presented in [1] combines the use of a design pattern library and cliché matching [29] with OO software metrics and structural information to reduce the number of checks in a multistage recovery process. The source code is translated into an AOL (Abstract Object Language) sentence which is then parsed to obtain an Abstract Syntax Tree (AST). Instances of design patterns are identified by using software metrics and verifying structural properties on the AST. Another automatic approach based on metrics has been proposed in [20]. The method is able to represent the source code in terms of metrics that can be of three possible categories: object oriented, structural, and procedural. Each design pattern to be recognized has associated a signature calculated from metrics on GoF patterns. The pattern searching algorithm compares the metrics of each class with the signature of the patterns. In [24], Olsson and Shi propose a recovery approach based on a reclassification of design patterns for reverse engineering purposes. The recovery approach is based on a static analysis which exploits inter-class relationships. The program behavior is efficiently recognized using a lightweight static program analysis. These analysis are performed on symbol tables and ASTs constructed by a compiler. The tool PINOT implements the proposed method for the automatic identification of design patterns. Kaczor et al. [17] adapted a bit-vector algorithm inspired to bio-informatics for identifying design pattern instances. Patterns and software systems to be analyzed are expressed in terms of string representations, which are formed by classes and relationships between them (association, aggregation, composition, instantiation, inheritance and dummy). Thus, the design pattern recovery problem is converted in a problem of approximate string matching using bitvectors. In particular, the algorithm matches the string representations of design pattern motifs and of software systems to be analyzed by performing operations on finite sets of bit-vectors. In [6] the authors proposed a design pattern recovery approach based on a visual language parsing technique. The recovery technique makes use of a design pattern library, expressed in terms of visual grammars, and is based on an extension of the LR-
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
parsing. In particular, the method first constructs a UML class diagram represented in SVG format from the input source code, and then reduces the problem of recovering design patterns to the problem of recognizing subsentences in a class diagram, where each subsentence corresponds to a design pattern specified by a visual language grammar. In [7] the recovery technique has been enhanced by including negative search criteria which are able to remove several false positives. The two phase design pattern recovery approach presented in this paper extends the recovery technique proposed in [7] with a low-level analysis, which is able to identify many other false positive pattern instances. The source code of the programs to be analyzed is represented using a data structure obtained by Source Navigator, which consists of inter-class relationships. The visual language grammar representing the patterns to be recovered and the low-level checks are encoded into a sequence of queries. The recovery process consists in the application of these queries on the data structure representing the input source code. This is similar to the detection process of the structural-driven patterns proposed in PINOT [24]. However, our approach performs a course-grained analysis which reduces the number of pattern candidates on which low-level checks are applied. Moreover, PINOT considers a reclassification of the GoF patterns to facilitate the recognition of design patterns as detailed in Section 5.
3. The design pattern recovery process In this section we present the two phase recovery approach for structural design patterns depicted in Figure 1. In the first phase, the candidate design patterns are identified at a course-grained level by analyzing the class diagram structure obtained during a preliminary analysis of the OO source code. The recovery technique applied in the first phase is the same one integrated in the visual environment DPRE proposed in [6], which is based on visual language parsing. The recovery technique reduces the problem of identifying design patterns to the problem of recognizing subsentences in a class diagram, where each subsentence corresponds to a design pattern specified by a visual language grammar. However, since DPRE was a prototype automatically generated from a grammar specification by using the visual language programming generator VLDesk [8], the time performances of the recovery process were affected by several inefficiencies in the graphics management of the visual editor. In particular, the loading process of
the class diagrams in the DPRE visual editor was not scalable, preventing the application of the proposed recovery approach on large-scale software systems. In order to overcome these inefficiencies we have developed an engineered version of the recovery tool 1 . This system can be easily integrated into general purpose tools able to visualize UML artifacts. In the second phase a fine-grained source code analyzer validates the identified candidate patterns, i.e., the goal is to verify if they are real patterns. This is accomplished by analyzing the source code to verify the relationships between the classes involved in the candidate design patterns. In the following each module of the process is described and analyzed.
3.1 Preliminary analysis The Preliminary Analysis module extracts the structural information proper to recover design patterns and stores them into a database, so that they can be efficiently accessible and navigable. In particular, the name of the classes, the name and type of the methods and variables, the name and type of the method parameters, inheritance and association relationships are stored in a suitable data structure. To accomplish this task we have used the code analysis tool Source Navigator [28], which is able to recover almost all the necessary information. Moreover, Source Navigator supports several programming languages, such as C++, Java, and Python, and provides APIs allowing programmers to construct a specific parser. However, Source Navigator is not able to recover some relevant information for design pattern identification. As an example, in case of Java it does not extract the inheritance relationships between classes where the super class is an interface. To overcome these deficiencies of the tool we have implemented a module which analyzes the source code and recovers the missing information. The obtained information can be easily analyzed to construct the corresponding UML class diagram and then visualized by a graphical editor.
3.2 Structural analysis The Structural Analysis identifies candidate design patterns by navigating the class diagram stored in the data structure obtained by the previous phase. This task is accomplished according to the visual language recovery technique proposed in [6]. In particular, the recognition process is an engineered version of the approach proposed in [6], where instead of being 1
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
www.sesa.dmi.unisa.it/dpr
Class Diagram
OO code
1th phase
Preliminary Analysis
th Candidate 2 phase Patterns
Structural Analysis
Low-level Analysis
Recovered Patterns
Figure 1. The design pattern recovery process automatically generated from a design pattern grammar specification by VLDesk tool [8], the design pattern recognition rules have been implemented by ad-hoc algorithms. This has allowed us to considerably improve the time performances of the recovery process and also eliminate several inefficiencies in the graphics management of the visual editor generated by VLDesk. The new version of the pattern recovery algorithm has also been optimized with respect to the number of accesses to the input data structure. Indeed, the parsing algorithm of the previous approach performed retrieved operations on both classes and relationships of the class diagram. As an example, to recover an adapter pattern the algorithm first looked for the class playing the role of Target, then the inheritance relationship, the Adapter class, the delegation relationship, and finally the Adaptee class. Thus, the algorithm launched the operation retrieving the next symbol to be parsed five times. On the contrary, the new approach does not look for both classes and relationships, but the identification of patterns is driven by the association and inheritance relationships only, which are indexed in the input data structure. As an example, to recover an adapter pattern the algorithm looks for all the inheritance relations between a superclass Target and a subclass Adapter, and then it looks for association relations between the previous Adapter and a class Adaptee (see Figure 2). In this case, the operation retrieving the next symbol to be parsed is launched two times. 1
2
Figure 2. Structural identification of participating classes in the Adapter Pattern The recovery algorithm also includes the negative criteria checks involving the structure of the identified design patterns as done in the previous approach
[7][26]. Negative criteria can be considered as a set of negative relationships between software artifacts, which allow us to reduce the number of false positive pattern instances. As an example for the adapter pattern the algorithm checks that no inheritance relationship exists between Adapter and Adaptee classes, and no relationship exists between Target and Adaptee classes.
3.3 Low-level analysis The candidate design patterns identified by the previous module are recovered taking into account the structural information of the classes only (i.e., considering the association and inheritance relationships between classes). The goal of the Low-level Analysis module is to verify that the candidates are real instances of structural design patterns. To this aim, it analyzes the stereotypes associated to the classes, such as the interface stereotype, for validating the role of the participating classes and verifies the method declaration in the delegating classes. Furthermore, the same kinds of checks are used to verify negative criteria requiring source code analysis. All these validation activities are effectively carried out by exploiting the information extracted during the preliminary analysis. In particular, the checks performed during the lowlevel analysis are: x Extend – where the inheritance relationships are verified by analyzing the overridden methods; x Extend abstract – where the inheritance relationships are verified by analyzing the implemented methods defined in the interface; x Delegation – where the delegations are verified by analyzing the method invocations in the delegating classes. As an example, for a candidate Adapter pattern (see Figure 3) the low-level analyzer performs: - an Extend abstract check between the Target and the Adapter classes; - a Delegation check between the Adapter and the Adaptee classes. In the latter check, the delegation is verified against the information obtained in the first check. Indeed,
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
specificRequest() is invoked by a method implemented in Adapter, which is defined in Target also. Finally, two negative criteria are verified: 1) no relationships exist between Target and Adpatee classes; 2) Adapter classes are not sub-classes of Adaptee classes.
- Delegation between Composite and Component classes, since Operation() of Composite has to invoke a method of the Component class. The negative criteria for the candidate composite pattern verify that there not exists a relationship from the Leaf class to the Composite class.
Figure 3. The Adapter Design Pattern The low-level checks applied to a candidate Bridge pattern (see Figure 4) are: - Extend Abstract between Abstraction and RefinedAbstraction classes, and between Implementor and ConcreteImplementor classes, since each method (e.g., OperationImp()) declared in the Implementor class has to be implemented by a OperationImp() of a ConcreteImplementor class. - Delegation between Abstraction and Implementor since Operation() of Abstraction has to invoke a method OperationImp() of Implementor class. The negative criteria verified for the candidate bridge pattern verifies that there not exist relationships from an implementation class to an abstraction class.
Figure 5. The Composite Design Pattern Figure 6 shows the class diagram describing the Proxy pattern. The low-level checks applied to a candidate Proxy pattern are: - Extend Abstract between Subject and RealSubject classes, and between Subject and Proxy classes, since each method (e.g., Request()) declared in the Subject class has to be implemented by a method Request() of a RealSubject or Proxy class. - Delegation between Proxy and RealSubject since Request() of Proxy has to invoke a method of RealSubject class. The negative criteria for the candidate composite pattern verify that the Proxy class is not an ancestor of the RealSubject class.
Figure 4. The Bridge Design Pattern Figure 5 shows the class diagram describing the Composite pattern. The low-level checks applied to a candidate Composite pattern are: - Extend between Component and Composite classes, for checking which methods declared in Component are overridden by Composite (e.g., Operation()). - Extend between Component and Leaf classes, for checking which methods declared in Component class are overridden by Leaf (e.g., Operation()).
Figure 6. The Proxy Design Pattern Figure 7 shows the class diagram describing the Decorator pattern. The low-level checks applied to a candidate Decorator pattern are: - Extend Abstract between Component and
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
ConcreteComponent classes, and between Component and Decorator classes, since each method (e.g., Operation()) declared in the Component class has to be implemented by a method Request() of a ConcreteComponent or Decorator class. - Extend between Decorator and ConcreteDecorator, for checking which methods declared in Decorator class are overridden by ConcreteDecorator (e.g., Operation()). - Delegation between Decorator and Component, since Operation() of Decorator has to invoke a method of Component class. The negative criteria verified for the candidate composite pattern are: 1) there not exists a relationship from the ConcreteComponent class to the Decorator or ConcreteDecorator classes; 2) there not exists a direct relationship from the Component class to the ConcreteDecorator class.
QuickUML is a tool for designing software, which has been created to build and manipulate class diagrams based on a core set of the UML notation [9]. Apache Ant is a software tool for automating software build processes. It can be considered as a kind of application like make utility but it is written in Java and is primarily intended for use with Java [2]. Table 1 contains some statistics on the software systems under analysis, while Table 2 summarizes the results obtained by applying the two phase recovery process. Table 1. Some statistics of the four software systems considered in the case studies JHD 5.1
Quick UML
JHD 6.0b1
Apache Ant
#LOC
8300
8792
24222
99259
#Classes
155
217
544
1249
#DI
4612
3849
13079
36070
#CRDPI
98
81
197
546
#DI = number of delegations and inheritances #CRDPI = number of classes involved in the recovered design pattern instances
Table 2. Design pattern instances recovered in the analyzed software systems JHD 5.1
Quick UML
JHD 6.0b1
Apache Ant
#Adapter
41
27
53
71
#Bridge
75
22
166
51
#Composite
0
0
5
5
Figure 7. The Decorator Design Pattern
#Façade
9
16
20
111
Due to space limit, we have not reported the description of the Façade design pattern and the performed low-level checks.
#Proxy
0
1
0
0
#Decorator
0
0
0
0
4. Experimental results In this section we describe the case studies performed to highlight the performances of the proposed recovery approach. To this end we examine two versions of JHotDraw (5.1 and 6.0b1), QuickUML 2001, and Apache Ant version 1.6.2. JHotDraw is a Java framework for drawing twodimensional graphics in structured drawing editors [15], and can be used for developing custom-made drawing editor applications. It was originally developed to illustrate the good use of design patterns for designing and documenting systems [16]. Thus, it is an ideal candidate to verify the effectiveness of a design pattern recovery approach.
Table 3 reports the times to recover the design pattern instances for the analyzed software systems, obtained using a Pentium 4 with 3,2Ghz and 1Gb MB of RAM. Notice that: - SA indicates the structural analysis phase of the recovery process, - LA indicates the low-level analysis phase of the recovery process, - ED-checks denotes the checks performed during the LA for validating both the participating role of each class and the relationships in the candidate pattern instances (without applying negative criteria), - NC-checks represents the negative criteria performed during the LA, and - 2-Ph indicates the entire two phase recovery technique.
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
The results highlight that the retrieval time for a software system is not only dependent on its size but it is also influenced by the number of identified pattern instances. Table 3. Times characterizing each stage of the proposed two phase recovery process
checks and NC-checks. Moreover, no instances of Decorator pattern have been recovered for the considered software systems. Compared to the results obtained by other recovery approaches the low number of recovered Proxy and Decorator instances is probably due to the use of a strict definition of these patterns. SA
JHD 5.1
Quick UML
JHD 6.0b1
Apache Ant
40
SA
28. 5
26.65
235.45
3666.79
30
ED-checks
1.99
1.15
4.77
20.06
NC-checks
1.7
1.71
23.69
421.66
total
3.69
3.46
28.46
441.67
32,19 2-Ph The times are in seconds
29.48
263.91
4108.46
LA
SA
SA+ED-checks
770
700 600 500 400
282
300 200 100
109
41
0
41
JHotDraw 5.1
55
47
120 27
QuickUML 2001
145 53
JHotDraw 6.0b1
71 Apache ANT
Figure 8. Recovered instances of Adapter pattern at each stage SA
SA+ED-checks
300
2-Ph
264
250 205
200
166
157
150 100
101
80
75
73
49
50
25
35
35
20
2-Ph
800
2-Ph
25
To show how the low-level analysis phase affects the design pattern recovery process let us analyze Figures 8, 9, and 10, which depict the recovered design pattern instances at each stage of the recovery process.
900
SA+ED-checks
15 10 5 0
3 0
0
0
JHotDraw 5.1
1
1
1
QuickUML 2001
0
0
JHotDraw 6.0b1
0
0
Apache ANT
Figure 10. Recovered instances of Proxy pattern at each stage Regarding the Adapter and Bridge patterns, we can observe that the number of eliminated false positives using ED-checks is greater than the number of false positives removed by applying NC-checks. This result is particularly interesting since the execution of EDchecks requires few seconds with respect to the time to accomplish the first phase (see Table 3). On the contrary the application of NC-checks requires much more time even if less false positives are removed. Concerning the Proxy pattern, for JHotDraw 6.0b1 and Ant all the pattern instances recovered in SA are eliminated by using ED-checks, whereas no Proxy instance is recovered for JHotDraw 5.1 and no false positive is identified for QuickUML. Figure 11 shows an instance of Adapter pattern recognized by executing the first phase of the recovery process on JHotDraw 5.1. This instance has been recognized as false positive by executing the EDchecks in the low-level phase. This is due to the fact that the delegation relationship involves the method write() of AbstractConnector, which is not declared in the Connector class.
51
22
0 JHotDraw 5.1
QuickUML 2001
JHotDraw 6.0b1
Apache ANT
Figure 9. Recovered instances of Bridge pattern at each stage We have reported only instances for Adapter, Bridge, and Proxy design patterns since for the other structural patterns there are not significant differences in the number of instances recovered from SA and LA. In particular, for Composite and Façade design patterns we did not eliminate false positives by applying ED-
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
Figure 11. A false positive Adapter pattern instance of JHotDraw 5.1 identified by using ED-checks
Figure 12 shows a false positive Adapter pattern instance of JHotDraw 5.1 identified by using NCchecks. In this example the violated negative criterion is “no relationships exist between Target and Adpatee classes” since the method tool() of StandardDrawingView delegates the method tool() of DrawingEditor.
The software systems considered in our case studies have been taken into account by other pattern recovery tools also. However, their recovery techniques use a different definition of patterns with respect to our [14], so the obtained results are not comparable. In particular, some differences between our definitions of structural design patterns and the ones presented in [24] are relative to the use of the multilevel inheritance relationships and the interpretation of the delegation relationships. As an example, for JHotDraw 6.0b1 PINOT identifies an instance of Adapter pattern having the class CreationTool as Target, the class TextAreaTool as Adapter, and the class FloatingTextArea as Adaptee. Our technique recognizes this occurrence as false positive since the delegation relationship involves the method beginEdit() of TextAreaTool, which is not declared in the CreationTool class.
Figure 12. A false positive Adapter pattern instance of JHotDraw 5.1 identified by using NC-checks
2-Ph
DPRE
1,20 1,00
0,93
0,97
0,97
0,82
0,80
5. Discussion
0,60
0,47
0,53 0,40
0,40
A limitation of the current version of the proposed recovery approach concerns the handling of inheritance relationships. Indeed, we have implemented a definition of patterns that does not take into account multi-level inheritance relationships. As an example, in the case of the Adapter pattern if there is a class between the Target and Adapter the pattern is not identified. In order to validate the results of the recovery process we have manually verified the discovered instances of design patterns. In particular, for the analyzed software systems we have obtained the precision values reported in Figure 13, which also includes the results obtained with the approach proposed in [7] (implemented in DPRE). Thus, the precision measure has been considerably improved by the fine-grained source code analysis phase. The previous case studies performed with DPRE were characterized by high precision values also [7]. This is motivated by the fact that the previously analyzed software systems had a low density of design patterns. Thus, the previous case studies did not highlight the main limitation of DPRE, i.e., the precision decreases with software systems characterized by high density of pattern instances. On the contrary, the software systems considered in this paper contains many pattern structures having shared classes and requiring source code analysis to be disambiguated.
0,21
0,20 0,00 JHotDraw 5.1
QuickUML 2001
JHotDraw 6.0b1
Apache Ant
Figure 13. Precision values obtained with 2-Ph and DPRE The approach based on the bit-vector algorithm [17] differs from our technique in the approximations applied to identify the occurrences of design patterns. In particular, we use the same Association Relationships approximation (i.e., the association, aggregation, and composition relationships are approximated and considered similar in the recovery process), but we do not take into account the Entity Count approximation (some roles of design patterns can be missed in the recovered instances) and Inheritance Relationship approximation (design pattern instances could include multi-level inheritance relationships). Moreover, we use a different counting of pattern instances. As an example, an occurrence of Bridge pattern with two ConcreteImplementor classes is counted as one instance in our approach and two instances by Kaczor et al. [17]. The same difficulty of comparing different approaches and tools employing different definitions of design patterns has been documented by Fülöp et al. [13], who compared Columbus [11], Maisa [25], and CrocoPat [5] on four C++ open-source software
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
systems. The authors concluded that “a more formal description of design pattern is very desirable” [13]. In the following we report the results obtained by other approaches considering the precision measure. The approach proposed by Antoniol et al. in [1] has been verified on two software systems comparable with those used in our case studies, i.e., LEDA version 3.4 having 208 classes and ET++ with 704 classes. Since the employed recovery technique applies a conservative approach for the identification of associations and aggregations, it achieves a recall value equals to 1 but the precision values are very low (average 0.3). The case studies provided in [21] considered systems with classes ranging from 150 and 343 and are characterized by a recall value equals to 1, while the precision values are between 0.14 and 0.50. Similar precision values characterize the tests performed by using the metric based approach proposed in [20] that employed toy systems. The approach proposed in [3] has been tested on four open-source systems whose number of classes ranges from 329 to 6729, and many of the analyzed design patterns are characterized by a percentage of 100% of true pattern instances compared to the found ones. However, some design patterns, like Adapter, Builder, and Proxy are characterized by very low percentages.
6. Final remarks In this paper we have described a two phase recovery approach for structural design patterns characterized by high precision values and acceptable time performances. In the first phase the class diagram extracted from the source code is analyzed for identifying design structures that are candidate pattern instances. This is accomplished using a recovery technique based on visual language parsing [6]. In the second phase the code of the classes involved in the identified candidate patterns is examined for verifying their compliance to the code-level constraints defined by the corresponding structural patterns. These checks allow us to eliminate many false positives and consequently to increase the precision of the approach. The recovery process has been assessed on four software systems of different size: two versions of JHotDraw library (5.1 and 6.0b1), QuickUML 2001, and Apache Ant version 1.6.2. The obtained results show that the approach is characterized by precision values ranging from 0.82 to 0.97. These results are considerably enhanced with respect to those obtained by using our previous approach [7].
In the future we plan to investigate how to extend our approach to other categories of design patterns proposed in the literature [14]. Moreover, we will study how to extend the design pattern approach to recover instances containing multi-level inheritance relationships also. The proposed technique is currently implemented as a stand-alone tool, where the output of the recovery process is stored in a HTML file. The previous recovery tool (DPRE) was integrated in the visual editor generated by using VLDesk and the results were presented in the visualized class diagram. Taking into account this kind of visualization we plan to develop the design pattern recovery technique as an Eclipse plug-in. In particular, the plug-in will interact with the standard Eclipse UML plug-in in order to visually highlight the recovered pattern instances. Finally, we intend to study the evolution of software systems at design level by analyzing the evolution of design patterns. In particular, the information on how the recovered pattern instances evolve in different versions of software systems will be used to understand design and evolution decisions.
Acknowledgements We want to thank Stefania Celenta for her contribution in the implementation of the design pattern recovery tool.
References [1] G. Antoniol, G. Casazza, M. Di Penta, R. Fiutem, “Object-oriented Design Pattern Recovery”, The Journal of Systems and Software, 59(2), 2001, pp.181196. [2] Apache Ant, http://ant.apache.org/ [3] Z. Balanyi and R. Ferenc, “Mining Design Patterns from C++ Source Code”, in Proceedings of International Conference on Software Maintenance (ICSM’03), Amsterdam, Netherlands, 2003, pp. 305-314, IEEE CS Press. [4] ID Baxter, A. Yahin, L. Moura, M. Sant’Anna, L. Bier, “Clone Detection using Abstract Syntax Trees”, in Proceedings of International Conference on Software Maintenance (ICSM’98), Bethesda, MD, USA, 1998, pp. 368-377, IEEE CS Press. [5] D. Beyer, C. Lewerentz, “CrocoPat: Efficient pattern Analysis in Object-Oriented Programs”, in Proceedings of the 11th IEEE International Workshop on Program Comprehension (IWPC 2003), 2003, pp. 294–295, IEEE CS Press. [6] G. Costagliola, A. De Lucia, V. Deufemia, C. Gravino, M. Risi, “Design Pattern Recovery by Visual Language
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15] [16]
[17]
[18]
Parsing”, in Proceedings of 9th European Conference on Software Maintenance and Reengineering (CSMR’05), Manchester, UK, March 2005, pp. 102-111, IEEE CS Press. G. Costagliola, A. De Lucia, V. Deufemia, C. Gravino, M. Risi, “Case Studies of Visual Language Based Design Pattern Recovery”, in Proceedings of 10th European Conference on Software Maintenance and Reengineering (CSMR’06), Bari, Italy, March 2006, pp. 163-172, IEEE CS Press. G. Costagliola, V. Deufemia, and G. Polese, “A Framework for Modeling and Implementing Visual Notations with Applications to Software Engineering”, ACM Transactions on Software Engineering and Methodology (TOSEM), 13(4), 2004, pp. 431-487. E. Crahen, C. Alphonce, P. Ventura, “QuickUML: a beginner's UML tool”, 17th annual ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’92), 1992, pp. 62-63. T. R. Dean, James R. Cordy: A Syntactic Theory of Software Architecture. IEEE Transactions on Software Engineering, 21(4), 1995, pp. 302-313. R. Ferenc, A Beszedes, M. Tarkiainen, T. Gymothy, “Columbus – Reverse Engineering Tool and Schema for C++”, in Proceedings of IEEE International Conference on Software Maintenance (ICSM’02), Montréal, Canada, 2002, pp.172-181. R. Ferenc, A. Beszedes, L. J. Fülöp, J. Lelle, ”Design Pattern Mining Enhanced by Machine Learning”, in Proceedings of the 21th International Conference on Software Maintenance (ICSM’05), Budapest, Hungary, 2005, pp. 295–304. IEEE CS Press. L. J. Fülöp, T. Gyovai, R. Ferenc, “Evaluating C++ Design Pattern Miner Tools”, in Proceedings of the 6th International Workshop on Source Code Analysis and Manipulation (SCAM 2006), Philadelphia, PA, USA, 2006, pp. 127-138. E. Gamma, R. Helm, R. Johnson, J. Vlissides, “Design Patterns: Elements of Reusable Object-Oriented Software”, Addison-Wesley, Menlo Park, CA, 1995. JHotDraw, http://www.jhotdraw.org R. Johnson, “Documenting Frameworks using Patterns”, in Proceedings of Conference on Object-Oriented Programming, Systems Languages and Applications, (OOPSLA’92), 1992, pp. 63-72. O. Kaczor, Y. Guéhéneuc, S. Hamel, “Efficient Identification of Design Patterns with Bit-vector Algorithm”, in Proceedings of 10th IEEE European Conference on Software Maintenance and Reengineering (CSMR’06), Bari, Italy, 2006, pp. 163172, IEEE CS Press. R.K. Keller, R. Schauer, “Pattern Visualization for Software Comprehension”, in Proceedings of International Workshop on Program Comprehension (IWPC’98), Ischia, Italy, 1998, pp. 4-12, IEEE CS Press.
[19] R.K. Keller, R. Schauer, S. Robitaille, P. Pagé, “PatternBased Reverse-Engineering of Design Components”, in Proceedings of International Conference on Software Engineering (ICSE’99), Los Angeles, 1999, pp. 226235. [20] H. Kim, C. Boldyreff, “A Method to Recover Design Patterns Using Software Product Metrics”, in Proceedings of International Conference on Software Reuse: Advances in Software Reusability (ICRE’00), Vienna, Austria, 2000, pp. 318-335. [21] C. Kramer, L. Prechelt, “Design Recovery by Automated Search for Structural Design Patterns in Object Oriented Software”, in Proceedings of Working Conference on Reverse Engineering (WCRE’96), 1996, pp. 208-215, IEEE CS Press. [22] J. Niere, W. Shafer, J. P. Wadsack, L. Wendehals, J. Walsh, “Towards Pattern-based Design Recovery”, in Proceedings of International Conference on Software Engineering (ICSE’02), Orlando FL, USA, 2002, pp. 338-348. [23] J. Niere, L. Wendehals, A. Zündorf, “An Interactive and Scalable Approach to Design Pattern Recovery”, Technical Report, University of Paderborn, Paderborn, Germany, 2003. [24] R. Olsson, N. Shi, “Reverse Engineering of Design Patterns from Java Source Code”, in Proceedings of International Conference on Automated Software Engineering (ASE'06), Tokyo, Japan, Sept. 2006, pp.123-134, IEEE CS Press. [25] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, A. Verkamo, “Software Metrics by Architectural Pattern Mining”, in Proceedings of the International Conference on Software: Theory and Practice, Beijing, China, August 2000, pp. 325-332. [26] I. Philippow, D. Streitferdt, M. Riebish, S. Naumann, “An Approach for Reverse Engineering of Design Patterns”, Software System Model, 4(1), 2005, pp. 5579. [27] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983. [28] SourceNavigator. http://sourcenav.sourceforge.net/ [29] L. Wills, “Automated Program Recognition by Graph Parsing”, Ph.D Dissertation, MIT, 1992.
11th European Conference on Software Maintenance and Reengineering (CSMR'07) 0-7695-2802-3/07 $20.00 © 2007