In this paper we present case studies of recovering design patterns from OO source code. The recovery approach is based on the use of visual language parsing ...
Case Studies of Visual Language Based Design Patterns Recovery Gennaro Costagliola, Andrea De Lucia, Vincenzo Deufemia, Carmine Gravino, Michele Risi Dipartimento di Matematica e Informatica Università di Salerno 84084 Fisciano(SA), Italy {gcostagliola, adelucia, deufemia, gravino, mrisi}@unisa.it
Abstract In this paper we present case studies of recovering structural design patterns from object-oriented source code. The proposed recovery technique is based on the use of visual language parsing techniques, and is supported by a visual environment automatically produced by a grammar based visual environment generator. We have applied the recovery technique to public-domain programs and libraries obtaining encouraging results. In particular, for the considered software our recovery approach is characterized by higher recall and precision values with respect to other recovery techniques.
1. Introduction Design patterns represent a useful technique in forward engineering for improving communication between designers, for reusing successful practices, and for sharing knowledge between software engineers. However, patterns can also be used during reverse engineering of software systems since the extraction of the design pattern information from design and code helps the comprehension of the adopted solution for a system [1], [2], [7], [20], [21], [23], [24], [26], [28], [29], [34]. Moreover, this information can be used to highlight whished properties of the design model, which can be reused whenever a similar problem is encountered. As a matter of fact, a design pattern can be seen as a set of classes, related through aggregation and delegation, which represents a partial solution to a common nontrivial design problem [15]. The extraction of design pattern information from design and code can improve the system documentation and can guide the restructuring of the system. Moreover, the use of design patterns can help
project managers in measuring the quality of the design models and code to be used in the development of Object-oriented (OO) software systems. Indeed, when the artefacts are well documented and easily readable the modification of a system part does not require the modification of the entire system. In this paper we present case studies of recovering design patterns from OO source code. The recovery approach is based on the use of visual language parsing. In particular, the input OO source code is first reverse engineered into a UML class diagram represented in SVG format, and then the problem of recovering design patterns is reduced to the problem of recognizing subsentences in a class diagram, where each subsentence corresponds to a design pattern specified by a visual language grammar [11]. For this reason our approach can also be applied to higher abstraction level artifacts than source code that are produced with a UML visual editor during analysis or design. The recovery process is supported by the visual environment DPRE (Design Pattern Recovery Environment), which has been automatically generated from an eXtended Positional Grammar (XPG) specification by using VLDesk [9]. DPRE includes a visual editor used to present the (imported) class diagram and an LR-based parser to identify design patterns in it. In particular, such a parser linearizes the input at run-time, as the symbols are scanned in the order specified by the grammar productions. The approach focuses on structural design patterns, where information are explicit in their syntactic representation. The precision of the design pattern recovery process proposed in [11] is here improved by including checks of negative criteria in the grammar specification of design patterns [30]. The retrieval effectiveness of the recovery technique has been assessed by applying the recovery technique on public domain software. In
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
particular, we have considered the following programs and libraries: Galib++ [14], which is a C++ Genetic Algorithm Library used to solve optimization problems, Mec, which is a trace-and-replay program, Socket, which is a library for inter-process communication, and Libg++, which is part of the GNU Free Software foundation C++ development environment [25]. The results are encouraging and show the effectiveness of the proposed technique. In particular, our recovery approach is characterized by higher precision and recall values with respect to other class structure based recovery techniques such as those presented in [1],[23]. The paper is organized as follows. In Section 2 related work on design pattern recovery is described. In section 3 we report the design pattern recovery process introduced in [11] and the DPRE environment. Section 4 presents the case studies, while Section 5 discusses results and compares with other approaches. Conclusion and future work are given in Section 6.
2. Related Work Our approach is based on a design pattern library, expressed in terms of visual grammars and the recognition process is based on an extension of the LR-parsing applied to multidimensional languages, including graph languages. From this point of view, our approach has similarities with work by Linda Wills [35], where a cliché library represented by flow-graph grammars is used to recognize a program’s design [32], and with the architectural recovery technique proposed in [12] that uses a graph based representation. In the last years several methods have been proposed to address the problem of retrieving and identifying design patterns from source code and design documents. Most of the automatic or semiautomatic approaches presented in the literature are based on the availability of structural information about the patterns or equivalently a pattern library represented in some way. DP++ retrieves design patterns from C++ code by using a method that search the minimal class and/or object structure that must be present to identify a pattern, namely the minimal key structure [4]. The author described searching criteria for Composite, Decorator, and Adapter patterns. The minimal key structure technique is also used in other approaches, such as KT developed for Smalltalk [7], and SPOOL proposed for C++ [20],[21]. However, KT does not carry out source code analysis since it considers the properties of a meta level of Smalltalk. Another approach is able to automatically recover design
patterns from Smalltalk code [37]. SPOOL can extract design patterns from reverse engineered code. In particular, it is able to visualize generic patterns or ad hoc design patterns by using manual and semiautomatic recovery process, and storing information on the pattern recovery in a repository. An approach similar to those provided in [7],[4],[21] has been presented in [30]. The aim of the authors was to improve the recall and precision values and to cover all the design patterns defined in [15]. In particular, the proposed method try to reduce the problem of recovering patterns that are not really implemented, which characterizes the techniques described in [7],[4],[21]. Some of the proposed searching algorithms, like those for Composite and Interpreter patterns, are characterized by a value equal to 1 for recall and precision. As declared by the authors the recovery technique requires more evaluation, by considering larger software systems as case study. In [23] the authors presented a system named Pat which is able to recover instances of design patterns using information on the pattern class structure as described in [15]. A reverse engineering tool is used to retrieve artifact information and to represent patterns by Prolog facts and rules respectively. Observe that the source code analyzer of Pat gives the structural analysis of the code based on C++ header files. Thus, some information such as the difference between concrete and abstract classes are not extracted. The approach proposed in [3] uses basic structural information on pattern and information on call delegations, object creations and operation redefinitions. An internal representation, called Abstract Semantic Graph, is first obtained from input C++ code by using Columbus system, which is able to extract data according the Columbus schema [13]. Then, patterns are stored in a XML-based format, with the aim to allow user to easily modify or adapt design pattern descriptions. A semi-automatic approach based on cliché recognition and graph transformation rules has been proposed in [28], [29]. The method recovers instances of design patterns from source code and annotates the code with information concerning the retrieved patterns. Patterns are defined on the base of the Abstract Syntax Graph representation of the program which is obtained by parsing the source code. The software engineer can select the most appropriate pattern catalogue and modify, add, or remove graph transformation rules. Then, the inference process starts by applying repeatedly the rules in order to detect the most complete set of design patterns. Another approach based on the use of information about the pattern class structure has been presented in
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
[1]. In particular, the method combines the use of a library of design patterns and cliché matching [35] with OO software metrics and structural information to reduce the number of checks in a multi-stage recovery process. The source code is first translated into an intermediate representation, namely AOL (Abstract Object Language), then an abstract syntax tree is obtained by parsing an AOL sentence. Software metrics and structural properties are determined by visiting the constructed tree. The proposed method has been tested with different C++ libraries (Galib++, Mec, Socket, etc.). A metric based approach as been proposed in [22], which covers all the design patterns defined in [15]. A tool is able to determine a set of metrics that can be of three possible categories: OO, structural, and procedural metrics. A signature consisting of a set of values for the metrics is associated to each design pattern to be recognized. The pattern searching algorithm compares the list of metrics for each class of the system with the metrics for the patterns. Clone detection methods have also been used to infer structural information directly from design or source code [5], [26]. In these cases, the recovery process does not use a design pattern library; rather, patterns are retrieved by considering repetition of instances of design or code portions. On the same line, the approach proposed in [2] uses concept analysis to infer design patterns.
3. Visual Language based Design Pattern Recovery In this section we recall the design pattern recovery process introduced in [11], the visual environment DPRE (Design Pattern Recovery Environment), and the LR-based parsing technique used to identify design patterns in UML class diagrams.
3.1 Design Pattern Recovery Process The recovery process is organized in two phases, namely the UML class diagram extraction phase and the pattern recognition phase, as graphically depicted in Fig. 3.1. In the first phase the input OO code is translated into a class diagram represented in SVG format. In particular, the input OO source code is processed by the Class Diagram Abstractor which is able to generate the corresponding graph structure. The SVG translator adds layout information to the graph, and the corresponding UML class diagram is translated in SVG format.
In the second phase the UML class diagram in SVG format is imported into the DPRE visual environment. Then, design patterns are identified by using a parser which is integrated in the visual environment. The UML class diagram and the recovered pattern are shown to the software engineers through the DPRE visual editor. In particular, the visual environment shows a report containing the statistics on the recovered design patterns that also describes the classes and their role for each recovered pattern.
OO code
IEEE
SVG
Abstractor
Translator
SVG
DPRE Design patterns
user
Figure 3.1 The design pattern recovery process Fig. 3.2 shows a screen-shot of the DPRE during the design pattern recovery process of Galib++, one of the software used as case study in Section 3. In the main window DPRE presents the class diagram where the recovered patterns can be highlighted. The output window (at the bottom of the figure) shows a selection item that allows the software engineer to select a particular pattern category (on the right) and the textual description of the recovered patterns (on the left). In particular, by selecting an entry from this list, the DPRE environment presents in the output window on the left the instances of the selected pattern in the textual format. Then, by selecting one of these instances, all the classes, aggregations, inheritance or associations, participating to it are highlighted in the main window in a visual manner (see Fig. 3.2). Let us observe that DPRE can also support other reengineering activities since it also includes a UML class diagram editor. Once the UML class diagram has been restructured, the class diagram can be parsed and the corresponding code can be automatically generated, in case the original source code was annotated to the class diagram abstracted by reverse engineering. In the example, the parser integrated into DPRE has recovered six adapter patterns, as indicated on the right side of the bottom window. The main window visualizes the class diagram highlighting the adapter pattern formed by the classes GAGeneticAlgorithm, GASteadyStateGA, GAPopulation.
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
Class Diagram
Figure 3.2. The Design Pattern Recovery Environment (DPRE)
3.2. Grammar-based Design Specification and Identification
Pattern
DPRE has been automatically generated by VLDesk [9], a system that inherits and extends to the visual field, concepts and techniques of compiler generation tools like YACC [19]. In fact, VLDesk is based on the formalism of eXtended Positional Grammars (XPGs), which represent a direct extension of context-free string grammars, where more general relations other than concatenation are allowed [9],[10]. The idea behind the definition of such formalism has been to overcome the inefficiency of visual languages parsing algorithms by searching suitable extensions of the well-known LR technique. VLDesk assists the designer in the definition of the syntax and the semantics of the language, and automatically generates a visual environment starting from the supplied language specification. Such an
environment encompasses a visual editor and an LRbased compiler. The parsing methodology underlying DPRE is able to capture in an effective way the syntax of most constructs of visual languages [9],[10]. Indeed, VLDesk has been used to develop visual environments for Software Engineering, Database and Multimedia design, etc. [8], [9],[10],[27]. The XPG formalism and the LR-based parsing methodology can be used to identify design patterns from UML class diagrams [11]. In particular, once the class diagram has been abstracted from the source code, the problem of design pattern recovery is reduced to the problem of recognizing subsentences in a class diagram, where each subsentence corresponds to a design pattern instance. Indeed, the idea is to specify the class structure of the design patterns to be recovered in terms of a grammar specification from which it is possible to obtain the corresponding parser automatically. As an example, Fig. 3.3 shows a portion
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
of a class diagram representing the structure of an Adapter pattern. A
B
C
Fig. 3.4. Relation LINK41,1 specifies that the borderline of CLASS’ (representing the RefinedAbstraction class) cannot be related to the end point of an AGGREGATION which is related to CLASS’’’ (representing the ConcreteImplementor class) through relation LINK1,2. Bridge → CLASS LINK1,2 INHERITANCE LINK1,1 CLASS’ LINK31,2 AGGREGATION LINK1,1 CLASS’’ LINK1,2 INHERITANCE CLASS’’’ AGGREGATION’
Figure 3.3. The Adapter pattern The XPG formalism conceives the symbols of a visual language as graphical objects each associated with a set of syntactic attributes [9] used to relate symbols between them. For the design pattern grammar we have used attaching regions as syntactic attributes which allow to connect visual symbols representing class symbols through link symbols representing relation between classes. For example, in Fig. 3.3 each class symbol (i.e., A, B, and C) has one attaching region as syntactic attribute corresponding to its perimeter, while the inheritance symbol and the association symbol have two attaching regions as syntactic attributes corresponding to the two ends of the line. In particular, the XPG productions are used to specify the syntax of a visual language by alternating symbols with relations. The symbols can be terminals or nonterminals, while the relations are defined on the syntactic attributes of the symbols. As an example, the following XPG production describes the adapter pattern in Fig. 3.3: Adapter → CLASS LINK1,2 INHERITANCE LINK1,1 CLASS’ LINK1,1 ASSOCIATION LINK2,1 CLASS’’
In this production CLASS, INHERITANCE and ASSOCIATION are terminal symbols and LINKi,j is a binary relation defined as: a symbol x is in relation LINKi,j with a symbol y iff attaching region i of x is connected to attaching region j of y. Note that, the relations between symbols can also be specified in negative form; this allows us to extend design pattern recovery process proposed in [11] by enabling checks of negative criteria during the parsing process [30]. The use of negative criteria allows us to avoid the recognition of false design patterns, i.e. of a set of classes, aggregations/ inheritance and delegation which presents a structure similar to a structural design pattern as defined by Gamma [15]. As an example, the following production checks that there are no aggregation relationships from an implementation class to an abstraction class in the Bridge pattern shown in
IEEE
Implementor
RefinedAbstraction
ConcreteImplementorA
Figure 3.4. Bridge Design Pattern with a negative criterion Action rules can be associated to the XPG productions in order to perform semantic checks and translation tasks on the recognized visual sentences [9]. Starting from the provided XPG specification representing the design pattern library, VLDesk generates a parser, based on an extension of the LRparsing, which is able to identify design patterns in the UML class diagram specified by the visual editor of DPRE. Let us observe that for recovering design patterns within a UML class diagram an instance of the parser is invoked on each CLASS symbol in the input class diagram. This solution solves the problem of recovering pattern having classes that participate in multiple design patterns [11]. Moreover, the proposed parsing technique recognizes the patterns by using a predefined order specified in the grammar. In particular, starting from a class symbol the parser linearizes the diagram, since it retrieves the next symbols to be parsed by using the relations specified in the right hand side of the productions. The constant time complexity of such finding operations leads to an overall linear time complexity for the parsing technique [11].
4. Experimental Results In this section we report the results obtained from the application of the proposed design pattern recovery
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
Abstraction
process to public-domain software, namely the C++ Genetic Algorithm Library Galib++ (version 2.4)[14], the trace-and-replay program Mec (release 0.3), the library for inter-process communication Socket (release 1.10), and Libg++ which is part of the GNU Free Software foundation C++ development environment. All the four systems have been used by Antoniol et al. in [1], while Mec is also used in [2]. In order to accomplish the first step of our design pattern recovery process we have experimented several CASE tools to abstract a class diagram for source code. The best results have been obtained by using the C++ Analyzer of Rational Rose [31]. Other tools, for example Borland Together Architect [6], were not able to recover some information such as aggregations. The output obtained from Rationale Rose has been translated in SVG format which is then imported by DPRE. Table 1 contains some statistics obtained from the software under analysis with Rationale Rose. Table 1. Some statistics obtained from the analysis of Galib++, Mec, Socket and Libg++
LOC Classes Aggregations Associations Inheritance
Galib++ 20,507 55 2 22 34
Mec 21,006 32 15 11 6
Socket 3078 30 2 6 19
Libg++ 44.106 144 43 88 67
To perform the second step of the recovery process we have used the current implementation of DPRE that is able to recover the structural patterns Adapter, Bridge, Proxy, Composite and Decorator. Table 2 summarizes the results of the recovery process for the analyzed systems, whereas Table 3 reports the times to recover design patterns. Table 2. Results of the design pattern recovery process for Galib++, Mec, Socket and Libg++
Adapter Bridge Proxy Composite Decorator
Galib++ 6 0 0 0 0
Mec 1 0 0 0 0
Socket 1 0 0 0 0
Libg++ 0 1 0 0 12
To evaluate the retrieval effectiveness of the proposed recovery strategy we have considered the precision and recall metrics [33]. These measures can assume values
in the range [0,1]. Recall represents the ratio of the number of relevant design patterns recovered over the number of relevant design patterns contained in the source code, while precision provides the ratio of the really implemented patterns recovered over the total recognized patterns. Thus, a value equal to 1 for recall (for precision, respectively) means that all the real patterns have been recovered (all the recovered patterns are relevant patterns, respectively). Table 3. Performances characterizing the design pattern recovery for Galib++, Mec, Socket and Libg++ Galib++
Mec
Socket
1
1
1
Libg++ N.A.
Precision
0.833
1
1
1
Time (sec.)
25.264
5.067
3.345
109.08
Recall
As shown in Table 3, DPRE has retrieved all the real patterns for the programs Mec, Socket and Galib++ (recall is 1). For Libg++ we cannot specify the recall value because we have manually checked only the 30% of source code. However, for this portion of code we have a recall value equals to 1 since DPRE has retrieved the unique Bridge pattern, which is shown in Fig. 4.1, and twelve Decorator patterns. Moreover, Mec, Socket and Libg++ are characterized by a precision value equals to 1, whereas for Galib++ a precision value equals to 0.833 has been obtained. Indeed, in this case DPRE has identified six design pattern while the real patterns are five. The results of the recovery process are encouraging and confirm our intuition on the effectiveness of the LR-based parsing methodology for the structural pattern recovery.
5. Discussion and Comparison In the following we provide an analytical discussion of the results obtained by our approach, then we compare them with the ones obtained by others considering the retrieval effectiveness and the time performances. Finally, some limitations of our approach are reported.
5.1 Analytical discussion of the results We have conducted a subset of the case studies reported in [1]. Unfortunately, we cannot perform a detailed comparison with the results in [1] because the authors have not provided details about the patterns they recovered.
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
fstreambase
filebuf
filebuf my_fb; void open(const char *name, int mode, int prot) { … rdbuf()->open(name,mode,prot) } filebuf*rdbuf() const {return &my_fb;}
fstream
void *open(const char *filename, int mode, int prot) { … }
procbuf
stdiobuf
procbuf *open(const char *command, int mode, int prot) { … }
procbuf *open(const char *command, int mode, int prot) { … }
ofstream
void *open(const char *name, int mode, int prot) { … fstreambase::open(name,mode,prot) }
void *open(const char *name, int mode, int prot) { … fstreambase::open(name,mode,prot) } ifstream
void *open(const char *name, int mode, int prot) { … fstreambase::open(name,mode,prot) }
Figure 4.1. The Bridge pattern recovered from Libg++ For the Mec and Socket systems we have identified one Adapter pattern similarly to [1]. By analyzing the source code we have verified that the retrieved pattern is actually an Adapter.
While Antoniol et al. discover five Adapter patterns for Galib++ (the authors claimed that four of them were real patterns), we have recovered six instances of this pattern that are showed in Fig. 5.1. By analyzing the source code we found that only the pattern in Fig. 5.1(f) is not a real Adapter design pattern. GAGeneticAlgorithm objectiveFunction(…) …
GAPopolutation individual(…) …
GASteadyStateGA
GAGeneticAlgorithm
(c)
(b)
GAPopolutation copy(…) …
copy(…) …
GAGeneticAlgorithm initialize(…) …
GAStatistics reset(…) …
GADEmeGA
GADEmeGA
(d)
GA1DBinaryString Genome
offset(…)
GABin2DecPhenotype * ptype phenotype(…) { … ptypeoffset(…) … }
(e)
(f)
Figure 5.1. The Adapter patterns recovered for Galib++
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
GABin2DecPheno type
GABin2DecGenome
GAStatistics *pstats initialize(…) { … pstats[i].reset(…) … }
GAPopolutaion **deme copy(…) { … deme[i]copy(…) … }
GAGenome copy(…) …
copy(…) …
GAGenome *child copy(…) { … childcopy(…) … }
GAPopolutaion *oldpop copy(…) { … oldpopcopy(…) … }
(a)
GAGeneticAlgorithm
GAIncrementalGA
GASimpleGA
GAPopolutaion *tmppop objectiveFunction(…) { … tmppopindividual(…) … }
GAGeneticAlgorithm
GAPopolutation copy(…) …
copy(…) …
ftpbuf
sockaddr
sockinetaddr
sockunixaddr
Figure 5.2. A sketch of the class diagram for the Socket library Let us observe that our approach has not retrieved a Bridge pattern for the system Socket as instead done by the approach of Antoniol et al. [1]. By analyzing the Socket class diagram and the corresponding source code manually, we have noted that the system contains a part of code that could be implemented using a Bridge pattern. Indeed, as shown in Fig. 5.2 the class ftpbuf is related through an aggregation to the concrete implementation (of sockaddr) provided in sockinetaddr while it should be related to the implementor sockaddr. This portion of the system can be reengineered by decoupling the interface of ftpbuf from the two implementations, and introducing an aggregation relation between ftpbuf and sockaddr, obtaining a Bridge pattern. For Libg++ software we have retrieved twelve Decorator patterns and one Bridge patterns. Observe that for this library Antoniol et al. [1] have identified the same number of Decorator patterns and three Bridge patterns. For this software we have manually analyzing the 30% of source code and we found twelve Decorator patterns but only one Bridge pattern.
5.2 Performances of the proposed approach The precision and recall are two metrics widely used for evaluating search results in Information Retrieval and Reverse Engineering [1],[3],[28],[30]. In our experiments the results obtained for these metrics are better than those gained by other approaches for the same software systems [1][2]. In general, the obtained precision and recall values result also to be better than those obtained by other approaches on software systems of comparable size. Indeed, the case studies proposed in the paper are characterized by an average precision value equal to 0.95 and an average recall value equals to 1.
The approach proposed by Antoniol et al. in [1] is characterized by a recall value equals to 1, but they obtain an average precision value equals to 0.3. The case studies provided in [23] are characterized by a recall value equal to 1, while the precision values are between 0.14 and 0.50. The approach proposed in [3] has been tested on four open-source systems, and many of the analyzed patterns are characterized by a 100% percentage of true pattern instances compared to the found ones. However, some patterns, like Adapter, Builder, and Proxy are characterized by very low values. The tests performed for the metric based approach proposed in [22], showed that the searching approach is characterized by a precision value of 0.435. The case studies for the clone detection based approach proposed in [2] are characterized by the recovery of more pattern than the actual ones. Other case studies do not provide information about recall and precision values [4],[7],[21],[37]. Finally, the analysis of times in Table 3 suggests that the proposed DPRE achieves acceptable computation times. Moreover, the performances are characterized by slightly better results with respect to the approach proposed by Antoniol et al. [1]. Indeed, DPRE has recovered design patterns for Mec, Galib++, Socket and Libg++ software two or three times faster.
5.3 Limitations of the approach It is worth pointing out that the design pattern recovery results obtained with DPRE are strongly related to the results obtained for the UML class diagram extraction phase accomplished by using Rose. In particular, some classes or aggregation or association could be missed by Rose. Indeed, by comparing the statistics depicted in Table 1 with those obtained in [1] we can observe that the reverse engineering phase has extracted less classes, inheritance and associations for Libg++, while for Galib++, Mec and Socket it has extracted the same number of classes but less aggregations, associations and inheritances. Furthermore, we have noted that in some cases the analyzer does not interpret correctly the relations (as an example in Mec system some associations were interpreted as aggregations). However, these limitations have not influenced the recovery process for Mec, Socket and Galib++ since we have retrieved all the real design patterns. In the case of Libg++ we have only checked the 30% of source code.
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
6. Conclusion and Future Work Software system maintenance requires a deep comprehension of the existing system in order to modify and integrate it with new or changed requirements. Design patterns represent useful architectural information that can support a rapid understanding of software design and source code. In reverse engineering of OO software systems they allows to capture relevant information which help the comprehension of the adopted solution [1], [2], [7], [20], [21], [23], [24], [26], [28], [29], [34]. In this paper we have extended the design pattern recovery process proposed in [11] by including negative criteria in the grammar specification of design patterns. Moreover, we have assessed the retrieval effectiveness of the approach by applying the recovery technique to several public-domain programs and libraries. The recovery process is supported by the visual environment DPRE which has been automatically generated by using VLDesk [11]. In particular, design patterns are recovered through an efficient LR-based parser which is integrated in DPRE. This parsing technique supports both textual and visual representations, which allows us to easily extend the proposed recovery technique to other type of design pattern categories requiring code checking [15]. The results of the case studies have confirmed our intuition on the effectiveness of the proposed recovery technique. In particular, the recovery is characterized by high precision and recall values for the analyzed programs and libraries. Moreover, the times for retrieving design patterns from the imported class diagrams turn to be better than those characterizing the application of other approaches on the same software systems [1]. In the future we intend to integrate DPRE with existing source code extractors, class diagram abstractors, and SVG translators, in particular with those of the GXL community [16]. GXL (Graph eXchange Language) is an XML sublanguage widely used for describing graph structures and to ensure data interoperability between reengineering tools [18],[36]. As an example, GXL documents could be extracted from C++ code or Java by using Source Navigator, while the SVG format could be obtained by using GXL2SVG [16]. Furthermore, we plan to extend the recovery approach by including a source code analysis phase. Indeed, VLDesk allows us to integrate the parsing of visual sentences with the parsing of code [9]. The analysis of code allows to verify the correctness of the candidate patterns identified by DPRE, and to extend
our approach to other categories of design patterns proposed in the literature [15],[30]. Finally, as mentioned above DPRE can also support forward engineering functionality, which are particularly useful when portions of code have to be reengineered exploiting design patterns (as in the case of Fig. 5.2).
References [1] G. Antoniol, G. Casazza, M. Di Penta, R. Fiutem, “Object-oriented Design Pattern Recovery”, The Journal of Systems and Software, 59(2), 2001, pp.181196. [2] G. Antoniol, P. Tonella, “Inference of Object-oriented Design Patterns”, Journal of Software Maintanance and Evolution: Research and Practice, 13, 2001, pp. 309330. [3] Z. Balanyi, R. Ferenc, “Mining Design Patterns from C++ Source Code”, in Proceedings of International Conference on Software Maintenance (ICSM’03), Amsterdam, 2003, pp. 305-314. [4] J. Bansiya, “Automatic Design-Pattern Identification”, Dr. Dobb’s Journal. Available on line at: http://www.ddj.com. [5] I.D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, L. Bier, “Clone Detection using Abstract Syntax Trees”, in Proceedings of International Conference on Software Maintenance (ICSM’98), Bethesda, Maryland, USA, 1998, pp. 368–377. [6] Borland Together Architect, http://www.borland.com/us/products/together. [7] K. Brown, “Design Reverse-engineering and Automated Design Pattern Detection in Smalltalk”, Master Thesis, North Carolina State University, Raleigh NC, 1996. [8] S.K. Chang, G. Polese, M. Cibelli, R. Thomas, “Visual Authorization Modeling in e-Commerce Applications”, IEEE Multimedia 10(1), 2003, pp. 44-54. [9] G. Costagliola, V. Deufemia, and G. Polese, “A Framework for Modeling and Implementing Visual Notations with Applications to Software Engineering”, ACM Transactions on Software Engineering and Methodology (TOSEM), 13(4), 2004, pp. 431-487. [10] G. Costagliola, A. De Lucia, S. Orefice and G. Tortora, “A Parsing Methodology for the Implementation of Visual Systems”, IEEE Transactions on Software Engineering, 23 (12), 1997, pp. 777-799. [11] G. Costagliola, A. De Lucia, V. Deufemia, C. Gravino, M. Risi, “Design Pattern Recovery by Visual Language Parsing”, in Proceedings of 9th European Conference on Software Maintenance and Reengineering (CSMR’05), Manchester, March 21-23, 2005, pp. 102111. [12] T. R. Dean, James R. Cordy: A Syntactic Theory of Software Architecture. IEEE Transactions on Software Engineering, 21(4): 302-313 (1995). [13] R. Ferenc, A Beszedes, M. Tarkiainen, T. Gymothy, “Columbus – Reverse Engineering Tool and Schema for C++”, in Proceedings of IEEE International Conference
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE
[14] [15]
[16] [17]
[18]
[19] [20]
[21]
[22]
[23]
[24]
[25] [26]
[27]
[28]
[29]
on Software Maintenance (ICSM’02), Montréal, Canada, 2002, pp.172-181. Galib++, http://lancet.mit.edu/ga/ E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reusable Object-oriented Software, Addison-Wesley, Menlo Park, CA, 1995. Graph exchange language (GXL) tools, http://www.gupro.de/GXL/tools/tools.html#conv. DR Harris, HB Reubenstein, AS Yeh, “Recognizers for Extracting Architectural Features from Source Code”, in Proceedings of Working Conference on Reverse Engineering (WCRE’95), Toronto, Ontario, Canada, 1995, pp. 252-261. R. C. Holt, A. Winter, and A. Schürr, “GXL: Toward a Standard Exchange Format”, in Proceedings of 7th Working Conference on Reverse Engineering (WCRE’00), Brisbane, Queensland, Australia, 2000, pp. 162-171. S. C. Johnson, “YACC: Yet Another Compiler Compiler”, Bell Laboratories, Murray Hills, NJ, 1978. R. Keller, R. Schauer, “Pattern Visualization for Software Comprehension”, in Proceedings of International Workshop on Program Comprehension (IWPC’98), Ischia, Italy, 1998, pp. 4-12. R. K. Keller, R. Schauer, S. Robitaille, P. Pagé, “Pattern-Based Reverse-Engineering of Design Components”, in Proceedings of International Conference on Software Engineering (ICSE’99), Los Angeles, 1999, pp. 226-235. H. Kim and C Boldyreff, “A Method to Recover Design Patterns Using Software Product Metrics”, in Proceedings of International Conference on Software Reuse: Advances in Software Reusability (ICRE’00), Vienna, Austria, 2000, pp. 318-335. C. Kramer, L. Prechelt, “Design Recovery by Automated Search for Structural Design Patterns in Object-oriented Software”, in Proceedings of Working Conference on Reverse Engineering (WCRE’96), Monterey, CA, USA, 1996, pp. 208–215. V. Kozaczynsk, JQ Ning, A. Engberts, “Program Concept Recognition and Transformation”, IEEE Transactions on Software Engineering, 18(12), 1992, pp. 1065-1075. Libg++, ftp://ftp.gnu.org/gnu/libg++/ J. Mayrand, C. Leblanc, E. Merlo, “Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics”, in Proceedings of International Conference on Software Maintenance (ICSM’96), Monterey, CA, USA, 1996, pp. 244–253. A. Musto, G. Polese, G. Costagliola, and G. Tortora, “Syntactic Modeling of UML Diagrams and their Automatic Transformation into RDBMS based Models”, in Proceedings of WTUML: Workshop on Transformations in UML, Genova, Italy, 2001. J. Niere, W. Shafer, J. P. Wadsack, L. Wendehals, J. Walsh, “Towards Pattern Design Recovery”, in Proceedings of International Conference on Software Engineering (ICSE’02), Orlando, Florida, USA, 2002, pp. 338-348. J. Niere, L. Wendehals, A. Zündorf, “An Interactive and Scalable Approach to Design Pattern Recovery”,
[30]
[31] [32]
[33] [34]
[35] [36]
[37]
Technical Report tr-ri-03-236, University of Paderborn, Paderborn, Germany, 2003. I. Philippow, D. Streitferdt, M. Riebish, S. Naumann, “An Approach for Reverse Engineering of Design Patterns”, Journal of Software and System Modeling, 4(1), 2005, pp. 55-79. Rationale Rose, http://www306.ibm.com/software/rational/ C. Rich and L. Wills, “Recognizing a Program's Design: A Graph-Parsing Approach”, IEEE Software, 7(1), 1990, pp. 82-89. G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983. F. Shull, W.L. Melo, V.R. Basili, “An Inductive Method for Discovering Design Patterns from Object-oriented Software Systems”, Technical Report, University of Maryland, Computer Science Department, College Park MD, 1996. L. Wills, “Automated Program Recognition by Graph Parsing”, Ph.D Dissertation, MIT, 1992. A. Winter, “Exchanging Graphs with GXL”, in Proc. 9th Intl. Symp. Graph Drawing (GD ’01), Vienna, 2001, Lecture Notes in Computer Science 2265, pp. 485-500, Springer. R. Wuyts, “Declarative Reasoning about the Structure of Object-oriented Systems”, in Proceedings of TOOLS’98. IEEE Computer Society Press, 1998, pp. 112-124.
Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06) 0-7695-2536-9/06 $20.00 © 2006
IEEE