Design Patterns Detection Using a DSL-driven

0 downloads 0 Views 2MB Size Report
identification using a high performance bit-vector algorithm is proposed in [40]. ... algorithm that is based on a graph model that also takes delegation, object creation, ...... System grokking: a novel approach for software understanding,.
Missing:
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE J. Softw. Maint. Evol.: Res. Pract. 2013; 00:1–37 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr

Design Patterns Detection Using a DSL-driven Graph Matching Approach Mario Luca Bernardi1 , Marta Cimitile2 , Giuseppe Di Lucca3 1

Department of Engineering3 , University of Sannio, Italy 2 Unitelma Sapienza University, Italy 1 [email protected], 2 [email protected],3 [email protected]

SUMMARY The knowledge about Design Pattern (DP) instances improves program comprehension and re-engineering of Object Oriented system. Effectively, it helps to discover developer design decisions and trade-offs that often are not documented. This work describes an approach to automatically detect DPs in existing Object Oriented systems tracing system’s source code components with the roles they play in the Patterns. In the proposed approach DPs are modelled basing on their high level structural properties (e.g., inheritance, dependency, invocation, delegation, type nesting, and membership relationships) that are checked, by source code parsing, against the system structure and components. Moreover, the approach can detect also Pattern variants, defined by overriding the Pattern properties. The paper presents a description of the approach, provides a brief description of the supporting tool, and discusses the results from the experiments carried out to validate it. The approach was validated on seven systems of an open benchmark containing systems of increasing sizes. For five additional systems, the results have been compared with the ones from a similar approach existing in literature. The obtained results, the identified DPs variants and the effectiveness of the approach are thoroughly presented and discussed. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Received . . .

KEY WORDS: Design Patterns Detection, Object Oriented systems,Graph-Matching, Domain Specific Languages, Model Driven Development

1. INTRODUCTION Design Patterns (DPs) were firstly introduced in [29] as general repeatable solutions to commonly occurring problems in software design. Several works [5, 12] show how software quality greatly improves by implementing DPs and documenting their adoption. In the last twenty years, as the number of pattern-based systems and frameworks increased, the topic of (semi-)automatic detection of pattern instances in Object Oriented (OO) software systems became more critical to improve program comprehension, maintenance and reuse [44]. When design documentation is not available (or updated), DPs detection can help program comprehension providing useful insights on software architecture, the underlying design choices and the role played by each code component in a DP [15, 23]. This is even more true in the case of bad or incomplete documentation. In fact, the lack of adequate documentation in a software system may make hard to understand which are the adopted design solutions and their code components. Finally, searching a software project for DPs can also be used to assess the quality of the source code [12, 15]. To address these issues, several methodologies, approaches and tools have been proposed in literature in the last twenty years [50]. Most of these approaches take into account a fixed and c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls [Version: 2010/05/10 v2.00]

2

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

limited set of properties to specify a pattern. In particular they do not consider behavioural properties which are crucial in the characterization of object DPs. Moreover, most of these approaches are very sensitive to structural differences of the searched patterns since their specifications are embedded in the detection algorithm. This paper proposes a detection approach that addresses these issues. The detection algorithm is based on a meta-model representing both the software system and the searched DPs through a wider set, with respect to existing approaches, of high level properties related to the source code elements, the static relationships among them, and their behavior. Each system is considered as an instance of this meta-model and is represented as a graph of elements and properties about them. DPs identification is performed by matching each pattern graph with the overall system graph and by annotating the elements of the type hierarchy with information on the roles they play in the pattern. An advantage of the proposed approach over most of existing ones is that it also allows to easily specify variant forms of the classic DPs (as coded in the literature). This is an important issue to address since it is well known that DPs are present in real world systems with many different variants [61, 57]. Our detection approach is driven by a set of pattern models written using a Domain Specific Language (DSL) defined to model the structure of both the software system and the design patterns. It organizes such DPs models as a hierarchy of declarative specifications. In particular a DP variant can be expressed as a set of changes to an existing specification by adding, removing or relaxing properties. Hence, it is possible to write a new pattern specification deriving it from an existing one (to detect a variant) or to write it from scratch (to detect new kind of pattern), with no impact on the mining algorithm. An eclipse-based tool, called Design Pattern Finder (DPF) has been developed to provide an automatic support to the approach (in the following DPF is used to concisely refer to the proposed approach, not only just the tool) . The approach has been assessed by applying it to seven systems of an open benchmark proposed in [32] and [9]. For five additional systems, we compared our results with the ones obtained using the tool Design Pattern Detection (DPD), proposed in [61]. In this case the results have been validated by experts in order to evaluate precision and recall considering the true positives of both tools as a gold standard. This paper improves and enhances our previous preliminary investigation reported in [13, 14]. The improvements and enhancements are mainly referred to: (i) a wider set of DSL specifications allowing the detection of new DPs not considered before; (ii) a new version of the DPF prototype tool provided with a user interface; (iii) more experiments and the related discussion of the results provided by the approach, using the tool DPF, on a larger set of systems, comparing them with results provided by both an open benchmark and the DPD approach. The paper is structured as follows. In Section 2 relevant related work is discussed. Section 3 describes the meta-model and DSL defined to represent the system and patterns structure and the detection approach. Section 4 concisely describes the catalog of the DPs specifications, while the DPF tool is briefly described in Section 5. Section 6 reports the experiment setup whereas the results are discussed in Section 7. Section 8 contains conclusive remarks and briefly discusses future work. The Appendix reports and describes some DSL specifications of the most relevant DPs’ and their variants.

2. RELATED WORK The problem of mining DPs in existing OO systems has been faced and discussed in several works, and different methods and techniques have been proposed to support it. Some reviews on current techniques and tools for discovering architecture and design patterns from OO systems are provided in [24] and [55]. In the last work, authors classified pattern recovery techniques basing on the used type of analysis and the adopted searching methodology. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

3

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

Reference

Detection Approach

Tool

Mined Code

Recovered DPs

Case studies

Precision

Keller et al. (1999) [41]

Minimum key structure

SPOOL

C++

Template method, factory method and bridge

2 industrial systems, ET++

-

Philippow et (2005) [52]

Mimimum key structure

-

C++

GoF

Student projects

100% for Singleton, Interpreter

Class structure

Pat

C++

Adapter, bridge, proxy, composite, decorator

NME, LEDA, zApp

14-50%

Beyer et al. (2003, 2005) [15]

Predicate Calculus

Crocopat

Java/C++ Composite, mediator

2 Mozilla, wxWindows

-

Dong et al. (2009) [25]

Matrix and weights

DPMiner

Java

Adapter/command, bridge, composite, strategy/state

Java AWT, JEdit, JHotDraw 6.0b1

91-100%

Balanyi and Ferenc (2003) [10]

Class structure

Columbus C++

Reclassified GoF

2 Jikes, Leda, StarOffice, StarOffice Writer

< 60%

Heuzeroth et al. (2003) [38]

Predicates on abstract syntax trees

-

Java

Observer, mediator, chain of responsibility, visitor, and decorator

Swing

-

Niere et al. (2002, 2003) [48]

Cliche’ recognition and graph transformation, with fuzzy logic

FUJABA

Java

GoF

AWT library

-

Antoniol (1998) [7]

Cliche’ matching with software metrics in the class structure

-

C++

Adapter, bridge, proxy, composite, decorator (1995)

LEDA, libg++, galib, mec, socket

30%

Kim and Boldyreff (2000) [42]

Metrics

-

C++

GoF

3 Systems (no info)

Avg 43%

Olsson and (2006) [56]

Shi

Class structure, exploiting inter-class relationships

PINOT

Java

Reclassified GoF

ANT, AWT, JHotDraw, Swing

-

Kaczor et (2006) [40]

al.

Bit-vector based on string representation

-

Java

Abstract factory and composite

JHotDrawm, QuickUML, Juzzle

-

Smith and Stotts (2003) [57]

Elemental design patterns and rhocalculus

SPQR

Java

Decorator

-

-

Tsantalis et (2006a) [61]

al.

Class structure expresses as matrices, exploiting Graph similarity algorithm

-

Java

Composite, adapter/command, decorator, observer, state/strategy, prototype, visitor

JHotDraw, JRefactory, JUnit

100%

De Lucia et al. (2009) [19]

XPG formalism and LR-based

DPRE

Java

Adapter, bridge, composite, proxy, and decorator

JHotdraw (5.1, 6.0b1), QuickUML, Apache Ant, Swing, and Eclipse JDT

62-97%

Bergenti [12]

Class Structure

IDEA

UML

Template, Proxy, Bridge, Composite, Decorator, Adapter

-

-

Arcelli (2011) [8]

Basic elements and metrics

MARPLE

Java

Abstract Factory, Composite, Visitor

JavaReports, Batik

10-80%

Vokac (2006) [62]

Semi-formals diagrams translated into queries

no name

C++

GoF

CRM commercial system

-

France et (2004) [27]

UML

no name

C++

Abstract Factory, Bridge, Decorator, Singleton, Observer, Composite, and Visitor

-

Stencel (2008) [59]

Static analysis techniques and SQL

D3

Java

GoF

AJP and JHotDraw

-

Gueheneuc (2008) [33]

UML-like multilayered approach

DeMIMA

Java

GoF

33 industrial systems, 5 open source systems

34 %

Proposed approach

DSL-Driven Matching

DPF

Java

GoF and their variants

12 open source systems

95%

Kramer Prechelt [43]

al.

and (1996)

et

al.

(2000)

al.

Graph-

JWAM,

Table I. An overview of Design Pattern Detection Approaches

c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

4

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

2.1. Type of analysis The type of analysis can be classified as: structural analysis, behavioral analysis, semantic analysis and formal specification/composition analysis. Structural analysis approaches consist in recovering the structural relationships from available source code artifacts. They focus on recovering structural DPs such as Adapter, Proxy, Decorator, etc. These approaches consider inter-class relationships to identify patterns structural properties [36]. An example is in [6], where a structural analysis of DP is proposed starting from C++ systems. Moreover, in [62] the source code is parsed using a third party commercial tool called Understand for C++. The tool extracts the entities and the references from C++ source code and stores them in a database. Queries are performed on the database to extract different properties of patterns. In their experimentation authors recovered Singleton, Factory, Template, Observer and Decorator patterns from a VCS (Version Control System). Behavioral analysis approaches adopt dynamic analysis, machine learning and static program analysis techniques for patterns behavioral aspects extraction. They can be used together with structural analysis when patterns are structurally identical or have a weak structure (e.g, State and Strategy patterns are structurally identical). The limit of these approaches is the high number of false positives at the increasing of the number of execution traces [25]. Semantic analysis approaches complete the structural and behavioral analysis reducing the false positive rate for recognition of different patterns. They use naming conventions and annotations which contain the role information about the classes and methods [25, 54]. This analysis can be used for recovery of patterns having similar static and behavior properties (e.g., Bridge and Strategy patterns). Different techniques are used for semantic analysis. In [25], three options are discussed and they conclude that naming conventions are most appropriate and feasible option. Finally, formal specification/composition analysis of DPs includes some approaches on formal patterns specifications. It is important to supplement different detection approaches by formally specifying different patterns [11, 27, 64, 58]. Moreover, DPs have different implementation variants and any formal specification of patterns can help to specify the possible variations in different patterns as well as overcome the challenges of capturing their semantics. These approaches use formal specification languages to specify DPs supported by tools validating the correctness and completeness of the specifications [27]. In [45, 46] an extension of the UML sequence diagram is proposed allowing designers to define and visualize the pattern roles and the different types of interaction groups for a design pattern. Other approaches [53, 30] propose definitions of pattern variants composed of reusable feature types. A feature type is detected in the source code with a search technique that is most fitting for its characteristics. Different technologies are used for detecting different parts of a pattern. In particular, in [30] four new variants of standard patterns (Abstract Factory, Decorator, Adapter and Proxy) are defined during analysis of different source applications. This approach is similar to ours because it explicitly represent the concept of variants. It presents a model based on a set of properties (called by authors feature types), that all together are capable to represent a design pattern. Our approach defines variants using domain specific language modeling a wide set of structural and behavioral properties (like delegation, object creation, calls and dependencies) using inheritance to reuse previous specifications. This makes easy to express and mine design pattern with complex behavioral relationships. 2.2. Searching Methods Searching methods can be classified as: Database queries, Constraint resolver, Metrics, XPG formalism and parsing, UML structures and matrices, miscellaneous approaches. Several pattern recovery techniques use Database queries [54, 65, 41, 43] for extracting patterns. They produce an intermediate representation of the source code (i.e. ASG, AST, XMI, meta-data and UML structures) and then use SQL queries to extract pattern related information. Performances in these cases depend on the underlying database and can be scaled very well, but the queries are limited to the information available in the intermediate representations. Existing SQL-based approaches usually store a limited sets of properties related to the source code elements. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

5

This is mainly due to the relational model that is not suitable to easily express and query complex graph-based structures. Moreover, they are used for structural and creational DPs and they only partially support behavioral DPs identification. This issues are duscussed in [43]. A notable exception is [54] that proposes a meta-model that takes into account a reasonably complete set of structural and behavioural properties. With respect to it, our meta-model: (i) includes the possibility to express fields, methods and constraints on them; (ii) takes into account delegation and dependency properties and (iii) allows using the defined DSL to build higher-level properties starting from the low-level ones in order to mine complex structure (requiring no changes to the search engine). Several tools and frameworks to identify idioms, macro-patterns, DPs and design defects use explanation-based constraints programming techniques. For example, in [35], authors recover patterns using a multilayered approach which focuses on ensuring an optimal recall rate, but precision and performance are low. Metric based techniques compute program related metrics (e.g., generalizations, aggregations, associations, interface hierarchies) from different source code representations and compare their values with source code DPs metrics. These techniques [63, 49, 7, 42] are computationally efficient because they reduce the search space through filtration [34]. The limit is that they have been experimented on a few number of patterns. Moreover, their precision and recall is low. XPG formalism and parsing techniques use SVG (Scalable Vector Graphics) format for the intermediate representation of the source code and represent DPs in a visual language by mapping the grammar of each pattern with the graph representation. They give a precise visualization but are limited only to structural DPs. Moreover, to the best of our knowledge, the existing experimentations are limited to few patterns [18] and do not show any recall rates. UML structures and matrices techniques [61, 25, 49, 33, 12, 56] allow to represent structural and behavioral information of software systems. They apply different techniques to match DPs template metrics with the matrices generated for the system. In [33] a semi-automatically approach to identify micro-architectures in source code is proposed. The approach is based on information organized in three layers: two layers are used to recover an abstract model of the source code (including binary class relationships) and a third layer is used to identify DPs in the abstract model. A DPs detection methodology based on similarity scoring between graph vertexes is proposed in [61]. The approach is able to also recognize patterns that are modified from their standard representation. It exploits the fact that patterns reside in one or more inheritance hierarchies (in order to reduce the size of the graphs which the algorithm is applied to). These approaches are computationally efficient and have good precision and recall rates. Their limit is that they miss to extract the implementation variants of similar DPs. Furthermore, they are limited to a few number of patterns. Finally, there are some well known techniques that cannot be classified in the above categories (e.g., fuzzy reasoning, bit vector compression, minimum key structure method, predicate and rho calculus, dynamic analysis using run-time execution traces, formal methods based on semantic, machine learning based approaches and concept analysis) but are good as a complement to improve the structural methods cited above [48, 38, 40]. For example, in [19], De Lucia et. al. present some case studies of recovering structural design patterns from OO source code and in [21] they propose a model checking approach to analyze behaviour of pattern instances both dynamically and statically. In [8] a tool for DPs detection and software architecture reconstruction is proposed. An approach mixing structural and metric techniques is used to detect pattern instances. More recently, in [4], DP recovering is obtained using ontology formalism. DPs restrictions are formalized and translated into rules that are executed on a knowledge-base that is populated with semantic descriptions of library code. Some studies have been also focused on the formalization of empirical evaluation criteria [25, 19, 60]. Each applied technique should be evaluated using well defined criteria and different authors have proposed taxonomies and related frameworks to perform such evaluations. The approach we propose is based on a system meta-model, and a DSL, that is able to represent elements down to statements and expressions. This allows to reason about structural and behavioral properties that can be used (i) to improve search space reduction and (ii) to distinguish between c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

6

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

patterns that have the same structure but behave (or are used) in different ways [61]. Thus the type of analysis includes structural and behavioral analysis whith a graph matching searching method. 2.3. Comparison with the proposed approach As discussed above, in the last years a high number of DP mining approaches and tools are implemented. According to this, there are several studies aiming to compare and describe existing approaches [26, 19, 52]. In Table I, starting from these works and from the analysis of literature, we synthesize the most relevant approaches to DP mining. For each approach, the table reports: the referring authors, a synthesis of the adopted strategy, the supporting tool (if any), the programming language of the mined source code, the searched DPs. Moreover, the list of the systems used during the tool experimentation and the obtained average precision are shown. The main advantage of the DPF approach over most of existing ones is that it allows to identify variant forms of the classic DPs (as known in the literature). This is a particularly important issue since DPs are present in real world systems with many different variants [61, 57]. With respect the existing approaches, usually based on a fixed subset of relationships among source code elements, our approach is based on a richer meta-model taking into account both structural relationships and behavioral ones (as delegation, calls, dependency and object creations). Moreover, the declarative DSL specification “override” capability, makes the approach flexible in detecting new patterns, domain specific patterns and architectural ones. As shown in the table, there are tools for discovering patterns from Java source code and tools for mining patterns from C++. Starting from C++ mining tools, in [41], authors use Datrix as intermediate format to express a wide set of source code properties in order to model DPs. DPF extends the set of such properties and introduces a DSL in order to express pattern specification in a declarative fashion. Moreover inheritance among specifications allows to mine variants without changes to the pattern detection algorithm. In [52], a Rational Rose C++ analyzer enables DP UML diagram extraction out of C++ source code. As the authors observed, the language used to write the detection cannot express several key behavioral properties and: (i) the approach is not capable to distinguish among pattern with same structure but different behaviour; (ii) it is difficult to extend the approach in order to consider new kind of properties that such model does not take into account. Another C++ code mining tool is Pat ([43]), where the fundamental idea for the automated search is to represent both patterns and designs in Prolog. Unfortunately, some information that would also be relevant for a precise search for pattern instances is not completely extracted by the structural analysis there proposed. DPF overcomes the limitations of [52] and [43] tools because it is independent from any modeling approach/tool, and its meta-model and DSL allow to model a very broad variety of structural and behavioral properties in a declarative way to easily extend the original properties without changing the mining algorithm. Another DP mining tool is Crocopat [16]. This tool works exclusively on RSF input format. To use Crocopat on Java projects, a tool is needed that parses Java source code and creates Crocopat input format from it. Crocopat, is compared to Columbus [10] in [28]. This tool is versatile and easily extensible (given the high number of available plug-in) but differently from our proposed approach it requires that the systems are annotated with Design Pattern Markup Language to describe DPs. For Java mining tool, FUJABA [48] and SPQR [57] were considered. These tools are based on a very similar decomposition method. DPF, with respect to FUJABA introduces an explicit DSL to express pattern specification declaratively. Moreover, in order to improve scalability reducing the search space, our approach allows variants to be defined using inheritance. An approach of DPs identification using a high performance bit-vector algorithm is proposed in [40]. The approach is more efficient in term of space and the compactness of representation but is based on a restrict set of properties with respect to our approach. Comparing DPF with DeMIMA [33] is quite complicate because DeMIMA is a very different multilayer approach. Basing on precision results shown in the table, we can suppose that DPF offers a higher precision because it can be configured using a set of variables. MARPLE tool ([8]) is based on interesting graph matching technique using an attributed relational graph in which types are nodes and microstructures are associated to edges. Compared to c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

7

Figure 1. The meta-model represented as a UML class diagram

the DPF approach, MARPLE is less capable to reduce the search space and, according literature, it has been validated just on few DPs. DPRE tool ([19]) is based upon the XPG grammar formalism to express patterns. Once the grammar is defined, the Visual Programming Environment Generator produces a visual editor and a XpLR parser from it. XPG grammars, allows to represent a set of properties using the terminals as building blocks (i.e. Class, Aggregation and Inheritance). Our approach uses a different matching algorithm that is based on a graph model that also takes delegation, object creation, dependency and containment into account. Moreover DPRE requires code generation in order to detect new patterns (or variants) since the grammar must be used to generate the pattern and the related visual editor. In contrast, our matching algorithm is applied on graphs that are generated by means of a run-time translation of the DSL into graphs and hence does not require any code generation or integration step. While this could increase detection times with respect to DPRE, it allows to reach better precision since the set of patterns and their variants can be effectively customized for any given context. An extension of DPRE approach has been proposed in [20, 22]. Finally, there are some additional approaches that are not reported in table for briefly. In [67, 66] a technique for DP recognizing is provided. It complements existing static analysis with a dynamic analysis. In contrast with our proposed approach (that uses a static analysis even to recover behavioral properties), a dynamic analysis is here introduced to transform DP behavioral aspects into finite automata identifying relevant methods calls. A dynamic analysis is also used in MoDeC ([47]) to describe behavioral and creational pattern as collaborations among objects in the form of scenario diagrams and in [39], where a test program is executed on the system to produce traces for the behavior parser. The dynamic analysis should provide better precision in recovering pattern behavior ( [47] ) but requires a complete system execution and the generation of all the pattern-relevant traces. Moreover, we have selected a wide set of properties that allow statically to infer pattern behavior. The results of our study are effective if compared with a dynamic based approach. Finally, a more recent tool is DPJF ([17]) which implements an approach similar to the proposed one. It improves precision performing a set of specific behavioural analyses on source-code (e.g., forward to a single object, field maintenance, state propagation ) to filter out false positives. Our approach, while not based on such specific analyses, allows by using DSL constructs to build most of them declaratively improving the overall flexibility of the mining process. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

8

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

3. THE DETECTION APPROACH The detection approach defines and exploits a meta-model and a Domain Specific Language (DSL) to model the structure of both the software system and the DPs to be detected. 3.1. The Meta-model and the Domain Specific Language The meta-model uniformly describes the DPs and systems in terms of relationships among code elements. Both the structural (e.g., inheritance, implementation, type nesting and visibility) and behavioral relationships (e.g., delegation, object creation) among the Types are traced down to the DPs Properties and Types components. The meta-model is shown in Figure 1, where a UML class diagram represents the structure of an OO system (i.e., its Types and the structural relationships among them), the structure of the DPs (i.e., the set of Properties modelling their structural elements) and the relationships among the DPs code elements and the Types. Starting by the upper part of the figure, we can observe that the structure of a system is modeled as a set of Types (i.e., Container, Value, Reference, and Compound Types ∗ ) along with their relationships. Reference Types are composed by Fields and Methods, and a Method can have zero or more Arguments. A ReferenceType can inherit from another ReferenceType as well as can contain another ReferenceType (e.g., an inner class). The bottom part of the diagram defines a DP as the aggregation of several Properties characterizing it: • Classifier: it allows to introduce a Type (Class or Interface) used in a pattern specification or to modify an already existing Type. Moreover, this property permits to model a role needed by the pattern with respect to its required internal structure and relationships with other Classifiers. Finally, it allows to define constraints, if any, on its super-type or its implemented interfaces. • Data : it permits to define a field in an existing Classifier or override an existing field. This property can specify an existing Classifier as the field’s type or a compound type of an existing Classifier (like an array for a generic Collection). • Behavioral: it allows to define a method in an existing Classifier or to override a method’s definition. Method definition includes the specification of its return type and its arguments along with their optionality flag (indicating whether the element is mandatory for the pattern specification or not). This property can be used to define one or more of the required (or optional) behaviours of the Classifiers introduced in a pattern specification. • Dependency: it describes the dependency between a pattern element (like a method) and another pattern element (as another method or a field). • Invocation: it models a call between methods of Classifiers defined in the pattern specification. • Delegation: it specifies a mapping between a set of methods of a Class and a set of methods of an existing Classifier in the pattern specification. This allows to take into account the delegation for the patterns that require it. • Object Creation: it models the creation constraints specifying the method or the field that needs the object creation and the Classifier of the created object. This happens for patterns expressing a mandatory object creation semantic as in the case of creational patterns but also for many patterns in the other categories.

This part of the meta-model is also used as the base to define the DSL representing structural and behavioral relevant properties of OO software systems. Our DSL language takes inspiration from existing pattern detection languages. Some example are the SDF, Crocopat and Grok [37, 15, 31]. In particular, for pattern mining, two requirements that strongly point towards using a domain-specific language are: (i) the need to express the structure of software system (and patterns), composing it by recursive rules; and (ii) the need (and difficulty) of ∗ Compound

types are treated as separated types since they must specify the base type of compound. In this class are also arrays and generic types. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

9

p a t t e r n ob s e r v e r { type AS( 1 ) { has method A , R ; has method N ; has container o of type AO; }

type AO( 1 ) { has method U ;

}

type CO( ∗ ) { i n h e r i t s −from AO;

}

type CS( ∗ ) { i n h e r i t s −from AS ; has constructor c { object −c r e a t i o n o ; }

overrides methods [ A , R] each { delegates to o ;

}

overrides method N each { delegates to o ; c a l l s U i n AO. U;

}

}

}

Figure 2. An Example of DSL instance: the Observer Pattern specification

succinctly representing effective constraints on such structure. Starting from the existing languages, the DSL was defined with the aim to express design pattern specifications with the following goals: • a specification should be writable by the analyst with reduced effort; • the DSL should allow to express constraints on source code structure and behavior to model complex DPs; • the DSL should support the definition of pattern variants (using inheritance among specifications) to foster reuse;

As an example how a pattern is modeled using the DSL, let us consider a classic Observer DP (supporting only a single kind of event for each notify method) as proposed in literature [29]. Figure 2 shows the DSL specification for such an Observer DP. Each specification is just a sequence of type blocks: each block specifies the set of properties that must hold for a role in the pattern (including the constraints on the allowed multiplicity, reported in the brackets just after the type name - if no brackets follow the type name the default multiplicity is 1). As shown in the Figure 2, the Observer specification requires: • • • •

a single AbstractObserver (AO) and several ConcreteObservers (CO); a single AbstractSubject (AS) and several ConcreteSubjects (CS); a container of AbstractObservers to be defined in the ConcreteSubject (the field “o”); the methods A and R (that play roles of add and remove) to be defined in the AbstractSubject and overridden in ConcreteSubjects; • a Delegation to be defined between A and R of ConcreteSubject and the add/remove methods Container type; • the notify method (called “N”); the method N must contain an invocation towards the update method U of the AbstractObserver classifier; • an object creation (to initialize the container field “o”) in the constructor of the ConcreteSubject type. Each specification can be translated into a graph in which elements are nodes and properties are labelled edges. This graph, as better explained in Section 3.2, is part of the input for a twopass graph-matching detection algorithm. Figure 4, on the top right side, shows an excerpt of the c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

10

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

graph representing the Observer as described in the specification of Figure 2. Such graph reports key elements specified in the DSL together with the relationships among them † . A variant of this Observer can be defined, easily, deriving it from the DSL specification of Figure 2. Structural elements that need to be changed can be overridden. For instance, Figure 7, on the right side, shows the graph of a common multi-event Observer. This variant redefines the elements “N” and “U” as sets of methods to take into account different kinds of events and notification handlers. 3.2. The Pattern Detection process The pattern detection process comprises the following main steps: • definition of the patterns specifications repository: each specification is written according to the proposed DSL and organized in a catalog stored into a repository. • pattern Models instantiation: the DSL specifications are parsed to generate the Design Pattern Graphs (DPGs) to be detected. • system source code analysis: the source code of the system under study is parsed and the complete ASTs of the system are produced. • generation of an instance of the system model: a traversal of the system ASTs is performed to generate an instance of the system model, also represented as a graph (called the System Graph - S). Rapid type analysis (RTA), class flattening and inlining of not public methods are exploited in order to build a system’s representation suitable for the matching algorithm‡ . • design patterns matching: S is traversed and a matching algorithm is performed to identify implemented patterns. During the detection each pattern instance is mapped to the corresponding matching design pattern graph.

In order to describe the detection algorithm, we provide the definitions of the notations and concepts used in it. A DPG can be considered as an attributed graph specifying a set of predicates on the attributes that must hold. It is the building block of the detection process and is used to identify sub-graphs of interest occurring in the system graph. Formally: Definition 1 DPG — A design pattern graph is a pair DP G = (P, AC), where P is an attributed graph defined by P = (V, E) where E and V are respectively its nodes and edges. AC is a set of predicates on the attributes that contains compound expressions made of conditions on nodes, edges and attributes of P. § We introduce at this point the definition of DPG matching which generalizes sub-graph isomorphism with evaluation of the predicates on the attributes. Definition 2 DPG Matching — A design pattern graph DPG(P, AC) is matched with a system graph S if there exists an injective mapping ϕ : V (P ) → V (S) such that: (i) ∀e(u, v) ∈ E(P ), (ϕ(u), ϕ(v)) is an edge in S, and (ii) predicate ACϕ (S) holds. If a DPG is matched to a system graph, the binding between them can be used to access the sub-graph on the system (either the sub-graph structure or attributes and properties on nodes and edges). We define a matched DPG to denote the binding between a DPG and the system graph as follow: † To

keep figure concise, each node/edge is labelled with the initial letter of the corresponding field or method in the DSL. Moreover, not all properties are represented, as for type CS which has several overrides and a constructor that are all omitted. ‡ Note that RTA is used to handle late binding and hence the computed call graph reports a super-set of the real calls that can be executed at run-time. This however only lowers the precision in very few cases. A discussion on the impact on the detection quality is however reported in threats to validity section. § Compound predicates can be broken down to simple predicates on individual (or set of) nodes or edges. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

11

L i s t i n s t a n c e s = . . . ; void s t a r t ( ) begin f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP) f o r i = 1 to k do Match ( i ) ; end end void f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP) begin foreach node u i n DP do ϕ(u) = { v i n V ( S ) | ACu ( v ) = t r u e } ( 1 ) computation ϕ(u) ( 2 ) reduce ϕ(u1 ) . . . ϕ(uk ) u s i n g lookahead and p r o p e r t i e s c o n s t r a i n t end end void Match ( i ) begin foreach v i n ϕ(ui ) | v i s f r e e do i f not checkNeighborhoodBindings ( ui , v ) then continue ; ϕ(ui ) = v ; i f i < | V(DP ) | then Match ( i + 1 ) ; else i f ACϕ ( S ) then i n s t a n c e s . add ( ϕ() ) ; end end boolean checkNeighborhoodBindings ( ui , v ) begin foreach edge e ( ui , uj ) i n E(DP) , j < i do i f ( edge e1 (v, ϕ(uj ) ) not i n E ( S ) ) o r ( not ACe (e1 ) ) then return false ; end return true ; end

Figure 3. A sketch of the detection algorithm.

Definition 3 Matched DPG — Given an injective mapping ϕ between a DPG and a graph S, a matched graph is a triple ⟨ϕ, DP, S⟩ and is denoted by ϕDP (S). Figure 3 outlines the detection algorithm. The specification expressed as a design pattern graph DP is rewritten by means of a set of predicates on individual nodes ACu and edges ACe . For each node u in the pattern DP, there is a set of candidate matched nodes in S for which the constraints ACu hold. These nodes define a (partial) matched design pattern graph referred as candidate neighborhood of node u and denoted by ϕ(u): Definition 4 Candidate Neighborhood — The candidate neighborhood ϕ(u) of node u is the set of nodes in graph S that satisfies the predicate ACu : ϕ(u) = {v | v ∈ V (S), ACu (v) = true} Hence the search space of a DPG matching a pattern DP on the system S is defined by the candidate neighborhoods of all nodes belonging to the pattern specifications as follows: c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

12

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 4. A running example showing the candidate neighborhood analysis.

Definition 5 Search Space — The search space of a DPG on a system graph S is defined by the candidate neighborhood in S of all DPG nodes. It corresponds to the Cartesian product of the candidate neighborhood for each DPG node: ϕ(u1 ), × . . . ×, ϕ(uk ) ∈ S , where u1 , . . . , uk ∈ DP . The algorithm can be seen as composed of two phases. The first one starts at line 5 by calling (line 7) the forwardNeighborhoodAnalysis function (lines 13-21) which computes the candidate c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

13

neighborhood for each node uh in the DPG. This function ends after performing a pruning step of the search space (better described in the following). The second phase (on lines 23-44) performs a search, over the product ϕ(u1 ) × . . . × ϕ(uk ) using a depth-first traversal, to find a sub-graph isomorphism. The Match(i) function iterates on the ith node to find valid bindings for that node. Procedure checkNeighborhoodBindings(ui, v) examines if ui can be mapped to v by considering their edges and attributes. Line 28 maps the node ui to v. Lines 29-33 continue to search for the next node or, if it is the last node, evaluate the predicate AC to check constraints. If it is true, then a valid binding ϕ : V (DP ) → V (S) has been found and added to the list (line 32). Since the worst-case complexity of the matching algorithm is O(nk ), where n = |S| and k = |P |, to make the algorithm usable on real systems, a search space reduction technique must be used. Our approach uses system and pattern information to reduce the size of candidate neighborhoods and exploits a look-ahead requiring, for each node ui of the DPG, a valid (partial) binding of the neighborhood sub-graph centered in ui and having a fixed distance r from it. For each candidate neighborhoods, structural information (e.g., nodes and edges) and predicates on attributes (types and properties of nodes and edges) are used to prune matches that would not produce acceptable solutions. This neighborhood knowledge can be exploited to prune unfeasible sub-graph at an early stage and obtain a reduced set of candidates on which to perform the full depth first matching (that is resource- and time- expensive). There is a trade-off with respect to how candidate neighborhood sub-graphs are built. They increase pruning power as the look-ahead increases, but their construction is of polynomial complexity (with respect to look-ahead). The current implementation uses a look-ahead equals to 1 (immediate neighborhood). We found no improvement with a look-ahead equals to 2 since even if for some patterns (those that have a rich structure or highly constrained) time was greatly reduced, for others the bigger neighborhood analysis increased the total time (i.e., the average time remained almost the same). Figure 4 reports a simple running example of the detection process. It shows the candidate neighborhood analysis and the resulting bindings related to the pattern specification of Figure 2 as performed on a subset of the roles (only AS and AO) of the Observer DPG and on a small portion of a system graph (respectively DP and S in the Figure 4). For each pair of nodes ui ∈ V (DP ) and vj ∈ V (S), the neighborhood sub-graph of ui and vj are matched to find candidate neighborhood. In the step 1 the pair considered is (Text, AS) and hence the immediate neighborhoods of respectively DP and S are considered. In this case the match fails since pattern node neighborhood is not a sub-graph of S. By converse the step 2 on pair (AS, Figure) is a successful match since AS and Figure neighborhoods are congruent and all the constraints are satisfied (nodes and edges are of the same types, and multiplicity constraints are met). Hence several conditioned bindings are established for the matched candidate neighborhood. The step h is for the pair (Figure, AO). Due to structural differences this match fails (correctly) avoiding an unfeasible candidate neighborhood (since the Figure is not an AbstractObserver). This because the structure of the Observer DP has a quite good pruning power. Simpler patterns may generate a higher number of candidate neighborhoods that must be taken into account in the second phase of the algorithm in which the full depth-first matching performed increases time and space requirements. Hence, to further reduce search space, the algorithm is executed on all the DPGs in the specification repository and the previous bindings are taken into account when performing the subsequent candidate neighborhood analysis. For some pattern model, the same element could be bound, in the general case, to several patterns. This however is not true for all pattern elements. For instance, the binding of the visit() for the Visitor pattern method should not allow bindings of another patterns (e.g., like the execute() method of a Command or the notify() method of an Observer). When a pattern model explicitly forbids multiple bindings for a pattern element, existing established bindings of already analyzed patterns are used as further constraints to improve the search space reduction (pruning unfeasible bindings as early as possible). Variants are handled in the same way as other specifications, with no special treatment within the detection process. The only difference regards how their sub-graphs are built (taking into account the specification inheritance relationships and using a flattening approach). The resulting variant graph contains both the properties inherited from their super-specification and the overridden ones. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

14

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

The candidate neighborhood analysis is performed for each pattern role and for each system element ¶ . After the analysis the bindings are merged together and verified as a whole and hence results are not influenced by role ordering. Moreover, after candidate neighborhood has been performed, all the candidate bindings are verified to see: (i) if they cover all mandatory pattern properties, and (ii) if all the matched candidate bindings hold. If both conditions are met the set of candidate bindings represents a valid pattern instance linking each found pattern role to the set of system elements implementing it. The execution times are influenced by the ordering of the pattern specifications since existing bindings for single-bind roles are used to prune the list of system elements to consider during subsequent candidate neighborhood analyses (reducing execution times).

4. THE PATTERN SPECIFICATIONS CATALOG A set of DSL specifications of the most commonly used DPs has been written and stored in the DP’s specifications catalog repository. The currently defined catalog is shown in Figure 5, where each DP is represented as the root of a hierarchy (i.e., the darker rectangle(s) in each box) while each descendant (i.e., the lighter rectangles) represents a DP variant. The catalog is composed of 18 patterns detected by 56 variants. The detection relationship between a variant and the mined pattern is depicted using a dashed arrow. Inheritance between specification is depicted as the standard UML generalization. A description of the DSL code and DPGs for the most relevant DPs specifications and their variants is provided in the Appendix. In the remaining of this section, in order to show how DSL statements are used to specify design patterns to mine, the Observer multi-event variant is described in more details. 4.1. Observer In Section 3 a description of both DSL and DPG for the Observer DP has been provided. Here, we give the DSL specification and DPG of a variant of this DP, known as multi-event Observer and often used in real-world software systems. This variant takes into account also notify methods with one or more parameters and indirect calls to update methods. The structure of this specification, reported in Figure 6, is quite different from the one proposed in literature (shown in Figure 2) and requires the following properties: • • • • • •

a single AbstractObserver (AO) and several ConcreteObserver (CO); a single AbstractSubject (AS) and several ConcreteSubject (CS); a container of AbstractObservers to be defined in the ConcreteSubject (the field “o”); an abstract event class EH modeling the event concept; one or more concrete event classes CE modeling the concrete events; the methods A and R (that play roles of add and remove) to be defined in the AbstractSubject and overridden in ConcreteSubjects; • a Delegation to be defined between A and R of ConcreteSubject and the add/remove methods Container type; • notify methods set (called “N”); each method of the set must contain an invocation towards the update method of the AbstractObserver classifier; • an object creation (to initialize the field “o”) in the constructor of the ConcreteSubject type. These properties together define the DPG reported in Figure 7. As a final consideration, the DSL specifications are, of course, critical to the whole detection process that is deeply affected by the correctness, the completeness and the conciseness by which ¶ Actually

for each system element that has not already bounded to a pattern role that explicitly requires unique bindings

for it c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

Prototype

15

Proxy

Observer

GoF Observer GoF Prototype

Deep Clone Prototype

GoF Prototype With Manager

Client-init Prototype ProxyIndirection

Multiple Events Observer with Parameter

Client-init Prototype With Manager

Multiple Events Observer with Hierarchy

Deep Clone Prototype With Manager

Command

Factory Method

Abstract Factory

Template Method

GoF Command

Command With Generics

Factory Method Classic GoF

CommandInnerClasses

Command with Explicit Execution Context

Single Creator

Delegation Based

Abstract Factory Classic GoF

Template Method with Generics

Parametrized

Command with Undo/Redo

Command with Undo/Redo separate Engine

Single Factory

RegistryBased

Parametrized AF

Composite

Singleton

Decorator Memento

GoF Composite EnumerationSingleton Container within Component

GoF Singleton

Composite with Components Interator

Decorator

Multiple Composites Pool

Relaxed Sigleton

AdapterWithGenerics

TwoWay-Adapter

Reflective Singleton

Multiple Composites Roots

Adapter

Bridge

Object Adapter

GoF Bridge

AdapterInnerClasses

Class Adapter

GoF Memebto

Iterator

GoF Iterator

Bridge With Generics

Deferred Bridge

DynamicAdapter

State

internal-multiple

external

magic-cookie

external-nested

Strategy

Builder

Visitor

GoF Strategy

GoF Builder

GoF Visitor

GoF State

uml-statechart

Figure 5. The DPs specifications hierarchies

they are written. For instance writing a specification with conflicting rules is legal within our DSL, but will result in bad mining performances and will obviously produce no results. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

pattern type has has has }

Observer multievent { AS( 1 ) { method A , R methods−set N container o of −type CO

type AO( 1 ) { has methods−set U

}

type EH( 1 ) { has method−set G

}

type CE( ∗ ) { i n h e r i t s from EH

}

type CO( ∗ ) { i n h e r i t s −from AO } type CS( ∗ ) { i n h e r i t s from AS has constructor c { object −c r e a t i o n o }

overrides method A , R { delegates to o }

overrides methods−set N each { delegates to o c a l l s u i n CO. U }

}

}

Figure 6. DSL of the Multi-event Observer specification.

Figure 7. The Multi-Event Observer Design Pattern Graph

5. THE DESIGN PATTERNS FINDER TOOL The Design Patterns Finder (DPF) Tool implements all the steps of the identification process. It was developed as a set of Ecplise plug-ins based upon JDT, and upon the EMF framework. Figure 8 shows the overall architecture of the DPF tool. It is a layered architecture: the bottom layer includes c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

17

Figure 8. The architecture of Design Pattern Finder tool.

Figure 9. DP-Finder Tool: DPF integration in Project Explorer

c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

18

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 10. DP-Finder Tool: The DP Specification Editor

the Eclipse Foundation Components [1]. Indeed the tool uses the JDT and the Eclipse Modeling Platform (Xpand, Xtext and MoDisco) to extract the needed information about the systems static structure. The middle layer (DPF Core) includes the main components of the DPs identification process. Three main sub-systems are included in this layer. The Project Analyzer produces an instance of the meta-model (i.e., the system Graph) of the analyzed system by static analysis. At this aim the following information is extracted from the analyzed system: type hierarchy, type inner structure (attributes, their types and scopes and so on), methods and constructors signatures, method calls, object creations and container support in order to express containment within types, static member information, delegation. The Pattern Specification Parser and Translator sub-system registers the DSL specifications of each DP, and parses them to produce the corresponding DPGs. Each specification is written using the defined DSL that is translated, by means of the Xtext-based DSL translator in a set of constraints. A Pattern Catalog stores the specifications (both DSLs and corresponding DPGs) of the DPs to be mined (currently the Factory Method, Prototype, Singleton, Adapter, State, Strategy, Composite, Decorator, Observer, Memento, Template Method, Command, Proxy, Bridge and Visitor DPs are included in the catalog). Each pattern specification can be standalone or can override a base specification by changing some of its properties. The Design Pattern Finder Engine is the heart of the mining process. It executes the detection algorithm to identify the DP instances by searching the system graph for sub-graphs matching a defined DPG. Once the system has been parsed and the meta-model instance is built, the user can select which DPs are to be detected. The execution of the algorithm produces a model of the system elements annotated with information on the detected patterns (including with their internal members and pattern roles). Each identified pattern instance is traced to the source code elements implementing it and the tool allows a user to visualize, inspect and analyze such code components. The results of the identification process are also stored in the central repository. The DPF IDE layer allows interactions with users using the tool as shown in figures 9 and 10. The user can select which patterns are to be searched and analyzed in a system by means of a tree viewer or using the Eclipse Visualizer. The Visualizer also shows a summary of patterns found in the analyzed system at different levels (package or class). c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

System Name

Version

Junit Lexi JHotDraw QuickUML Nutch PMD JRefactory Log4J JHotDraw Voldemort Apache Avro JDT

3.7 0.1 5.1 2.1 0.4 1.8 2.6.24 1.2.15 7 1.3.x 1.6 3.6.1

Size (KLOC) 4,9K 7,1K 8,9K 9,2K 23,6K 41,5K 98,5K 43,7K 78,5K 85,9K 125,2K 511,5K

#Types 104 100 174 230 335 519 568 176 567 382 1085 1655

19

#Methods 648 677 1316 1082 1854 3665 4234 863 5728 5312 8451 24153

Table II. Analyzed systems characteristics

However, a full description of the funtionalities provided by the DPF tool and how a user can interacts with DPF are out of the scope of this paper.

6. CASE STUDY The effectiveness and efficiency of the proposed approach has been validated applying it to twelve OO systems. A first group contains seven open source java software systems of increasing sizes from the publicly available benchmarks proposed in [32] and in [2]. Moreover, we consider a second group of five open source java software systems (Log4j, JHotDraw7, Apache Avro, JDT, Voldemort) selected to perform a direct comparison with a similar design patterns mining approach proposed by Tsantalis in [61]. All the analyzed systems along with the main structural characteristics are listed in the Table II. Systems, in both groups, were chosen of increasing sizes to evaluate the scalability of the algorithm and to validate the quality of results on a large code base. The DPs considered in this case study are the ones reported in Figure 5. The validation was performed using the DPF tool developed to support the approach. According to [55], design pattern recovery techniques can be evaluated by computing precision and recall [51], in order to asses their effectiveness and correctness. To compute recall and precision we assume that a pattern instance can be classified into one of four categories: • • • •

true-positive (TP : correctly found), false-positive (FP : incorrectly found), true-negative (TN : correctly missed), false-negative (FN : incorrectly missed).

Precision is defined as the ratio of correctly found occurrences to occurrences provided by the tool and is given by: P recision = TP /(TP + FP ) (1) Recall is the ratio of correctly found occurrences to all correct occurrences and is given by: Recall = TP /(TP + FN ).

(2)

To verify the correctness of the results, in the case of the first group of seven systems, we considered as Gold Standard (GS) the union of both the benchmarks cited in [32] and in [2] c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

20

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

(assumed to be correct) with the correct results produced by our approach (i.e., instances not in the benchmarcks and mainly due to DP variants detection evaluated by code inspection). ∥ Each DP instance in the resulting GS was classified as a DP variant according to the defined catalog. For the remaining five systems, the GS was computed using the correct results produced by both DPF and DPD tools. Hence it could lack pattern instances missed by both tools (overestimating recall) but allows to perform a direct (and reliable) comparison on precision.

7. DISCUSSION OF RESULTS 7.1. Results on benchmarks The study was carried on by applying DPF to find the DPs contained in the first group of seven java software systems reported in Table II. In particular, the analysis on JRefactory 2.6.24 system (shown in Figure IX) allows a direct comparison with results provided for DPD tool in [61], for the tool presented in [2] and for ePad Eclipse-based tool presented in [20]. Tables from III to IX report, for each of the analyzed systems: the name of the DPs searched in the code (first column), the number of true positive instances as provided by the benchmark (GS), the number of each searched pattern detected by the proposed approach (column D), the number of true positive found by DPF (column Tp), the number of false positive and negatives found by DPF (columns Fp and Fn). The last two columns report respectively precision (P) and recall (R) computed on the results as provided by DPF (using the gold standard). In order to improve readability, only the variants for which either the benchmarks or DPF produced results are reported. The sets of instances detected for the variants of a given DP can be non-disjoint sets, since an overlap among istances satisfying more variants’ specifications is possible. This means that, with respect to the values reported in the tables, the sum of the numbers of the instances of all the variants of a DP may be greater than the instances of that DP actually implemented in the code. As shown from Tables III to IX, the average value for precisione and recall are respectively, 0.95 and 0.87. These values are also consistent for increasing system sizes. Patterns like Command, Composite, Observer but also Visitor are more precisely identified since their specifications include both static and behavioral relationships. This is confirmed by the number of false positives that is lower than patterns with a less constrained structure or with limited or absent behavioral properties. This, as highlighted in the Section 3.2, is due to the higher pruning power of complex DPGs with respect to the simpler ones. As results show, the proposed approach is able (depending on how specifications are written) to distinguish among patterns that have the same static structure but different behaviors. For example, for the Command pattern, in order to distinguish it from the Adapter one (the object version), the specification uses the invocation property requiring the execute method, in the concrete subclass, to invoke a method of a class bound to a Command. The same happens for Composite and Decorator, where the Decorator is required to specify a delegation towards the decorated object. Singletons are mined using several variants, from the classic GoF (requiring a private constructor, a final class declaration and a single getter static method taking no parameters) to the “relaxed” one (that remove such constraints). To include singletons instantiated using enumerations or static fields with public access the EnumerationSingleton variant was defined. As Table IX, according to DSL specifications some several singleton instances were found that are not compliant to GoF definition and would be missed without the variant specifications. In particular, DPF detected 43 of 45 instances of RelaxedSingleton of which 8 instances are not compliant to GoF, two are False Negatives and the remaining 35 ones share both GoF and Relaxed Singleton Specification (i.e. both specifications were met). In addition, 24 instances were found that satisfy the EnumerationSingleton specification. ∥ Of

course, the different formats of the benchmarks were translated into a unique common format to store the considered

GS. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

System→

21

01-junit

↓Design Pattern

GS

D

TP

FP

FN

P

Adapter/spec{InnerClass}

10

10

10

0

0

1

Adapter/spec{GoFObject}

22

18

17

1

5

0,94

0,77

Command/spec{ExecutionEngine}

57

59

54

5

3

0,92

0,95

Composite/spec{GoF}

3

3

3

0

0

Composite/spec{MultipleCompositeRoot}

4

4

3

1

1

0,75

Composite/spec{MultipleComposite}

4

5

4

1

0

0,8

1

Decorator/spec{GoF}

4

5

4

1

0

0,8

1

Factory Method/spec{SingleCreator}

3

3

3

0

0

1

Factory Method/spec{GoF}

8

7

7

0

1

1

Factory Method/spec{SingleFactory}

11

11

10

1

1

Factory Method/spec{Parametrized}

4

4

4

0

0

1

Factory Method/spec{DelegationBased}

3

2

2

0

1

1

Iterator/spec{External}

6

6

6

0

0

1

1

Memento/spec{GoF}

2

2

2

0

0

1

1

Observer/spec{EventsAsParams}

6

6

6

0

0

1

1

Singleton/spec{Enumerative}

2

2

2

0

0

1

1

Singleton/spec{Relaxed}

2

2

2

0

0

1

Strategy/spec{GoF}

14

12

12

0

2

1

Template Method/spec{GoF}

22

24

19

5

3

1

0,91

0,79

R 1

1 0,75

1 0,88 0,91 1 0,67

1 0,86 0,86

Table III. Results obtained on benchmark 01-junit

System→

02-Lexi

↓Design Pattern

GS

D

TP

FP

FN

Adapter/spec{InnerClass}

36

35

33

2

3

P

R

0,94

0,92

Builder/spec{GoF}

5

5

5

0

0

Command/spec{GoF}

36

33

32

1

4

0,97

0,89

Command/spec{InnerClasses}

37

35

34

1

3

0,97

0,92

Factory Method/spec{SingleCreator}

4

3

3

0

1

1

0,75

Factory Method/spec{GoF}

4

3

3

0

1

1

0,75

Factory Method/spec{SingleFactory}

2

1

1

0

1

1

0,5

Factory Method/spec{Parametrized}

5

4

4

0

1

1

0,8

Factory Method/spec{RegistryBased}

3

2

2

0

1

1

0,67

Factory Method/spec{DelegationBased}

5

4

4

0

1

1

0,8

Observer/spec{GoF}

5

5

5

0

0

1

1

Observer/spec{EventsAsParams}

6

6

6

0

0

1

Singleton/spec{GoF}

3

2

2

0

1

1

Singleton/spec{Enumerative}

5

4

3

1

2

0,75

0,6

Singleton/spec{Relaxed}

10

9

8

1

2

0,89

0,8

State/spec{GoF}

2

2

2

0

0

Strategy/spec{GoF}

11

12

11

1

0

Template Method/spec{GoF}

5

4

4

0

1

1

1 0,92 1

1

1 0,67

1 1 0,8

Table IV. Results obtained on benchmark 02-Lexi

Similar considerations can be made for Factory DPs: the Parametrized variant for both Abstract Factory and Factory method allowed to discover several istances of creator method (requiring parameters to select the product) that otherwise would be missed by DPF. Indeed, 18 Factory Method Parametrized instances were found in addition to the 16 GoF ones. Moreover, inspecting the c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

22

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→

03-JHotDraw

↓Design Pattern

GS

D

TP

FP

FN

P

R

Adapter/spec{GoFObject}

32

29

29

0

3

1

0,91

Bridge/spec{GoF}

37

36

35

1

2

Builder/spec{GoF}

2

0

0

0

2

Command/spec{UndoRedo}

15

11

11

0

4

Command/spec{GoF}

65

64

61

3

4

Command/spec{UndoRedoEngine}

11

10

10

0

1

1

Command/spec{ExecutionEngine}

20

20

20

0

0

1

Composite/spec{GoF}

16

19

14

5

2

Decorator/spec{GoF}

31

29

29

0

2

1

0,94

Factory Method/spec{SingleCreator}

10

9

9

0

1

1

0,9

Factory Method/spec{GoF}

3

2

2

0

1

1

0,67

Factory Method/spec{SingleFactory}

10

9

9

0

1

1

0,9

Factory Method/spec{Parametrized}

15

14

12

2

3

Factory Method/spec{RegistryBased}

2

1

1

0

1

1

0,5

Factory Method/spec{DelegationBased}

8

5

5

0

3

1

0,62

Memento/spec{GoF}

2

2

2

0

0

1

1

Observer/spec{EventsAsHierarchy}

4

4

4

0

0

1

1

Observer/spec{GoF}

5

5

5

0

0

1

1

Observer/spec{EventsAsParams}

5

5

5

0

0

1

1

Prototype/spec{GoF}

8

8

8

0

0

1

Singleton/spec{GoF}

6

5

5

0

1

1

Singleton/spec{Enumerative}

3

3

3

0

0

1

Singleton/spec{Relaxed}

4

4

4

0

0

1

Strategy/spec{GoF}

49

37

36

1

13

0,97

0,73

Template Method/spec{GoF}

67

61

59

2

8

0,97

0,88

0,97

0,95 0

1 0,95

0,74

0,86

0,73 0,94 0,91 1 0,88

0,8

1 0,83 1 1

Table V. Results obtained on benchmark 03-JHotDraw

results, we noted that in the benchmark out of 18 GoF instances, 7 were DelagationBased variants (delegating the creation to other objects ) of which 6 instances were correctly mined by DPF. We found that the number of false negatives was dramatically reduced by adding new variants inheriting existing specifications and taking into account the structural differences that caused the tool to miss them. The false negatives (computed as GS − Tp in Tables from III to IX) were related to patterns implemented differently from what assumed in the specification (our catalog is, for the most part, based on the definitions given in literature [29] and their known variants ). As an example, Table III shows the variants detected for the JUnit 3.7 system. In this small system a limited number of pattern is found, however is interesting that the five variants of Factory Method identify several factory methods that were not present in the benchmarks. It’s worth observing that in some cases (not all), variants are not mutually exclusive with respect to the instances. Single creator factory mtehods share the 7 GoF instances and the same happens for the Composite variants for which the MultipleComposite or MultipleCompositeRoot and Decorator variants can overlap on the same instances. DPF found several Command instances using an external execution engine. These patterns actually are not part of the main code of the framework (and correctly not included in the benchmark) but are sample tests of the junit distribution shipped with the framework. These tests indeed use the TestRunner as an executor implementing the structure and behaviour required for the proposed Command-ExecutionEngine variant (and for these reason were added to the gold standard). It’s also worth observing that mining results obtained for specifications focused on the detection of pattern variants using inner classes are quite good raising the resulting precision and recall of the overall pattern family. This is especially true for Commands (in Lexi, QuickUML and Nuch c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

System→

23

04-QuickUML

↓Design Pattern

GS

D

TP

FP

FN

P

R

Abstract Factory/spec{GoF}

11

9

9

0

2

1

0,82

Adapter/spec{InnerClass}

19

20

18

2

1

0,9

0,95

Adapter/spec{GoFObject}

48

44

41

3

7

0,93

0,85

Builder/spec{GoF}

12

11

10

1

2

0,91

0,83

Command/spec{GoF}

10

8

7

1

3

0,88

0,7

Command/spec{InnerClasses}

12

12

12

0

0

1

Composite/spec{ContainerWithinComposite}

6

4

4

0

2

1

0,67

Composite/spec{GoF}

6

4

4

0

2

1

0,67

Composite/spec{MultipleCompositeRoot}

6

6

6

0

0

1

Composite/spec{MultipleComposite}

6

4

4

0

2

1

Factory Method/spec{Parametrized}

2

2

2

0

0

1

1

Factory Method/spec{RegistryBased}

1

1

1

0

0

1

1

Factory Method/spec{DelegationBased}

2

2

2

0

0

1

Observer/spec{GoF}

17

17

16

1

1

Prototype/spec{GoF}

10

9

9

0

1

1

0,9

Proxy/spec{Indirection}

4

3

3

0

1

1

0,75

Proxy/spec{GoF}

6

3

3

0

3

1

0,5

Singleton/spec{GoF}

3

3

3

0

0

1

1

Singleton/spec{Enumerative}

2

3

2

1

0

Singleton/spec{Relaxed}

3

3

3

0

0

Strategy/spec{GoF}

15

18

12

6

3

0,67

0,8

Template Method/spec{GoF}

38

30

29

1

9

0,97

0,76

0,94

0,67 1

1

1 0,67

1 0,94

1 1

Table VI. Results obtained on benchmark 04-QuickUML

systems) and Adapter (in JUnit, QuickUML ). For instance, in JUnit the overall recall of Adapter with InnerClass variant is 0.87 (evaluated considering all the instances of the first two rows of Table III). Removing the Adapter-InnerClass variant the recall becomes 0.53 (evaluated considering the 10 Adapter-InnerClass instances as false negatives). Moreover result confirms that patterns mined with a higher number of (non-overlapping) specifications have better overal results with respect to ones with few specifications. This is also influenced by the pattern complexity since simple patterns (from structural and behavioural point of view) require fewer variants than complex ones to be mined efficiently. Two notable examples are the Singleton and Factory Method pattern families comprised respectively of 5 and 6 variants having the highest overall precision and recall on all the systems. 7.2. A comparison with DPD approach A comparison between results of our approach and those obtained using the similarity scoring approach [61] was performed. DPD approach was chosen mainly because it adopts a similar technique (exploiting a similar set of information, but used in a different way). We would like to point out that the comparison is between the two approaches not between the two tools. A comparison of DPF tool with other DP mining tools available in literature would be our future work. The results obtained are synthesized in Tables X (for the systems JHotDraw 7 and Avro 1.6), in Table XI (systems Log4J 1.2 and JDT 3.6) and in Table XII (system Voldemort). The tables report the name of the detected DPs (first column) and the number of patterns considered as Gold Standard (GS). The Gold Standard (GS) used as reference is the set of all true positive instances. It was computed using the correct results produced by both DPF and DPD tools (version 4.5). Hence it could lack pattern instances missed by both tools but allows a direct comparison. The remaining columns in the table, for each tool, report the number of D (detected), c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

24

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→ ↓Design Pattern

05-nutch GS

D

TP

FP

FN

P

Abstract Factory/spec{GoF}

1

1

1

0

0

1

Adapter/spec{GoFObject}

68

66

64

2

4

0,97

0,94

Bridge/spec{GoF}

25

23

22

1

3

0,96

0,88

Command/spec{GoF}

50

47

46

1

4

0,98

0,92

Command/spec{InnerClasses}

9

7

7

0

2

Command/spec{ExecutionEngine}

12

7

6

1

6

0,86

0,5

Decorator/spec{GoF}

27

26

25

1

2

0,96

0,93

Factory Method/spec{SingleCreator}

1

1

1

0

0

1

Factory Method/spec{GoF}

11

7

7

0

4

1

Factory Method/spec{SingleFactory}

3

3

3

0

0

Factory Method/spec{Parametrized}

26

30

26

4

0

Factory Method/spec{DelegationBased}

8

5

5

0

3

1

1 0,87

R 1

0,78

1 0,64 1 1

1

0,62

1

0,75

Iterator/spec{GoF}

4

3

3

0

1

Memento/spec{GoF}

15

14

13

1

2

Prototype/spec{GoF}

2

2

2

0

0

1

1

Prototype/spec{GoFManager}

2

2

2

0

0

1

1

Proxy/spec{Indirection}

7

7

7

0

0

1

Proxy/spec{GoF}

43

39

38

1

5

Singleton/spec{GoF}

11

7

7

0

4

1

Singleton/spec{Enumerative}

2

2

2

0

0

1

Singleton/spec{Pool}

4

4

4

0

0

1

Singleton/spec{Relaxed}

46

43

42

1

4

Strategy/spec{GoF}

39

36

36

0

3

1

0,92

Template Method/spec{GoF}

38

31

31

0

7

1

0,82

0,93

0,97

0,98

0,87

1 0,88 0,64 1 1 0,91

Table VII. Results obtained on benchmark 05-nutch

Tp (true positives), Fp (false positives) and the computed values of Precision (P) and Recall (R) computed on the results as provided by the tool and validated by an expert. The first consideration about the results is related to the presence of false positive and false negative. The percentage of false positive is less than 0.4% and 2% for respectively DPF and DPD that is quite acceptable for both tools. However for some patterns, and for both the approaches, the number of false positive is particularly higher than for other patterns. This happens, for DPF on JHotDraw 7, for Template Method patterns in which of 110 instances, 10 instances were not template methods. Inspecting those cases revealed a problem in the structure of the specification that, even correct, was too relaxed. This caused some internal helper methods to be considered as template methods. For the Observer pattern (in JHotDraw 7) the results were similar, since the approach detected 105 observers instances (one for each concrete participant) but 9 of them were not Observers. The case of Observer design pattern is also interesting for what concerns the detection of patterns variants. DPF detected 96 true Observer instances on JHotDraw 7 (with 9 false positives) and 48 instances on Apache Avro 1.6 (no false positives found). Our pattern specification repository actually was comprised of 2 variants for the Observer pattern. The first one exploits standard Java types (Observable class and Listener interface). The second one is the multi-event observer reported in Figure 7. Inspecting the matched instances we found that, for the both JHotDraw 7 and Apache Avro 1.6 systems, the 96 and 48 instances respectively were all variants of the second type. This also explains why DPD tool, that is based on the classic variant, was not able to find observers on these two systems. The situation is inverted in JDT and Log4J systems for which in both cases a single event observer is found by DPD but not by DPF. Inspecting source code we found that the c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

System→

25

06-PMD

↓Design Pattern

GS

D

TP

FP

FN

P

Adapter/spec{GoFObject}

10

10

10

0

0

1

Builder/spec{GoF}

12

12

11

1

1

Command/spec{GoF}

10

6

6

0

4

Composite/spec{GoF}

8

8

7

1

1

Decorator/spec{GoF}

5

5

5

0

0

1

Factory Method/spec{SingleCreator}

3

2

2

0

1

1

Factory Method/spec{GoF}

22

18

17

1

5

Factory Method/spec{SingleFactory}

15

12

12

0

3

Factory Method/spec{Parametrized}

7

6

5

1

2

0,83

Factory Method/spec{RegistryBased}

2

3

2

1

0

0,67

Factory Method/spec{DelegationBased}

9

7

7

0

2

1

Iterator/spec{GoF}

4

5

4

1

0

0,8

1

Observer/spec{GoF}

3

3

3

0

0

1

1

Proxy/spec{GoF}

3

4

3

1

0

Singleton/spec{GoF}

9

9

9

0

0

1

Singleton/spec{Enumerative}

1

1

1

0

0

1

Singleton/spec{Relaxed}

8

8

7

1

1

Strategy/spec{GoF}

29

28

28

0

1

1

Template Method/spec{GoF}

12

12

12

0

0

1

Visitor/spec{GoF}

92

80

80

0

12

1

0,92 1 0,88

0,94 1

0,75

0,88

R 1 0,92 0,6 0,88 1 0,67 0,77 0,8 0,71 1 0,78

1 1 1 0,88 0,97 1 0,87

Table VIII. Results obtained on benchmark 06-PMD

DSL specifications missed the observer with a notify method taking one or more arguments or is indirectly called (before the actual call to the update(. . . ) methods on listener). Another interesting case is related to the Adapter and Command patterns since they have very similar static structure. The DPF tool is able, if needed, to distinguish between them by adding behavioural constraints (clarifying how pattern is used by its context) while DPD (and many other structural approaches) is not able to do this. For JhotDraw and Avro our specification was not able to distinguish among them while in JDT and Log4J cases we added further constraints leading to a better identification. In particular for the Command pattern, in both cases, DPF obtained better results in terms of both precision and recall. Actually for precision and recall we can observe that while precision is quite high for both the tools, recall for DPF is generally higher than DPD (as shown in Figure 11∗∗ ). This results are confirmed by the details showing that both tools are good at keeping the number of FP low, but for DPF the number of TP is, in the average, higher than DPD. Finally, the DPs detection was performed on the Voldemort system. The obtained results are in Table XII. Voldemort system was selected since it contains a large portion of automatically generated code, allowing to study the impact of code generation on pattern mining. As shown in Table XII, the values of precision and recall for DPF are quite high (respectively greater than 0.64 and 0.82). For DPD, while the average value of precision is similar (0.91) to the DPF one, the values of recall are not satisfactory if compared to the ones of DPF. A manual code inspection revealed that the high number of missed Prototype, Adapter and Singleton instances are mostly present in the classes generated from the ProtocolBuffer code generator†† specifications. The specifications we used in the catalog helped to obtain better results since we added variants capable of detecting inner

∗∗ The

average is performed taking into account all the meaningful precision/recall scores obtained for all searched patterns on the considered systems. †† The Google protocol buffer code generator hosted in http://code.google.com/p/protobuf/. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

26

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→

07-JRefactory

↓Design Pattern

GS

D

TP

FP

FN

P

R

Abstract Factory/spec{GoF}

2

2

2

0

0

1

1

Abstract Factory/spec{Parametrized}

1

1

1

0

0

1

Adapter/spec{GoFObject}

49

39

38

1

11

0,97

0,78

Adapter/spec{Dynamic}

38

33

29

4

9

0,88

0,76

Builder/spec{GoF}

4

4

4

0

0

Command/spec{GoF}

84

87

82

5

2

Command/spec{ExecutionEngine}

10

6

6

0

4

1

0,6

Factory Method/spec{SingleCreator}

9

7

7

0

2

1

0,78

Factory Method/spec{GoF}

18

16

15

1

3

Factory Method/spec{SingleFactory}

14

12

12

0

2

1

0,86

Factory Method/spec{Parametrized}

22

18

18

0

4

1

0,82

Factory Method/spec{RegistryBased}

11

11

11

0

0

1

Factory Method/spec{DelegationBased}

7

6

4

2

3

Memento/spec{GoF}

11

10

10

0

1

1

Observer/spec{EventsAsParams}

3

3

3

0

0

1

Proxy/spec{Indirection}

31

33

31

2

0

0,94

Proxy/spec{GoF}

31

31

30

1

1

0,97

Singleton/spec{GoF}

12

12

12

0

0

1

Singleton/spec{Enumerative}

24

24

24

0

0

1

Singleton/spec{Relaxed}

45

43

43

0

2

1

1

1

1

0,94

0,94

0,98

0,83

1

0,67

0,57 0,91 1 1 0,97 1 1 0,96

State/spec{GoF}

8

8

7

1

1

Strategy/spec{GoF}

50

49

49

0

1

Template Method/spec{GoF}

131

155

131

24

0

0,85

Visitor/spec{GoF}

136

132

131

1

5

0,99

0,88 1

0,88 0,98 1 0,96

Table IX. Results obtained on benchmark 07-JRefactory

System→

JHotDraw 7

Tool→

Apache Avro 1.6

DPF

↓Design Pattern

GS

D

TP

FP

Observer

96

105

96

Singleton

32

32

32

DPD

DPF

DPD

P

R

D

TP

FP

P

R

GS

D

TP

FP

P

R

D

TP

FP

P

9

0.91

1

0

0

0

-

-

48

48

48

0

1

1

0

0

0

-

0

1

1

31

30

1

0.96

0.93

27

27

26

1

0.96

0.96

21

18

3

R -

0.85

0.66

Factory Method

20

18

18

0

1

0.9

4

2

2

0.5

0.1

12

12

12

0

1

1

1

1

0

1

0.1

Template Method

110

110

100

10

0.9

0.9

16

10

6

0.63

0.09

32

28

28

0

1

0.88

13

9

4

0.69

0.28

Adapter/Command

155

123

120

3

0.97

0.77

79

71

8

0.89

0.45

33

30

30

0

1

0.90

12

9

3

0.75

0.27

Decorator

6

6

6

0

1

1

4

4

0

1

0.67

6

5

5

0

1

0.83

6

6

0

1

1

Prototype

46

34

31

3

0,91

0,67

113

40

73

0,35

0,86

5

7

5

2

0.71

1

0

0

0

-

State Strategy

194

168

165

3

0.92

0.85

213

171

42

0.80

0.88

116

137

114

23

0.82

0.98

18

18

0

1

0.15

Composite

6

5

5

0

1

0.83

6

6

0

1

1

0





















-

Table X. Precision and Recall for JHotDraw 7 and Apache Avro 1.6

classes and generic types that are consistently and widely used by the large automatically generated classes. An overview of the average precision and recall obtained for each system by both the DPD and DPF tools is shown in Figure 11. The figure shows that the precision of DPF is always greater than the one of DPD with the exception of the Log4j system. Finally, the recall values obtained for DPF are largely better than the corresponding values for DPD. 7.3. Performance issues When running the DPF tool, we have measured the execution times for each step of the detection process. Table XIII reports the measured values for each of the analyzed systems and Figure 12 c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

27

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH System→

JDT

Tool→

Log4J

DPF

DPD

DPF

DPD

↓Design Pattern

GS

D

TP

FP

P

R

D

TP

FP

P

R

GS

D

TP

FP

P

R

D

TP

FP

P

Observer

4

0

0

0

-

-

4

4

0

1

1

1

0

0

0

-

-

1

1

0

1

1

Singleton

53

66

51

15

0.77

0.96

31

31

0

1

0.58

1

33

32

1

0.97

1

10

10

0

1

0.31

R

Factory Method

288

278

276

2

0.99

0.96

21

18

3

0.86

0.06

12

12

11

1

0.92

0.92

2

2

0

1

0.17

Template Method

564

554

554

0

1

0.98

94

93

1

0.99

0.16

48

48

48

0

1

1

6

6

0

1

0.13

Adapter

673

673

673

0

1

1

274

256

18

0.93

0.38

50

49

49

0

1

0.98

15

15

0

1

0.30

Command

278

281

278

3

0.99

1

0

0

0

-

4

5

4

1

0.80

1

0

0

0

-

-

Decorator

25

24

24

0

1

0.96

10

10

0

1

2

3

2

0

0.67

1

0

0

0

-

-

Prototype

33

35

33

2

0,94

1

0

0

0

-

1

1

1

0

1

1

0

0

0

-

-

State Strategy

181

187

174

13

0.96

0.96

145

140

5

0.96

0.77

17

22

17

5

0.77

1

17

17

0

1

1

0.40 -

Composite

1

0

0

0

-

-

1

1

0

1

1

1

1

1

0

1

1

0

0

0

-

-

Proxy

13

13

13

0

1

1

0

0

0

-

-

2

2

2

0

1

1

0

0

0

-

-

Table XI. Precision and Recall for JDT and Log4J

System→

Voldemort

Tool→

DPF

DPD

↓Design Pattern

GS

D

TP

FP

P

R

D

TP

FP

P

Observer

9

9

9

0

1

1

0

0

0

-

Singleton

148

150

148

2

0.99

1

91

70

21

0.77

0.47

0.78

0.08

R -

Factory

89

84

83

1

0.99

0.93

9

7

2

Template

22

20

18

2

0.90

0.82

10

10

0

Adapter

245

245

245

0

1

1

64

61

3

Command

1

1

1

0

1

1

0

0

0

-

Decorator

14

16

14

2

1

13

13

0

1

0.93

Prototype

139

139

139

0

1

119

119

0

1

0.86

State-Strategy

112

170

109

61

103

96

7

Composite

3

3

3

0

0

0

0

0.88 1 0.64 1

0.97 1

1 0.95

0.93 -

0.45 0.25 -

0.86 -

Table XII. Precision and Recall Voldemort

Figure 11. DPF and DPD Precision (left) and Recall (right) for the analyzed systems

shows the total and average detection times of DPD and DPF tools for each systems. The pattern matching step is the most CPU time consuming. We cannot show detection times for each pattern since our approach uses the successful identifications across pattern specifications as constraints to improve the performance and hence the detection times are dependent on that. However, we calculated the average time to detect a single pattern and it resulted to be comparable to the other structural approaches. The total times in Table XIII, show that DPF exhibits a better scalability with respect to DPD Tool. Experimentation performed for tuning the patterns specifications, showed that c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

28

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→

JHotDraw 7

Avro 1.6

Log4J

Voldemort

Total Time

1281.503

6320.815

Average per pattern

142.389

702.313

12619.523

701.936

2930.562

1402.169

77.992

Parsing&AST extraction

69.974

325.618

255.23

414.181

39.203

Meta-model generation

137.809

428.147

1567.785

2782.866

239.872

781.442

Pattern repository detection

627.090

2296.271

4451.886

351.331

1454.022

Total Time

1125.211

4120.287

7369.613

630.406

3034.226

Average per pattern

125.023

457.810

822.631

70.044

387.224

Tool↓

Step↓

DPD

DPF

JDT Times (s) ↓

Table XIII. Execution times of the design patterns detection process

performances can be considerably improved by identifying structural and behavioral constraints that are effective at identifying a well defined variant of a pattern. Hence the approach is more effective when specifications are structured in a hierarchy and each specification is dedicated to specific pattern variants.

Figure 12. DPF and DPD total (left) and average per pattern (right) detection times

Another interesting point is that the algorithm is faster when actually the searched DPs are found in the analyzed system. On the contrary the worst performances are when no DP instances are found in the system. This is because all matches need to be executed anyway and no existing bindings are used to reduce the list of remaining matches to be evaluated. 7.4. Threats to Validity Construct validity threats concern the relationship between theory and observation. There could be imprecision and omissions in the measurements made in the proposed validations for several reasons. One of the most important limitation regards the generation of behavioral properties in presence of late binding. In this case, as already stated, we have built a call graph using Rapid Type Analysis (RTA) to reduce the set of possible callers. However the call graph still contains a super-set of the actual calls. In the properties extraction algorithm we decided to take into account the sets of possible targets in order to perform the matches. In this way, we surely don’t miss any possible binding but exposes the algorithm to the presence of false positives (since the set of successful binding can be a super-set of the actual ones). Further experimentation (with more strict policies) should be performed in order to assess if the behavior of the algorithm improves with respect to this conservative choice. Conclusion validity concerns the relationship between the treatment and the outcome. As explained in Section 6, we performed a comparison with seven systems of open benchmarks. Hence our results are still exposed to bias and human mistakes or subject to interpretation but, being the benchmark publicly available and evolved over several years, these effects should be limited. Conversely, for the second group of analyzed systems, the gold standard was computed comparing results obtained using DPF and DPD approaches. This means that, for these systems, we cannot exclude that the computation of recall is imprecise (it could c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

29

be higher than actual one since there could exist pattern instances missed by both tools used in the study to compute the gold standard). This, in future work, can be improved by performing a full analysis of the searched source code base (or using available benchmarks). Threats to internal validity concern factors that can influence our observations. In this case the identification of pattern instances was based on the expert examination of internal/external documentation and source code and hence could pose a threat to internal validity affecting the number of false negatives. Threats to external validity concern the generalization of our findings. Of course replication on further systems to confirm or contradicts the obtained results is always desirable. Moreover, we cannot claim that our approach produces the same results on different (and larger) systems. Rather, we provide quantitative information on the quality of the search for several real world systems and can affirm that precision and recall have remained consistent and independent with respect to the system size. On the performance side there is a high dependency of the overall detection performance on the quality of pattern specifications. When specifications are badly written (that means few and overlapping constraints) the performance of the algorithm degrades rapidly.

8. CONCLUSIONS AND FUTURE WORK DPs identification in a software system, together with the knowledge about the components involved in each DP instance, greatly helps system comprehension. The approach presented in this paper provides an efficient support to the detection of DP instances in an OO system. It exploits a metamodel and a derived DSL defined to represent both the patterns and the system under study. The detection process is carried out matching each model of a design pattern with the system model one. The defined DSL allows to easily specify the model of the DPs structure by the Properties characterizing them. Moreover, the DSL makes easy to specify the variants of a DP by overriding an already defined DP specification. The DPF tool has been developed to provide an automatic support to the approach. The performed experiments confirmed the ease of use of the DSL to specify a DP model and its variants as well as the correctness of the specifications. The experiments showed the high accuracy of the approach in detecting the DP instances in a system, allowing to distinguish among the DP variants too. In particular, the approach has been assessed by applying the DPF tool first to seven systems of an open benchmark proposed in [32] and [2], and then to five additional systems to compare the results from DPF with the ones obtained from the DPD approach. It is worthwhile to highlight the effectiveness of DPF in detecting DP variants that were not identified by the benchmark or the DPD tool. The average values of precision and recall, evaluated using the GS composed starting from the considered benchmarks, are good, independently of system size. For the latter group of five systems, the results show that DPF performs better and is more efficient than the DPD. As future work, DPF approach will be further improved and a deep comparison with other available approaches and tools will be performed. Future work thus involves the improvement of the metamodel in order to consider a wider set of properties to allow modeling of more complex design and architectural patterns also experimenting with anti-patterns mining.

APPENDIX - PATTERNS SPECIFICATIONS EXAMPLES In the following some DSL specifications and DPGs of of the most relevant DPs and their variants are provided. For sake of brevity and not to bury readers, all the other DSL specifications and DPGs of the remaining DPs stored in the DP catalog repository are not described here, but they are available at our repository [3]. 8.1. Object and Class Adapters The Adapter pattern has two “classic” variants [29], since it can be implemented using inheritance (the class Adapter) or composition (the object Adapter). c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

pattern adapterObject { type ADAPTER( ∗ ) { {

has f i e l d f of −type ADAPTEE ; has constructor c { has param p of type ADAPTEE ; set − f i e l d f ; }

has methods−set a d ap t e rS e t each { delegates to f i n adapteeSet ; }

}

type ADAPTEE( 1 ) { has methods−set adapteeSet ;

}

}

p a t t e r n ad a p t e r C l a s s extends a d a p t e r O b j e c t { override ADAPTER( ∗ ) { i n h e r i t s −from ADAPTEE ; }

override methods−set a d a p t e r S e t each { delegates to methods−set adapteeSet i n ADAPTEE ;

}

}

Figure 13. The DSL specifications of adapter Object- and Class- variants.

Figure 14. Object Adapter and its Class variant specification graphs

The Class Adapter, showed in figures 13 (DSL) and 14 (DPG), is based on inheritance (delegates to inherited methods) while the object Adapter exploits composition. In the DSL specification of the Object Adapter, two template types are defined: an Adapter role ADAPTER incapsulating a reference to an ADAPTEE type. The ADAPTER must define a field f of type ADAPTEE and must implement the methods of the ADAPTER interface in terms of methods of the ADAPTEE one (referring to the adapterSet methodset defined in the specification, each method of the set must delegate to a method of “f” field). The adapterClass instead, is specified as a variant of the adapterObject; in this case ADAPTER is forced to inherit from ADAPTEE and adapterSet is built considering the methods inherited from ADAPTEE. However, in real world systems the adapter pattern can be found in several forms that are slightly different from the models of Figure 13. It is often implemented for generic types, using inner classes or both. In our catalog we have three specifications to cover such implementation. Figure 15 reports the DSL of the inner class Adapter variant. It inherits from the classic Object Adapter specification and introduces the structural elements to detect the adapter as an inner generic type. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

31

p a t t e r n A d a p t e r I n n e r G e n e r i c C l a s s e s extends Ob j ec tA da p te r { override ADAPTER( ∗ ) { g e n e r i c on type T ; }

override ADAPTEE( 1 ) { g e n e r i c on type T ;

}

type CLIENT ( ∗ ) { method c l i e n t { new O b j e c t A d a p t er on type T }

} }

Figure 15. The DSL of Inner-class Generic Object Adapter variant

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

p a t t e r n Composite { type C( 1 ) { has method a { has r e t u r n type void has param c of −type C }

}

}

has method r { has r e t u r n type void has param c of −type C

has methods−set componentSetC

type LF ( ∗ ) { i n h e r i t s −from C has−n o t container co of −type C }

type CM( ∗ ) { i n h e r i t s −from C has container cm of −type C has methods−set componentSetCM each { delegates −to cm }

}

}

Figure 16. The Composite DSL specification

8.2. Composite The Composite DSL specification models the Component hierarchy using three roles: Leaf (LF), Component (C) and CM (Composite). Both Leaf (LF) and Composite (CM) must inherit the Component role. While LF has inner structure (no reference to inner components), CM must own a reference to a collection (named “cm” in the specification) having C as base type. Each method of the composite CM (referred to as “componentSet”) delegates to container methods (to allow adding/removing/iterating over inner components). For Composite, DSL specification and DPGs are respectively reported in figures 16 and 17. In Figure 17, the labels cS1 and cS2 stand respectively for componentSetC and componentSetCM. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

32

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 17. The Composite DPG

cS1 is type of

C

a

is type of

CM

is type of

cm

delegate to m

has not c

r

co

LF

cS2 delegate to

Figure 18. The Decorator DPG

1 2 3 4 5 6 7 8 9 10 11

p a t t e r n D e c o r a t o r extends Composite { ... override CM { has methods−set componentSetCM each { delegates to cm delegates to m i n C. componentSetC

}

}

}

}

}

Figure 19. An excerpt of the Decorator DSL.

8.3. Decorator The Decorator pattern can be represented as a variant of the described Composite. The added elements are represented in red in Figure 18 while an excerpt of its DSL is reported in Figure 19. The most relevant change in this variant is related to the componentSetCM method set that requires delegation to both the cm field and to the componentSetC (representing the set of method defined by the Component interface). c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

33

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

pattern Singleton { f i n a l type X { X has p r i v a t e constructor c ; X has f i e l d f of type X ; X has public s t a t i c methods−set c re a ti o nH o ok s each { depends on f ;

1 2 3 4 5 6 7 8 9

}

} }

Figure 20. The DSL of the first singleton variant

is type of X

f

X

-c cH

RET

is type of

is container of f

X

c cH

depends on

depends on

f

c cH

depends on Pool

Relaxed Singleton (overriding constructor)

GoF Singleton

*

Figure 21. GoF and relaxed Singleton graphs

8.4. Singleton The Singleton pattern is identified using three variants, whose DPGs are shown in Figure 21. The DSL of the first one, reported in Figure 20, provides a Singleton definition as given in literature [29], implemented with a final class, a private constructor and a public static getter method. To mine multiple instance getters, the variant defines a method set called “creationHooks” (the box labelled by cH in Figure 21). Each method in this set requires a dependency on the static Singleton field “f”. The second relaxed specification, called “relaxed-gof”, removes the private constraint from the constructor (refer to the red block in right part of Figure 21) and final class constraints. Finally the DPG shown on the right side of the Figure 21, models the Pool variant that allows to handle a fixed set of resources. In this case the field “f” is overridden to become a container while the method RET represents the pool’s resource manager. *Bibliography [1] http://www.eclipse.org/modeling/. [2] Comsats institute of information technology. http://research.ciitlahore.edu.pk/Groups/ SERC/DesignPatterns.aspx . [3] https://github.com/UnisannioSoftEng/DPF/wiki/Design-Pattern-Finder-Home. [4] A. Alnusair, T. Zhao, and G. Yan. Automatic recognition of design motifs using semantic conditions. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, pages 1062–1067, New York, NY, USA, 2013. ACM. [5] A. Ampatzoglou, G. Frantzeskou, and I. Stamelos. A methodology to assess the impact of design patterns on software quality. Inf. Softw. Technol., 54(4):331–346, Apr. 2012. [6] G. Antoniol, G. Casazza, M. D. Penta, and R. Fiutem. Object-oriented design patterns recovery. Journal of Systems and Software, 59(2):181–196, 2001. [7] G. Antoniol, R. Fiutem, and L. Cristoforetti. Design pattern recovery in object-oriented software. In Proceedings of the 6th International Workshop on Program Comprehension, IWPC ’98, pages 153–, Washington, DC, USA, 1998. IEEE Computer Society. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

34

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

[8] F. Arcelli and M. Zanoni. A tool for design pattern detection and software architecture reconstruction. Inf. Sci., 181(7):1306–1324, Apr. 2011. [9] F. Arcelli Fontana, A. Caracciolo, and M. Zanoni. Dpb: A benchmark for design pattern detection tools. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on, pages 235–244, 2012. [10] Z. Balanyi and R. Ferenc. Mining design patterns from c++ source code. In Proc. International Conference on Software Maintenance ICSM 2003, pages 305–314, Sept. 22–26, 2003. [11] I. Bayley and H. Zhu. On the composition of design patterns. In Quality Software, 2008. QSIC ’08. The Eighth International Conference on, pages 27 –36, aug. 2008. [12] F. Bergenti and A. Poggi. Improving uml designs using automatic design pattern detection. In Proc. 12th. International Conference on Software Engineering and Knowledge Engineering (SEKE 2000, pages 336–343, 2000. [13] M. Bernardi, C. M., and D. L. G.A. A model-driven graph-matching approach for design pattern detection. In 20th Working Conference on Reverse Engineering (WCRE), pages 172–181, 2013. [14] M. L. Bernardi and G. A. Di Lucca. Model-driven detection of design patterns. In Proceedings of the 26th IEEE International Conference on Software Maintenance, ICSM ’10, September 12-18, Timioara, Romania, 2010. [15] D. Beyer. Relational programming with crocopat. In Proceedings of the 28th international conference on Software engineering, ICSE ’06, pages 807–810, New York, NY, USA, 2006. ACM. [16] D. Beyer and C. Lewerentz. Crocopat: efficient pattern analysis in object-oriented programs. pages 294–295, 2003. [17] A. Binun and G. Kniesel. Dpjf - design pattern detection with high accuracy. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on, pages 245–254, 2012. [18] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Behavioral pattern identification through visual language parsing and code instrumentation. In Proceedings of the 2009 European Conference on Software Maintenance and Reengineering, CSMR ’09, pages 99–108, Washington, DC, USA, 2009. IEEE Computer Society. [19] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Design pattern recovery through visual language parsing and source code analysis. Journal of Systems and Software, 82(7):1177 – 1193, 2009. [20] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. An eclipse plug-in for the detection of design pattern instances through static and dynamic analysis. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–6, 2010. [21] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Improving behavioral design pattern detection through model checking. In Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on, pages 176–185, 2010. [22] A. De Lucia, V. Deufemia, C. Gravino, M. Risi, and G. Tortora. An eclipse plug-in for the identification of design pattern variants. In Sixth Workshop of the Italian Eclipse Community (Eclipse-IT 2011, pages 40–51, 2011. [23] J. Dong and Y. Zhao. Experiments on design pattern discovery. In Predictor Models in Software Engineering, 2007. PROMISE’07: ICSE Workshops 2007. International Workshop on, pages 12–12, May 2007. [24] J. Dong, Y. Zhao, and T. Peng. Architecture and design pattern discovery techniques - a review. In H. R. Arabnia and H. Reza, editors, Software Engineering Research and Practice, pages 621–627. CSREA Press, 2007. [25] J. Dong, Y. Zhao, and Y. Sun. A matrix-based approach to recovering design patterns. Trans. Sys. Man Cyber. Part A, 39(6):1271–1282, Nov. 2009. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

35

[26] F. A. Fontana, M. Zanoni, and S. Maggioni. Using design pattern clues to improve the precision of design pattern detection tools. Journal of Object Technology, 10:4: 1–31, 2011. [27] R. France, D. Kim, S. Ghosh, and E. Song. A uml-based pattern specification technique. Software Engineering, IEEE Transactions on, 30(3):193 – 206, march 2004. [28] L. J. Fulop, T. Gyovai, and R. Ferenc. Evaluating c++ design pattern miner tools. In Proceedings of the Sixth IEEE International Workshop on Source Code Analysis and Manipulation, SCAM ’06, pages 127–138, Washington, DC, USA, 2006. IEEE Computer Society. [29] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable objectoriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. [30] H. A. Ghulam Rasool. Discovering variants of design patterns. Journal of Basic and Applied Scientific Research, pages 139–147, 2013. [31] M. Goldstein and D. Moshkovich. System grokking: a novel approach for software understanding, validation, and evolution. In Proceedings of the 7th international conference on Next generation information technologies and systems, NGITS’09, pages 38–49, Berlin, Heidelberg, 2009. SpringerVerlag. [32] Y. G. Gu´eh´eneuc. P-mart: Pattern-like micro architecture repository,. In Proceedings of the 1st EuroPLoP Focus Group on Pattern Repositories. Michael , Aliaksandr Birukou, and Paolo Giorgini, 2007, http://www.ptidej.net/tool/designpatterns/. [33] Y. G. Gueheneuc and G. Antoniol. Demima: A multilayered approach for design pattern identification. IEEE Transactions on Software Engineering, 34(5):667–684, 2008. [34] Y. G. Gu´eh´eneuc, J. Y. Guyomarc’H, and H. Sahraoui. Improving design-pattern identification: a new approach and an exploratory study. Software Quality Control, 18(1):145–174, Mar. 2010. [35] Y. G. Gu´eh´eneuc, H. A. Sahraoui, and F. Zaidi. Fingerprinting design patterns. In 11th Working Conference on Reverse Engineering (WCRE 2004), pages 172–181, 2004. [36] A. L. Guennec, G. Sunye, and J. marc Jezequel. Precise modeling of design patterns. In In Proceedings of UML00, pages 482–496. Springer Verlag, 2000. [37] J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism sdf reference manual. SIGPLAN Not., 24(11):43–75, Nov. 1989. [38] D. Heuzeroth, T. Holl, G. H¨ogstr¨om, and W. L¨owe. Automatic design pattern detection. In Proceedings of the 11th IEEE International Workshop on Program Comprehension, IWPC ’03, pages 94–, Washington, DC, USA, 2003. IEEE Computer Society. [39] H. Huang, S. Zhang, J. Cao, and Y. Duan. A practical pattern recovery approach based on both structural and behavioral analysis. Journal of Systems and Software, 75(12):69 – 87, 2005. Software Engineering Education and Training. [40] O. Kaczor, Y. Gueheneuc, and S. Hamel. Efficient identification of design patterns with bit-vector algorithm. In Proc. 10th European Conference on Software Maintenance and Reengineering CSMR 2006, pages 10 pp.–184, 2006. [41] R. K. Keller, R. Schauer, S. Robataille, and B. Lagu¨e. Advances in software engineering. chapter Pattern-based design recovery with SPOOL, pages 113–135. Springer-Verlag New York, Inc., New York, NY, USA, 2002. [42] H. Kim and C. Boldyreff. A method to recover design patterns using software product metrics. In Proceedings of the 6th International Conerence on Software Reuse: Advances in Software Reusability, ICSR-6, pages 318–335, London, UK, UK, 2000. Springer-Verlag. [43] C. Kramer and L. Prechelt. Design recovery by automated search for structural design patterns in object-oriented software. In Proceedings of the 3rd Working Conference on Reverse Engineering (WCRE ’96), WCRE ’96, pages 208–, Washington, DC, USA, 1996. IEEE Computer Society. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

36

M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

[44] M. P. L. Prechelt, B. Unger-Lamprecht and W. Tichy. Two controlled experiments assessing the usefulness of design pattern documentation in program maintenance. IEEE Trans. Softw. Eng., 28(6):595–606, 2002. [45] K. N. Loo and S. Lee. Representing design pattern interaction roles and variants. In Computer Engineering and Technology (ICCET), 2010 2nd International Conference on, volume 6, pages V6– 470–V6–474, 2010. [46] K. N. Loo, S. P. Lee, and T. K. Chiew. Uml extension for defining the interaction variants of design patterns. Software, IEEE, 29(5):64–72, 2012. [47] J. K. Y. Ng, Y. G. Gueheneuc, and G. Antoniol. Identification of behavioural and creational design motifs through dynamic analysis. J. Softw. Maint. Evol., 22(8):597–627, Dec. 2010. [48] J. Niere, W. Sch¨afer, J. P. Wadsack, L. Wendehals, and J. Welsh. Towards pattern-based design recovery. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 338–348, New York, NY, USA, 2002. ACM. [49] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, and A. I. Verkamo. Software metrics by architectural pattern mining. In Proceedings of the International Conference on Software: Theory and Practice (16th IFIP World Computer Congress, pages 325–332, 2000. [50] T. Peng, J. Dong, and Y. Zhao. Verifying behavioral correctness of design pattern implementation. In Proceedings of the Twentieth International Conference on Software Engineering & Knowledge Engineering (SEKE’2008), pages 454–459, 2008. [51] N. Pettersson, W. Lowe, and J. Nivre. Evaluation of accuracy in design pattern occurrence detection. IEEE Trans. Softw. Eng., 36(4):575–590, July 2010. [52] I. Philippow, D. Streitferdt, M. Riebisch, and S. Naumann. An approach for reverse engineering of design patterns. Software and System Modeling, 4(1):55–70, 2005. [53] G. Rasool and P. M¨ader. Flexible design pattern detection based on feature types. In 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, pages 243–252, 2011. [54] G. Rasool, I. Philippow, and P. M¨ader. Design pattern recovery based on annotations. Adv. Eng. Softw., 41(4):519–526, Apr. 2010. [55] G. Rasool and D. Streitfdert. A survey on design pattern recovery techniques. IJCSI International Journal of Computer Science Issues, 8(2):251 – 260, 2011. [56] N. Shi and R. A. Olsson. Reverse engineering of design patterns from java source code. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages 123–134, Washington, DC, USA, 2006. IEEE Computer Society. [57] J. M. Smith and D. Stotts. Spqr: flexible automated design pattern extraction from source code. In Proc. 18th IEEE International Conference on Automated Software Engineering, pages 215–224, Oct. 6–10, 2003. [58] K. Stencel and P. Wegrzynowicz. Detection of diverse design pattern variants. In Software Engineering Conference, 2008. APSEC ’08. 15th Asia-Pacific, pages 25–32, Dec 2008. [59] K. Stencel and P. Wegrzynowicz. Implementation variants of the singleton design pattern. In Proceedings of the OTM Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: 2008 Workshops: ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET, OnToContent + QSI, ORM, PerSys, RDDS, SEMELS, and SWWS, OTM ’08, pages 396–406, Berlin, Heidelberg, 2008. Springer-Verlag. [60] P. Tonella, M. Torchiano, B. Du Bois, and T. Syst¨a. Empirical studies in reverse engineering: state of the art and future trends. Empirical Softw. Engg., 12(5):551–571, Oct. 2007. [61] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis. Design pattern detection using similarity scoring. IEEE Trans. Softw. Eng., 32(11):896–909, Nov. 2006. c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH

[62] M. Vok´ac. An efficient tool for recovering design patterns from c++ code. Technology, 5(1):139–157, 2006.

37

Journal of Object

[63] M. von Detten and S. Becker. Combining clustering and pattern detection for the reengineering of component-based software systems. In Proceedings of the joint ACM SIGSOFT conference – QoSA and ACM SIGSOFT symposium – ISARCS on Quality of software architectures – QoSA and architecting critical systems – ISARCS, QoSA-ISARCS ’11, pages 23–32, New York, NY, USA, 2011. ACM. [64] Y. Wang and J. Huang. Formal modeling and specification of design patterns using rtpa. IJCINI, pages 100–111, 2008. [65] P. Wegrzynowicz and K. Stencel. Relaxing queries to detect variants of design patterns. In Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on, pages 1571–1578, 2013. [66] L. Wendehals. Improving design pattern instance recognition by dynamic analysis. In Proceedings of the 2003 International Workshop on Dynamic Systems Analysis, WODA ’03, pages 29–32. ACM, 2003. [67] L. Wendehals and A. Orso. Recognizing behavioral patterns atruntime using finite automata. In Proceedings of the 2006 International Workshop on Dynamic Systems Analysis, WODA ’06, pages 33–40, New York, NY, USA, 2006. ACM.

c 2013 John Wiley & Sons, Ltd. Copyright ⃝ Prepared using smrauth.cls

J. Softw. Maint. Evol.: Res. Pract. (2013) DOI: 10.1002/smr

Suggest Documents