focus
software evolution
An Architecture-Driven Modernization Tool for Calculating Metrics Javier Luis Cánovas Izquierdo and Jesús García Molina, University of Murcia
A case study shows how ArchitectureDriven Modernization (ADM) facilitated interoperability among modernization tools by defining metamodels to generate PL/SQL trigger metrics in legacy Oracle Forms applications.
M
odel-driven software development (MDD) is gaining increasing acceptance, mainly because it can raise the level of abstraction and automation in software construction. MDD techniques (see the sidebar “MDD Basic Concepts”), such as metamodeling and model transformation, not only apply to the creation of new software systems but also can be used to evolve existing systems. These techniques can help reduce software evolution costs by automating many basic activities in software change processes, such as representing source code at a higher level of abstraction, providing information to analyze the impact of the changes, or automatically generating software artifacts of the evolved system. Several experiences of applying MDD in platform migration scenarios have recently been published,1,2 but they define ad hoc metamodels that hinder interoperability. In 2003, the Object Management Group (OMG) launched the Architecture Driven Modernization (ADM) initiative for applying MDD technology in software modernization (http://adm. omg.org). ADM’s objective is to develop a set of standard metamodels to represent the information involved in a modernization process to facilitate interoperability among tools. In ADM, modernization refers to understanding and evolving existing software assets to maintain their business value. Just as OMG’s Model Driven Architecture (MDA) approach has made a great contribution to MDD’s
0 74 0 -74 5 9 / 10 / $ 2 6 . 0 0 © 2 0 10 I E E E
popularity (www.omg.org/mda), ADM standards could play an important role in promoting the application of MDD to software evolution. To the best of our knowledge, no one has published any practical experiences of applying ADM. This lack of information can be justified because the first implementation of ADM’s core metamodel, the Knowledge Discovery Metamodel (KDM), was delivered in 2008. Several important modernization-tool vendors are working on ADM and have presented some KDM examples (http:// kdmanalytics.com/kdmexamples/index.php), but they’re very simple and aren’t part of a detailed case study. We put ADM into practice by building a modernization tool to generate metric reports of legacy Oracle Forms��������������������������� applications to assess migration efforts. This case study illustrates the two main tasks in an ADM process: KDM ������������� model extraction and using these models to generate the July/August 2010 I E E E S O F T W A R E
37
MDD Basic Concepts The three principal features of model-driven software development (MDD) are ■■ domain-specific languages (DSLs) that express models at different abstraction levels, ■■ DSL abstract and concrete (that is, notation) syntaxes that are defined separately, and ■■ model transformations for generating software artifacts (for example, code) from models, either directly by model-to-text transformations or indirectly by intermediate model-to-model transformations. An abstract syntax is normally defined by a metamodel that uses a metamodeling language to describe the set of language concepts and their relationships. Metamodeling languages typically provide object-oriented constructs with which to build metamodels. The relationship between a model and its metamodel is commonly referred to as “conforms-to.”
artifacts involved in a modernization process. We describe how model transformations (code-tomodel, model-to-model and, model-to-text) can automate these tasks.
Metamodels in ADM
The ADM initiative includes the definition of seven metamodels, although only three are currently available: version 1.1 of the KDM and beta versions of the Abstract Syntax Tree Metamodel (ASTM) and the Software Measurement Metamodel (SMM). The other four are under development (analysis program, visualization, refactoring, and transformation). ASTM and KDM complement each other in modeling software systems’ syntax and semantics. Whereas ASTM lets you use Abstract Syntax Trees (ASTs) to mainly represent the source code’s syntax, KDM lets you represent semantic information about a software system, ranging from source code to higher-level abstractions such as GUI events, platforms, or business rules. To promote reuse, the ASTM metamodel has two parts. A base metamodel called the Generic Abstract Syntax Tree Metamodel (GASTM) factors common elements of most programming languages, and a separate metamodel called the Specialized Abstract Syntax Tree Metamodel (SASTM) represents the programming languages’ specific properties. On the other hand, KDM is the principal metamodel of ADM because it provides a common interchange format intended to represent existing software assets, thus allowing tool interoperability. KDM is organized in several domains or packages that each correspond to an architectural view of the system (for example, platform, user 38
IEEE SOFT WARE
w w w. c o m p u t e r. o rg /s o f t w a re
interface, or data). These packages are grouped into four abstraction layers to improve modularity and separation of concerns: infrastructure, program elements, runtime resource, and abstractions. Moreover, because some modernization activities (for example, static analysis) require a precise semantic of the source code statements, KDM specification includes the additional micro-KDM package that provides a predefined non-languagespecific semantic. SMM is a metamodel that can represent both metrics and measurements. It includes a set of elements to describe the metrics in KDM models and incorporates elements to represent such metrics’ measurements. SMM requires an execution engine to perform metrics on KDM models, but none are available at this time. KDM and ASTM define several conformance levels. Each level specifies which models a tool must import or export to comply with that level. In KDM, level 0 (L0) ensures compliance for the packages of both infrastructure and program element layers.�������������������������������������������� Level 1 (L1) extends L0 and is defined particularly for each package of both runtime resource and abstraction layers and micro-KDM. A tool is level 2 (L2) compliant when it conforms to L1 for all KDM packages. In ASTM, L0 ensures compliance for the syntax elements of the GASTM and SASTM metamodels, whereas a tool is L1 compliant when it supports the semantic elements (for example, information on scope and references among elements).
Overview of a Modernization Tool and Process in ADM
We’ve chosen a practical example that comes from our experience in a modernization project migrating Oracle Forms applications to a Java platform. An important part of the migration involves PL/ SQL triggers in legacy Forms code. From our experience, a major factor that determines the time and effort required for the migration of a trigger is its coupling to the user interface (UI)—that is, the number and kind of statements for accessing the UI. A tool to analyze this coupling would therefore be useful for estimating migration costs. We built such a tool (http://modelum.es/gra2mol/metrics-adm), which serves as a running example in this article. Extracting KDM models from source code in a general-purpose programming language (GPL) is a key activity in the application of ADM because it provides models that represent the existing source code at a high level of abstraction (that is, a reverse-engineering process). These models are the starting point for the automation of modernization
process activities because they can be transformed into artifacts such as reports for decision making (for example, metrics or control flow graphs), architectural views, or source code. We therefore formed our tool into two components: an extractor that generates KDM models from PL-SQL code and a metrics report generator for these KDM models (see Figure 1). We built these components on the Agile Generative Environment (AGE), which provides domain-specific languages (DSLs) for basic MDD tasks3 —for example, RubyTL4 for model-to-model (M2M) transformations and Textplate for model-to-text (M2T) transformations. Extracting models from GPL code requires a model that conforms to a target metamodel (for example, KDM) from a program that conforms to a grammar (for example, PL-SQL). Developers usually build dedicated parsers to perform this code-to-model (C2M) transformation. However, when we faced the problem of extracting models from Java code, we decided that a DSL might make this task easier and more productive. So, we extended AGE with Gra2MoL (Grammar to Model Language),5 a rule-based DSL like RubyTL, that integrates a query language specially tailored to navigate ASTs. Before describing how we implemented each component, we provide an overview of an ADM modernization process (see Figure 2). You should split the KDM model extraction process into two steps because the abstraction gap between source code and KDM is high. First, you should write a C2M transformation to extract ASTM models from code, and then you should write M2M transformations to generate KDM models from these intermediate models. The data structures most commonly used by modernization tools to represent source code are syntax trees, and ADM offers the ASTM metamodel to represent such structures as models. For the PL/SQL-to-KDM extractor, we used Gra2MoL to extract an ASTM model from PL/SQL code, particularly from PL/SQL triggers, and we used RubyTL to write the M2M transformations aimed to obtain the KDM models (see Figure 2). Once you have KDM models, you must define mappings among existing system elements and target elements to generate the desired solution—for instance, PL-SQL metrics reports in our tool. These mappings are implemented by M2M transformations, whose target models conform to either KDM or other ADM metamodels representing some sort of metadata related to modernization (for example, metrics or visualization data). However, the current ADM status might force us to define ad hoc
KDM extractor Source code (PL/SQL)
Metric generator
KDM model
ADM-PLSQL-Tool Gra2MoL
RubyTL
Metric report
Texplate
Agile Generative Environment
metamodels for some metadata. For instance, in our metric generator, we defined our own Metrics metamodel to account for each trigger’s coupling because the ADM metamodel for metrics, SMM, still requires the definition of a DSL (that is, the concrete syntax and the DSL execution engine) to specify metrics and calculate measurements on KDM models. We obtained the reports using RubyTL for KDM-to-Metrics transformations, and Texplate to generate reports from Metrics models (see Figure 2b). Although extracting ASTM models and their transformation into KDM models is common in any ADM-based modernization project, what’s done with the KDM models is different depending on the automated activity. For instance, to generate source code in a migration project, M2M transformations should transform extracted KDM models into KDM models representing the target architecture, and then apply M2T transformations.
Figure 1. An overview of the developed tool. The tool uses ADM metamodels for generating metrics reports from PL/ SQL source code automatically.
ASTM Model Extraction
To extract ASTM models from GPL code, you must apply a C2M transformation to establish a mapping between the GPL’s grammar and the ASTM metamodel, which is a combination of the GASTM metamodel and the SASTM metamodel of the GPL. You must therefore create the SASTM metamodel before you implement this transformation. In the case of our tool, the SASTM metamodel extends the GASTM metamodel ����������������� to represent specific PL/SQL concepts, such as document description language statements. Figure 3 shows an excerpt of these metamodels in which SASTM metaclasses inherit from GASTM metaclasses. When extracting models from GPL code, the main task is collecting scattered information for creating the model elements that represent the source code statements. This scattering is caused mainly by the way GPL source code represents references between elements. Whereas such references are explicit in the models, they’re implicitly established in the source code through the use of identifiers, such as the reference between a variable identifier and its July/August 2010 I E E E S O F T W A R E
39
GASTM metamodel
(PL/SQL) SASTM metamodel
KDM metamodel
Metrics metamodel
ASTM Metamodel c2m (Gra2MoL)
ASTM model
m2m (RubyTL) KDM model
m2m (RubyTL)
KDM model extraction process
Source code (PL/SQL)
m2t (Texplate) Metrics model Metric report generator
Metric report
Figure 2. Architecture Driven Modernization process. It comprises two main steps: the Knowledge Discovery Metamodel (KDM) model extraction and metric report generation. The information items in parentheses link the process to our tool example.
GASTMSyntaxObject
DefinitionObject
Type
DeclarationOrDefinition
Datatype
Declaration
Expression
Statement
PrimitiveType
OtherSyntaxObject
IfStatement
FunctionCallExpression
LoopStatement
BinaryExpression
Definition GASTM RDBTableType
RDBSelectStatement
RDBTableDefinition
RDBColumnType
RDBModifyStatement
RDBColumnDefinition
RDBDatabaseType
RDBInsertStatement
RDBSelectExpression RDBHostVariableExpression
PL/SQL SASTM Figure 3. An excerpt of the Generic Abstract Syntax Tree Metamodel (GASTM) and PL/SQL Specialized Abstract Syntax Tree Metamodel (SASTM) metamodels. It shows only the syntax elements.
declaration. Transforming an identifier-based reference into an explicit reference involves looking for the identified element in the source code. Figure 4 illustrates the scattering problem when extracting model elements from a PL/SQL variable. This scattering problem requires complex processing to locate the correspondences between source code and model elements. To facilitate this, Gra2MoL incorporates a powerful XPath-like query language specially built for resolving references. This language lets you express which elements you want to collect without specifying how to obtain them. Moreover, by using this query language, we’ve obtained L1 ASTM models because 40
IEEE SOFT WARE
w w w. c o m p u t e r. o rg /s o f t w a re
it lets us establish references among elements (see query examples at http://modelum.es/gra2mol/ metrics-adm).
Generating KDM Models
We generated KDM models from ASTM models by applying a chain of M2M transformations. First, an initial transformation generates an L0 KDM model from the ASTM model. Then you can convert this generated model into one or more L1 KDM models, depending on which architectural views of the legacy system you need to achieve the desired results. In our case, an L1 micro-KDM compliant model was sufficient, because this level
DECLARE found boolean; BEGIN ...
: ExpressionStatement
operator
rightOperand leftOperand
: Equal
found := FALSE; ... END : VariableDeclaration
: IdentifierReference name
: Literal
type : Name
identifierName : Name
nameString = found
value = "False"
nameString = found : UnnamedTypeReference
type
: Boolean
Figure 4. Example of the scattering problem in a PL/SQL code snippet. The references between elements are implicitly established through the use of identifiers. The only reference in this example appears as a filled red arrow. Figure 5. Model examples used to measure metrics. (a) The Metrics metamodel. (b) The Knowledge Discovery Metamodel (KDM) example for the :WCGA := NULL statement. (c) A metrics model example for the KDM model shown.
0..* Measurement name : String measurements ComplexMeasurement
ValueMeasurement value : String
(a)
LinkMeasurement elements 0..* ModelElement (from KDM::Core)
: ActionElement
: complexMeasurement
kind = assignment
name = "Imperative Writings"
codeElement
measurements
from : Value name = NULL from
: Writes
to
: ValueMeasurement value = 1 : StorableUnit
(b)
provides the models required to perform the measurements on PL/SQL (that is, an abstract representation of the code). The L0 KDM model consists of two models related to the ADM infrastructure layer: code and inventory models. Whereas the������������������ code model represents the existing software system’s source code, the inventory model is a catalog of the system’s software artifacts (for example, physical files and directories). The KDM metamodel doesn’t contain elements to represent specific statements or expressions of a particular programming language, but represents them as ActionElement elements. The type
: LinkMeasurement
name = :WCGA (c)
elements
of operations an ActionElement performs is established by the kind attribute, whose values (that is, strings) can be those that the micro-KDM package specifies—for example, assignments and binary operators (see Figure 5b). So, we obtained an L1 microKDM compliant model by filling the kind attribute of each ActionElement element in an L0 KDM model. In our KDM extractor, we used RubyTL to obtain L1 micro-KDM compliant models from ASTM models. To achieve a separation of concerns, we exploited the RubyTL modularity mechanism4 to separate building the code and inventory models into two phases. The first phase July/August 2010 I E E E S O F T W A R E
41
Coupling distribution
100
Triggers (%)
90 60 40 20 0
Expense
Exams
Europrojects
R/W Imperative
31.25
60.00
93.33
R/W Declarative
18.75
42.86
22.67
Reflective
0.00
0.00
16.00
Figure 6. The coupling distribution of three Oracle Forms of a student management system. Each bar indicates the proportion of triggers that contain a particular kind of coupling. A trigger can contain more than one type of coupling.
generates the L0 code model and assigns microKDM actions to the ���������������������������� kind������������������������ attribute. The main issue to address in this phase is the creation of the corresponding ActionElement for statements and expressions. The second phase generates the inven������ tory model, which is made from the triggers of the PL/SQL code so that each trigger refers to its implementation in the code model.
Using KDM Models to Measure Metrics
Once we have KDM models, we have an appropriate representation of the existing system to generate artifacts related to the modernization process. In our case, we used L1 micro-KDM compliant models to obtain metrics on the coupling between the code and the UI in PL-SQL triggers of Oracle Forms applications. We defined several metrics to measure the coupling that influences the effort of migrating triggers (the more coupling, the more difficult this is). These metrics are based on the UI statements’ count, location, and type (reading and writing). In accordance with the PL-SQL construct for accessing the UI, we classified the coupling in three categories: reflective (for example, the use of NAME_IN or COPY functions), declarative (for example, variables in a select statement), and imperative (for example, an assignment statement). The reflective coupling is the most difficult to migrate because it implies studying the source code’s runtime context. So, you must locate the reflective and UI reading and writing statements in each trigger, and determine whether the operation is performed in an imperative or declarative statement for the latter. The extracted KDM models—namely the code model—make these computations easier because the readings and writings are identified by elements that represent such operations (that is, ele42
IEEE SOFT WARE
w w w. c o m p u t e r. o rg /s o f t w a re
ments of the types Read and Write, respectively). On the other hand, you must ����������������������� analyze the kind attribute of the action element to distinguish between an imperative and a declarative ����������������� statement�������� . We located the reflective statements by looking for elements that represent function calls in the source code (that is, elements of the type Call). We implemented all these computations by using a RubyTL transformation definition that converts the extracted KDM model into a Metrics model (Figures 5a through c shows the metrics metamodel and an example model). Finally, to visualize the data, we converted these metrics models into a commaseparated value (CSV) file by using an M2T transformation in Texplate. We applied these metrics to the PL/SQL code of a legacy Forms application used in the University of Murcia’s student management system. Figure 6 shows the coupling detected in the triggers of three forms. The information visualized helps us understand how difficult the migration of each form is. In this case, the Europrojects form would be more difficult to migrate than the other two forms, if we consider the code-UI coupling, but we must also take into account other aspects such as the form’s size and complexity.
Lessons Learned
A good knowledge of the source grammar and ASTM and KDM metamodels is necessary to specify the code-to-ASTM and ASTM-to-KDM mappings involved. KDM is a very large metamodel, but the user must deal with only the KDM domains related to the activity to be automated (that is, the compliance level). Moreover, the lack of examples of KDM models makes the understanding of such a metamodel more difficult. ASTM is simpler than KDM, but you must define an SASTM metamodel for your GPL if one doesn’t exist. This step requires an in-depth knowledge of the GPL’s structure to extend the GASTM metamodel. The developer must choose the KDM compliance level, which is determined by the existing system aspects (that is, domains), to be addressed. For instance, our extractor produces only L1 micro-KDM compliant models because the activity the generator performs is analyzing PL-SQL statements. You might require other L1 compliant KDM models in other activities—for instance, data models to improve the data quality. L1 compliance in ASTM makes the ASTM-toKDM transformation easier by adding cross references to the parse tree. You can reduce the effort you need to implement a model extraction in two ways: by using
DSLs instead of ad hoc parsers and reusing grammars. We designed Gra2MoL to construct models from source code represented as a parse tree, and use grammars written in ANTLR (Another Tool for Language Recognition; http://www.antlr. org) format, for which more than 100 grammar definitions exist (for example, we reused the PL/ SQL grammar). The organization of ASTM in the generic and specific metamodels lets you achieve reusable M2M transformations. Because the GASTM metamodel contains elements common to any GPL, you can implement reusable GASTM-toKDM transformations. On the other hand, using a language such as RubyTL, which supports a modularity mechanism, helps you write composable M2M transformations. We used this mechanism to organize the ASTM-to-KDM transformation into two phases and to compose the GASTM-toKDM with the SASTM-to-KDM specific transformation to PL-SQL. In an ADM solution, you should identify separate components that are integrated by exchanging KDM models to promote reuse. We created a producer (that is, extractor) and a consumer (that is, metrics generator) of KDM models that are L1 micro-KDM compliant. We could also integrate them with other micro-KDM compliant tools that could import and export KDM models of PL-SQL code.
T
he two main benefits of ADM in applying MDD to software evolution are that it facilitates integration and interoperability among different vendors’ modernization tools and it provides standard metamodels for basic concerns in understanding and evolving existing systems. Using these metamodels lets developers save the time and effort of creating their own metamodels. Moreover, they’ll use high-quality metamodels created by expert practitioners. However, these metamodels, particularly KDM, are complex and difficult to learn and manage in model transformations, which is a serious obstacle to ADM’s wider use. The availability of KDM examples and ADM case studies could help create an easier learning curve. Because the OMG is still defining the metamodels that will complete the ADM proposal, researchers and industry must make an effort to understand and test metamodels in practice by applying them in modernization projects. They should also focus on adapting existing software evolution techniques and methods to a model
About the Authors Javier Luis Cánovas Izquierdo is a PhD candidate at the University of Murcia.
His research interests are domain-specific languages and model-driven modernization. Izquierdo has a master’s in computer science from the University of Murcia. Contact him at
[email protected].
Jesús García Molina is a full professor in the Department of Computers and Sys-
tems at the University of Murcia, where he leads the Modelum Research Group. His research interests include model-driven development, domain-specific languages, and model-driven modernization. Molina has a PhD in chemistry from the University of Murcia. Contact him at
[email protected].
driven approach and creating bridging among the representation formats used by current tools and the KDM metamodel.
Acknowledgements
The Fundación Séneca (Spain) supported this work through grant 08797/PI/08 and a doctoral grant for Javier Luis Cánovas Izquierdo.
References 1. R. Heckel et al., “Architectural Transformations: From Legacy to Three-Tier and Services,” Software Evolution, T. Mens and S. Demeyer, eds., Springer, 2008, pp. 139–170. 2. T. Reus, H. Geers, and A. van Deursen, “Harvesting Software Systems for MDA-Based Reengineering,” Proc. 2nd European Conf. Model Driven Architecture: Foundations and Applications (ECMDA-FA 06), LNCS 4066, Springer, 2006, pp. 213–225. 3. J. Sánchez Cuadrado and J. García Molina, “Building Domain-Specific Languages for Model-Driven Development,” IEEE Software, vol. 24, no. 5, 2007, pp. 48–55. 4. J. Sánchez Cuadrado and J. García Molina, “Modu������ larization of Model Transformations through a Phasing Mechanism,” Software and System Modeling, vol. 8, no. 3, 2009, pp 325–345. 5. J. Cánovas and J. García Molina, “A Domain Specific Language for Extracting Models in Software Modernization,” Proc. 5th European Conf. Model Driven Architecture: Foundations and Applications (ECMDAFA 09), LNCS 5562, Springer, 2009, pp. 82-97; http:// adm.omg.org/docs/ecmda09.pdf.
Selected CS articles and columns are also available for free at http://ComputingNow.computer.org. July/August 2010 I E E E S O F T W A R E
43