Mar 31, 2004 - require additional effort, so the investment for such analyses has to be weighted out ...... This framework assures that classes encap- sulating ...
Definition of Reference Architectures based on Existing Systems WP 5.2, Lifecycle and Process for Family Integration
Authors: Joachim Bayer Thomas Forster Dharmalingam Ganesan Jean-François Girard Isabel John Jens Knodel Ronny Kolb Dirk Muthig
Eureka Σ! 2023 Programme, ITEA project ip00004
IESE-Report No. 034.04/E Version 1.0 March 31, 2004 A publication by Fraunhofer IESE
Fraunhofer IESE is an institute of the Fraunhofer Gesellschaft. The institute transfers innovative software development techniques, methods and tools into industrial practice, assists companies in building software competencies customized to their needs, and helps them to establish a competitive market position. Fraunhofer IESE is directed by Prof. Dr. Dieter Rombach Sauerwiesen 6 67661 Kaiserslautern
Abstract
Migration from single system development to product family engineering is a challenging task. The success of a product family depends greatly on the quality of its reference architecture, one of the central artifacts in product family engineering. Therefore, it is important to leverage the experience embodied in successful, existing systems coming from the same domains or from the same development organization in order to achieve high-quality reference architectures. Despite the importance of leveraging existing knowledge, the literature provides limited guidance on how to mine prior, related systems for this specific purpose. This report addresses this issue by introducing the PuLSE™-DSSA1 approach that explicitly takes information from existing systems into account and gives guidance for migrating it into a product family. PuLSE™-DSSA is a method that combines forward engineering design activities together with reverse engineering tasks. In particular, we provide a systematic approach to define the reference architecture integrating experience of existing systems. Therefore, we regard the architecture as the interface between top-down design and bottom-up reverse engineering and as being the communication vehicle among stakeholders. Furthermore, we present the concept of view-based architectures and give an overview of typical views. A selection of beneficial techniques for request-driven reverse architecting in the product family context shows what kind of information can be integrated into the reference architecture. An open source case study then exemplifies how the approach works by analyzing different plug-ins of the Eclipse platform and defining a reference architecture supporting new development activities. In short, this report presents PuLSE-DSSA, an approach that integrates a systematic recovery of information from existing artifacts with activities directly working towards the goals of the overall migration. The recovery activities are thereby fully driven and controlled by the architectural design (i.e., forward engineering activities). Keywords:
architecture analysis, product family architecture, PuLSE-DSSA, reengineering, reference architecture, request-driven reverse architecting, reverse architecting analysis catalogue, reverse engineering, software architecture, software product families, view-based architectures, ITEA 1 PuLSE is a registered trademark of the Fraunhofer Institute for Experimental Software Engineering (IESE). PuLSE stands for Product Line Software Engineering. PuLSE-DSSA is a technical component of PuLSE; DSSA stands for Domain-Specific Software Architecture.
Copyright © Fraunhofer IESE 2004
v
vi
Copyright © Fraunhofer IESE 2004
Table of Contents
1 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.2 1.3 2 2.1 2.2 2.2.1 2.2.2 2.3 2.4 2.5
Copyright © Fraunhofer IESE 2004
Introduction Typical Business Cases Business Case 1: Single System to Product Family Business Case 2: Multiple Systems to Product Family Business Case 3: Product Family to a new Product Family Business Case 4: Existing Product Families to Product Populations Concepts of the Approach Overview of the Remainder of this Document
1 1 2 2 3 3 4 4 6 6 8 8 9 9 15
2.6
Definition of Product Family Architectures Introduction Roles Product Family Architect Reverse Architect Design Reverse Engineering Integration of Forward Engineering and Reverse Engineering Summary
3 3.1 3.2 3.3
View-based Software Architectures Software Architecture as Interface Architecture Descriptions Tailoring Architecture Descriptions
26 26 27 31
4 4.1 4.2 4.2.1 4.2.2 4.2.3 4.3 4.3.1 4.3.2 4.3.3 4.4 4.4.1 4.4.2
Typical Views and their Recovery View Notation Conceptual Views Meta-model Elements of the Conceptual View Recovery Techniques Module Views Meta-model Elements of the Module View Recovery Techniques Code Views Meta-model Elements of the Code View
34 35 35 36 37 38 38 38 40 40 41 41 42
22 25
vii
viii
4.4.3 4.5 4.5.1 4.5.2 4.5.3 4.6 4.6.1 4.6.2 4.6.3 4.7 4.7.1 4.7.2 4.7.3 4.8 4.8.1 4.8.2 4.8.3 4.9 4.9.1 4.9.2 4.9.3 4.10 4.10.1 4.10.2
Recovery Techniques Execution Views Meta-model Elements of the Execution View Recovery Techniques Behavioral Views Meta-model Elements of the Behavioral View Recovery Techniques Build-Time Views Meta-model Elements of the Build-Time View Recovery Techniques Feature Views Meta-model Elements of the Feature View Recovery Techniques Data Structure Views Meta-model Elements of the Data Structure View Recovery Techniques Integration of the Views Motivation Integrating Architectural Views
45 47 48 49 50 51 51 52 52 53 53 55 56 56 57 57 58 58 59 60 60 61 61 63
5 5.1 5.1.1 5.1.2 5.1.3 5.2 5.2.1 5.2.2 5.2.3 5.3 5.3.1 5.3.2 5.3.3 5.4 5.4.1 5.4.2 5.4.3 5.5 5.5.1 5.5.2
Request-driven Reverse Architecting Architecture Comparison Purpose Realization Summary Pattern Completion Purpose Realization Summary Feature Location Purpose Realization Summary SARA: Reconstruction of Modules and Subsystems Purpose Realization Summary Architectural Tracking Purpose Realization
66 67 67 67 75 75 75 76 78 78 78 79 85 86 86 86 93 93 94 96
Copyright © Fraunhofer IESE 2004
5.5.3 5.6 5.6.1 5.6.2 5.6.3 5.7 5.7.1 5.7.2 5.7.3 5.8 5.8.1 5.8.2 5.8.3 5.9 5.9.1 5.9.2 5.9.3 5.10
Copyright © Fraunhofer IESE 2004
5.10.1 5.10.2 5.10.3 5.11 5.11.1 5.11.2 5.11.3
Summary 99 Identifying Reusable Software Components using Metrics 100 Purpose 100 Realization 100 Summary 105 Conceptual View Reconstruction 106 Purpose 106 Realization 106 Summary 108 CaVE – Commonality and Variability Extraction 109 Purpose 109 Realization 109 Summary 112 Synthesizing a Layered Architecture 113 Purpose 113 Realization 113 Summary 114 Recovery of Abstract Data Types and Abstract Data Objects 114 Purpose 114 Realization 115 Summary 122 Interface Analysis 122 Purpose 122 Realization 123 Summary 124
6 6.1
Open Source Case Study The Eclipse Platform
125 125
6.2 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5 6.3.6 6.4 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5 6.4.6 6.4.7
The Plug-in Mechanism The Individual Systems Java Development Tools (JDT) C++ Development Tools (CDT) Cobol Development Tools (CobolDT) KobrA Component Development Tools (KobrA-DT) Frame Processor Development Tools (FP-DT) Motivation for the Reference Architecture Case Study Experiences Infrastructure set up Fact extraction Feature Trace Context Analysis for the Model Element Package Reusable concepts within the Java Model Refined Pattern Completion Conceptual Models of IDE Plug-Ins
126 128 128 128 129 129 130 131 132 132 133 134 137 139 139 144
ix
6.5 6.5.1 6.6
Reference Architecture Generic Architecture Description Case Study Summary
144 145 151
7 7.1 7.2
Conclusion Compliance to Business Cases Outlook
152 153 153
References
x
155
Copyright © Fraunhofer IESE 2004
Introduction
1
Introduction
Software development rarely happens on a green field but it must take already existing software systems into account. For instance, a predecessor system is exploited to define its next generation (partially based on new technology), or a set of independently built systems is merged to share maintenance effort. In all of these situations, existing artifacts (e.g., code, documentation) are often insufficient for realizing such goals. Then, reverse engineering techniques must typically be used to identify the necessary (architectural) information from the existing systems or artifacts. Up-to-date, this identification still is neither trivial nor fully automatable. Hence, the required reverse engineering techniques and technology are still under research and thus no practically useful approaches to be used by non-experts exist. The research activities that lead to this report have been driven by the vision of supporting people in (optimally and practically) exploiting existing software artifacts during the set up of new product families. The overall task of providing the practitioner with a systematic approach to the construction of new systems on the basis of existing ones has proven rather difficult. Our solution is to integrate a systematic recovery of information with the architectural design that directly works towards the desired goals. The report presents PuLSE-DSSA, an approach that realizes the intended combination of forward design and reverse engineering in the development process. Section 1.1 gives an overview of typical situations where the usage of reverse engineering is often required. Section 1.2 introduces the central concepts of PuLSE-DSSA. Section 1.3 gives an overview of the remainder of this report. 1.1
Typical Business Cases Product families are rarely developed independently of any predecessor system. On the contrary, they are often based on some pre-existing systems and need to take some information on these systems into account. We will now describe some typical cases in order to illustrate how our approach can help in integrating reverse engineering and architecture design:
Copyright © Fraunhofer IESE 2004
1
Introduction
1.1.1
Business Case 1: Single System to Product Family In a typical example for reengineering-based product family development, a system already exists which has evolved over several years. While it is still possible to evolve this system over time, it is inappropriate as a basis for a product family. The reasons for this may be one of these: • Only little documentation on the existing system is available. • Although the existing architecture fits the original system’s needs, it does not support new, originally unforeseen, market requirements. As a consequence, its functionality should be systematically repackaged. • The software development technologies that are used do not have the flexibility required by the new product family (e.g., if core functionality of the existing system is implemented in FORTRAN). • The system is getting harder to maintain because of the ongoing degeneration of the structure of the system over time. In order to be successful with further evolution of the system, certain restructuring steps have to be taken. In this case, a thorough analysis of the functionality of the existing system and the mapping of the functionality to code must be performed. This information can be used as a basis for designing a reference architecture, which repackages the same or similar functionality in a systematic manner. Therefore, the architecture will be appropriate to support the variability required by the market.
1.1.2
Business Case 2: Multiple Systems to Product Family A similar, but more complex situation, exists if a number of different systems already exist that were so far developed individually, but should now be integrated into a single product family. This case is more complex than the previous one as information from a number of different sources needs to be integrated. The development of a common platform will usually proceed through the identification of components in different products and their subsequent integration. In some cases, however, the integration effort will be regarded as too high. In this case, first one single system will be chosen as the basis for the product family infrastructure, the others will be integrated later (or never). This can then be mapped to the first business case. As the identified components will be directly integrated into the platform, the necessary information on these components for all relevant views must be made available.
2
Copyright © Fraunhofer IESE 2004
Introduction
1.1.3
Business Case 3: Product Family to a new Product Family An extreme case is when an already existing product family that is not well enough documented cannot be adapted to a change from its environment and a new product family should be developed encompassing the scope of the current family and the new needs. The following example illustrates three types of changes: • The existing product family has to fulfill new business goals, functional requirements or quality attributes for which its reference architecture is not suitable. • The domains covered by the product family should be extended or have been modified (e.g., new laws have come in force and affect the domain) and this domain extension cannot be well integrated in the current reference architecture. • The product family should exploit different technologies that are not compatible with the existing infrastructure. To leverage the experience of the existing product family in the development of the new one, the existing product family architecture has to be analyzed and documented. The analysis should elicit the properties of existing architectures that address the new requirements and identify component candidates for reuse in the new family. The documentation should include the rationales and trade-offs of still relevant strategies and the key success factors of the first product family. With the help of this information, the architect of the new family constructs a reference architecture meeting the new requirements, while reusing as much of the old family’s components as possible.
1.1.4
Business Case 4: Existing Product Families to Product Populations This business case occurs if there are not just a number of individual systems but a number of existing product families. This happens when a software company buys another company, and their existing product families have to be merged in order to have a common basis for further development. The different product families have then to be integrated into one resulting family. The problems in the migration will be amplified compared to the migration of single systems into a product family, because the variability is higher, commonalities of one of the individual families may be reduced in the product population case.
Copyright © Fraunhofer IESE 2004
3
Introduction
1.2
Concepts of the Approach The PuLSE-DSSA approach we describe in this report integrates architecture development with the analysis of existing systems in order to obtain a reference architecture that takes optimal advantage of existing systems. We start with the understanding of the business goals and requirements for the new reference architecture, as this will together with the scope of the product family determine the functionality and qualities that have to be provided. The information from existing systems and experiences made while developing them support the design of the reference architecture. This information is obtained by request-driven reverse architecting activities. To achieve our goals, we apply the following underlying concepts of our approach. Each concept will be elaborated in detail in the remainder of the document. • Top-down design: the architecture is designed from the top-level and detailed in several iterations. • Bottom-up recovery: Information from existing artifacts is first extracted and then abstracted to higher levels. • View-based architectures as interface: architectural views as mean of communication vehicle between design and recovery, and among stakeholders. • Scenario-based reference architecture definition: the reference architecture of a product family is designed and evaluated with the help of prioritized scenarios. • Request-driven reverse architecting: analysis of the systems and their artifacts, performed in a request-driven manner, the right information is provided when it is needed.
1.3
Overview of the Remainder of this Document The above-mentioned typical cases illustrate each from its context, the need for reverse engineering as a part of the migration of software development towards product family engineering. This document presents a selection of techniques to analyze one or more existing systems with the following objectives: • • • •
4
to gather experience to analyze commonalities and variabilities at the architectural level to characterize successful solution patterns to identify reusable candidates.
Copyright © Fraunhofer IESE 2004
Introduction
All of these activities provide inputs for our approach to design reference architectures. The architecture represents the medium to convey experience and success from the past to the product family architecture. In other words, the architecture constitutes the interface between forward engineering and reverse engineering activities. This report contributes by introducing PuLSE-DSSA, an approach to successfully design such reference architectures in a systematic, view-based way by exploiting existing artifacts when beneficial. To explain our approach PuLSE-DSSA, the document is structured mainly along its underlying concepts: we provided exemplary business cases in Section 1.1, which show that the approach is relevant and where it might be applied, Section 2 presents PuLSE-DSSA – our approach for designing reference architecture in the presence of existing systems. Section 3 then describes the concept of view-based architectures, and how to derive an architectural description with the right set of views. Section 4 then gives an overview on typical architectural views and how these views can be recovered. We will on the one hand focus on the well-known Siemens’ view set, and on the other hand on contextspecific views that capture information not covered by the first view set. In Section 5, we present a collection of request-driven reverse architecting analyses, where we emphasize especially on the product family context. The analyses being part of the collection aim at reconstructing partial views on the system containing information that is then processed in order to design the reference architecture of the product family. Such an analysis exploits existing assets (source code and/or other documentation) and produces (architectural) views of the system that capture certain aspects of one individual system or the product family. Experiences gained in a case study with the Eclipse platform are presented in Section 6. Finally, Section 7 summarizes the contribution of this deliverable and draws some conclusions.
Copyright © Fraunhofer IESE 2004
5
Definition of Product Family Architectures
2
Definition of Product Family Architectures
One of the central artifacts of a product family is the underlying reference architecture, the architecture that supports all products of the product family. The architecture is crucial to the success of the product family. In order to design high-quality and future-proof architectures, we propose an approach that incorporates knowledge and experiences contained in existing, successful software systems coming from the same set of domains or developed by the same organization. This section introduces the PuLSE™-DSSA method by describing first the concepts of a view-based architecture, then forward engineering activities for designing reference architectures, and the reverse engineering approach. Finally, it is shown how the integration of both directions, forward and reverse engineering, works. PuLSE™-DSSA is a method to systematically design a reference architecture supported by reverse engineering activities in order to learn and benefit from existing software systems. The exploitation of the given artifacts enables a successful and efficient migration towards product family engineering. 2.1
Introduction Software architectures are an important means to master the complexities that arise in the development and evolution of software systems. There are numerous reasons that make the development and evolution of software systems a complex task, including continuously changing requirements, inconsistent und ambiguous software specifications, or the fact that the application being developed itself is complex. A software architecture is a central artifact in software development that enables communication among the different roles from application and technical domains in a software developing project. An architecture facilitates the assessment of characteristics of the developed software system without being required to wait for it to be actually implemented, including the prediction of quality characteristics of a software system. The different roles in a project can be supported in their tasks by architecture descriptions that enable the analysis of certain aspects of a system in separation based on views and notations that are customized for the respective roles, like for example security or performance. One architectural view describes the software architecture from a certain perspective and contains only information that is relevant from that perspective; all other information is suppressed. The archi-
6
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
tecture is completely described by the composition of the different architecture views. An architectural view is defined by determining the types of relevant components and the possible relations among the components. Examples for architectural views are conceptual, structural or behavioral views. A number of architectural view models have been published, for example in [HNS2000] that presents a view model consisting of conceptual view, module view, execution view, and code view. Another view model has been published in [Kruc1995]. The majority of the published view models concentrates on functional aspects of an architecture. Functional aspects are, however, often not sufficient to describe an architecture completely. Rather, functional aspects need to be augmented with other views to reflect an architecture completely. Examples for additional views are views that capture domain-specific aspects or views that capture quality aspects of the architecture. The view-based description of software architectures has a number of consequences. Each view concentrates on one or only a few aspects of the documented system. Consequently, the complexity of the descriptions in single views and in the complete architecture description is reduced leading to simplified development, usage, and maintenance of the architecture. A higher number of views does, on the other hand, increase the effort for creating an architecture, especially because the different views must be created and kept consistent. To find the right set of views for the description of a product family architecture for the respective context, the organization and its software developing projects must be investigated. The problem is to find a set of views that is neither too small (increasing complexity of single views) nor too large (complicating consistency). All aspects that are relevant for the architecture must therefore be identified and used as a basis to develop a customized set of architecture views. The architecture of a software system describes the systems’ basic organization and structure that is the fundamental parts of the system are described with the relationships among them, as well as relationships among the system and its environment. In the development of a software system, the software architecture is the first artifact that describes the system as a solution to the problems stated in the requirements. In the context of software product families the importance of the software architecture is even higher since the architectures of all members of the software product family are handled together. A product family architecture covers a number of functionally similar systems. To this end common and variable aspects are documented together in one common architecture description.
Copyright © Fraunhofer IESE 2004
7
Definition of Product Family Architectures
When developing such a reference architecture for a software product family, a number of wide-ranging decisions have to be made that have an impact on the success of the product family. The architecture for a software product family can only rarely be developed from scratch, but must take into account the architecture of already existing systems. One reason for this is that a software architecture is more than a simple software development artifact. The system architecture is often reflected in the organizational units that develop these systems. In these cases, there are organizational units (e.g., groups or departments) that are responsible for certain partial systems or components. Consequently, changes to the architecture potentially cause analogous changes the organizational structure, which can be difficult and expensive. Thus revolutionary changes to the software architecture must be evaluated with great care. The different architectures a software developing organization builds are often similar. These similarities are also stable over time. This is due to similarities in the application domain, in the developed systems themselves, as well as in the developed solutions provided by the different systems. Thus, architectures contain recurring strategic solutions; the commonalities in the architectures can be exploited by reusing them in a software product family. 2.2
Roles In order to design a reference architecture based on existing systems, it is required to have knowledge of forward engineering and as well of reverse engineering. We see especially two roles to fulfill these requirements: the product family architect and the reverse architect. Both should work together in close cooperation to be able to interact, to discuss decisions, to clarify open issues, and to give one another feedback.
2.2.1
Product Family Architect The product family architect concentrates on architectural style and principles and describes the boundary between framework and product. The product family architects are responsible to ensure traceability between (product-) requirements and architecture solutions. Furthermore, they also communicate the architecture to the various stakeholders and inform them about architectural changes.
8
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
2.2.2
Reverse Architect The reverse architects have a strong background and knowledge in architectural analyses, reverse engineering, reengineering and reverse architecting. They are responsible to schedule analyses and ensure that the chosen analyses are feasible and reasonable. Moreover, the reverse architects support the integration of analysis results gained from existing systems into the design process of reference architectures.
2.3
Design The development of high-quality and future-proof reference architectures is one of the central challenges when introducing product families. The goal of the architecture development process is the construction of an architecture that appropriately supports the functional, quality, and business goals of the numerous products in the product family and that is documented using a number of architectural views. This section discusses the design of reference architectures with PuLSE-DSSA, an integrated, iterative, and quality-centered method for the design and assessment of product family, or reference, architectures. It is a customizable method developed at Fraunhofer IESE for building product families that covers the whole product family development life cycle and that can be introduced incrementally. The basic ideas of PuLSE-DSSA are to develop a reference architecture incrementally by applying generic scenarios in decreasing order of architectural significance and to integrate assessment into architecture creation. The architecture is described by using different views and the quality of the architecture is monitored and assured in each development iteration. PuLSE-DSSA makes use of information provided in quality models and patterns documented according to the meta-model depicted in Figure 1. A quality model is a description of the quality goals that apply to a product family, of how these quality goals influence each other, and of the architectural means for addressing quality goals. The PuLSE-DSSA method is customizable to different application domains and application contexts. In particular, it is customizable with respect to a number of factors, including the supported architectural views and the processes and techniques used to integrate existing components.
Copyright © Fraunhofer IESE 2004
9
Definition of Product Family Architectures
Quality Model
1
/subcharacteristic
Metric
* * Quality Attribute
1 *
/effects
1
* 1
1
/subgoal
1
* Stimulus
/influence
Scenario
*
*
* Means
Business Goal
1 Response
/satisfies *
/specializes *
/uses Pattern
/conflicts
Problem
Figure 1:
Rationale
Solution
Quality Model Meta-model
Basis for the development of an architecture are a prioritized list of business goals, functional requirements, as well as quality requirements. In the context of product families, also the commonalities and variabilities among those goals and requirements have to be known. In order to ensure that an architecture fulfills all stated requirements, architecture evaluation methods such as SAAM or ATAM [CKK2002] are commonly used. These methods enable organizations to evaluate or to assess a given architecture with respect to relevant quality attributes by using scenarios and involving a number of stakeholders. In current industrial practice, an architecture is typically evaluated only once the architecture creation process has been finished. The problem with this practice, however, is that at this point a large number of design decisions have been made and it therefore might be necessary to make major changes as a result of the architecture evaluation. This particularly holds in the context of software product families in which a large number of stakeholders and concerns has to be taken into account and integrated. In order to avoid making wrong design decisions and to be able to take appropriate action in case of problems, it is therefore necessary to integrate the assessment and analysis of the architecture directly with its
10
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
design. The effort required for an architecture assessment can be minimized by using the scenarios required for the assessment also in the design of the architecture. Scenarios are short textual descriptions of possible interactions with a system, but they also include anticipated changes. They can be compared to use cases that are used to describe functional and non-functional requirements. Generally, scenarios are captured by using stimulus and response pairs. A stimulus is an event that causes an architecture or system to react in some particular manner. The response, on the other hand, describes the activity or property that can be observed as a result of the stimulus. For example, the quality requirement “The system shall be able to provide services even in case of a hardware error” has the stimulus “Hardware Error” and the response “Provision of Services”. In the context of product families, scenarios are generic; that is they not only capture common but also variable requirements of the instances in the product family. The usage of the same scenarios for the design and assessment of an architecture simplifies and accelerates the assessment of an architecture since the scenarios that have to be applied to the architecture during an individual assessment already exist. Besides a continuous assessment of the architecture and quality assurance of the documentation, an iterative approach to the development of an architecture is indispensable in order to handle complexity and to successively consolidate the different requirements and concerns of the involved stakeholders. Furthermore, an iterative approach enables to separate product family related design decisions from product-specific decisions. The architecture is created in a number of iterations by stepwise application of scenarios and by using proven solutions to recurring problems such as architectural patterns. Iterations are performed until all scenarios have been applied and no problems arose from the final assessment of the architecture. The iterative development of an architecture using PuLSE-DSSA follows the Plan-Do-Act-Check paradigm, a general approach for continuous development and quality improvement. [Demi1986]. This approach consists of four consecutive stages: Plan, Do, Check, Act. In the “Plan” stage, the root cause of the problem is determined and then a change or a test aimed at improvement is planned. This change or test is then carried out during the “Do” stage, preferably in a pilot or on a small scale. In the “Check” stage, it is then checked whether the desired result was achieved, what or if anything went wrong, and what was learned. Finally, in the “Act” stage, the change is adopted if the desired result was achieved. If the result was not as desired, however, the cycle is repeated using knowledge obtained from the previous cycle. Adopting the Plan-Do-Act-Check paradigm for the design of reference architectures results in the following four phases: planning of the next iteration (plan), realization and documentation of the scenario application (do), assessment of the architecture (check) and if required refining and revising design decisions as part of the planning of the next iteration (act/plan).
Copyright © Fraunhofer IESE 2004
11
Definition of Product Family Architectures
As shown in Figure 2, the reference architecture design process takes as input a prioritized list of business goals, functional requirements, quality goals, and produces a reference architecture that satisfies the specified goals and requirements and that is documented using a number of previously selected or defined architectural views. In detail, a single iteration, as shown in Figure 2, consists of the following four phases: Planning. The planning phase defines the contents of the current iteration and delineates the scope of the current iteration. This includes the selection of a limited number of scenarios that should be used in the iteration, the identification of the relevant stakeholders and roles, the selection and definition of the views that have to be created, as well as defining whether an assessment of the architecture should be performed at the end of the iteration. The roles determined to be relevant have a direct impact on the selection of scenarios and the views that have to be created. Based on the concerns of the identified stakeholders, scenarios are prioritized and the necessary architectural views selected and defined. The scenarios describe the functional and quality requirements of the product family the architecture is designed for. As the order in which scenarios are addressed is very important, those scenarios that are considered to have the highest significance for the architecture should be selected for the first iteration. In the next iteration, the second most important group is selected and so forth. Prioritizing scenarios should follow a simple, basic rule: the bigger the impact of a scenario on the architecture, the higher the scenario’s priority. However, simple in theory, evaluating the architectural impact of a scenario from its description is a difficult, non-trivial task that generally requires much experience. In the following, some criteria that can be used for prioritizing scenarios are summarized. These criteria can be used as an indication of the expected architectural impact of scenarios. – Economic Value. This is the value from an economic point of view that will be added to the product family in case a scenario will be realized. The product family scope can deliver this information. It is clear that a high economic value is a hint for high scenario priority. – Typicality and Criticality. A scenario is of high typicality when it reflects routine operations whereas a critical scenario occurs in a rather sporadic manner and when the user does not expect it to occur. Typical scenarios should therefore be implemented first. – Future-proof. A scenario is future-proof when it considers possible evolution points. Future-proof scenarios are likely to have a big impact on the architecture and therefore they should be assigned high priorities. – Effort. This is an estimation of the effort required for realizing a scenario. Since the architecture creation plan also assigns responsibilities to personnel, the effort estimation is surely helpful.
12
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
Business Goals Functional Requirements Quality Goals
Stakeholder Analysis View Selection and Definition
Planning
Assessment (optional)
View-Based Reference Architecture
Figure 2:
Realization
Documentation
Architecture Creation Process
In case an architecture assessment should be performed at the end of the iteration, assessment criteria have to be defined according to the business and quality goals. The assessment criteria define criteria that have to be met by the architecture. They are defined based on the business and quality goals for the product family and its members and taking into account the constraints. Defining assessment criteria before the actual design begins has a number of benefits, including a better understanding of the requirements and avoidance of specifying criteria that, due to an already influenced perspective, merely support what has been developed. Realization. In the realization phase, solutions are selected and design decisions taken in order to fulfill the requirements given by the scenarios. Thereby, existing knowledge and experiences, such as architectural patterns and design principles can be reused in this phase. Here is the place where information gained from reverse engineering activities comes in. The reverse engineering activities are triggered by the realization needs of the architect. For these solutions which have already been successfully applied in earlier products, the impacts on quality attributes as well as their suitability for certain requirements are known and possibly even documented. When selecting and applying the selected solutions, an implicit assessment regarding the suitability of the solutions for the given requirements and their compatibility with design decisions made
Copyright © Fraunhofer IESE 2004
13
Definition of Product Family Architectures
in earlier iterations is made. As an input for this step, a catalog of means and patterns is used. Means are principles, techniques, or mechanisms that facilitate the achievement of certain qualities in an architecture whereas patterns are concrete solutions for recurring problems in the design of architectures. The means and patterns are described according to the meta-model depicted in Figure 1. In a first step, appropriate means are selected. The selection is done by comparing the scenarios associated with the means in a catalog with the scenarios describing the actual requirements. Let, for example, “The system must be available 24/7 even in the case of a hardware fault” be a quality requirement for the product or component under design. This requirement is covered by the scenario “High availability in the presence of hardware faults” with the stimulus “Hardware Fault” and the response “High Availability”. The scenario is for instance related to the architectural means “Redundancy” and therefore redundancy is chosen for the architecture. Once the means are selected, the patterns that specialize the respective means are selected. This is again done by comparing the scenarios related to the patterns in a pattern catalog with the scenarios for the actual requirements. The patterns can either be taken from published pattern collections (e.g., [BMRS+1996], [Doug1999] and [GHJV1993]), have been identified and documented during earlier architecture creation projects, or have been discovered in prior systems during architecture recovery. In a next step, the selected patterns are instantiated in order to address the functional requirements. During the instantiation, the architect associates the elements of the pattern with concrete architectural elements. Documentation. In the documentation phase, the results of the realization phase are documented using the architectural views selected and defined earlier using the process outlined in Section 3.3 – “Tailoring Architecture Descriptions”. If an architecture description exists already from earlier iterations, it is updated and refined. In order to guarantee the consistency and completeness of the architecture description in general and the views in particular, quality assurance approaches such as reviews and inspections [QA2004] have to be applied to the architecture documentation. Contrary to the phase “Assessment”, it is not checked whether and how good the architecture itself fulfils the specified requirements, but what the quality of the architecture is. Assessment. The goal of the assessment phase is to analyze and evaluate the resulting architecture using defined assessment criteria and involving the different stakeholders. In particular, the given architecture is checked with respect to functional and quality requirements and the achievement of business goals. For the assessment of the architecture, architecture evaluation methods like ATAM and SAAM are applied. As a basis for these scenario-based evaluation methods, the scenarios already used in the creation of the architecture are used. In addition to the architecture evaluation methods, also available quality models can be used. The explicit relation between patterns, scenarios, and quality attributes documented in the quality models as well as the rationale documented for a
14
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
pattern helps in evaluating the given architecture. To minimize the effort involved in the evaluation, only the functionality newly added in the last iteration is evaluated in detail. Nevertheless, it is essential to ensure that all scenarios from previous iterations are still supported. Contrary to the preceding phases, this phase is optional that is it does not have to be performed in each iteration. The architecture creation process, however, can only be finished once all scenarios have been applied successfully (i.e. no problems have been detected in the assessment) and all assessment criteria have been fulfilled. In case of one or more assessment criteria are not or not sufficiently fulfilled by the architecture or some scenarios have not been applied, the architecture creation process either continues or is stopped anyway. In the latter case, however, the reasons for stopping the creation have to be documented. If the assessment of the architecture showed that at least one of the defined assessment criteria was not fulfilled, the underlying problem has to be examined in order to determine how the architecture creation process can continue. The analysis focuses on whether the current set of scenarios could be applied successfully to the architecture that resulted from the previous iterations and includes an in-depth reasoning about the chosen architectural approaches and decisions. In the best case, changing just the last iteration may solve the problem. In the worst case, if a solution supporting all scenarios in all variants even exists it is necessary to track back to the first iteration. There may as well cases where it is decided to not realize a scenario in order to avoid the problem. The input to the analysis is the result from the architecture evaluation activity. Outputs of the activity are a problem analysis and an assignment of action items. This section has discussed the design and evaluation of reference architectures independent of information gained from reverse engineering activities. In the following section, reverse engineering will be introduced. The integration of reverse engineering activities with the design and analysis of a reference architecture is then discussed in section 2.5. 2.4
Reverse Engineering Reverse engineering is the process of analyzing a system to identify its components and their interrelationships and to create representations of it in other forms or at a higher level of abstraction [ChCr1990]. The main goals in the context of product family engineering are the following: • Documentation of the architecture in order to assess the applied solutions and problems of an individual system. • Enabling reuse in order to integrate components (or whole subsystems) into the product family.
Copyright © Fraunhofer IESE 2004
15
Definition of Product Family Architectures
• Recovery of lost information in order to benefit from field-tested solutions and experiences. • Localization of single features in the source code in order to reuse this functionality in the product family. To achieve this goal, we apply a reverse engineering process consisting of 4 phases (see Figure 3): first characterization of the target system and set up of the infrastructure, second extraction of facts about the system, third execution of basic analysis. If necessary, detailed analyses are performed in a fourth phase, which might cause further activities in the first 2 phases.
Figure 3:
Reverse Engineering Process
Set up infrastructure. In the first phase a reverse engineering infrastructure is arranged, which in principle supports the application of reverse engineering techniques. Such an infrastructure provides the reverse architect with methods, techniques and tools to extract facts about a system and to analyze a system. Furthermore it offers the possibility to compile and to execute the system to gather dynamic run-time data. Available documentation of relevant system aspects are as well part of the infrastructure as mechanisms in order to present the results of analyses in a textual or graphical manner. The infrastructure is highly dependant on the characteristics of the systems to be analyzed. To set up a reverse engineering infrastructure, the system in focus will first be classified with respect to the following dimensions:
16
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
• Programming language and compiler. The programming language and its underlying paradigms (e.g., procedural, object-oriented, functional, etc.) and the used compiler have great impact on the reverse engineering infrastructure. The programming language prescribes which data can be extracted from the source code. For instance, extracting the Java package hierarchies is only possible if the actual implementation was done with Java, or the visualization of the inheritance tree is only possible for an object-oriented software system. Compilers can introduce special build-time mechanisms that have to be considered when analyzing the system (e.g., code generation). In heterogeneous development environments, where there are software systems that are realized with more than one programming language or compiler, each combination has to be included in the analysis. • System organization. The type of the system organization indicates which additional data sources have to be taken into account. For example, if there are several instances of an individual system realized by conditional compilation, the infrastructure should allow generating each instance so that one is able to regard the different instances separately. The system organization shows which non-source code files have to be processed in order to get a complete view on the software system. In case of a framework-based software system that operates with complex configuration files, the different configurations will contain data that may bring additional information to the analysis. • Topology. The software systems can be distributed in the development environment and file systems of the organizations in different ways. The scale ranges from global development in different sites with distributed source code over complex directory trees to flat distributions (extreme cases: all files in one directory, one single source code file). The storage of physical items of the software system can be supported by so-called software configuration management systems. The configuration management being responsible for controlling modifications and releasing items brings data about evolutionary aspects through history and log files into the reverse engineering infrastructure. The above-mentioned dimensions characterize existing software systems. Appropriate support for fact extraction tools and reverse engineering techniques in the infrastructure is derived based on the characterization. Next to the tool support facilities, the infrastructure itself contains raw data about the artifacts software system, i.e., discrete and objective facts about documents, events, and entities. Examples of artifacts are of course the source code, but as well code comments, configuration files, configuration management data, bug tracking data, user documentation, architectural descriptions, requirements, questionnaires, etc. Views produced by former analysis are regarded as artifacts, too, and can be processed farther. In short, the first phase supplies the reverse engineering process with the artifacts needed. Some special analyses may require
Copyright © Fraunhofer IESE 2004
17
Definition of Product Family Architectures
an infrastructure extension in order to adjust the infrastructure to the requirements of the analysis. Moreover, the reverse engineering infrastructure provides mechanisms to create, store, access, manipulate and maintain fact bases for a software system. The fact base is filled in the next phase, the fact extraction. Fact Extraction. The second phase processes the raw data contained in artifacts provided by the infrastructure into a fact base. Tools mostly automate the fact extraction activity and generate the fact base. A fact represents one basic piece of information about a software system (e.g. there are three classes A, B and C; A calls B and inherits from C, or the implementation of class D changed quite often). All facts together are aggregated in a fact base, the foundation of all further analyses. The fact base if often represented as a graph (common notations are GXL, Graphical eXchange Language [GXL2004], or XMI, XML Metadata Interchange [XMI2004]). Parsing or pattern matching are two common techniques for fact extraction from source code and other text-based files (e.g., configuration files), see for example [MoWo2003], [MuNo1995a]. In some cases, a combination of both is needed [KnPi2003]. Tools for performing the fact extraction should reside in the reverse engineering infrastructure. For some artifacts like questionnaires or user documentation manual or semi-automatic processing is required because the data cannot be extracted in an automated way. Additional input for the fact base can be gained by interviewing the experts. The fact extraction activity results in information about the software system. In our case information is classified data whereby the data entities are connected to each other with relations. Semantically interpreted data is called information that means a data entities are put into the context of other data entities. The fact base usually contains a huge amount of information, so that relevant information with respect to a certain problem or request is hidden in overcrowded low-levels models. In order to catch the relevant aspects out of the fact base, in the third phase goal-oriented analysis activities are performed that result in views or partial views. A view is a representation of a particular software system or a part of a system that captures only from a particular perspective (see Section 3 and 4 for more details on views). The third and fourth phases comprise the analysis of a software system that means the low-level information of the fact base is processed further and views are constructed. Such an analysis is usually done in a semi-automatic way and requires expert involvement to some extent. However, there exist analyses that can be performed fully automatic. The results of analyses are new, modified, or augmented (architectural) views or subsets of views, which are then the basis for the comprehension by the architects. This comprehension of course has to be done manually and results in knowledge that was gained from the available
18
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
information. Knowledge is the result of a learning process and it is based on data and information, but in contrast to them, it is bound to individuals and does not exist outside of them. The background and the experience of a person influence how the information is interpreted. The main goal of every analysis in the context of reverse engineering is to provide the right information on the right level of abstraction presented in a way so that the architects can transform the given information into knowledge and use this gained knowledge for their purposes (in this case the design of the reference architecture). Each reverse engineering analysis serves to achieve at least one of the following highlevel goals: • • • • •
To understand aspects of the software system To learn about dependencies between entities of the system To identify coherent system entities To exploit information for other purposes To assess a software system
We differentiate between two types of analyses, basic (the third phase) and detailed (the fourth phase) analysis: Basic analysis. The third phase involves the execution of basic analyses. Each analysis is executed with the information contained in the fact base. These analyses can be regarded as a standard set which can be parameterized, can be executed directly with only slight adaptation. It pays off to have predefined, reusable basic analyses collected in a catalogue so that the reverse engineer can execute an analysis on demand. Examples for such basic analysis are context analysis of code entities, architecture reconstruction, visualization of class and inheritance hierarchies, call graphs, standard data flow analysis, design pattern recognition. Detailed analysis. In the fourth phase, detailed analyses are realized. In contrast to basic analysis, detail analysis may require significant additional effort for the set up of the infrastructure or the fact extraction. They aim at getting a deeper understanding of specific system aspects, and therefore, more effort is required, as well as the involvement of experts is increased. For each analysis, concrete goals have to be defined in order to perform the analysis in an efficient way. Each analysis (both basic and detailed) is characterized by the following criteria. • Type of information: The type of the wanted information about the existing software system characterizes the analysis to be done. To work in a goaloriented manner, it is important to know what kind of information is needed. For instance, an analysis can deal with the context of single source code entities, or it can identify single components (or subsystems), extract such a component, and prepare the reuse of it. Another case is the revela-
Copyright © Fraunhofer IESE 2004
19
Definition of Product Family Architectures
tion of the underlying architectural styles of a system and the reasoning about them. These analyses differ strongly in scope and goal, and in order to operate in an efficient mode while performing analysis, it is necessary to have a clear understanding of the type of information (i.e., the wanted results). • Available resources: The available resources of a software system classify which information sources can be accessed. Beyond the pure source code, existing documentation, although it might contain inconsistencies, is very helpful. The involvement of experts via questionnaires or interviews is essential for some detailed analysis like the recovery of design rationale. It is crucial to know which data sources can contribute to the fact base, in order to not ignore important facts about the existing system or to select an analysis technique for which the information sources are not available. • Type of analysis: In general, there are two possibilities of analyzing a system, statically or dynamically. During static analysis only the artifacts of the systems are regarded. Relations between the entities are analyzed offline. Dynamic analyses gather information about the system during runtime, by executing of predefined scenarios and instrumentation of the system. In some cases, the combination of both, static and dynamic analyses is beneficial. • Technical domains: The technical domains in which a software system is operating (e.g., real-time systems, database management systems, and embedded systems) deliver the basic conditions for the following analysis. For example, analyses of embedded systems imply the consideration of core factors like resource consumption and runtime behavior.
20
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
System-specific Effort Variance
Effort
Time Infrastructure
Figure 4:
Extraction
Basic Analysis
Detailed Analysis
Schematic Efforts for Reverse Engineering Activities
Figure 4 is showing in a simplified, schematic way the typical efforts that are needed in the different reverse engineering phases. After a basic effort to set up the reverse engineering infrastructure and the initial fact extraction, prefabricated basis analysis can be executed with a relatively low effort. Detailed analyses that aim at a special, often fine-grained examination of the software system may require significant additional effort dependant on the type of the analysis. To avoid delays in later phases, the set up of the infrastructure should start as early as possible. The next section will show how forward and reverse engineering influence each other, and how they both can be combined in order to design an architecture that is based on existing systems. Section 5 will then introduce a selection of analyses from our analysis catalogue, which we regard as important in the context of product family engineering, comprising basic as well as detailed analysis.
Copyright © Fraunhofer IESE 2004
21
Definition of Product Family Architectures
2.5
Integration of Forward Engineering and Reverse Engineering The goal of integrating as many existing artifacts as possible in the new reference or product family architecture could tempt an organization to conclude first all reverse engineering activities, and then based on the knowledge gained start the design of the reference architecture. But because reverse engineering require effort and time, this pure approach is not always appropriate. For this reason, we propose a request-driven approach that produces results from reverse engineering activities on demand. An organization should decide quite early whether they want to apply reverse engineering techniques in general or not. If it is decided to work with reverse engineering, an initial effort to set up the reverse engineering infrastructure and to extract the facts has to be made. The earlier it is invested, the earlier the reverse engineering analyses can start. Then basic analysis can be selected out of our catalogue, adjusted to the concrete context via parameterization, and then executed. Since detailed analysis dependent on the type of analysis may require additional effort, so the investment for such analyses has to be weighted out against the expected usefulness and profit. The design of the reference architecture starts concurrently to the set up of the infrastructure and the fact extraction. During the iteration cycles, there may be at certain points the need for information about existing artifacts. In these cases, reverse engineering activities are initiated. Typically those requests concern critical aspects or success factors of the existing systems. Experiences made and architectural consequences of the chosen solutions of past systems can be recognized and migrated, the analyses help also to bind alternatives for similar problems. Therefore underlying knowledge and solutions can be saved and merged into the reference architecture. The results of reverse engineering activities are used in all four phases of architectural development, but they are especially beneficial in the realization phase.
22
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
Business Goals, Functional Requirements, Quality Goals Existing Artifacts, Documents, Systems, ...
Stakeholder Analysis View Selection and Definition
Planning
Assessment (optional)
Set Up Infrastructure
Request
Realization
Basic Extraction Analysis
Infrastructure Extension (optional)
Detailed Analysis
Extraction
Response View-based (Product family) Architecture
Documentation
Architecture Development Figure 5:
Reverse Engineering Informationen
Reverse Engineering
Integration of Architectural Design and Reverse Engineering
Figure 5 shows how forward and reverse engineering interact. During the design iterations, one or more requests initiate reverse engineering activities. Both processes, forward and reverse engineering, can run concurrently as long as the required quality of the reference architecture is achieved and design iterations are only marginal dependent on the expected results of an analysis. Results of reverse engineering (i.e., responses to requests) flow back into the design process, where then the gained information can be used to learn from past experiences. The outputs of reverse engineering are partial views that concentrate on specific aspects, characteristics or entities of an existing system. Dependent on the type of the analysis performed these views can be system-specific or product family-specific views. The partial views are then processed in the architectural design where the goal is to document a system with complete views. The communication between forward and reverse engineering activities operates in a request-response way, but experts from both sides (i.e., product family
Copyright © Fraunhofer IESE 2004
23
Definition of Product Family Architectures
architect and reverse architect) should be involved when initiating a request as well as when integrating a response. This has the following reasons when starting a request: • When shaping a request, it is integral to know to what extent the request is feasible if at all. The reverse architect knows what is realizable with the instantiated reverse engineering infrastructure, and for that reason he can assess which enquiries from product family architect are realizable. • Another important issue for a request is the effort for conducting an analysis with respect to factors like computation time, working hours, expert involvement, etc. To know about this is essential in order to plan and manage the cycles for design and reverse engineering activities and to schedule sessions where the knowledge gained from the existing systems is transferred. • Some analyses may require additional effort in setting up the infrastructure or for the fact extraction. To have no delays during the conduction of analysis the need for this additional effort has to be communicated right from the start. Such additional requirements have to be identified as early as possible. • The reverse engineering infrastructure includes a collection of analysis tools. Analyses may have contradicting or overlapping constraints on this environment, which have to be dissolved in the beginning, so that the infrastructure supports all of the selected analyses. The result of the formulated request, after conducting an architectural analysis, produces a response in form of (partial) views. The incorporation of such partial views should be conducted together by product family and reverse architects because of the following points: • In order to clarify questions online the product family and the reverse architect should review the response together, so that ambiguities can be solved immediately. • The product family architect may give feedback about the analysis to the reverse architect for the purpose of improving the quality of the results and for an easier adaptation of the results when processing them further. • The discussion and interpretation of the partial views contained in the response combines the forward and the reverse viewpoints. Supplementary information of the individuals enriches the communication, and human beings in close cooperation enable the knowledge transfer from reverse engineering to the architectural design activities. • New changed or refined requests may be the outcome of the discussion between product family and reverse architect and these requests can be started immediately. Responses of analyses which can be performed at once can already be reviewed during the discussion session.
24
Copyright © Fraunhofer IESE 2004
Definition of Product Family Architectures
In order to have a working integration of architectural design and reverse engineering activities, it is crucial that the communication between product family and reverse architect works well. Request and responses have to be understood by both for the purpose of having optimal use of reverse engineering in PuLSEDSSA. 2.6
Summary This section introduced PuLSE-DSSA, an approach that enables an efficient design of high-quality reference architectures and that explicitly includes information from existing software systems. The approach achieves its goals by the following points: • A view-based documentation of the architectures where the views are harmonized to involved stakeholders. • Scenario-based development in iterations, which enables the continuous evaluation of the architecture. • Incremental design that prioritizes successive requirements and implements them systematically. • The direct integration of reverse engineering activities in the design process so that efforts are only spent on demand. The collection of predefined, parameterized reverse engineering analyses in a reusable catalogue and the generalization of detailed analyses as well as the development of refined analyses will help to further reduce the effort for reverse engineering in future. For this reason, the design activities will benefit even more from reverse engineering activities. Section 5 will introduce a selected subset of our architectural analysis catalogue, while sections 3 and 4 are centered on the concept of views, section 3 with the concept of view-based architectures and section 4 giving examples of typical views. The case study documented in section 6 will give a practical example, where we applied PuLSE-DSSA and combined design with reverse engineering in order to build a reference architecture for Eclipse IDE plug-ins.
Copyright © Fraunhofer IESE 2004
25
View-based Software Architectures
3
View-based Software Architectures
The combination of design and recovery is at the heart of developing new reference architectures based on existing systems. In order to discuss this combination, we will now first describe how these two fundamental activities interact in various specific product family development scenarios. In the following section, we will discuss the software architecture in its role as an interface between architecture design and reverse engineering. In section 4, we will then take a look at various architectural views and the information contained in them, as well as we show in section 5 how reverse engineering techniques can be used in order to gain the information needed to reconstruct architectural views. 3.1
Software Architecture as Interface Software architectures have been defined as “the fundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution” [BCK1998]. The architecture of a software-intensive system is concerned with the high-level organization and the structure of the system and is generally the first artifact that describes a software system from a solution-oriented point of view. As such, it facilitates communication about a system in an early phase of system development and enables discussions about alternative solutions. The architecture is a fundamental artifact in a software development project and whatever is wrong in the architecture multiplies through the complete product life cycle. It is generally accepted that the earlier design decisions are made, the harder they are to change in later phases and the more far-reaching effects they have. It is, therefore, crucial to make decisions in the architecture phase carefully and to be aware of their impact. Software architectures not only cover functional requirements but also quality and business goals and hence have determinant impact on the quality attributes of the resulting systems. Thus, it is important to make sure to have the best possible architecture. The characteristics of software architectures just mentioned make them a crucial asset in the development of software. Furthermore, these characteristics make architectures the ideal interface for combining recovery from existing systems and the development of a product family based on the recovered information. The architecture of a software system is at a sufficiently abstract level to overview the overall organization of a software system, yet it describes the solution offered by the software system at a level of detail that enables the comparison among different architectures in the same application domain or their
26
Copyright © Fraunhofer IESE 2004
View-based Software Architectures
combination in a product family or reference architecture. Therefore, it can be used to exchange and reconcile architecture information recovered from existing systems that is used as input to design new architectures. The architecture is a software system’s fundamental organization. To use an architecture as interface between recovery and design activities, it must be documented properly. This is done in architecture descriptions. Figure 6 shows the interaction between recovery and design activities as an IDEFØ diagram. The output of architecture recovery is a number of architecture descriptions from the different existing systems that are used as a basis for a product family architecture. The scope of the product family, as well as the requirements on the product family are determined and documented during product family scoping and modeling. The recovered architecture descriptions are used as starting point to the design of the product family architecture. Expert Knowledge
Expert Knowledge
Product and Domain Information
Product Family Architecture Design
Architecture Recovery
Existing Systems
Product and Domain Information
Product Family Architecture Description
Architecture Descriptions
Product and Domain Information
Figure 6:
Product Family Scoping and Modeling
Combining Design and Recovery
An architecture description is a document usually containing a number of architectural views that capture the software architecture from different perspectives. It is essential that the description techniques for the different architectures (i.e., the recovered architectures, as well as the product family architecture) match or that, at least, the differences in the descriptions are documented. Otherwise, misunderstandings are likely to happen leading to unwanted results or unnecessary effort. 3.2
Architecture Descriptions Software architectures encompass structural and behavioral properties of software systems as well as relations to their environments. To completely describe an architecture, several perspectives towards it should be taken. This results in architecture descriptions that consist of multiple architectural views. Each architectural view is an abstraction of the software system, but different architectural views abstract from different details of the system. Architectural views prescribe the types of components and the types of relationships for describing a software system (i.e., the connectors), as well as properties of these compo-
Copyright © Fraunhofer IESE 2004
27
View-based Software Architectures
nent and connector types. Consequently, each architectural view presents different information, is used by different stakeholders, and addresses different concerns. The most influential input to view-based documentation of software architectures was made by Philippe Kruchten [Kruc1995]. He proposes a system of four interrelated views (logical, process, development, and physical view) augmented with a fifth redundant view (scenarios) that abstracts from certain requirements and shows how the four views work together to satisfy the requirements on an architecture. Davis and Williams also propose a set of four views (domain, component, platform, and interface view) augmented with a fifth view, the context view that describes the dynamic behavior and quality characteristics of the resulting software system [DaWi1997]. Hofmeister, Nord, and Soni elicited their views by investigating what descriptions are actually used to describe architectures in industrial software projects [HNS2000]. The result was a set of four interrelated views (code, module, execution, and conceptual view). These view sets have in common that they focus on functional aspects of an architecture. However, non-functional or quality aspects should also be reflected in an architecture description. Quality attributes can be the basis for defining additional views that highlight how quality requirements are satisfied in the architecture. A quality attribute is a general characteristic of a software system (e.g., performance). A quality requirement is a concrete requirement related to a quality attribute (e.g., the method x should have a response time of y). The use of quality attributes as a basis for additional views on software architectures has a number of effects. It supports a clear separation of concerns, since one architectural view concentrates on a certain functional or non-functional aspect and only contains elements that depict how that aspect is covered by the architecture. Such a separation of concerns leads to reduced complexity in the description of the individual aspects that, in turn, increases the comprehensibility of the individual views and, consequently, of the overall system. The increased comprehensibility also supports the evolution and maintenance of the architecture, enables traceability and facilitates reuse. Views for certain quality attributes enable a clear separation of concerns during architecture development. The architects can concentrate on one important aspect of the architecture at a time. The concentration on one aspect per view increases the comprehensibility of the software architecture description. The effect of the mentioned benefits is that the quality of the documentation (i.e., of the architecture description) is increased. This, in turn, supports the creation, as well as the maintenance and evolution of the architecture description.
28
Copyright © Fraunhofer IESE 2004
View-based Software Architectures
However, there are also challenges that have to be mastered when using views for quality attributes. The first challenge is to select the views to be used in a way that they form an optimal set that is neither too big nor too small. Once the views have been selected, the consistency among them must be ensured and they must be maintained properly. The view-based description of software architectures is accepted practice. There are two principal possibilities for doing this in a given project: • Using a proposed, general-purpose view set, like one of the view sets mentioned above • Setting up a customized view set As described above the general purpose view sets do not take into account quality-related views. Therefore, a customized set of views must be created when quality attributes are to be reflected in the architecture description. The problem is that there are no methods that support the creation of an integrated set of views or to augment an existing set of views with quality-related views. However, there is an IEEE Recommended Standard for Architecture Description that can be used as a basis for doing that [IEEE2000]. It defines a meta-model that allows the description of an architecture from different viewpoints using architectural views. This meta-model defines the concepts associated with architecture descriptions and relates them to each other. It can, therefore, be used to create customized view sets by specializing the standard’s meta-model. The resulting view sets are compatible with each other, which fosters the reuse of views and view sets in numerous projects. The recommended standard does, however, not provide concrete views or any process support for defining, creating, and using view-based software architectures. Another important aspect when dealing with view-based product family architectures is that each view potentially contains variability. Therefore, each view must support the description of variability and a decision model must support the consistent management of the different variabilities in the different views. The meta-model for architecture descriptions as proposed by the IEEE Recommended Standard for Architecture Description is shown in Figure 7 (taken from [IEEE2000]). A system is the final product of software development. It fulfils a mission in its environment that, in turn, influences the system. A system has a number of stakeholders. Stakeholders include customers of the system, persons involved in the development of the system, or other systems collaborating with the system at hand. Each stakeholder has a number of concerns on the system under consideration. Concerns are any matter of vested interest a stakeholder has in a software system. The architecture of a system is documented in the respective architecture description. The description is organized by a number of views that conform to viewpoints. A viewpoint is a perspective taken by a stakeholder on the system
Copyright © Fraunhofer IESE 2004
29
View-based Software Architectures
and is used to cover one or more concerns. It does this by determining the models used to represent the concern(s), processes to create them, and applicable analyses. Mission
/fulfils
Environment
/inhabits /influences
/has an
System
/has
/is important to
Architecture
/described by
Stakeholder
/identifies
Architectural Description
/provides
Rationale
/is addressed to /organized by /selects
/has /indentifies Concern
/used to cover
Viewpoint
/conforms to
View /aggregates /participates in
/has source /establishes methods for
Model
Library Viewpoint
Figure 7:
Conceptual Model of Architecture Description
A view is the description of a software system from a certain viewpoint. It is associated to one viewpoint, which it must conform to. Views consist of collaborating models and can be composed. The view must conform to the notation determined by a viewpoint and is created using associated processes. To enable the reuse of viewpoints the concept of a library viewpoint is introduced. A library viewpoint can be used in different architecture descriptions.
30
Copyright © Fraunhofer IESE 2004
View-based Software Architectures
3.3
Tailoring Architecture Descriptions As described above, there are cases in which the views in a pre-defined view set are not fitting to the respective context. Unfortunately, approaches for viewbased descriptions of software architectures do not provide systematic means for extending their view sets. Therefore, we developed an approach for extending existing view sets that elicits and defines the views necessary in a given context based on an approach for view-based software documentation. The goal of the view customization activity is to define an optimal set of architectural views that are used to document the architectures of the planned products. There are cases in which an existing view set can be used without adaptation. However, often the proposed views are not sufficient to describe all relevant architectural aspects. Then, a new view set must be defined. Constraints Expert Knowledge
Business Strategy
Expert Knowledge
Business Goal Elicitation
Product Requirements
Quality Goal Elicitation
Quality Model
Product Requirements
Business Goals
Consolidated Views Set Existing View Sets
View Consolidation
View Elicitation
View Definition
Customized ViewSet
View Set
Incomplete View Set
Figure 8:
Customizing Architecture Views
The process for customizing architecture views is shown in Figure 8. To prepare the view set definition, business and quality goals are elicited and documented. These are, together with an existing view set that is to be extended, the input to the actual view elicitation and definition activities. The single activities are described in more detail in the following. Business goals are the starting point for view set definition. Business goals are taken into account to enable the documentation of the impact of an organizations’ business strategy on the products the organization builds. Certain business goals influence a products’ software architecture. These influences can be documented by providing views that show how the respective business goals are realized in the architecture. In this initial step, the business goals of the organization developing the product family are elicited and documented. In a moderated brainstorming session the business goals influencing the product family are collected and consolidated from the domain and application experts.
Copyright © Fraunhofer IESE 2004
31
View-based Software Architectures
If a documented business strategy exists, it is taken into account as well. Constraints on the product family are also taken as an input in this step. A quality model is a description of the quality goals that apply to a product family and of how these quality goals influence each other. The (nonfunctional) requirements on the different products are investigated with the domain and application experts to set-up a quality model. To do this, the quality goals are collected, consolidated, and documented. The result is a quality model that captures the quality attributes relevant to the product family, as well as concrete quality requirements on the product family architecture in the form of scenarios. The quality goal elicitation is followed by the actual view elicitation and definition. As described above existing view sets can be used as a starting point to add project-specific views. It is also possible to start from scratch. As a first step, an initial set of views is created. In moderated brainstorming sessions, views that highlight important aspects of the system being developed and its architecture are collected. The goal is to elicit all concerns and aspects that should be made explicit and to find views that enable the modeling of these in the architecture description. Sources for views are the quality model, as well as existing view sets that serve as a framework to integrate the specific views into (in case there is one). The result is a list of views that should be used. Typically, the number of views is too large. Therefore, the view set is analyzed in a second step called view consolidation. The initial set of views is analyzed to find the optimal set (in terms of number of views and which views to select). If certain important aspects of the product family architectures cannot be covered with the initial views, view elicitation is revisited based on the incomplete view set. Once the view set is consolidated, this final set of views is defined. The result is a meta-model that defines the element types and the relationships that can be used in the different models, as well as relations among the different views. Additionally, the notations used to document the views are defined. We found three principal possibilities for representing architecture views when augmenting an existing set of views: • New view: a new view representation, that is new representations for the elements and relationships defined in the meta-model are created. • Filtered or highlighted view: elements in an existing view are filtered out (in case they are not important for the new view) or highlighted (in case they are the focus of attention). As an example, the structure and behavior of employed patterns can be shown by means of filtered logical and process views. An example for highlighting is a structural architecture view in which the elements that are made persistent are marked for a persistence view. • Augmented Views: new elements are added to an existing view, for example annotations for performance data in dynamic views.
32
Copyright © Fraunhofer IESE 2004
View-based Software Architectures
In the next section, we will introduce a number of typical architectural views, followed in section 5 by a selection of request-driven reverse architecting analysis, which help to reconstruct such views for existing system
Copyright © Fraunhofer IESE 2004
33
Typical Views and their Recovery
4
Typical Views and their Recovery
The software architecture as an interface between forward and reverse engineering is described with the help of views, which highlight only certain aspects of the system. This section gives an overview about typical views and their interaction. We start with the example of the Siemens’ view set (Hofmeister et al. [HNS2000]), a view set of four widely accepted in research and industry. This view set contains the code view, the module view, the conceptual view, and the execution view. Furthermore, we will expand the Siemens’ view set with other views, which we consider as well as important, namely the behavioral view, the build-time view, the data structure view, and the feature view. These supplemental views help to focus on special properties of the software systems in order to capture other important characteristics of the software system not reflected in the Siemens’ view set. Reconstructing those views constitutes a good illustration of the broad scope of the possible recovery techniques and their mutual support. The reconstruction of views exercises dynamic and static analyses, top-down and bottom-up techniques, as well as expert-driven and automatic recovery. A reconstructed view may support the construction of the others. The views constitute the canvas on which the product family architects and the reverse architects together will paint the design rationales, the successful means to meet requirement, including the commonalities and variabilities among the various prior systems. This section will briefly introduce request-driven reverse architecting analysis techniques for each view. Section 5 then reports on a selection of our analysis catalogue containing request-driven reverse architecting techniques that we developed in order to produce the view set. Our focus thereby lies of course on techniques that express concerns in a product family context. The next subsection shows our notation template for the view description meta-models, and then the different views themselves are introduced. Each view is described with respect to the following points, the meta-model, and the individual elements of model, and a list of recovery techniques that can be applied to reconstruct the view. Finally, we show how different views can be integrated to an architectural description, and how the views interact.
34
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.1
Table 1:
View Notation Viewpoint
Name of the viewpoint
Concerns
The concerns the viewpoint addresses
Meta-model
Pointer to the meta-model defining the view
Representation
The notation used to depict the view
Process
Pointer to the process description that captures how the view is created
Analyses
Pointer to the applicable analyses
Viewpoint Template
A view set as the ones proposed in literature can be used directly or as a starting point for defining a customized view set that is tailored to the respective project context. The customization process has been presented in the preceding section. Its result is a customized meta-model that defines the views to be used to describe an architecture. The notations that are used to model the views accompany the meta-model. We use the following template to define the viewpoint a view conforms to. As defined in the IEEE recommended Standard 1471 for Architecture Description, a viewpoint establishes the conventions by which a view is created, depicted, and analyzed [IEEE2000]. In the following, the view set proposed by Hofmeister, Nord, and Soni [HNS2000], as well as the additional views, are modeled using the viewpoint template. 4.2
Conceptual Views The conceptual view is the view closest to the application domain. As such, it can be a key facilitator to interact with domain experts who are not interested in the details of the software system, but in what the system does in terms of domain concepts. When inputs from such domain experts are required by the reference architecture design process, it is important to reconstruct this view by using the subsystem decomposition of the module view. Example of the conceptual view’s concerns can be expressed by the following questions: • How is domain-specific hardware and software incorporated into the system? • How does the system meet its requirements? • How are COTS components integrated and how do they interact (at the functional level) with the rest of the system? • How is the impact of changes in requirements or the domain minimized? • How is the system connected to its environment?
Copyright © Fraunhofer IESE 2004
35
Typical Views and their Recovery
The conceptual view serves as a communication platform where stakeholders and developers can discuss the key concepts underlying the software system. Dependencies within a system and between the software system and other systems are captured in this view as well. 4.2.1
Meta-model The conceptual architecture view captures the application domain by mapping the functionality of the system to conceptual components. Conceptual connectors are used to coordinate the conceptual components and to exchange data among them. The interplay and relationships among conceptual components and the conceptual connectors is captured in the conceptual configuration. The conceptual view depicts a system from an application domain viewpoint that is independent of solution aspects, like software or hardware techniques. The meta-model used here is a slightly simplified version of the meta-model in the originally published view set [HNS2000]. We removed the conceptual ports and roles, as well as the protocols, because we usually describe the conceptual architecture at a level of detail that does not go into such detail that the mentioned concepts are necessary. The process for creating a conceptual architecture view is not replicated from the original reference because the focus in this deliverable is on the views playing the role of an interface between recovery and design activities. Hofmeister, Soni, and Nord do not provide analyses for the views. Again, since the focus is not on that aspect of viewpoints, we do not provide analyses.
36
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
Viewpoint
Conceptual Architecture Viewpoint
Concerns
• Functional Requirements • COTS Integration • Legacy System incorporation • Domain-specific hardware/software integration • Partitioning into product releases
Meta-model Conceptual Configuration
1
*
*
*
Conceptual Component 0..1
Representation
1
0..1
*
*
*
Conceptual Connector 0..1
0..1
• Structural Models for decomposition of functional requirements in conceptual components (e.g., UML class diagrams with appropriate stereotypes) • Behavioral Models for modeling the conceptual connectors (e.g., UML sequence diagram)
Table 2:
4.2.2
Process
cf. Design Activities for the Conceptual Architecture View
Analyses
-
Conceptual Architecture Viewpoint
Elements of the Conceptual View We have chosen three elements of the meta-model of the conceptual architecture view, these are: Conceptual Configuration The interplay and relationships among conceptual components and the conceptual connectors is captured in the conceptual configuration. Conceptual Component The application domain maps the functionality of the system to conceptual components. This includes the incorporation of COTS components and legacy systems, and how the system aims at achieving the requirements in general, i.e. independent from concrete technological decisions.
Copyright © Fraunhofer IESE 2004
37
Typical Views and their Recovery
Conceptual Connector Conceptual connectors are used to coordinate the conceptual components and to exchange data among them. They enable the communication between two conceptual components. 4.2.3
Recovery Techniques Conceptual views are recovered manually with the help of human experts, usually in an interview-based manner. In cases where the module view is available, we provide guidance in reconstructing the views by our conceptual view reconstruction approach as described in Section 5.7.
4.3
Module Views In the Siemens view set [HNS2000], the module view organizes modules into two orthogonal structures: decomposition and layers. The decomposition of a system captures the way a system is decomposed into a hierarchy of subsystems and modules. A module is also assigned to a layer, which constrains its dependencies on other modules, based on the layer where they are found. The following subsection address these structure in turns: decomposition into subsystems and modules, then the layers. The module view differs from the conceptual view in its components scope, its structure, and the concerns it addresses. Conceptual components are where the main functionality resides. In the module view, all the application functionality, control functionality, adaptation and mediation must be mapped to modules [HNS2000]. The structure of the conceptual view is flat, as its role is to provide an overview. The module view subsystem decomposition is hierarchical, as it is decomposed down to the individual module level. Example of the module view’s concerns can be expressed by the following questions (from the reconstruction perspective): • • • •
4.3.1
How are the products mapped to the software platform? How are the dependencies among modules minimized? Which modules and subsystems could be reused? Which techniques are used to isolate change in COTS, OS, database, standards, protocols, etc. and how successful are they?
Meta-model The module architecture view maps the components and connectors from the conceptual architecture view to subsystems and modules. The conceptual solu-
38
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
tion provided in the conceptual architecture view is addressed with available software platforms and technologies. The modules are the places where computation takes place. Modules require and provide interfaces to other modules. Interfaces have no associated implementation. Modules are organized in two orthogonal structures: decomposition and layers. A software system is decomposed into a number of hierarchical subsystems and modules. A module is also assigned to a layer, which then constrains a module’s dependencies on other modules. Viewpoint
Module Architecture Viewpoint
Concerns
• Mapping of the solution to a software platform • Support and services required • Testing support • Dependencies among modules • Reuse of modules/subsystems
Meta-model contain * Subsystem provide
0..1
*
contain
* *
*
*
* *
Module
0..1
*
* *
require
require assigned to
*
*
0..1 *
Layer *
* use
Table 3:
provide
Interface
*
0..1
use
contain
Representation
• Structural Model for decomposition of the system into subsystems, layers, modules and interfaces (e.g., UML class diagram with appropriate stereotypes and packages for subsystem and layers)
Process
cf. Design Activities for the Module Architecture View
Analyses
-
Module Architecture Viewpoint
As in the conceptual architecture view, the process for creating a module architecture view is not replicated from the original reference because the focus in the deliverable is on the views playing the role of an interface between recovery and design activities.
Copyright © Fraunhofer IESE 2004
39
Typical Views and their Recovery
4.3.2
Elements of the Module View The module view describes two orthogonal structures: functional decomposition in modules and subsystems and usage oriented decomposition in layers. Subsystems and Modules We exploit the system decomposition of the module view for two main purposes. First, it serves as a good anchor point to interact with developers or designers of the system. For them, this view is more abstract than the code view, but still close to daily understanding, so that they can easily use it as reference point, when answering questions about strategies and solution employed to implement features or fulfill quality attributes. Second, the decomposition serves as input for the execution and conceptual view. The dynamic part of execution view analyses the dynamic interaction among the subsystems and modules defined here. The conceptual view combines the decomposition with top down information obtained from expert and other sources. Modules and subsystems build hierarchy whereby modules are parts of subsystems. Modules provide some kind of functionality. Usually the functionality comprised by a module is logically related. Layers Layers organize modules into a partially ordered hierarchy. Modules can be assigned to exactly one layer. Layers consist of further layers and can only use layers that are on the same or a lower level in the hierarchy. A layer depends only on layers that are on the same or on a lower level. A strict definition of a layered architecture means that the used layers are at most one level below.
4.3.3
Recovery Techniques Reconstructing the decomposition of the module view manually requires often too much effort to be practical, due to the high of number of files and the fact that other sources (e.g., the code view and the conventions) often reflect only a small part of the module structure. For this reason, automatic or semiautomatic solutions can be very useful, when they provide a good approximation of this decomposition with limited expert effort. Before considering how to efficiently recover this decomposition, it is helpful to consider why we recover it and how it will be used. The main goal of this recovery is to help the design a reference architecture by focusing the architects’ and experts’ attention on the specific subsystems and modules implementing a needed feature in different prior systems. This allows them to identify, analyze and compare the strategies and solutions of prior systems and used them as in-
40
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
put to specific design questions. A semi-automatic solution must meet a key challenge to be successful: It has to find the right combination of limited expert input and focused review to achieve a decomposition that the expert accepts with few modifications. There exist different automatic [MMRC+1998], [HuBa85] or semi-automatic [ScHa1994] [MuUh1990] techniques, which decompose a system into a hierarchy of subsystems and modules (see [BGS2002] for a summary of these techniques). We propose SARA (Semi-automatic Architecture Recovery Approach, see section 5.4), because it builds a decomposition around representatives provided by experts. For a technique to reconstruct layered architectures by iteratively refined design hypotheses, see section 5.9. 4.4
Code Views Code architecture views isolate the construction and development aspects of a software system, and organize them in a separate view according to the organization’s particular development environment. The implementation language, development tools, development environment and the development process have a strong influence on this view during forward engineering, and thus, will affect again when recovering the architecture of software systems. The code architecture view is grounded on information found in the source code, in configuration files, or other information sources directly connected to development environment. The main goal of the code architecture view is to give an overview about how the different elements are organized in the development environment of an organization.
4.4.1
Meta-model The code architecture view captures how the runtime entities from the execution view are mapped to deployment components, how modules from the module view are mapped to source components and how deployment components are produced from source components. The code architecture view captures how the implementation of the system described in the other views is organized. The source components, intermediate components (e.g., object files or static libraries), and deployment components (e.g., executable or dynamic libraries) are related to the elements from the module view (i.e., subsystem, layer, module, and interface) and from the execution view (i.e., runtime entity).
Copyright © Fraunhofer IESE 2004
41
Typical Views and their Recovery
Viewpoint
Code Architecture Viewpoint
Concerns
• Version and release management • Tools for development environments • Integration • Testing
Meta-model ModuleView:: Subsystem
0..1
ModuleView:: Layer
0..1
ModuleView:: Module
0..1
ModuleView:: Interface
0..1
trace
* *
Code Group
trace
trace
trace
* * *
Source Component
* * generate
Binary Component
Library
Configuration Description
Executable
*
1
* import
*
*
*
*
* *
link
* *
1 1
compile compile
*
used at run time
link
instantiate link
* ExecutionView:: RuntimeEntity
Table 4:
Representation
•
Process
cf. Design Activities for the Code Architecture View
Analyses
-
Code Architecture Viewpoint
The code architecture view shows a static view on the software system that is information that can be gathered without executing the system. Fact extractors perform (several) static analyses to collect data and information out of the development environment. 4.4.2
Elements of the Code View According to Hofmeister et al. the code architecture view is built of several different code groups. Each code group can again contain other code groups. A code groups consists of the following elements: • Source components • Binary components • Libraries • Executables • Configuration description
42
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
In the following, we will discuss how the different elements can be recovered from existing systems as well as what are available techniques for the tasks and which problems might occur in their usage. Source components Source components are artifacts, which are produced by software developers in the implementation phase. They can be programmed manually or with the help of source code generators. Information about source components can be collected using fact extractors. A fact extractor analyzes the complete source code and extracts low-level information. Based on this low-level information, we can build at least two hierarchical perspectives on the software system: • Physical containment: the physical containment perspective of a software system reflects the file system structure in the development environment. It shows the location of directories and files related to the software system. This perspective is language independent that is every software system consists out of these two elements (in the worst case, it is one file in a flat directory structure). • Structural containment: the structural containment perspective shows logical abstraction structure of the source code entities. It is obvious that this is a highly language dependent perspective on the software system. Therefore we have to choose an appropriate fact extractor with respect to the language of the source code. If the developers used different programming languages or dialects within a system, several fact extractors are needed, or the fact extractor has to be adapted in some way. An example of a structural containment is an object-oriented software system programmed in Java. In Java, we have the following structural hierarchy: at the top-level are the Java packages building a hierarchy, and which contain one or more classes. Each class may consist of several attributes, methods, and inner classes. The same counts for inner classes again. Attributes can be class variables or instances of other classes; methods operate with local variables, members and invoke methods of other classes. A fact extractor can provide this structural information and the relations between the entities. In contrast to a Java system, a procedural system programmed in C has a different structure. It consists of functions and procedures, which may call each other, pass variables and parameters between each other and they may set and use global data structures. Binary Components When compiled, the statements of source components are translated machine code. A source component is transformed into at least one binary component.
Copyright © Fraunhofer IESE 2004
43
Typical Views and their Recovery
Binary components are then linked together to libraries and executables, which then enable the user to run the program. Libraries Libraries often contain general support or special purpose functionality like mathematical computation, file handling, or graphical user interface stencils. When analyzing an existing system, the reverse architects want to set a certain focus, for example they only want to learn about the software system itself (i.e. without any libraries) or the connections between the software system and a specific group of libraries. For this reason, recovery techniques should be able to filter the information, which will be put into the fact base. We have to differentiate between three types of libraries: • Third-party libraries: third-party libraries are often part of the development environment and located in other directories than the software system itself. The libraries are linked to the binary components during the build process. Filters have to be applied to certain standard include directories, and to userdefined directories passed via switches to the linker. • In-house libraries across systems: a software system can consist of other software systems, which play the role of libraries. In this case, the libraries serve a lot of systems and have to be handled very carefully because of their high importance in to the organization. A concrete specification whether the binary components are part of the libraries or part of the software system under examination is needed. • In-house libraries within a single program: These libraries are a normal part of the software development environment. The source components are packaged into libraries in order to obtain to have a more The separation of libraries and the software system is a matter of filtering. The reverse architect has to leave out certain parts contributing to the software system, but disregard in the recovery process. Executables The result of the transformation process of source components into binary components through compiling and linking are executables. Executable are mostly machine and operating system dependent, i.e. the translation process introduces various dependencies to the source components. Configuration Descriptions The configuration description of a software system has a very strong influence on the other elements of the code architecture view. Configurations are one mean in order to make generic implementations specific that is to build con-
44
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
crete instances from the generic source code. A configuration description can include among others information about mechanisms (i.e. how to realize certain characteristics) and results (i.e. what are values of certain characteristics) of the software system. Mechanisms deal with: • • • •
Preprocessing conditions Compiler switches Include paths for libraries Make-files generation and execution
Results directly connected to the configuration might be among others: • Which parts are included in one particular instance of a software system, and which are not • Initial values for variables and constants • Distribution to client and server The configuration description is a very crucial part in the analysis of existing software systems because of its deep impact on source and binary components, libraries and executables. The more a software system is configurable, the more complex is the situation for the reverse architect. Instead of analyzing all possible instances at the same time, the focus is set onto one concrete instance first (often the most common one). Later, other instances can be analyzed and the differences can be modeled in the view. 4.4.3
Recovery Techniques When recovering source components from a software system, it is most important to have the right fact extractor(s) because of language dependent differences. Fact extractors store the information in a fact base. Several fact extractors may contribute to one and the same fact base, in order to capture some product specific information not provided by a single fact extractor. There are different types of fact extractors: • Parsers: a parser is a program that receives input in the form of source code instructions and breaks them up into parts (for example, the nouns (objects), verbs (methods), and their attributes or options) that can then be put into a fact base or can be processed by other tools. • Lexical pattern matchers: a lexical pattern matcher evaluates the source code based on regular expressions that is it scans the code for a search string matches the criteria (or several criteria for more complex patterns) and puts the hits into the fact base. In general, (most) fact extractors are able to collect at least the following relations between source code components:
Copyright © Fraunhofer IESE 2004
45
Typical Views and their Recovery
• Imports: this is a dependency between files; one file needs information that is part of another file. • Call: calls are function or procedure calls in a procedural software system, or for an object-oriented system, method invocations are extracted here • Inheritance: inheritance can occur between classes, that means one class inherits the attributes and/or methods of another class • Set: an attribute can be set by a method, or a variable can be set by a function or procedure • Use: the same applies for the use relation • Of-type: the of-type relation indicates the different types; variables, fields or objects used in the signature of a method, function or procedure. These relations can be lifted to various levels of abstraction. Depending on the questions the reverse architects want to answer, they may neglect some relations or even some source components. Static analysis with the help of fact extractors cannot resolve all relations between source code entities, since some decisions can only be drawn at a later time that is at the compile time or run time of the system. Therefore, the code architecture view should be completed by the execution architecture view, where the interaction of the different source code entities is modeled. In special cases where the source code itself got lost (e.g., after a hard disk crash) there are the following possibilities to analyze entities other than source components: • Binary Components: For the analysis of binary components, there is one type of fact extractor. A decompiler converts object code back into the code of a higher-level language. A major problem of this technique is that some of the naming conventions present in the original source code will be replaced with insignificant terms in the decompiled version. • Libraries: libraries can be analyzed with the help of disassemblers. A disassembler converts a program in its ready-to-run form (sometimes called object code) into a representation in some form of assembler language so that it is readable by a human. A program used to accomplish this performs the inverse of the task that an assembler does. • Executables: A disassembler can also be applied to executables. The understandability and readability of the code, and therefore of the resulting code architecture view will be decreased when working only with decompilers. As well as with decompilers, the value of the extracted information by disassemblers is deprecated by incomplete information.
46
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
Due to the large variety of possible combinations of source code languages, configurations, tools, make-files properties, not every case is addressed in the above description. Therefore, the above description gives a general overview about recovery techniques for the code architecture view. The elements of the code architecture view will be transformed from one into another, starting from the source code, finally ending at libraries and executables. The translation process of source components of the software systems has to be documented in order to support the organization of concurrent development, compilation and build processes. Different development teams need a clear decomposition of the source components, so that the integration process of the whole software system is smoothed as well as the management of different version of the source components is possible. Abstractions, aggregations and groupings of the code architecture view elements build the higher-level view (i.e., first the module view, then the conceptual view). The higher level or abstracted entities and relations can be mapped to concrete ones at the code level. During architecture recovery activities, code architecture view and the execution architecture views produce a lot of facts about the software system. The facts are then the foundation for the two other views. In case of an ambiguity in the higher views, the reverse architect can step at every time into the details, and inspect his assumptions. For a large-scale software system, code views will contain a lot of elements (even for medium-scale systems up to several millions elements), which can be hardly managed by humans. For this reason, the foundation has to be shifted to a manageable level (i.e., mapped to elements of higher abstraction, for instance elements of the module and the conceptual architecture views). Having all these views, an analysis of the software system can be performed in a topdown (guidance from the higher levels) and a bottom-up (from details to abstractions) manner or both combined. 4.5
Execution Views The execution architecture view comprises the runtime aspects of the software system and explains the deployment of the system and how the elements of the code, module, and conceptual view can be mapped to concrete external elements (i.e., operating system mechanisms and hardware elements). Hence, the execution architecture view captures how modules from the module architecture view are mapped to the elements provided by the respective runtime platform and how these are mapped to the hardware architecture.
Copyright © Fraunhofer IESE 2004
47
Typical Views and their Recovery
4.5.1
Meta-model The modules identified in the module architecture view are assigned to runtime entities. Two or more runtime entities communicate via a communication path. The type of available runtime entities and communication paths is dependent on the software platform that provides platform elements and communication mechanisms. The software platform also determines the available platform resources that can be assigned to appropriate hardware resources. Questions answered by execution views are how run-time entities interact or what the conceptual messages (messages in the domain terminology) that are exchanged between the run-time entities are, and how they can be mapped to architectural entities form other views. Run-time entities interact during the execution of a program using the communication mechanisms like RPC (remote process communication), IPC (inter process communication) and shared memory. In a large-scale system, it is not obvious, which runtime entities interact and what kind of conceptual messages they exchange for a given scenario/feature. Hence, this information has to be recovered. Viewpoint
Execution Architecture Viewpoint
Concerns
• Performance • Recovery • Reconfiguration • Resource Utilization • Concurrency • Replication • Distribution
Meta-model Software Platform
*
*
Communication consume * Mechanism
Platform Element
*
1
* *
consume
*
Platform Resource
*
assigned to
1
Hardware Resource
is a *
*
Communication Path *
48
contain
1
use mechanism
Table 5:
*
communicate over
Runtime Entity
*
assigned to *
ModuleView:: Module
2..*
Representation
• Deployment diagrams that show the mapping of software onto hardware (e.g., UML deployment diagrams)
Process
cf. Design Activities for the Execution Architecture View
Analyses
-
Execution Architecture Viewpoint
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.5.2
Elements of the Execution View The execution view provides information on how runtime entities are mapped onto the hardware architecture of the machines executing a program. Execution views are a good means to illustrate the connection of the various hardware elements on the hand, and the distribution of the system onto different machines on the other hand. The easiest case, as it is often the case for small systems, is that every runtime entity is operating on the same machine in the same process, accessing only one shared memory area. But as most software systems are complex, communicating over networks or the Internet, it is crucial to know about which parts run on which hardware. Runtime Entities The basic elements are the runtime entities (for example, threads, process, and objects) and their relation to each other, expressed via communication paths. A run-time entity represents a running instance of a module or a code group. Communication Paths Two run-time entities can interact with each other via communication paths. A communication path can be a method invocation, a procedure call, or anything else that connects two run-time entities. Two run-time entities can influence the behavior of each other by passing messages, or changing the internal status. The implementation of a run-time entity specifies the limit and the range of the state and of the behavior. Communication Mechanism Two runtime entities communicate over a communication path, using a communication mechanism. There are a lot of different possible mechanisms, so the following list contains only some important ones: • TCP/IP: This protocol realizes the communication via the Internet. • RPC: Remote procedure calls are used to forward requests and responses in local area network. • DCOM: This technology is responsible for establishing the communication of distributed components. • CORBA: Another technology used for communication purposes. Source components are responsible for the implementation of these mechanisms. The code architecture view can thus provide information about the correspondent parts of a communication path that is which source components are sending requests, and which are replying with a response. For instance, in Java systems, the so-called stubs can be localized and be connected to the communication path. Dynamic information can then report about the usage of the specific paths.
Copyright © Fraunhofer IESE 2004
49
Typical Views and their Recovery
Platform element Each runtime entity contributes to a platform element. Platform elements are operating system dependent container, where runtime entities run in or share data. A runtime entity can be among others: • • • • • • •
Process Thread Queue Shared memory Socket Shared Library File handle
A platform element is composed of at least one run time entity. To figure out, which platform elements are in use, instantiated or working together, the reverse architects can monitor the software system and the operating system. The evaluation of this data will show for instance, if there were new processes created because of the software system, or if some memory was allocated by the system. Furthermore, the runtime entities responsible for those actions can be identified when combining the execution view with dynamic aspects. The code view also indicates as well the usage of platform elements. For example, Java threads can only be instantiated in distinct, predefined ways. If the reverse architects are able to find those neuralgic spots, they can reveal the connection between the platform element and a source component. Platform element Platform resources consume platform elements, which means that every element needs at least one resource for its mode of operation, but it may be also split over several resources. Among others, the following items are considered as platform resources: • • • • •
CPU Memory Timer Address Space Semaphore
Platform resources are directly linked to the hardware of the machine, where the software system is running on. To encounter different hardware combinations, configuration files often specify system properties. 4.5.3
Recovery Techniques Execution views show how a software system is connected to the hardware of the machine where it is running on. Different hardware combinations can lead
50
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
to different execution views, when the software system is configured in a way that considers hardware properties. Otherwise the system may be implemented in a way that minimizes the hardware influence. For instance, if a software system runs just in a process, it will do so in any case, whether the underlying machine is a multi-processor system or not. The dynamic parts of the execution architecture view and the code architecture view can help to disclose the source components related to the execution view elements since some of the elements are always implemented in a certain, consistent manner. The execution view influences the module and the conceptual view on the software architecture. When recovering the deployment, it will make a great difference, if the reverse engineers analyze a system running on a multiprocessor hardware platform, or if it is a web application distributed on several servers, using some kind of middleware to access an underlying database management system. 4.6
Behavioral Views Behavioral views connect statically extracted information to dynamically executed scenarios or use cases. This view captures how the structural elements of a software system interact for given scenarios. In contrast to the execution view, which is focused on the mapping of runtime entities to hardware and operating system mechanisms, the behavioral view concentrates on the interactions of structural elements on a more abstract level. The interactions can be used to answer the following questions: • How do architectural entities interact in order to fulfill a specific task? • Which architectural entities are unique to a scenario and which are involved in several scenarios?
4.6.1
Meta-model The behavioral view captures how conceptual components and modules interact during the execution of scenarios or use cases. Behavioral views describe the dynamic behavior of a software system on a high level of abstraction by reporting which conceptual components and which module are involved when realizing execution scenarios.
Copyright © Fraunhofer IESE 2004
51
Typical Views and their Recovery
Viewpoint
Code Architecture Viewpoint
Concerns
• Scenarios • Use case • Conceptual component interaction • Module interaction
Meta-model
C o n c e p tu a l V ie w :: C o n c e p tu a l C om ponent *
*
re q u ire s
*
0 ..1
in te ra c ts S c e n a rio U se C a se 0 ..1
*
M o d u le V ie w :: M o d u le
re q u ire s
* *
Representation
in te ra c ts
• Activity diagrams • Collaboration diagrams
Table 6:
4.6.2
Process
cf. Design Activities for the Behavioral Architecture View
Analyses
-
Behavioral Viewpoint
Elements of the Behavioral View In the following subsections, we will discuss the different elements of the behavioral architecture view and how they can be recovered from existing systems. Scenarios, Use Cases Scenarios, sometimes called use cases, describe a certain behavior of the system by capturing how the static elements of the conceptual architecture view (i.e., conceptual components) or the static modules of the module view interact in order to show the activities and the order in which a scenario is realized.
4.6.3
Recovery Techniques Various techniques recover behavioral information from existing systems, each following the same principle: • Instrumentation of code • Execution of scenario
52
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
• Abstraction or mapping First, the source code of the software system is instrumented in order to gather information about which source code was executed during a scenario. Conceptual components or modules are then mapped to the executed source code and the information is lifted to the respective level of abstraction. There are some techniques following the general principle introduced above: Riva in [RiRo2002] proposes the recovery of architectural views of a system by combining a static views containing hierarchical decompositions and dynamic information showing interactions between the static entities and by horizontally and vertically abstractions by grouping entities. Riva’s approach is using message sequence charts for visualization and manipulating behavioral views. Jerding introduces a technique to extract components and connectors using a Solaris tool called ISVis as described in [JeRu1997] but it requires a manual mapping of components to code. This approach is able to detect interaction patterns, i.e. a recurring ordered list of interactions, and to match all instance of such a p 4.7
Build-Time Views The build-time architecture view [TuGo2001] captures complex build-time properties that are not explicitly addressed by the view set proposed by Hofmeister, Nord, and Soni [HNS2000]. Especially large-scale software systems often exhibit conditional compilations, multiple configurations, code generation and dynamic behavior in the build process. The build-time architecture view pulls together the interrelationships in the build process between the source components of the code architecture and the runtime entities of the execution architecture view. Thus, the impact of build-time mechanisms and build-time information on a software system are captured in this view.
4.7.1
Meta-model Build-time architecture views isolate the build process related aspects of a software system. This view is strongly related to the code and the execution architecture views since it captures how the software system is build from collections of source components into executables, which then instantiate runtime entities. The build-time architecture view models configurations and build-time properties extractable from build-time artifacts, such as build scripts (e.g., make files, project files, shell scripts), source and object files, and configuration choices. Therefore, it explicitly captures the build process that allows repeatable build procedures. The build-time view provides information about the compilation dependencies among manually implemented and automatically generated source components, the time-sequence of compilation steps, and what configuration alternatives (if there more than one) were chosen to produce certain running version of the system.
Copyright © Fraunhofer IESE 2004
53
Typical Views and their Recovery
Viewpoint
Build-Time Architecture Viewpoint
Concerns
• Build management • Build time reduction • Build tools • Configurations • Code generation • Compilation procedure
Meta-model
B u il d - T im e : : M e c h a n is m * u s e m e c h a n ism * C o d e V ie w :: S o u rc e C om ponent
1
b u il d
B u il d - T im e : : S o u rc e C om ponent
*
*
depend on
1
B u il d - T im e : : In fo rm a tio n
1 li n k * B u il d - T im e : : E x e c u t a b le 1 in s t a n tia t e * C o d e V ie w :: E x e c u ta b le
Table 7:
1
in s t a n tia t e
*
E x e c u t io n V ie w :: R u n t im e E n t i t y
Representation
• Structural model (e.g. UML class diagram with appropriate stereotypes for implemented and generated classes)
Process
cf. Design Activities for the Build-Time Architecture View
Analyses
-
Build-Time Architecture Viewpoint
The build-time architecture view used here is for the most part based on the work of [TuGo2001]. It consists out of the following elements: • • • •
Build-time information Build-time mechanism Build-time source component Build-time executable
The build-time architecture view shows the software system during the build process that is information on how the system is actually built, and what mechanisms are applied to produce the system. Fact extractors can perform static analyses to collect data and information from the build environment.
54
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.7.2
Elements of the Build-Time View In the following subsections, we will discuss the different elements of the buildtime architecture and how they can be recovered from existing systems. Build-time information The build-time information determines what parts of the source components are actually used to build the system. For instance, two conditional compilations with different compiler flags may lead to two different instances of the software system with only a small, shared group of source components. In order to keep track with those build alternatives, it is important to record the build-time information and the consequences of that information (i.e., what source components are included for a certain conditional compilation). Based on the build-time information and the mechanisms the software is produced. To recover build-time information, lexical pattern matching can be applied to analyze configuration files [KnPi2003]. Another possibility is to use make file analyzers (or parsers) to extract information about the build process. Data flow analyses can help to trace the consequences of a certain configuration or a conditional compilation throughout the whole build process. Build-time mechanisms A build-time mechanism is an element that is used to build the software system by transforming source components into executables with consideration of the build-time information. An overview of common build-time mechanisms to create variants of a software system can be found in [MuPa2002]. Techniques like frame processing, conditional compilation, parametric polymorphism or refinements are described there. Other build-time mechanisms are compilers, cross-compilers, make files, interface builders (e.g., to stick together different programming languages) and code generators (e.g., lex and yacc, which generate scanners and parser for compilers, or code generator which add platform-specific code to make a system buildable). Different build-time mechanisms may overlap and interfere each other. In these cases, it is necessary to know which mechanism caused which effect. Interviewing the developers responsible for the build process can reveal buildtime mechanisms as much as tracing of the build process can expose other mechanisms. An analysis of make files can often point to further mechanisms applied in the build process (e.g. if a code generator is used, then it will be called at certain points during the build process). Build-time source component Static source components as modeled in the code architecture view can be dynamically changed or transformed into build-time source components. Furthermore, other build-time source components can be generated during the build process. Responsible for these actions is the build mechanism and the content of the build-time information.
Copyright © Fraunhofer IESE 2004
55
Typical Views and their Recovery
Such modifications and creations of build-time source components are only visible in the build process. Build-time components can extend the structural containment perspective of the code architecture view. The fact extraction techniques described in section 4.4.2 can be applied to build-time components as well. Build-time executables Similar to the code architecture view, build-time components are linked to build-time executables that can then be instantiated during the execution of the software system. In contrast to the code view, the decision which buildtime executables are created by build-time mechanisms is made dynamically by the build-time information. The fact extraction techniques described in section 4.4.2 can as well be applied to build-time executables. 4.7.3
Recovery Techniques Build-time architectural views can close the gap between the code architecture and the execution architecture view by explicitly describing the build process and its elements. Through modeling which source components are modified, transformed or created during the build process the understanding of how the build works is facilitated. In product family engineering build-time mechanism are quite frequently used to implement variations between different product instances. The identified mechanisms for an individual system can be propagated to the design process of the reference architecture. Experiences and problems with the chosen buildtime mechanisms and a possible reuse of these mechanisms in other contexts may help to migrate from the already existing systems towards a product family. Furthermore, the already field-tested build time mechanism can be exploited to realize variation between the instances of a product family. Recovery of build-time views is therefore an important task when migrating existing, individual systems towards a product family.
4.8
Feature Views Features are prominent, distinctive user-visible aspect, quality, or characteristic of software systems. The feature-view as described here captures the following: • Map between a feature and source code artifacts that realizes it. • A feature model that captures the features and its dependencies to other features in a system.
56
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.8.1
Meta-model The meta-model of the feature view is shown in Table 8. Viewpoint
Feature Viewpoint
Concerns
• Features in a system • Map between components at source level to features • Helps in understanding dependencies among features Map between conceptual components and features
Meta-model
Conceptual Component
* realize
* require
* Source Component
*
* 1..* *
*
* Feature
*
refine
Representation
• Component Map • Feature Model
Table 8:
4.8.2
Process
-
Analyses
-
Feature Viewpoint
Elements of the Feature View In the following subsections, we will discuss the different elements of the feature view: Feature Features represent functionality visible to the user of a software system. A feature is realized with the help of conceptual components and on the implementation level, implemented in source components. When regarding features, there are two main ways in order to document them: • Component Map: The component map contains information about the given set of features and set of components that implement them. Components can a module or set of files, functions and global variables. The component map supports the architects and developers to understand how cer-
Copyright © Fraunhofer IESE 2004
57
Typical Views and their Recovery
tain features are realized and which implementation parts contribute to a feature. Moreover, the maintainers know where to start bug fixing (often bug reports from customers name a concrete feature that is the starting point for the debugging tasks). • Feature Model: A feature model captures the relationships among the features in a system. For modeling the relationships among the features, we use the relations namely refine, and require as proposed in [FFB2002]. These relations are briefly explained below. Two distinct features have relations among each other. We distinguish between the following types of relations: • Refine Relation: The purpose of the refine relation is to describe the features of a system stepwise at a lower abstraction level with additional details. Refinement is a more detailed description of (the services of) a feature. • Require Relation: Many features in a product are connected to each other. The require relation expresses a dependency between two features: one of them needs the presence of the other one in the same product, for its correct operation. 4.8.3
Recovery Techniques Using the existing source code and the technique described in section 5.3, the map between feature and set of source code artifacts that implements this feature can be captured and documented. Using user-documentation and the CAVE described in section 5.8, features that are presented in a given system can be recovered.
4.9
Data Structure Views The data structure view captures the key data entities of the software system in a hierarchical composition. This view focuses only on data structures and their specific context so that the data structures of a system can be understood on a high-level of abstraction as well as from a low-level that is atomic perspective of the data structure. The explicit concentration on data aspects enables this specialized view to complement the above mentioned other views. The data structure architecture view is related to the code architecture view because the source components realize data structure entities. Also, the runtime entities of the execution architecture view are instantiated by them. Since modules are the entities in which computation takes place, they may be traced to the data structure groups, as well as interfaces typically provide access to data structure entities. In order to have persistent data, data structure storages may have to be set up and maintained.
58
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.9.1
Meta-model The responsibility of data structure views is to captures the key data entities of the software system in a hierarchical composition. Data and how it is processed are the essential characteristics that basically distinguish one software system from another. For this reason, data plays a very important role in the success or failure of the system. Within a software system, data is represented with the help of data structures, so the data structure view focuses only on data structures and their specific context. The goal of this view is to understand the key data structures on a high-level of abstraction as well as from a low-level perspective. Viewpoint
Data Structure Viewpoint
Concerns
• Database management • Data modeling • Database Analysis • Database Design
Meta-model
Representation
• Data Models (e.g. UML structural diagrams or Entity-Relationship (ER)-Model diagrams)
Process
cf. Design Activities for the Data Structure Architecture View
Analyses
-
The data structure architecture view is comprised of the following meta-model entities: • Data Structure Group • Data Structure Entity • Data Structure Storage
Copyright © Fraunhofer IESE 2004
59
Typical Views and their Recovery
A data structure group contains related data structure entities. Data structure entities may be composed of several other entities. In this way, existing data structure can be combined to form more complex data structures. 4.9.2
Elements of the Data Structure View In the following subsections, we will discuss the different elements of the data structure architecture views and how they can be recovered from existing systems. Data Structure Group A data structure group is a collection of related data entities. The relation among data entities may be logical (i.e., they represent related or similar data) or physical (i.e., they are located in the same data storage, or processed on the same machine). Each data structure group consists of one or more data structure entities. Data Structure Entity A data structure entity is a container for data that is it may be composed of other data entities. At the lowest level there are atomic data structure entities. In the following subsections, we show examples on how to extract special types of such data structure entities. Data Structure Storage A data structure entity has usually one or more places where it is stored, data structure storages. Storage elements can be databases, or files. Another option is to represent the data with concepts like abstract data types and abstract data objects.
4.9.3
Recovery Techniques Data structure views capture a key element of the software system. They show the main data structures, which are important for understanding the system as a whole. When revealing the context of such data structure entities (i.e., how a piece of data is used and processed), these views help to recognize possible side effects of changes and to learn how the system works. In product family engineering, we can assume that the instances of a product family share a common set of core data structure entities. When analyzing several individual and similar systems, then the (potentially) different data structures have to be merged or adapted to one common structure, in order to define the data structure view for the reference architecture of the product family. Furthermore, access and manipulation routines of the data structures may be integrated in the product family as well.
60
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
Reconstructing different views on the database models performs recovery of data structures in databases. [Blah1998] introduces a reverse engineering process for databases consisting out of three phases. The first phase, implementation recovery, deals with the context of the database, in which the software system as a whole operating and with the extraction of the atomic database entities. The second phase is called data structure extraction, where the goal is to resolve the relations between the atomic entities. In the third phase, the data structure conceptualization, the data structure views are interpreted, refined if necessary, and abstracted if possible. In section 5.10 we will present a technique that captures data structures implemented with the quite frequently used concepts of abstract data types and abstract data object. 4.10
Integration of the Views The four views of the Siemens’s view set (i.e. the code, the module, the conceptual and the execution architecture view) and the additional views introduced above (i.e., the build-time, the feature, the behavioral, and the data structure architecture view) complement each other. Each view contributes to the complete description by revealing certain aspects of the software system not captured by another view. This section illustrates the interplay among a number of architectural views to together describe an architecture completely. We first motivate the integration of architecture views by discussing the different uses of integration information. Then, we describe different techniques for documenting integration information.
4.10.1 Motivation A view-based description of a software architecture consists of a number of overlapping views that together capture all relevant aspects of the described architecture. The overlap is due to the fact that the same architecture is described from different perspectives in the different views and that the views share certain concepts. For example, the execution view illustrates the dynamic interaction among the components from in the module or the conceptual view. The completeness and consistency of the view sets used for architecture descriptions is crucial for view-based software architecture descriptions. Such a set is complete if the resulting architecture description covers all important aspects of the architecture. The number and the complexity of the different views used to represent a product family architecture directly affect its creation and evolution. When the product family architecture is created, all views must be created and when it is evolved, potentially all views representing it have to be changed.
Copyright © Fraunhofer IESE 2004
61
Typical Views and their Recovery
This is a strong argument for keeping the number of architectural views that make up a product family architecture description as low as possible. Nevertheless, the reduction of the number of architectural views is not only beneficial. The concentration on certain concerns in one architectural view makes it easier to understand and, therefore also, to evolve it. The more views are used, the less concerns are “packed” into each of the views. Of course, the handling of the information on how the different views are related becomes more expensive. But as Jackson pointed out [Jack1990], there is no way to avoid the information on the relations among views; it will be handled implicit within the views when their number is low. From this perspective, the number of view should not be too low. So, there is trade-off between having more views to reduce the complexity of the evolutional changes and having fewer views to keep the number of changes low. Therefore, we argue for customized views that are tailored to characteristics of the environment, in which the product family architecture is developed. The consistency of views is an issue that is especially important when projectspecific view sets are used. Then, the consistency must be ensured for the customized view sets. The most common problems when using multiple views are confusion and conflation. Confusion means that a concept appears with different names in distinct views. Conflation means that two different concepts appear with the same name in distinct views. These problems are shared by all approaches for view-based description of software (like, for example aspect oriented approaches). A proper integration of the view set used to describe a software architecture can avoid consistency problems. In order to provide a coherent set of architectural views that complement each other, the overlapping views must be explicitly related to each other. This is necessary to make sure that the separately modeled views together describe the software architecture. There are several possible uses of information on view integration one being the possibility to understand the complete software architecture. Another important use is the explicit documentation of consistency information as a basis to perform consistency checks on the different architectural views. Traceability, that is the ability to describe and follow the life of a concept throughout the different stages of software development, is another important use of view integration information. View integration information must also be established when new views are added to an existing set of architectural views. The next section describes a number of approaches for integrating architecture views.
62
Copyright © Fraunhofer IESE 2004
Typical Views and their Recovery
4.10.2 Integrating Architectural Views There are different ways to documents view integration information that is information that captures the relationships among a number of architectural views. Which technique is used depends on the use of the integration information. Kruchten proposed scenarios as a means to describe how the different architecture views interact to fulfill certain requirements [Kruc1995]. This rather informal way of documenting integration information is, however, inappropriate here since the architecture description is used as interface between design and reverse engineering activities and must, therefore, be unambiguously. This can only be achieved if the views and the relationships among them are defined properly using meta-models. Build-Tim e:: M echanism * use m echa nism * C odeView :: Source C om ponent
1
build
Build-Tim e:: Source C om ponent
*
*
depend on
1
Build-Tim e:: Inform a tion
1 link * Build-Tim e:: Executable 1 instantiate * C odeView :: Executable
Figure 9:
1
instantiate
*
ExecutionView :: Runtim eEntity
Build-Time View Meta-model
The examples of view definition given in the sections above use cross references to denote the fact that a concept defined in one view is used is another one. Figure 9 shows the meta-model of the build-time view. The figure shows that build-time views use four concepts defined for build-time views, namely source component, mechanism, executable, and information. Additionally, three concepts are imported from other view definitions; these are runtime entity from
Copyright © Fraunhofer IESE 2004
63
Typical Views and their Recovery
the execution view, as well as executable and source component from the code view. The cross references used in the meta-models enable the description of relationships among different views, the relationships between the view elements can expressed as well (e.g., the instantiation relationship between executable from the build-time view and runtime entity from the execution view). A drawback of this technique is the fact that the integration information is distributed over the different view definitions. To avoid this, special diagrams can be used that capture global integration information in a single diagram. These diagrams are usually used in addition to the cross reference techniques to highlight the integration of views. Such a specific diagram that captures the relationships among a number of views is the view connectivity diagram. A view connectivity diagram abstracts from the information contained in the metamodel of a number of views by concentrating on the elements that connect the views. Additionally, a view connectivity diagram makes the relationships among the views explicit using typed associations among model elements. Figure 10 shows a partial view connectivity diagram that contains the parts related to the build-time view (the part also shown in Figure 9). Executable
BuildTime
Execution
Figure 10:
Runtime Entity
Component
SourceComponent Executable
View Connectivity Diagram for the Build-Time View
View connectivity diagrams can also be used to capture all view in a view set. A view in a view connectivity diagram is a set that contains model elements. Therefore, the relationships can thus be defined as relations among these sets. Views are depicted in view connectivity diagrams as stereotyped classes. View elements are smaller boxes attached to views, for example the model element “runtime entity” for the execution view. The view elements that are depicted in a view connectivity diagram must be defined in the meta-model of the respective view. The composition relationship is a stereotyped association that is directed. The type of the relation () is defined as follows (Execution is abbreviated to E, BuildTime to BT, rte and exe stand for runtime entity and executable, respectively, oclType is a predicate taken from the Object Constraint Language (OCL) [WaKl1999]): instance_of = {(exe,rte) | ∀ exe ∈ BT ∃ rte ∈ E : exe.oclType = rte}.
64
Copyright © Fraunhofer IESE 2004
Code
Typical Views and their Recovery
The integration of different architectural views can be documented using view connectivity diagrams. So, the views that have been separated to concentrate on specific aspects of a software architecture can be put together again. This section introduced a number of architectural views, as well as techniques to integrate them. In the following section, request-driven reverse architecting is presented that provides information from existing systems for the different architectural views.
Copyright © Fraunhofer IESE 2004
65
Request-driven Reverse Architecting
5
Request-driven Reverse Architecting
Architectural analyses contribute to the design of reference architectures by exploiting information from existing software systems. This section will introduce a couple of such analyses, which reveal crucial information especially in the context of a product family. Existing artifacts are processed into facts, which build then the foundation for analyses on top of them. The result of each analysis is a (architectural) view or parts of a view, either new, modified or augmented views. Request
Artifacts Artifacts
Figure 11:
Fact Extraction
Fact Base
Analysis
Views Views
Request-driven Reverse Architecting
An analysis is always initiated by a concrete request that has a specific goal and an expected result for the following analysis. Based on the request it is decided which kind of analysis is appropriate with respect to the requirements. It is beneficial to have a catalogue of standard, parameterized analyses from which to choose. Analyses results in a response that is returned to the origin. By this request-driven mode of operation for performing architectural analysis, this reverse architecting approach achieves the following goals: • Efficient and goal-oriented execution of analyses because concrete requests provide the determining factors and the basic conditions of the architectural analysis. • The requested information is supplied on demand. • By providing a catalogue of predefined, parameterized analyses the approach can react flexible to different types of requests. • The catalogue of analyses is extensible so that new analyses can be added when appropriate. • Consideration of all available artifacts and sensible expert involvement where necessary leads to the reasonable utilization of available resources. The remainder of this section will introduce a couple of selected architectural analyses from our catalogue in detail, which are especially beneficial when mi-
66
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
grating towards a product family, but can also be used when analyzing just a single system. The sub sections are structured as follows: the purpose explains why the request-driven reverse architecting technique is relevant and useful in the product family context and which view it produces. The mode of operation of each analysis technique is described in the realization part of the sub sections. For each technique, a short summary will be given paraphrasing the key points of the technique. 5.1
Architecture Comparison
5.1.1
Purpose Based on the architectural descriptions of prior systems (or at least partially descriptions containing most relevant information), we can start the architecture comparison. The goal of a comparison is to learn about different solutions applied in the same domain, to identify advantages and drawbacks of the solutions, and to rate the solutions with respect to scenarios. By achieving these goals, architecture comparisons contribute to the fulfillment of the quality goals. Results can be used in the design process of reference architectures. Architecture comparisons are initiated by requests, returning partial views as responses. The approach is expert-driven, iterative and highly context-sensitive to the request posed.
5.1.2
Realization In order to conduct an architecture comparison, we propose as shown in Figure 12 the following four major steps in our iterative approach: request analysis, system selection, comparison, and recording of the responses.
Request
Request Analysis System Selection Composition Response Recording
Figure 12:
Response
Architecture Comparison Process
We distinguish a couple of different high-level requests that can initiate the architecture comparison process: • What are common or variable features?
Copyright © Fraunhofer IESE 2004
67
Request-driven Reverse Architecting
• What are patterns used and what is their impact? • What is the solution to a requirement or a design problem? • How is a given quality (e.g. performance, maintainability) addressed? • What are alternatives to architectural means? • What are the consequences of an applied strategy? • What are the difficulties and drawbacks associated with the use of applying a strategy? In most cases such high-level requests are not answerable at once, therefore we start with a request analysis that aims at producing fine-grained requests that then are processed further. 5.1.2.1 Request Analysis In the first step the request coming in from forward engineering is analyzed. It may be the case that the request is too abstract or too complex so that it is not possible to answer it immediately. Therefore it has to be analyzed with the goal of breaking it down into operational pieces that can be processed by the following steps. Each piece of a request will then be handled, and all pieces together deliver a response to the overall request. Some responses to concrete requests may lead to a stop of further processing, while others can result in new concrete requests. The analysis of requests has to be performed together by the product family architect and the reverse architect. Another activity during the request analysis is to prioritize the different finegrained requests in order to be able to schedule the execution of the analysis in a proper way. Therefore dependencies between fine-grained requests have to be revealed, as well as concrete decision points for termination and for assessing the successful achievement of the response have to be defined. The request analysis is performed in close cooperation with the system experts. The product family architect breaks down the requests together with the reverse architect. The break-down steps can be supported by questionnaires, if the request deals with a typical problem, otherwise this step is conducted in discussion or interview sessions. All stakeholders involved in this step should agree on a common understanding on what is the goal, and what other requests have to be posed in order to achieve the ability of giving a response to the initiating request. An example for breaking down an abstract request is the following: a product family architect may put the abstract request what are the experiences with the model-view-controller (MVC) pattern in the existing systems. The request is not
68
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
answerable at once, on the contrary, it has to be refined to be processed further. Resulting other requests that are derived from this one may be among others the following: • Is the MVC pattern applied at all? • In which system is the MVC pattern applied? • Why did the developers choose not to use the MVC pattern? • What are consequences of using MVC with respect to certain criteria? • Do the pattern instances among the different system aim at solving the same problems? Are these instances comparable at all? • Was the application of the MVC pattern successful with respect to certain criteria? The examples mentioned above illustrate what other requests may be derived from the initial one. The first follow-up question can terminate the architecture comparison analysis, because if there are no instances of the MVC pattern present in the system, why should one continue to analyze them. Other questions like the second one may lead to even more requests, in this case which classes implemented the model or the views, or is it a real MVC pattern or are model and controller combined somehow. The request analysis and the direction that resulting refinements take are highly depending on the context, in which the request is set up. There is no general rule how to break-down such a request, but the goal has to be a fine-grained request that is answerable or that provides a decision point where to determine if to continue or to terminate. 5.1.2.2 System selection In this step, the reverse architect and the experts select the systems they will compare. The goal is to select few systems that will bring most insights to answer the request from the product family architect. Each time the architect provides one or more requests, a new system selection is performed. This section describes the main factors, which influence the systems selection. These factors are the style and stage of reference architecture design, the availability of experts, the type of request from the architect, the presence of information about the systems and the effort available for comparing systems. The two main styles of reference architecture design based on existing systems are characterized by the presence of a dominant system or its absence. An existing system is considered as dominant in the target domain of the product family if it will be used basis for the product family. This means that most of the current architecture and design of the dominant system will be transferred into
Copyright © Fraunhofer IESE 2004
69
Request-driven Reverse Architecting
the product family architecture and that a great amount of the source code can be reused as well. The goal of the architecture design in the presence of a dominant system is to correct weaknesses identified in its architecture and to generalize its architecture to support the product family. In this context, the ideal system to compare against the dominant one offers a different solution that is better (i.e. it corrects a weakness, is more general, or is complementary) and could be easily integrated into the dominant architecture. When selecting system candidates who should be compared against the dominant one two issues have to be taken into account. On the one hand, systems that differ from the dominant one are appealing, because they are more likely to offer a different solution. On the other hand, they should not be too different, otherwise, it is likely that the integration into the dominant architecture is not easy. For this reason, it is important to strike a balance. In the absence of a dominant system, the goal of the architecture comparison is to collect a wide range of successful solution patterns together with the experience gathered by using them and use them as input to define the reference architecture. From this point of view the selected systems should offer this wide range of answers to the architect’s request. However, the solution patterns used in the available system are not always known before they are selected. For this reason, the reverse architect and the experts often choose the systems based on their quality and the similarity among the systems as a whole. The stage of the reference architecture design also affects the selection of systems: At an early stage of the design, the architect is usually more open to consider a couple of different solutions, which might require a higher integration effort and reworking. As the architect comes closer to completing the architecture design, he is less likely to accept solutions that require rework. For this reason, it is important to select systems that are similar or that provide a solution likely to be compatible. One of the most important factors for the system selection is the availability of system experts at the time where the comparison should be performed. This factor is particularly important, because the comparison strongly relies on the expert knowledge of the system (e.g., rationales) and his experience with the system (e.g., the consequence of a decision on the system maintainability). The type of request influences the systems selection in different ways: When the request involves automatically identifying pattern instances that can be applied to all systems before the selection, then the reverse architect and the experts use the information about the presence of these patterns in the systems to perform the selection. For a request that involves evaluating which systems address a given quality well, the system expert can directly identify systems whose main success factor are related to this quality.
70
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
The presence of information about a system also plays an important role in the system selection. The availability and completeness of the views needed to perform the comparison is particularly important. The availability of data about the development and maintenance of a system (e.g., effort, defect information) provides a way to compare the quality of the various systems. For requests, which can be directly mapped to the source code with limited effort, this data provides indices for the quality of the pattern implementation and its use. Data about user satisfaction and market share can also be used as input for systems selection. The effort available for comparing systems and the effort required to analyze each individual system determine how many systems can be evaluated. When the system experts are available also to identify the success factors and the critical aspect of their system, then they should do so. This information can facilitate the system selection in general, but is particularly important when the experts are not taking part in system selection at later stage. The actual system selection is a human-based activity that depends strongly on the context. There are no fix rules how to combine the different factors. However, the reverse architects have a better chance to select the right system if they consider the factors described above. 5.1.2.3 Composition The third step in our architecture comparison approach includes the comparison itself and composes the information gained from several individual systems together. In order to be able to make a statement about two different software systems, they must have something in common and their architectural description, the basis for the comparison has to be on the same level of abstraction. In some cases this requires effort in refining the architecture description of one or more of the selected candidates. There might be cases where it is economically unfeasible to refine all descriptions. In these cases it can be appropriate to abstract all systems to one matching level. Some other requests may require additional information that is not yet part of the architectural documentation of the existing systems. For example, there might be the need for a new architectural view or a specific metric not yet calculated but with certain expressiveness with respect to the request posed. In this case other reverse engineering analysis have to be started in order to exploit the requested information from the given software systems. The possible refinement of the existing systems can be regarded as the preparation phase of the comparison step. The goal of the preparation is to be able to
Copyright © Fraunhofer IESE 2004
71
Request-driven Reverse Architecting
work with comparable views. Which views are needed, is determined by the request itself. We then propose one of the following main activities in order to evaluate alternatives presented in the selected systems. These activities aim at collecting the same type information for each of the individual, selected systems. It is possible to apply different or more than just one activity for evaluating the alternatives depending on how well such an activity can be conducted for a system. The activities are in detail: • Simulation: Each existing software system is examined within a simulation. The simulation offers conditions in which the reverse architect is able to measure or to observe those characteristics of the system, which are directly correlated to the request. For example, possible deadlocks can be detected with the help of simulation. • Instrumentation: Each system is instrumented and profiled, so that information can be collected during runtime. Only the source code directly related to a request should be instrumented to emphasize the coherence between the output gained during runtime and the request. The output has then to be analyzed with respect to certain characteristics or properties that are associated with the request. • Prototype: In this case we consider prototyping not as implementing a certain aspect of the system but as adjusting the existing implementation to the given request. This means that only the aspects related to the request are left in the system, and everything else is removed. This procedure works only if the aspects related to the requests can be more or less isolated from the rest of the system. The benefits are that prototypes can be easily modified in order to get further information or a deeper understanding. It is also possible to play around with a couple of variations of the prototype. The prototypes of the different systems are then executed one after another. During execution of each prototype, observations are made, data is collected and measurement can be taken. • Scenario evaluation: The existing systems are analyzed with the help of different scenarios. To execute such kind of analysis, scenarios related to the request are conducted with each of the existing software systems. The behavior and the characteristics, how a certain scenario is accomplished, are recorded. For each of the systems, feedback from the experts and other stakeholders helps to identify risks with respect to certain criteria, to find out about rationales of applied solutions, and to learn how a system addresses certain requirements. The scenario evaluation by involving experts and their experience is the preferred way of comparing the individual systems, since it is able to gather logical information reported by human beings.
72
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
Depending on the type of the request, an evaluation technique may be better than another one, so that the decision, which one to chose, is highly contextsensitive. However, the evaluation of alternatives implemented in different, existing software system will produce information about only one individual system at a time. Therefore the next activity within the composition step is the creation of a joint abstraction, spanning the product family space and leaving the scope of individual systems. We propose the following means to do this: • Comparison table: A comparison table lists the different, individual systems selected as columns and the criteria related to the request as rows. The responses for each system (or the subjects of the analysis) are then filled into the proper cells. The reverse architect and the product family architect can now reason about advantages and disadvantages of the different alternatives. • Partial rankings: In some cases, it may be possible to rate the individual systems with respect to certain criteria, although it often is only a relative rating like system A is better than B. These partial rankings help to analyze what the best and the worst solutions are, and in ongoing iterations to answer what were reasons for the decision to do the implementation in the way it was done. • Product family views: Product family views cover al selected systems and focus on the whole product family. They contain information that considers more than one system, for example, for presenting commonalities among systems, it is important to know what are common parts in different systems (i.e., what was found in all of the selected systems). The product family aspects presented with the above-mentioned means contribute by bringing information about variabilities and commonalities, applied solution strategies, field-tested experiences, possible reuse candidates, consequences and background information to the response of a request. When there is a high amount of information within a response, it might be necessary to package and condense it to its core in order to not overstrain the stakeholders and to avoid hiding relevant information in overcrowded responses. 5.1.2.4 Response Recording The last step in the architecture comparison approach is responsible for recording the packaged information of the responses together with the requests that initiated the analysis. If available, integration and usage information of the gained response are stored as well. Another aspect to be included is what systems were considered in the analysis and the reasons for their selection.
Copyright © Fraunhofer IESE 2004
73
Request-driven Reverse Architecting
The recording supports further iterations in documenting decisions made at an earlier time. Furthermore, when there are similar requests to the current one, the responses might be similar as well. In order to benefit from this characteristic, it is important to record the context of an iteration after is has been conducted. 5.1.2.5 Architecture Comparison Example This section describes an example how architecture comparison can be used in order to gain information about existing systems. Let’s assume that there are five different individual systems all implementing a printing feature. The systems vary in the supported file format (e.g., pdf, ps, ascii, or graphical files), and in the operating systems where printing should be possible. When migrating towards product family engineering, there was the requirement that the printing functionality for PDFs should be a commonality, and therefore the goal is to reuse one of the existing implementations. The high-level request asks which implementation of the printing features is the best with respect to the required formats and reusability. The requests analysis produces fine-grained requests: • Do all of the systems implement the printing feature? • Which systems support what formats? • Where are the different formats implemented? In the system selection step, there were three systems selected. Two were left out, because one was considered as too old and too degenerated, and for the other one, no experts were available. Then the architectural descristimptions were refined with respect to the questions mentioned above. After some iteration, we are able to produce a comparison table as shown in Figure 13. It contains information about which classes implement the printing feature for each system. An empty cell means that the respective printing feature is not present for this system. The figure shows that in system 1, the different formats are connected to each other, because they share a couple of common classes. In system 2, the implementation is less connected than in system 1, the two different formats share only one class. Another fact is that the number of implemented classes in system 3 is smaller than in system 1. System 3 requires only one class to realize the PDF printing functionality, and the other printing formats are implemented in other classes, so that PDF is independent. PDF PS ASCII Text Graphics Figure 13:
74
System 1 A, B, C, D
System 2 M, N
A, E, F B, E, G
N, O, P
System 3 X Y Z
Comparison Table
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
Returning this table as a response leads to further requests. The context of the printing features implementation for system 2 and 3 is analyzed in more details: • How flexible are the implementations? • What are their dependencies to other parts of the system? • What are the dependencies of the requests? Finally, the implementation of system 2 was chosen as the candidate to be integrated into the product family because when compared to the other candidate, it was considered as better one with respect to maintainability and flexibility of the implementation. 5.1.3
Summary Architecture comparison is an iterative, expert-driven approach in order to learn about alternatives solution strategies embodied in existing systems by setting up requests for information. During all the steps, feedback, learning effects, and insights gained with the help of other analyses are integrated into the responses to a request. This will influence already running and further iterations. The main goal is to mine existing software systems in order to create product family views that span over more than one system. The results of architecture comparisons are information that is useful for the design of reference architectures. The product family architect and the reverse architect are made aware of the context of applied strategies like dependencies, constraints, risks and consequences of the implementation. They learn as well about advantages and drawbacks of the individual systems in contrast to other, existing systems. Furthermore, rationales, applied means and decisions made are recorded. In short, architecture comparison contributes to the design of a product family by providing the possibility to learn from existing systems and by having the possibility to evaluate design decisions early within the individual systems, which will be the future product family instances.
5.2
Pattern Completion
5.2.1
Purpose A design pattern describes a problem and the design of a generic solution for that problem. Since design patterns are abstract entities, it is difficult to find instances of them directly in the source code. Often, the map between design patterns and source code entities are not documented. Locating the source code elements that implement the design patterns is helpful for understanding
Copyright © Fraunhofer IESE 2004
75
Request-driven Reverse Architecting
and also for reusing the pattern and its associated code in another system. Often the expert already knows about certain code elements that implement a part of the design pattern. Pattern completion is a process that uses the pattern definition and code identified by the expert in order to locate the rest of the code that contributes to the pattern. This subsection describes a process model for pattern completion; an example is also given to demonstrate the application of this process model to Model-View-Controller pattern. 5.2.2
Realization The process of pattern completion has the following four phases (see Figure 14): • Extraction • Partial map definition • Complete Pattern • Review Extraction This phase is required to fill the fact base with information about the software system. The fact base contains program entities (e.g., packages, classes, methods) and the relationships (e.g., inherit, call, read_attribute) among them. Partial map definition Using the pattern definition and the source code, the reverse architect defines a partial map from pattern elements to the source code. This map is partial in the sense that the reverse architect does not need to define a complete map but just gives a starting anchor point. This partial map is used as input to complete the rest of the pattern elements. Complete Pattern This phase completes the pattern by using the fact base and the pattern definition. Starting points are source code entities given in the partial map of the pattern That is, the source code entities that implement the rest of the pattern elements are identified and reported. This phase can be fully automated.
76
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
Pattern Definition
Source Code
Extract Facts
Fact Base
Partial Map Definition
Mapping
Complete Pattern Review Candidates Feedback
Figure 14:
Pattern Completion Process
Review Phase This phase is necessary because all the candidates identified in the previous phase need not be necessarily implementing functionalities related to rest of the pattern elements. The human expert reviews the candidates that are identified in the previous step. The partial map can also be redefined once he gets better understanding of the patterns at the implementation level. Pattern completion of Model-View-Controller: An Example The Model-View-Controller architectural pattern (MVC) divides an interactive application into three components. The model contains the core functionality and data. Views display information to the user, while controller handles user inputs. Views and controllers together comprise the user interface. A changepropagation mechanism ensures the consistency between the user interface and the model. The expert had partial understanding of portion of system that contributes to the model as well as the view of MVC. He defines the partial-map between the model of the MVC pattern and a set of classes he considers/believes to be part of the model. A similar mapping was defined between the view of MVC pattern and another set of classes. He wants to locate the controller automatically for understanding the MVC at the implementation level. The candidates for the controller of MVC pattern are identified automatically using the fact base, the pattern definition and the mapping defined by the reverse architect in the previous phase. The pattern definition says that the controller invokes methods of the model, the view, or both. All classes matching these criteria are considered as controller. Because there may be false-positives, the human expert reviews the candidates. Finally, the pattern instance is complete, documented and the
Copyright © Fraunhofer IESE 2004
77
Request-driven Reverse Architecting
reverse architect knows the implementation location. If necessary, the initially defined map has to be reworked because the expectations of the expert were wrong. This approach integrates such feedback iteratively until pattern instances are found completely documented. 5.2.3
Summary Design Patterns are an accepted architectural mean to solve typical problems and to achieve certain qualities. Since design patterns are abstract entities, it is difficult to locate instances of them in the source code. The presented analysis technique, pattern completion, helps to identify pattern instances based on a starting point and the pattern definition. Pattern completion results in documented pattern instances (i.e., the source code implementing a pattern is reported) and enables the reuse of such instances in other contexts.
5.3
Feature Location
5.3.1
Purpose In software maintenance and evolution, change request often involve modifying or adding specific features. Before we can start implementing the required feature, we must locate the implementation of the concepts in the source code. Feature location is a process that maps domain concepts to the software components. The input is the maintenance request, expressed in natural language using the domain terminology [EKS2003]. The output of the mapping is a set of components that implement the feature or concept, see Figure 15.
Domain level Request
Concept Location
Components implementing concept Figure 15:
78
Concept Location Process
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
Reusing the existing software assets helps to reduce the cost of migration to product-family engineering. Before reusing the software assets, one must locate feature in the existing source code. The approach described in this section [EKS2003] helps to locate source code elements that implements feature specific logic in a semi-automatic way. Terminology • A feature f is a realized functional requirement. Generally, the term feature also subsumes non-functional requirements. However, in this section, we consider only the functional features. • A scenario s is a sequence of user inputs triggering actions of a system that yields an observable result to an actor. A scenario is said to execute a feature if the observable result is executed by the scenario’s actions. • A component is a computational unit of a system. Components consist of an interface, which exposes the services offered by the component and the implementation of these services. • A subprogram is a procedure or function or method. Subprograms are lowest level of components. • The execution summary of a given program run lists all subprograms called during the run. The execution trace lists the sequence of all performed calls. • A feature-component map describes which components implement a given set of relevant features. 5.3.2
Realization This section describes a technique that combines static and dynamic analyses to identify the components implementing a set of features. Dynamic information is obtained from the execution summaries generated by a profiler for different scenarios and static information is obtained by parsing the source code. Beyond simply localizing all required subprograms, concept analysis [LiSn1997] is used to derive detailed relationships between features and executed subprograms. These relationships identify subprograms jointly required by any subset of features, classify subprograms as low-level or high-level with respect to the given set of features, reveal additional dependencies between subprograms, and help to identify the subprograms that together constitute a larger component during static analysis. The general process is as follows:
Copyright © Fraunhofer IESE 2004
79
Request-driven Reverse Architecting
• Identify set of relevant features F = {f1, …, fn} • Identify scenarios A = {S1, …, Sq} so that features in F are covered • Generate execution summaries (profiler). This step yields all required subprograms O = {s1, …, sp} for each scenario. • Create relation table R such that (S1, s1), (S2, s2), …, (Sq, sp)∈R • Perform concept analysis for (O, A, R) • Identify relationships between scenarios and subprograms • Perform static dependency analyses 5.3.2.1 Concept Analysis This subsection introduces the concept analysis with an example. Basic understanding of concept analysis is necessary for understanding the technique described here. Concept analysis is a mathematical technique that can be used to investigate binary relations. The binary relation in application of concept analysis to derive the scenario-subprogram relationships states which subprograms are required when a feature is invoked. Concept analysis is based on a Relation R between a set O of objects and set of attributes A, then R ⊆ O x A. The tuple C = (O, A, R) is called formal context. For a set of objects O ⊆ O, the set of common attributes σ is defined as: σ(O) = {a∈A | ∀(o∈ O)(o, a) ∈R} Analogously, the set of common objects τ for a set of attributes A ⊆ A is defined as: τ (A) = {o∈O | ∀(a∈A )(o, a) ∈R} The formal context for applying concept analysis to derive the scenariosubprogram relationships will be laid down as follows: • subprograms will be considered as objects • scenarios will be considered attributes • a pair (subprogram s, scenario S) is in relation R if s is executed when S is performed
80
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
Figure 16 shows an example relation. An object oi has an attribute aj if the row i and column j is market with an X.
o1 o2 o3 o4 Figure 16:
a1 X
a2 X
a3
a4
a5
X X X
X X X
X X
a6
a7
a8
X X
X X
X X
Example Relation Table
A pair (O ,A) is called concept if A = σ(O) and O = τ (A). For a concept c = (O, A), O is the extend of c, denoted by extent(c), and A is the intent of c, denoted by intent(c). Figure 17 shows the concepts for the relation in Figure 16. C1 C2 C3 C4 C5 C6 C7 Figure 17:
({o1, o2, o3, o4}, φ) ({o2, o3,o4}, {a3, a4}) ({o1}, {a1, a2}) ({o2, o4}, {a3, a4, a5}) ({o3, o4}, {a3, a4, a6, a7, a8}) ({o4}, {a3, a4, a5, a6, a7, a8}) (φ, {a1, a2, a3, a4, a5, a6, a7, a8})
Concepts for the Example Relations
The set of all concepts of a given formal context forms a partial order via: (O1, A1) ≤ (O2, A2) ⇔ O1 ⊆ O2. The set L of all concepts of a given context and partial order ≤ form a complete lattice, called concept lattice: L(C) = {(O, A) ∈ 2
O
x2
A
| A = σ(O) ∧ O = τ (A)}.
The infimum of two concepts in this lattice is computed by intersecting their extents as follows: (O1, A1) ∧ (O2, A2) = (O1∩O2, σ(O1∩O2)) The infimum describes a set of common attribute of two sets of objects. Similarly, the supremum is determined by intersecting the intents: (O1, A1) ∨ (O2, A2) = (τ(A1∩A2), A1∩A2)
Copyright © Fraunhofer IESE 2004
81
Request-driven Reverse Architecting
The supremum ascertains the set of common objects, which share all attributes in the intersection of two sets of attributes. The concept lattice for the example relation in Figure 16 can be graphically depicted as a directed acyclic graph whose nodes represent concepts and whose edges denote the relation < as shown in Figure 18. The most general concept is called the top element and is denoted by T. The most special concept is called the bottom element and is denoted by ⊥.
T C1
C2 C3
concept
C4
C5
len] = t; l->len++; }
/*--- file stack.c ---*/ static List stack; void init () { stack = empty (); } void push (T t) { prepend (&stack, t); } T pop () { T result = first (stack); stack = rest (stack); return result; }
Example C Program Excerpt
5.10.2.1 Global Variable References Yeh et al. [YHR1995] identify ADOs by grouping global variables and all the routines that access them, regardless of where they are declared. We will refer to this strategy as Global Reference. In the case of a global variable used to log the occurrence of errors, this approach would group all the routines that use this log variable into one ADO. Yeh et al. propose to exclude frequently used variables from the analysis to avoid this unwanted effect. Applied to the example of Figure 34, Global Reference would find the ADO {stack, init, push, pop}.
Copyright © Fraunhofer IESE 2004
115
Request-driven Reverse Architecting
5.10.2.2 Same Module Heuristic One simple heuristic that follows programming conventions and is easy to apply is to group together routines only with those data types in their signature and referenced global variables that are declared in the same module. In the case of Ada, a package body and its specification would form a module. In C, modules do not exist, but programmers simulate the lacking concept by a header file f.h for the specification and a C file f.c for the body. Same Module [GiKo1997] assumes that programmers are disciplined and follow this convention. Despite its rather simple nature, Same Module yielded very good results in our experiment. This heuristic can be applied to detect ADTs as well as ADOs. For the example of Figure 34, Same Module would propose one ADT {T, List, empty, prepend, first, last} and one ADO {stack, init, push, pop}. 5.10.2.3 Part Type Heuristic Often, we find abstract data types that represent some sort of container of other abstract data types. For example, queues are containers of processes, or an account contains data about its owner and the deposited money. For such abstract data types there is usually an operation that takes an element and puts it into the container. For a process queue, for example, there will be an insert routine with two arguments: the process to be inserted and the queue itself. Even though both types are mentioned in its signature, we would not consider insert to be an operation for processes but for queues. The Part Type heuristic reflects this perception. It is based on the part type relationship which is defined as: • a type PT that is used in the declaration of another type T is called a part type of T, denoted PT < T • the part type relationship is transitive; i.e., if PT < T1 and T1 < T, then PT < T holds Part Type groups a routine with those types in its signature that are not a part type of another type in the same signature. This can be illustrated with the example of Figure 34 in which we find the following declarations: typedef ... T;
struct List {int len; T cont[100];}; void prepend (struct List *l, T t) { … }
Here, T is a part type of List. That is why prepend would be an operation of List according to Part Type and not of T though both are mentioned in the signature of prepend. Liu and Wilde proposed this heuristic in [LiWi1990]. Its basic
116
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
assumption is that a part type is actually used to be put into its container or to be retrieved from it. It does not check this assumption. Part Type can only be applied for detecting abstract data types. Because T is a part type of List in the example in Figure 34, Part Type would detect one ADT consisting of {List, empty, prepend, first, last}. 5.10.2.4 Internal Access Heuristic The purpose of an abstract data type is to hide implementation details of the internal data structure by providing access to it exclusively through a welldefined set of operations. The idealized encapsulation principle entails that all routines that access internal components of the abstract data type are considered to be the data type’s operations, which is exactly the attitude of the Internal Access heuristic. For a record or a union any access to a field is an internal access. The Internal Access heuristic associates types with those routines that have an internal access to them. Yeh et al. presented this heuristic in [YHR1995]. In the example of Figure 34, Internal Access propose the ADT {List, empty, prepend, first, and last}, because of their internal access to List. 5.10.2.5 Interconnectivity Metric The approach proposed by Canfora et al. [CCM1993], [CCM1996] uses a heuristic based on usage patterns to generate ADO candidates. These candidates are then ranked according to an index which measures the variation in “internal connectivity” of the system due to the introduction of these ADOs. The method selects candidates with a value of internal connectivity above a threshold obtained by statistical sampling. The heuristic and the evaluation metric are defined on the variables reference graph that describes the usage of global variables by functions. They can be explained more easily in terms of the following definitions, given a function f and a global variable v: • the context of f is the set of all variables it sets or uses • functions related to f are all functions which set or use variables in the context of f • closely-related functions of f are all functions which set or use only variables in the context of f • functions referencing v are all functions which set or use v
Copyright © Fraunhofer IESE 2004
117
Request-driven Reverse Architecting
Given the variables reference graph of Figure 35 and F as the function under consideration, then the context of F is {v1, v2}, the functions related to F are {R1, R2, R3}, and the closely related functions are {R1, R2}.
F
Figure 35:
V1
R1
V2
R2
V3
R3
Example of Variable Reference Graph
The candidate that is proposed as an ADO consists of all closely related routines of the given function F plus the context of F, i.e., all variables set or used by F. In the example of Figure 35, this is {v1, v2, F, R1, R2} for the given function F. Note that the proposed clusters depend upon the given function. Suppose F also references variable v3, then the cluster for F would be {v1, v2, v3, F, R1, R2, R3}; from the perspective of R3 we would get the cluster {v2, v3, R2, R3}. Thus clusters can overlap. The internal connectivity measure (IC) and the improvement in internal connectivity (DIC) are defined as:
IC ( F ) =
closely related functions of F
∆IC ( F ) = IC ( F ) −
functions related to F
∑
v in context of F
{f | context of
f = {v }}
functions referencing v
The underlying intuition is to have only few references of variables from outside the cluster (this motivates the internal connectivity measure IC) and only few routines in the cluster that reference only one variable of the cluster (the second term in the formula for DIC). The latter is aimed at clusters whose parts are more tightly coupled. The algorithm used is outlined in Figure 36.
118
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
repeat build variables reference graph create cluster candidates using a heuristic for each candidate compute improvement in cohesion metric ( IC) if improvement >= threshold then select candidate else slice remaining functions using different clusters’ variables until graph contains only isolated subgraphs consisting of a variable grouping with one or more functions Figure 36:
Delta IC Algorithm
The original approach developed by Canfora [CCM1996] was applied in reengineering context and applied slicing [Weis1984] to candidates that were not above the improvement threshold. This additional slicing step is not appropriate in the context of architecture reconstruction, because it modifies the code. Thus it has been left out. The reverse engineer discovers the appropriate threshold by calibrating the threshold to recover ADO that identified by hand. In the simple example of Figure 34, the approach recognizes the ADO {stack, push, pop, init}. 5.10.2.6 Similarity Clustering Approach Schwanke proposed a Similarity Clustering approach to detect subsystem using a similarity metric between routines [Schw1991]. We adapted the approach to atomic component identification by generalizing the similarity metric, adding informal information, relationship-based weights, and adapting many of its parameters [GiKo1997]. The extended Similarity Clustering approach groups entities (functions, user-defined types, and global variables) according to the proportion of features (entities they access, their name, the file where they are defined, etc.) they have in common. It is best explained in terms of the resource flow graph. The resource flow graph (RFG) is a graph abstraction of a system, which captures typical relationships between routines, variables, and user-defined types, the building blocks of atomic components. The relationships considered in this approach are illustrated in the entity-relationship diagram of Figure 37.
Copyright © Fraunhofer IESE 2004
119
Request-driven Reverse Architecting
calls n m n return-type
routine
1 parameter-of-type mn local-var-of-type mn
Figure 37:
global variable n
of-type
user-defined type is-part-of-type
actual-parameter n uses n m sets m n m
n m
1 entity
1
n
relationship of cardinality 1: n
An Entity-relationship Model for the Resource Flow Graph
This entity-relationship model defines the structure of a resource flow graph, which is an abstraction of the source code. Nodes of this graph stand for the entities and edges express the relationships among entities. Place each entity in a group by itself repeat Identify the most similar groups Combine them until the existing groups are satisfactory
The intuition of the Similarity Clustering approach is that if these features reflect the correct direct and indirect relationships between these entities, then entities, which have the most similar relationships, should belong to the same ADT or ADO. Functions, variables, and types are grouped according to the following algorithm: In each iteration of this algorithm, a similarity metric measures the proportion of features, which are shared. The algorithm terminates when “existing groups are satisfactory”. Groups are considered satisfactory when the most similar groups have a similarity, which is below a certain threshold. This threshold is established experimentally on systems where some atomic components are known. The similarity metric is constructed of three layers: • The similarity between two groups of entities, which is defined in terms of similarity between entities across groups.
120
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
• The similarity between two entities, which is a weighted sum of various aspects of similarity. • Each specific aspect of similarity between two entities. The similarity between two groups of entities S1 and S2 is defined as the average of the similarities of all pairs of entities in the two sets:
GSim( S1 , S 2 ) =
∑
s i ∈S1 , s j ∈S2
Sim( s i , s j )
S1 × S 2
The similarity between two entities A and B is the weighted sum of various aspects of similarity (the factors xi are used to adjust the influence of the various aspects):
Sim( A, B ) =
x 1Simindirect ( A, B ) + x 2 Simdirect ( A, B ) + x 3 Siminf ormal ( A, B ) x1 + x 2 + x 3
Each aspect is normalized to obtain values between 0 and 1, so the resulting similarity is also normalized. The following aspects are included in the similarity metric for the experiment reported in this article: • Direct relation aspect, denoted by Simdirect ( A, B ) , captures the relations between the two entities. It is computed on the resource flow graph as the weighted sum of edges between A and B divided by the weighted sum of all possible edge types between A and B. • Indirect relation aspect, denoted by Simindirect ( A, B ) , captures the relations with common neighbors in the resource flow graph. • Informal information aspect, denoted by Siminf ormal ( A, B ) , captures some of the information that is ignored by the semantics of programming languages, but is used by programmers to communicate the intent of a program like comments, identifier names, file organization, etc. Here, it is computed as the proportion of common substrings in identifiers and the files containing the entities A and B. The exact formulae, the rationale behind them, along with the experience calibrating and evaluating Similarity Clustering are reported in [GKS1999]. With the right calibration, it finds the ADT {List, first, rest, prepend} and the ADO {stack, init, push, pop}.
Copyright © Fraunhofer IESE 2004
121
Request-driven Reverse Architecting
5.10.3 Summary Identifying abstract data types (ADT) and abstract data objects (ADO) in procedural language like C contribute to constructing an up-to-date data view of a system and help to understand some key data structures and their operations. This section described six techniques that automatically identify ADT and ADO candidates. Since these techniques work well for different instances of ADTs and ADOs, and their success depends on the system, the best solution is for a software engineer to select techniques on a specific system, then review and combine their results. 5.11
Interface Analysis
5.11.1 Purpose Reuse is a promising solution for challenges for software-developing organizations and their need for reducing cost, effort and time-to-market, the increasing complexity and size of the software systems, and increasing requests for high-quality software and individually customized products for each customer. Documented interfaces are one of the prerequisites of effective reuse of components. Reuse works when the developers know which functionality is provided by a component and how to access the functionality implemented in such a component. Components in the context of interface analysis are collections of source code entities (e.g., files, groups of logically related routines, single or groups of classes or packages, or even whole subsystems). Applications of the interface analysis technique work with the following motivations: • Reduction of the complexity of given components with respect to the number of offered routines by minimizing the provided interfaces to only the actually used interfaces when to facilitate reuse • Documentation of source code spots in usage lists where to change accesses to a component when migrating the software system towards componentbased development. • Extension of architectural descriptions (e.g., the module and/or the code view) by explicit notation of the provided functionality of a component. • Migration of a group of entities towards an encapsulated component with explicit boundaries. Interface analysis reveals the connections of the subject component to the rest of the software system, or if it should become a real component in future, it
122
Copyright © Fraunhofer IESE 2004
Request-driven Reverse Architecting
documents the spots to be changed and how the future component is embodied in the system. 5.11.2 Realization The interface analysis operates like the other analysis as well on the fact base. A prerequisite for this analysis is that the fact base contains interface related information (e.g., calls or method invocation of class methods). Figure 38 shows the inputs to the interface analysis, a component or a collection of entities that should be migrated towards a component. Interface Description Component
Interface Analysis
Usage List Reduced Interface
Fact Base Figure 38:
Interface Analysis
The inputs are then analyzed with respect to dependencies of the component (or the collection of entities) to the rest of software system whereby dependencies means for instance data dependencies, import relations, inheritance, or caller-callee dependencies. The resulting interface descriptions will be twofold: • Required dependencies: the required dependencies are entities that the component needs in order to work properly. Usually a component communicates with other components to accomplish its tasks. • Provided dependencies: the provided dependencies (or usage list) are the part of the given component where the system accesses the component that is where it needs some functionality implemented in the component in order to work properly. There are two options, on the hand we can only documented the currently used entities from the component (i.e., when changing the signature of a routine, it is beneficial to locate each place where a call has to be adjusted) and, on the other hand, we are able to documented the complete interface offered to the outside (i.e., when reusing the component in another context, it is beneficial to know about the complete interface). To document both is required when planning to reduce the size of the interface. An interface description usually contains a list of routines and global variables that can be accessed from the outside. The interface description we produce
Copyright © Fraunhofer IESE 2004
123
Request-driven Reverse Architecting
includes the signature and the return parameter of the specific routines as well as the locations of different usages and in which file it is implemented. The description basically resolves the kind of dependencies which is present for each instance (e.g., calls, inheritance, data dependencies, imports, etc.), but it is possible to focus only on specific kinds of relations. Figure 39 shows an example for such an interface description. Tools automate the task of analyzing the interfaces by querying the fact base and it can be parameterized by list of components (or the collection of entities) for which the interfaces should be described. A component or a collection of entities will usually overlap in their interface description or their usage list. For this reason, we can create a single description for the whole components that contains no redundancies. These descriptions can be sorted by different criteria (e.g., places of usages, name of routines, classes, etc). Required Routines of class1.method1:
Class2.method2 Located in: c:/code/file2.java Type: method Signature: void method2 ( value: int ) Returns: void
Figure 39:
Interface Description Example
5.11.3 Summary The interface analysis documents the provided interfaces and the actually used interfaces of source code entities. This information can be used for the following purposes: • Migrating a groups entities into an encapsulated component • Reducing the broadness of provided interfaces to only the actual needed and therefore reducing the complexity in terms of the size • Documenting the description and usage of entities in an automated way This technique helps to counteract the degeneration of the interfaces.
124
Copyright © Fraunhofer IESE 2004
Open Source Case Study
6
Open Source Case Study
This section presents a case study in which we will show the applicability of our approach PuLSE-DSSA. The case study is centered on Eclipse, an open source tool platform. 6.1
The Eclipse Platform The Eclipse Project was founded in the late 1990ies. At this time an increasing number of Internet technologies brought a diversity of monolithic tools, which in most cases didn’t integrate with each other sufficiently. To address this problem, IBM and OTI developed Eclipse – a generic platform that allows the construction of tools for application development. The platform provides support for the development and seamless integration of tools that manipulate arbitrary types of content. As Eclipse is mainly written in Java, it can be used on different ® operating systems, including Windows , Mac OS, and Linux . TM
Adding functionality (or tools) to Eclipse is metaphorically speaking similar to plugging a connector into an outlet, so the components contributing functionality to the platform are called plug-ins. Generally speaking everything in eclipse is a plug-in, except the platform runtime which provides the functionality to start up the system and to manage the plug-ins. The basic mechanisms how to integrate plug-ins with the platform are described in more detail in Section 6.2.
Copyright © Fraunhofer IESE 2004
125
Open Source Case Study
Figure 40:
Eclipse architectural overview
As shown in Figure 40, the Eclipse platform basically comprises five components. The workbench constitutes the view of Eclipse and provides several frameworks that allow the fast development of graphical user interfaces that integrate seamless with the look and feel of the workbench user interface. The workspace constitutes the model of the eclipse platform. It provides a wrapper for the physical file system and manages the resources. The platform runtime provides the infrastructure to manage and to run the plug-ins and itself is no plug-in. Finally, the help component implements the help system of the Eclipse platform whereas the team part of the platform supports the integration oft repository tools such as CVS. 6.2
126
The Plug-in MechanismPlug-ins are the smallest units of functionality within Eclipse. Adding further functionality to the platform means contributing through the implementation of one or more plug-ins. The contribution mechanism is based on hooks that are provided by existing plug-ins. In Eclipse these hooks are called extension-points. At these extension-points other plug-ins can add their functionality. Within the Eclipse platform the workbench part provides most of these hooks, but it is also possible to contribute to the workspace, the help or team component. The plug-in mechanism is not restricted to the Eclipse Platform. Plug-ins that are not part of the platform can also provide extensionpoints. One well known plug-in are the Java Development Tools (JDT). The JDT is very often thought of to be Eclipse; actually it is not even part of the Eclipse
Copyright © Fraunhofer IESE 2004
Open Source Case Study
platform. As it is shown in Figure 41, the JDT is an example of a plug-in that uses extension-points of the platform, but also provides its own extensionpoints.
Figure 41:
Eclipse plug-in architecture
To add a plug-in, there are some rules to follow and mechanisms to use, so new functionality can be integrated in a clean and seamless way. The rules are arranged in a style guide and emphasize that the look and feel of the plug-ins should not differ from that of the platforms workbench. The mechanisms that have to be used are exposed through well defined API interfaces, classes, and methods. To facilitate the development of new functionality, the platform provides several frameworks for implementing the model and view functionality. To implement a plug-in, at first it is necessary to define a descriptor file. Within the descriptor file, the extension point to which the plug-in will contribute is declared. In this context it is noteworthy that the plug-in mechanism is based on Java reflection. The plug-in providing an extension point expects an interface to be implemented or a certain class type. This class has to be specified in the plug-in’s descriptor file. This way, the hosting plug-in can evaluate the descriptor file at runtime, get the name and class path of the class that has to be hosted, and finally instantiate it with a reflection using that information. Depending on the kind of contribution it is necessary to implement some infrastructure code so that, for example, a tree viewer gets integrated in the workbench or menu entries with underlying actions can be executed within the action framework and can perform the necessary tasks. Optionally, it is possible to define a singleton plug-in class, which constitutes the proxy for the plug-in in the system. This class can serve as a listener for life-cycle events of the plat-
Copyright © Fraunhofer IESE 2004
127
Open Source Case Study
form or can be used as an interface through which others can communicate with the plug-in at runtime. 6.3
The Individual Systems In the case study, we analyzed three existing Integrated Development Environments (IDE), in order to benefit from these existing systems for our current development of two new IDEs. All these systems have in common that they are not standalone IDEs but that they rely on the infrastructure of the Eclipse platform, i.e. they are or will be only available as Eclipse plug-ins.
6.3.1
Java Development Tools (JDT) The JDT [JDT2004] is the most prominent representative under the Eclipse IDE plug-ins. It contributes a full-featured Java Development Environment to the Eclipse platform. The JDT consists of two major parts. The first one is the Java Core, which can be seen as analogue to the Eclipse platforms workspace. Java Core realizes a Java model that is build upon the workspace model of the platform. The second part, the JDT UI, is the view part of the JDT. It implements the workbench contributions that are Java specific. The main features of the IDE are listed below: • • • • • • • • •
6.3.2
Incremental Java builder Java Model providing an API for navigating the Java element tree Code assist and code select support Search infrastructure used for searching, code assist, type hierarchy computation and refactoring Evaluation support Several Java specific views like package view and creation wizards Java Editor providing features like syntax coloring, context-specific code assist and code select, margin annotations, API help integration, and code formatting Debugger Ant integration
C++ Development Tools (CDT) The CDT (C/C++ Development Tools) Project [CDT2004] is a quite new IDE Project. The goal of the CDT-Tool is to provide a fully functional C and C++ Integrated Development Environment for the Eclipse platform. Analogue to the JDT, the CDT separates the system into a model part and a view part. The features implemented by the current version of CDT are listed below:
128
Copyright © Fraunhofer IESE 2004
Open Source Case Study
• C/C++ Debugger (APIs & Default implementation, using GDB) • C/C++ Model that provides an API for navigating the element tree • Different views (e.g. a tree view providing project and source file navigation or an outline view for source files) • Different wizards for project and class creation • C/C++ Editor with syntax highlighting and code completion • C/C++ Launcher (APIs & Default implementation, responsible for launching external application) • Parser • Search Engine • Content Assist Provider • Makefile generator 6.3.3
Cobol Development Tools (CobolDT) The Cobol IDE Eclipse subproject [Cobo2004] will provide a fully functional Cobol IDE. At the time of writing this report a beta-version of the Cobol IDE plugin is available. Currently only parts of the features listed below are implemented yet: • Editor supporting all COBOL source code formats (i.e. fixed format, variable format, and free format) • Views for structural outlines, templates, and a template assistance wizard • A builder that allows target execution by specifying environment variables, setting compiler and linker options, and the analysis of dependencies. The builder implements a bridge interface for invoking a native COBOL compiler •
6.3.4
A COBOL Debugger that allows to set breakpoints, to display variable values and call stacks and that follows the Eclipse debug perspective by providing a bridge interface for native COBOL debuggers
KobrA Component Development Tools (KobrA-DT) The KobrA-DT is currently in the requirements elicitation phase. It is planned as a tool that will assist software architects and programmers in developing component-based software using the KobrA method [ABBK+2001]. The KobrA method meets the requirements of both software product family engineering and single-system development. The development of the KobrA method was influenced by the concepts of the Model Driven Architecture [MDA2004]. Basically, the method prescribes a recursive component development process with an emphasis on the design of UML models for behavior, structure, and functionality of each component. The planned features of the KOBRA-DT are listed below:
Copyright © Fraunhofer IESE 2004
129
Open Source Case Study
• Instances of the KobrA component model (i.e., a project under development) will be visualized through several views (e.g., tree views) at different levels of granularity. It will also be possible to apply user-defined filters to the views. Other tools will have the possibility to access the model from outside at runtime via an API. • Functionality comparable to that of an editor will be realized through the integration of UML-CASE-Tools: The different UML models that make up a KobrA component can be created and edited through the integrated tool. In-memory model information provided by these CASE tools will be accessed and manipulated from the KobrA-Tool. Changes affecting the in-memory model of the KobrA tool (e.g. through the creation of KobrA components or changes in the in-memory model of the CASE tool, for example by changing an UML diagram), will be synchronized in either direction. • Different wizards will provide New, Import and Export functionality for UML models, KobrA components (and their sub-elements), and KobrA projects. • Regarding the product family engineering capabilities of the KobrA method, the tool will support the instantiation of generic UML models using the Decision/Decision Resolution Models that are part of a generic KobrA component. • Code generation for different target technologies based on the KobrA models will be possible from the instantiated models. This will be possible through the supply of an interface for different generators. • Consistency checks and updates between dependent UML models will be possible. Errors will be reported or fixed automatically during the component design process. A search engine for KobrA model elements will help in project navigation and consistency check tasks. 6.3.5
Frame Processor Development Tools (FP-DT) A frame processor is a tool supporting frame technology [Bass1997], a technique to support reuse in practice. In frame technology, the implementation units, called frames, have the same appearance as those in any major programming language. They form a group of symbols (e.g., source code or frame code) that can be consistently referenced. Frames contain both source code and frame-specific code providing adaptation, which enables reuse. Frame-specific code consists of frame commands and frame variables in order to make variation points explicit by distinguishing between common and variable text. Frames can be arranged in hierarchies and will be resolved at compile time by the frame processor, an advanced pre-processor. The frame processor processes the frame hierarchy and generates finally pure source code. In product family engineering, this technique is used to produce different product instances from
130
Copyright © Fraunhofer IESE 2004
Open Source Case Study
a family by explicit variation points in one common code base (see [MuPa2002] for detailed description of frame technology). The planned features of FP-DT Eclipse plug-in, currently under development, are listed below: • Navigation and management of frame hierarchies in frame processor projects and the physical mapping of it to directories and frame files, as well as visualization in a frame view and a frame files view. • Refactoring of frame hierarchies • Management of dependencies between normal and leaf frames (adapter and adaptees) • Insertion, replacement, and removal of frame commands and frame variables in source code in order to introduce variants • Creation, modification, and deletion of variation points in the source code • Syntax highlighting of frame commands, frame variables, as well as variation points • Wizard supported creation, modification, and deletion of frames • Import existing frame hierarchies into the Eclipse plug-in • Transfer of common or variable code fragments into frames and vice versa • Build of product family instances by resolving the frame hierarchy with the help of an underlying decision model 6.3.6
Motivation for the Reference Architecture The goal of the case study was to design a reference architecture for IDE plugins of the Eclipse development platform, because we want to develop two new IDEs (the KobrA component browser and the frame processor). The opensource community already has developed three other IDEs (JDT, CDT, and Cobol-DT) with success. Rather than starting our development from scratch, we analyzed the existing plug-ins to define a reference architecture that fulfils the needs of our planned new tools. The reference architecture includes the commonalities among the plug-ins, for example all five are dependent on the Eclipse platform as basis for their functionality. However, since each plug-in aims at being an IDE for a different purpose, the reference architecture will of course contain variable elements and a decision model to create the different, product-specific instances. In defining a common reference architecture, we claim for the following advantages and benefits: • Benefit from gained experiences made with the products developed by the open-source community
Copyright © Fraunhofer IESE 2004
131
Open Source Case Study
• Reuse field-tested and proven concepts and solutions revealed in the existing products • Know about alternatives and their consequences, when there are different implementations for the same or similar problems concerning a plug-in. • Be aware of common difficulties in developing an IDE and avoid bottlenecks and problems present in the existing systems. • Learn from the existing systems and profit from the work already done by the open source community. • Reduce effort required for the development of the two new products by sharing a single reference architecture and reusing existing artifacts. All the above-mentioned reasons will improve the overall quality of the new, arising products. In the case study, we applied our approach PulSE-DSSA and we will show how information can be mined from successful, existing systems in a systematic, explicit way, and moreover how this information is then integrated into the design process of the reference architecture. 6.4
Case Study Experiences
6.4.1
Infrastructure set up In the Eclipse plug-ins case study, first of all, the reverse engineering infrastructure was set up. Based on the classification mentioned in Section 2.4, the appropriate methods and mechanisms were selected in order to provide all tools necessary for the analysis. The tool set included extraction tools and mechanisms for dynamic and static analyses. The infrastructure comprises the complete source code for each of the plug-ins (i.e., the JDT, CDT, and CobolDT) and the source code for the Eclipse IDE. An independent working copy of the Eclipse core and a single plug-in added to the core were also created. This way, dynamic analyses of the instances were possible. (Each combination was backed up at the beginning to be able to restore the initial states.) This way we established the working environment for the reverse architects, each plug-in instance was executable, analyzable and we prepared the fact base of the reverse engineering infrastructure to be filled in the following fact extraction step.
132
Copyright © Fraunhofer IESE 2004
Open Source Case Study
6.4.2
Fact extraction Fact extraction is a process that extracts certain facts from the source code. Facts are entities (e.g., packages, classes, interface, methods, attributes) and relationships (e.g., call, inherit) among them. All facts together are stored in an intermediate representation in the fact base. The fact base forms the foundation for the later analysis. We applied our fact extractor to the following five plug-ins of the Eclipse IDE: • • • • •
Table 11:
JDT CDT COBOL DT Eclipse Workspace Eclipse Workbench
Relation Call Inherit
Start Entity Set method {class, interface}
End Entity Set Method {class, interface}
Contain
{package, class}
{class, method, attributes}
Read_attribute Write_attribute Attribute_data_type
method method attribute
Attribute Attribute data_type
Scope
{class, method, attribute}
Implement_interface
class
{public, private, protected, default} Interface
Description Method calls another method class/interface inherits class/interface A package contains classes. Also class contains methods and attributes Method reads attributes Method writes attributes An attribute has a data type. Data type can be either userdefined type or type supported by the language Every class, method, attribute has a scope Class implements interfaces
Extracted Entities and Relations
We extracted facts from the source code of the Eclipse core packages, since the plug-ins are anchored into the Eclipse IDE core. The extracted entities and relations are shown in Table 11. After the infrastructure was set up and the fact bases were prepared for all systems, we were ready to start different analyses in order to reach our goal of defining a plug-in reference architecture.
Copyright © Fraunhofer IESE 2004
133
Open Source Case Study
6.4.3
Feature Trace The Eclipse IDE and the JDT plug-in provide a large set of features, which are used within several scenarios. To identify components and finer grained entities that have the potential to be reused within the domain of developing tools and the planned reference architecture we requested the analysis of parts of the different IDEs’ execution architectures. We used a scenario based feature trace technique as a systematic analysis approach. One scenario we requested an analysis for was the selection and opening of a file through a double click in the projects navigation tree. This scenario depends on an “open file” feature, which handles loading the respective resource and displaying it in an editor. The second scenario we found worth considering is the change of a Java-class file within the IDEs’ editor and the subsequent save of the changes. The first feature that is part of this scenario is the “background update” feature. Changes in the structure of a class file are propagated to all views that provide a perspective on this resource without an explicit user request. To complete the scenario, users have to explicitly commit the changes they made in the editor to the physical file system, so the second feature we identified within this scenario is the “save file” feature. This scenario is described in detail in the next section. Save Feature To get a starting point we requested the list of components that are involved in the “save file” feature. As an input to the analysis we assumed, that it would be helpful to search for methods carrying the pattern “save” in their naming. Based on this assumption the analysis recovered the process flow shown in Figure 42. To decrease the complexity of the extracted information, any process flow that involves mechanisms related to the user interface, for example updating the view after the save activity has finished, is suppressed. The swim lanes within the diagram name the identified coarse grained components that were identified within the Eclipse core and JDT architecture. The activities presented in the behavioral view abstract the functionality these components fulfill as parts of the considered process flow.
134
Copyright © Fraunhofer IESE 2004
Open Source Case Study
Action Framework
Save Editor Part
Editor Framework
Do Save
Get Editor Input
Eclipse Workspace Model
Java Workspace Model
Create Workspace Modify Operation
Execute Modify Operation
Get Corresponding Model Resource
Commit Changes
Run Java Model Save Operation
Create Java Model Save Operation
Synchronize Editor and Model
Synchronize Model and File
Figure 42:
Behavioral view for the save feature
The behavioral view delivered in the response helped us to identify four major components of the Eclipse architecture that take part in the save activity. The two swim lanes on the left represent parts of the Eclipse core UI and JDT UI. In this consideration, we will take a closer look at the two swim lanes on the right, namely the Eclipse Workspace Model and JDT Java Workspace Model component. We will refer to these two parts as Eclipse model and JDT Model, respectively. The behavioral view shows how a controller captures the “Save” event and delegates it to the editor framework. The editor framework uses functionality
Copyright © Fraunhofer IESE 2004
135
Open Source Case Study
provided by the Eclipse workspace and the JDT workspace. This functionality provides means to synchronize the editor input with the underlying physical resource. A conceptual view is shown in Figure 43 that describes the structure of the participating model components.
Figure 43:
Conceptual View of model elements involved in save activity
Figure 43 shows all entities that are part of the model and that are involved in the synchronization activity between the physical file abstraction of Eclipse (the File class), the model element representing the File within the Java Model (the CompilationUnit class) and the working copy (the WorkingCopy class), which buffers the Editor Input. The Buffer component can be queried to retrieve the corresponding buffers from original and working copy so synchronization between the two is possible. The update feature and the open feature are not discussed here in detail, as they do not provide additional information relevant for this discussion.
136
Copyright © Fraunhofer IESE 2004
Open Source Case Study
6.4.4
Context Analysis for the Model Element Package From the inheritance hierarchy existing between the classes Openable (implementing the synchronization logic) JavaElement, WorkingCopy and CompilationUnit, we assumed that there exists an important coherence. It turned out, that these classes are part of a model component, which serves as the JDTs internal abstraction of the Java programming language. Therefore, we formulated further requests to gain information concerning the Java model. First we wanted to know, which classes are part of the model and, based on this response, which of them have the potential to be reused. The analysis method used was the metrics-based selection of reuse candidates described in section 5.6. We reviewed the code of the classes the response delivered in order to allocate them to specific categories and to sort them according to the degree of their reusability. Beneath the classes realizing the concepts of the Java programming language we found a category of classes that hold meta information about the model classes. Additionally we identified a caching functionality that provides fast access to allocated model elements. A fourth category we identified were classes implementing some means to execute operations on the Java model. Finally we found classes that, like the Openable class mentioned above, extend Java model elements with certain characteristics that are not Java specific, and therefore can be called cross cutting.
Copyright © Fraunhofer IESE 2004
137
Open Source Case Study
Figure 44:
Module view on an excerpt of the Java model
Figure 44 shows parts of this interface hierarchy and its implementing classes. The grey shaded interface ICompilationUnit and its implementing class CompilationUnit represent the concept of a Java source file and are part of the Java model’s straightforward realization of an abstraction of the Java programming language. All other (non shaded) classes within this diagram introduce the cross cutting concerns mentioned above. These concerns are added to the Java model elements by the implementation of interfaces or by inheritance. We will describe these concerns in greater detail in the following section with a focus on the reusability of the classes implementing these concerns.
138
Copyright © Fraunhofer IESE 2004
Open Source Case Study
6.4.5
Reusable concepts within the Java Model As mentioned above, we concentrate in this section on the category of classes and interfaces we recovered that implement respectively specific functionality, which is not Java domain specific but can be reused in other contexts, too. The first interface we want to describe here is the IParent interface. If a class implements this interface it is indicated that this class aggregates other model elements. The class CompilationUnit for example implements this interface, as it is not a leaf in the Java model element hierarchy and can contain other model elements like types, methods and so on. The ISourceReference interface is implemented by classes that have a reference within a physical source file (e.g., the Java Model Element Method implements this interface, as it is physically present in source code at a specific offset and with a certain length). IWorkingCopy indicates that a certain instance of a class is a copy of the underlying original resource. This concept was already introduced in the save activity described in section 1.1.3. Analogous we got similar results for the CDT and the CobolDT. All interfaces respectively aspects that were found in the JDT model were also part of the programming models the two IDEs mentioned provide. The IOpenable interface indicates, that a model element can be saved and has a physical reference in the file system. Our assumption in this case is that the Openable class would –with some minor changes- be reusable within the whole domain of plug-in IDE models, as we find the concept of model elements representing files that contain source code in all considered IDEs. More over the IOpenable interface is not just implemented separately by each class that has a physical representation in the file system, but has its own default implementation, namely the Openable class. This indicates, that the reuse potential is high, as the functionality the class provides is directly included in the Java model element inheritance hierarchy through sub classing not by separate implementation.
6.4.6
Refined Pattern Completion The save feature was mainly concerned with the model part of the considered IDEs as it provides the mechanisms to synchronize with physical entities and to commit changes on these entities. In our case study, these models are representing an abstraction of the programming language the considered IDEs are designed for. But the IDE’s model is not only for application internal use, it is made visible in the user interface in several views, the structure of a class for example is displayed within an outline view using the classes of the underlying Java model.
Copyright © Fraunhofer IESE 2004
139
Open Source Case Study
All changes that are made to the model through changing the class file structure using the editor are immediately propagated to the corresponding views. Besides using the editor there are several other possibilities to manipulate the model through user interaction. Therefore, we assumed that a Model View Controller framework is part of the Eclipse architecture, where the basic infrastructure of the framework provides means to add models, views and parts of the controller. Before going into a more detailed analysis, we first will take a closer look at the Model View Controller pattern in general, to show, what the analysis is based on. 6.4.6.1 The Model View Controller Pattern Many applications provide a screen presentation (i.e., a view) of their underlying data model. In this context, several problems have to be solved. First of all the coupling of model and view should not be too tight, so the parts can be easily reused separately. What goes together with this point is that it should be possible without major effort to provide different perspectives on the applications data model through contributing different views. Another important point concerns the synchronization of view and model. If there are model changes at runtime, either through user interaction within the application or from outside, for instance through deleting a physical entity that is somehow represented in the applications data model, these changes have to be propagated to all the views, so they can update their representation of the respective application data. The behavior of the application responding to user interaction is also of importance. If a user selects a certain button on the screen or a key on the keyboard, the application should respond in the desired way. To intercept and handle these inputs, the appropriate mechanism has to be provided. The Model View Controller Pattern is widely used to solve these problems [GHJV1993]. Through the distribution of the concerns mentioned above into three components, namely the model, the view, and the controller the decoupling of these architectural elements is enforced. In the following the responsibilities of the different components are briefly discussed. Model: The model encapsulates the state and the functionalities of an application or a part of it. Generally, a model constitutes the computational abstraction of a real world entity like a system or a process. Therefore, an application’s data model isn’t necessarily limited upon static data. A model provides also the means to act upon its data from outside. View: The View implements the logic for displaying the model or parts of it and provides possibilities for user interaction. Controller: The Controller processes events produced by user interaction for example keyboard or mouse events captured on control elements that are part
140
Copyright © Fraunhofer IESE 2004
Open Source Case Study
of the GUI. The controller maps these user events to operations on the model data. Related Patterns: The MVC pattern is a combination of several other design patterns. For the sake of reusability, model and view are loosely coupled using the Observer pattern. This pattern describes a publish-subscriber mechanism: views that want to get notified of model changes have to implement an interface and register with the model through this interface. If model changes occur, these are propagated to all registered interfaces that is the views implementing the interface.
Figure 45:
Model View Controller Pattern
The controller mechanism can be realized using the strategy design pattern. Dependent on the kind of event, the class implementing the appropriate algorithm is chosen to handle it. The composite pattern is used to realize the recursive containment of elements implementing the same interface in tree like structures. This pattern can be found in the GUI of MVC-based architectures, like in views that are nested within other views but may be also used in a model that is organized as a tree [GHJV1993], [Gran2002]. 6.4.6.2 The Model View Controller Pattern in Eclipse The feature traces we requested during the architectural recovery showed that the presence of a data models is one central concept within the JDTs IDE but
Copyright © Fraunhofer IESE 2004
141
Open Source Case Study
also within the Eclipse core. The visualization of the model in the IDEs views was an aspect we did not consider right at the beginning, but it was emergent, that the mechanism behind the visualization was central to the Eclipse core and the JDT. The request we made was twofold: First it should be proofed whether MVC pattern was realized within the different systems at all. If this should proof true, we wanted to see how it was realized. As we already knew which classes were part of the model, the answer should provide a set of classes constituting the view and the controller. The technique used for the pattern completion is described in section 5.2. Recovering the View Our first request concerned the view classes. The response from the analysis showed, that the view classes collaborating with the Java model are located within the package org.eclipse.jdt.ui. For our further requests we chose only a subset of the classes the response delivered. This subset embraced those classes we supposed to be responsible for the project navigation tree in the Java perspective, namely classes, that realize the package explorer of the JDT plug-in. Recovering the Controller We then requested the recovery of classes handling any user interaction with the package explorer. The response hinted towards the existence of a controller framework within the Eclipse core. This framework assures that classes encapsulating code for handling user interaction with the views respectively their control elements can be added and get notified through an event mechanism.
142
Copyright © Fraunhofer IESE 2004
Open Source Case Study
View
PackageExplorerPart +setViewer()
«interface» IInputSelectionProvider +getSelection() +setSelection()
«interface» ISelectionChangedListener +selectionChanged() «realize»
«realize»
PackageExplorerTreeViewer +addTreeListener() +addSelectionChangedListener() +setInput() +addDoubleClickListener() +addOpenListener()
«interface» IElementChangedListener +inputChanged()
SelectionChangedListener
PackageExplorerActionGroup «realize» PackageExplorerContentProvider
PackageExplorerLabelProvider «interface» IAction +run()
JavaModelManager JavaModelElement
+addElementChangedListener()
Controller
JavaCore
Model Figure 46:
+addElementChangedListener()
Model View Controller Architecture of the Package Explorer
The Module View shown in Figure 46 is an excerpt of the results produced by the pattern completion technique. This View shows a subset of the classes that make up the controller component of the package explorer. Those classes within the controller that implement for example the ISelectionChangedListener interface can register with the controller framework of the Eclipse core and get notified about events produced within the package explorer view parts. The controller class provides the logic to manipulate the Java model dependent on the users’ selection. The package explorer constitutes the view part of the architecture. It initializes the necessary infrastructure to enable the Model View Controller interaction at the IDE plug-ins startup. Additionally, it acquires the Java model and displays it in a tree like structure. The view is registered to the Java model through the IelementChanged Listener interface. If model changes occur the view gets notified, queries the updated model and displays it.
Copyright © Fraunhofer IESE 2004
143
Open Source Case Study
6.4.7
Conceptual Models of IDE Plug-Ins To end the case study experiences section we introduce the overall conceptual module view of the considered IDEs (CDT, JDT and CobolDT) and the Eclipse core (see Figure 47).
View-Controller
Model
Eclipse UI
Controllerframework
IDE-Workbench-UI
IDE-Data-Model
Eclipse Workspace
Workspace Resource Framework UI Framework
Figure 47:
Eclipse-WorkbenchUI
Eclipse-ResourceModel
Conceptual View of the IDEs architecture
At this level of abstraction, there are no differences between the architectures of the different plug-ins so they can be visualized within one generic view. Differences get more obvious but are not striking if one browses into the elements that make up the plug-in IDEs, namely the IDE-Resource-Model and IDEWorkbench-UI. The IDE-Resource-Model was already analyzed exemplarily for the different IDEs with an emphasis on the JDT. In the following section, the results we obtained there will serve as input for parts of the reference architecture. An analysis for the Workbench part of the different systems was not requested in greater detail yet but will be necessary for our future work as we assume big reuse potential within the view parts of the architectures, too. 6.5
Reference Architecture The ultimate reason for conducting the case study was the development of a product family architecture for Eclipse plug-ins that encompasses two existing plug-ins (JDT and CDT, we left out the CobolDT), as well as two new ones (FP-
144
Copyright © Fraunhofer IESE 2004
Open Source Case Study
DT and KobrA-DT). In this section, we briefly describe the techniques used for documenting generic product family architecture and for instantiating the generic architecture description to yield individual description for specific product family members. In the following, the model parts of the Eclipse plug-ins are used as example. 6.5.1
Generic Architecture Description Generic architecture descriptions encompass the descriptions of the architectures of a number of related software systems. For that reason, a generic architecture consists of a number of generic architecture views. Generic architecture can potentially contain variation points. Since we use UML model to document architectural views, stereotypes are used to denote variation points in the generic architecture views. This way, generic architectural views are still UMLconform and can be modeled using standard CASE tools. The stereotype can be used for all UML model elements. If an architectural view contains variation points, it is accompanied by a decision model. This way potentially all architecture views can be extended to be generic. A systematic approach for the extension of arbitrary software engineering artifacts for their use in a product family is described in [Muth2002]. Figure 48 shows the generic conceptual view for the model part of the plug-in product family. The generic conceptual model encompasses the model parts of all four considered plug-ins. Elements of the view that are not part of all product family members, that is variabilities, have the stereotype to express their variable nature. Element of the view that do not have the stereotype are common to all plug-ins.
Copyright © Fraunhofer IESE 2004
145
Open Source Case Study
Model
«variant» FrameModel
«variant» CModel «variant» KobrAModel
«realize»
«variant» Frame
«variant» XMLPersistenceFramework
ModelManager
«variant» KobrAComponent
«realize»
«realize» «realize»
«variant» JavaModel
«variant» CCompilationUnit
«variant» JavaCompilationUnit
«variant» Buffer
«interface» IOpenable +save() +open()
ModelElement
Figure 48:
Generic Conceptual View for the Model Part
Table 12 contains a partial decision model for a generic conceptual view given in Figure 48. A decision model is a table that in each row contains one decision. A decision is given by an id (a letter denoting the model, here C stands for conceptual view) and a description. A question follows that asks for the resolution of the respective variability in terms of the application domain. The variation point relates to the model element that is variable. The resolution column then lists the possible resolutions for the variation point. The resolution is given by the possible value set (the first value in the set is the default value). The last column lists the decisions constrained by a decision. For example, if the JDT, CDT, or Frame-DT are chosen, a buffer is needed and the XML persistence framework is not needed. For the KobrA-DT it is the other way around.
146
Copyright © Fraunhofer IESE 2004
Open Source Case Study
ID
Decision
C1 JDT
C2
C3
C4
C5 C6 Table 12:
Question Is the Java PlugIn used?
Variation Point
ConceptualComponent JavaModel ConceptualComponent JavaCompilationUnit CDT Is the C Plug-In ConceptualComponent used? Cmodel ConceptualComponent CCompilationUnit FP-DT Is the FrameConceptualComponent Processor Plug-In FrameModel used? ConceptualComponent Frame KobrA-DT Is the KobrA ConceptualComponent Plug-In used? KobrA Model ConceptualComponent KobrAComponent XML Persis- Is a XML Persis- ConceptualComponent tence tence FrameXML PersistenceFramework used? work Buffer Is a buffer used? ConceptualComponent Buffer
Resolution Constrained Decision {Yes, No} C1 = Yes: C5:=No, C6:=Yes {No, Yes}
C2 = Yes: C5:=No, C6:=Yes
{No, Yes}
C3 = Yes: C5:=No, C6:=Yes
{No, Yes}
C4 = Yes: C5:=Yes, C6:=No
{No, Yes}
-
{Yes, No}
-
Decision Model
The decisions given in table are all optional, meaning that the respective element is either present or not in a specific product family architecture. If they are present, the stereotype is removed during instantiation, if they are not present, the respective element is removed.
Copyright © Fraunhofer IESE 2004
147
Open Source Case Study
Model
ModelManager FrameModel
«realize»
Buffer
Frame
«interface» IOpenable +save() +open()
ModelElement
Figure 49:
Frame Processor Instance of the Generic Conceptual View
Figure 49 shows an instantiation of the generic conceptual view for the FrameDT. The figure shows that all elements that do not belong to the Frame-DT have been removed and the stereotypes for the Frame-DT specific elements have also been removed. Figure 50 shows the behavioral view for the Frame-DT.
148
Copyright © Fraunhofer IESE 2004
Open Source Case Study
Action Framework
Save Editor Part
Editor Framework
Do Save
Get Editor Input
Eclipse Workspace Model
Frameprocessor Workspace Model
Create Workspace Modify Operation
Execute Modify Operation
Get Corresponding Frame Model Resource
Commit Changes
Run Frame Model Save Operation
Create Frame Model Save Operation
Synchronize Editor and Frame Model
Synchronize Model and File
Figure 50:
Behavioral View for the FP-DT
Figure 51 shows a screenshot of the resulting Frame Processor plug-in. The FPDT plug-in is not yet fully realized, but since we develop it in an iterative way, a first set of features is already implemented and applicable.
Copyright © Fraunhofer IESE 2004
149
Open Source Case Study
Figure 51:
Frame Processor IDE
The perspective associated with the frame development contains three parts. On the left side a view on the project’s physical resources (i.e., files that constitute frames) is provided by a tree viewer that is part of the Eclipse JDT. An editor shows the content of one of these files. The editors base functionality is taken from the Eclipse core editor framework, but moreover the editor associated with frame files was especially adapted to the needs of the frame processor IDE. Especially syntax highlighting and parsing capabilities were necessary here. The tree view at the bottom-left of the screen shot provides a logical view on the frame hierarchy. This view depends on an underlying frame model and presents logical relations between the frames. These relations are not visible in the physical project view but are retrieved by parsing the frame files. It is worth mentioning, that besides the save feature shown in Figure 50 the update feature of the JDT described in section 6.4.3 was adapted here to implement the analogue capability of reflecting the changes made in the editor immediately in the logical view.
150
Copyright © Fraunhofer IESE 2004
Open Source Case Study
6.6
Case Study Summary The case study showed how we applied PuLSE-DSSA in order to define a reference architecture based on existing systems. The underlying concepts of this approach (namely view-based architecture, request-driven reverse architecting, and iterative design) became visible during the conduction of the case study. We made explicitly how forward design and reverse engineering activities were combined by naming the request that were performed on demand. We paid special attention on showing how the responses to such requests were integrated into design activities. This finally led to a reference architecture for Eclipse plug-ins. The resulting product family architecture resulted in the successful construction of the frame processing development tools. In short, the case study proved that our approach PuLSE-DSSA attained the goals as we claimed in section 2. To achieve these aims, we documented the architecture of Eclipse plug-ins with views presented in section 4 by using request-driven reverse architecting analyses as introduced in section 5.
Copyright © Fraunhofer IESE 2004
151
Conclusion
7
Conclusion
Introducing product family engineering in a development organization is a nontrivial task. Before most software organizations consider it, they have already produced multiple successful systems in the domain of their future product family. The organization usually wants to exploit the experience gathered in developing these systems and even reuse some of their existing, field-tested components. However, this leveraging of previous success is rarely performed in an explicit and systematic fashion. This report introduced with PuLSE-DSSA an approach to design product family architectures that integrate request-driven reverse architecting techniques and thus, foster the systematic leveraging of previous experiences. PuLSE-DSSA is an iterative and scenario-based method for designing reference architectures. The basic ideas of PuLSE-DSSA are to develop a reference architecture documented in different views incrementally by applying generic scenarios in decreasing order of architectural significance and to integrate evaluation into architecture creation based on the scope definition, a product family model, and recovered information out of individual systems (or existing product families). PuLSE-DSSA elicits the views needed to express the architectures of a specific product family and provides a systematic process to define the reference architecture integrating the experience of past systems. In the approach presented here, the architect of the product family derives the views needed to express the architectures of a specific product family from the business and quality goals. To define these views, the architects usually do not start from scratch. Normally, they would use, more or less explicitly one of the typical view set. In this document we used the view set from Hofmeister et al [HNS2000] and extended it with four other views, which we consider as relevant as well. The report described how a reverse architect uses different techniques to reconstruct (partial) views for prior systems. These (partial) views will be processed in design activities and therefore, they constitute the canvas on which the product family architects and the reverse architect will paint the design rationales, the successful means to meet requirement, together with the commonalities and variabilities among the various, prior systems.
152
Copyright © Fraunhofer IESE 2004
Conclusion
7.1
Compliance to Business Cases In Chapter 1.4, we identified different kinds of the migration business case, where it is beneficial to integrate information of prior systems into product family engineering. Furthermore, we claim that the resulting reference architecture will be of a higher quality when exploiting existing systems instead of designing a new one from scratch for the following reasons: • The architecture of the individual systems reveals information about success factors and critical aspects, and then detailed analyses can react to requests such like why is the system inappropriate, what were the consequences of the application of one specific architectural means. The results are learning effects about essential characteristics and consequences of certain strategies of architecture development. • One main goal is to synthesize architectural means and strategies out of a variety of systems in order to identify potential reuse candidates. Architectural analyses help to classify the individual solutions for requests and their consequences. The product family architect and the reverse architect are able to derive a solution that will fit the requirements of the reference architecture of the design process but is based on experiences made with fieldtested systems. • Due to potentially outdated documentation it is vital to recover the existing, individual architectures and to learn about scenarios and means applied. For the transition of the means, patterns, and strategies, it is necessary to learn about their context and to reveal the means associated with scenarios. In short, exploiting the existing systems is worthwhile, since it helps to understand success factors and critical aspects, it avoids bottlenecks, and it promotes the learning about applied solutions. The recovering of an architectural description and the following integration of the gained results into the design process are crucial in order to benefit from the existing systems, and to reuse implanted knowledge, as well as field-tested architectural means, (i.e. patterns, strategies and infrastructure). Hence, with PuLSE-DSSA we have introduced an approach that successfully combines forward and reverse engineering activities as shown in the case study. Therefore, this report contributes by introducing a high-quality design process for reference architectures with support of reverse engineering.
7.2
Outlook Our approach to design reference architectures, PuLSE-DSSA, has proven feasible in the case study centered on IDE plug-ins for the Eclipse platform. The results and experiences made during the conduction of case study were used in
Copyright © Fraunhofer IESE 2004
153
Conclusion
order to design and implement the component browser and the frame processor as plug-ins and they will be beneficial as well for further IDE plug-ins. In future, we plan to apply PuLSE-DSSA in industrial projects as well as further case studies and to improve it during these applications. The presented view set comprises common views. In future, we may extend the view set and add further views. It is very likely our general view set will be replaced by a couple of view sets, each customized and tailored to a certain set of domains. Ongoing work includes the extension of our catalogue of request-driven reverse architecting analyses. We will widen existing reverse engineering techniques and we plan to develop new analyses, which will strengthen our catalogue and enlarge our capabilities. We will focus in doing so especially on analyses that are tailored to product family related issues. Furthermore, our analysis catalogue will be integrated more strongly in the architecture design process by providing a consistent description of each technique. For each technique we will provide information on how it is linked into the design process of architectures and how a technique can contribute to the design.
154
Copyright © Fraunhofer IESE 2004
References
References
[ABBK+2001] Atkinson, Colin, Bayer, Joachim, Bunse, Christian, Kamsties, Erik, Laitenberger, Oliver, Laqua, Roland, Muthig, Dirk, Paech, Barbara, Wüst, Jürgen, and Zettel, Jörg: Component-based Product Line Engineering with UML, Component Software Series, Addison-Wesley, 2001. [BCK1998]
Bass, Len, Clements, Paul, and Kazman, Rick: Software Architecture in Practice, Addison-Wesley, 1998.
[Bass1997]
Bassett, P.: Framing Software Reuse. Lessons From the Real World, Yourdon Press, 1997.
[BFKL+1999] Bayer, J., Flege, O., Knauber, P., Laqua, R., Muthig, D., Schmid, K., Widen, T., and DeBaud, J.-M.: PuLSE: A Methodology to Develop Software Product Lines, in Proceedings of the Fifth ACM SIGSOFT Symposium on Software Reusability (SSR'99), ACM, Los Angeles, CA, USA, p. 122-131, May 1999. [BGS2002]
Bayer, Joachim, Girard, Jean-Francois, and Schmid, Klaus: Architecture Recovery of Existing Systems for Product Families, 2002.
[Blah1998]
Blaha, Michael: On Reverse Engineering of Vendor Databases, in wcre, ieee, ieeep, Hawai, USA, p. 183-190, October 1998.
[BMRS+1996] Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., and Stal, M.: Pattern-Oriented Software Architecture: A System of Patterns, John Wiley & Sons, 1996. [CaBa1991]
Copyright © Fraunhofer IESE 2004
Caldiera, G., and Basili, V.R.: Identifying and Qualifying Reusable Software Components, IEEE Computer, p. 61-70 , 1991.
155
References
156
[CCM1993]
Canfora, G., Cimitile, A., and Munro, M.: A Reverse Engineering Method for Identifying Reusable Abstract Data Types, in WCRE~'93: Proceedings of the 1993 Working Conference on Reverse Engineering, \rm (Baltimore, Maryland; May 21-23, 1993), IEEE Computer Society Press (Order Number 3780-02), p. 73-82 , 1993.
[CCM1996]
Canfora, G., Cimitile, A., and Munro, M.: An Improved Algorithm for Identifying Objects in Code, Software-Practice and Experience, p. 25-28, January 1996.
[CDT2004]
CDT: C/C++ Development Tools, http://eclipse.org/cdt/, March 2004.
[ChCr1990]
Chikofsky, E., and Cross, J. H.: Reverse Engineering and Design Recovery: a Taxonomy, IEEE Software, 7(1):13-17, January 1990.
[CiVi1995]
Cimitile, A., and Visaggio, G.: Software Salvaging and the Call Dominance Tree, Journal of Systems Software, 28, p. 117-127, 1995.
[CKK2002]
Clements, Paul, Kazman, Rick, and Klein, Mark: Evaluating Software Architectures: Methods and Case Studies, Addison-Wesley, 2002.
[Cobo2004]
CobolDT: Cobol Development Tools, http://eclipse.org/cobol/, March 2004.
[DaWi1997]
Davis, M. J., and Williams, R. B.: Software Architecture Characterization, in Symposium on Software Reusability, Boston, USA, 1997.
[Demi1986]
Deming, W.: Out of the Crisis, MIT Center for Advanced Engineering Study, Cambridge, USA, 1986.
[Doug1999]
Douglas, B. P.: Doing Hard Time: Developing Real-Time Systems with UML, Addison-Wesley, 1999.
[EKS2003]
Eisenbarth, Thomas, Koschke, Rainer, and Simon, Daniel: Locating Features in Source Code, IEEE Transactions on Software Engineering, March 2003.
Copyright © Fraunhofer IESE 2004
References
[EtDa1996]
Etzkorn, L. H., and Davis, C. G.: Automated Object-Oriented Reusable Component Identification, Knowledge-based Systems, 9, p. 517-524, 1996.
[FFB2002]
Fey, Daniel, Fajta, Robert, and Boros, Andras: Feature Modeling: A Meta-Model to Enhance Usability and Usefulness, in Chastek, Garry (edt.), Proceedings of the Second Software Product Line Conference, LNCS 2379, Springer, San Diego, CA, p. 198-216, August 2002.
[GHJV1993]
Gamma, E., Helm, R., Johnson, R., and Vlissides, J.: Desgin Patterns: Abstraction and reuse of object oriented design, in Proceedings of the 7^th European Conference on Object-Oriented Programming (ECOOP'93), Kaiserlauten, Germany, July 1993.
[GiKo1997]
Girard, J.-F., and Koschke, R.: Finding Components in a Hierarchy of Modules: a Step towards Architectural Understanding, in icsm, "Bari", p. 58-65, September 1997.
[GKS1999]
Girard, J.-F., Koschke, R., and Schied, G.: A Metric-based Approach to Detect Abstract Data Types and State Encapsulations, jase, 6, p. 357-386, October 1999.
[GiKo2000]
Girard, J.F., and Koschke, R.: A Comparison of Abstract Data Type and Objects Recovery Techniques, Journal Science of Computer Programming, Elsevier, 36(2--3):149-181, March 2000.
[Gran2002]
Grand, Mark: Patterns in Java, Wiley Technology Publishing, 1, 2002.
[GXL2004]
GXL: Graph eXchange Language, http://www.gupro.de/GXL/, March 2004.
[HNS2000]
Hofmeister, C., Nord, R., and Soni, D.: Applied Software Architecture, Addison-Wesley, 2000.
[HuBa85]
Hutchens, D. H., and Basili, B. R.: System Structure Analysis: Clustering with Data Binding, tse, p. 749-757, August 85.
Copyright © Fraunhofer IESE 2004
157
References
158
[IEEE2000]
IEEE1471: IEEE Recommended Practice for Architectural Descriptions of Software-Intensive Systems, IEEE Computer Society, IEEE Computer, 2000.
[Jack1990]
Jackson, M., Some Complexities in Computer-Based Systems and Their Implications for Software Development, Tel-Aviv, Israel, In Proceedings of the International Conference on Computer Systems and Software Engineering, p. 344-351, May 1990.
[JaDu1998]
Jain, A. K., and Dubes, R. C.: Algorithms for Clustering Data, Prentice Hall, 1998.
[JDT2004]
JDT: Java Development Tools, http://eclipse.org/jdt/, March 2004.
[JeRu1997]
Jerding, D., and Rugaber, S.: Using Visualization for Architectural Localization and Extraction, in wcre, 5 th, p. unknown, October 1997.
[JoMu2002]
John, Isabel, and Muthig, Dirk: Tailoring Use Cases for Product Line Modeling, in Proceedings of the International Workshop on Requirements Engineering for Product Lines (REPL'02), p. 26-32, September 2002.
[JoDö2003]
John, Isabel, and Dörr, Jörg: Elication of Requirements from User Documentation, in Proceedings of REFSQ'03, Klagenfurt, Austria, June 2003.
[JDS2004]
John, Isabel, Dörr, Jörg, and Schmid, Klaus: User Documentation Based Product Line Modeling, Fraunhofer IESE, January 2004.
[KaKu1998]
Karypis, G., and Kumar, V.: METIS 4.0:Unstructured Graph Partioning and Sparse Matrix Ordering System, Department of Computer Science, University of Minnesota, Minnesota, USA, 1998.
[KnPi2003]
Knodel, Jens, and Pinzger, Martin: Improving Fact Extraction of Framework-based Software Systems, in 10th Working Conference on Reverse Engineering, IEEE Computer Society, Victoria, BC, Canada, November 2003.
Copyright © Fraunhofer IESE 2004
References
[Knod2003]
Knodel, Jens: Reconstruction of Architectural Views by Design Hypothesis, in Softwaretechnik-Trends, Gesellschaft für Informatik e.V. (GI), 23, 2, 2003.
[Kruc1995]
Kruchten, P.: The 4+1 View Model of Architecture, IEEE Software, 12(6):42-50, November 1995.
[LaGr1984]
Lanergan, R. G., and Grasso, C. A.: Software Engineering with Reusable Designs and Code, IEEE Transactions on Software Engineering, 10(5):498-501, September 1984.
[LiSn1997]
Lindig, C., and Snelting, G.: Assessing modular structure of legacy code based on mathematical concept analysis, in icse97, ieeecsp, 1997.
[LiWi1990]
Liu, S., and Wilde, N.: Identifying Objects in a Conventional Procedural Language; An Example of Data Design Recovery, in Proceedings of the IEEE Conference on Software Maintenance, p. 266-271 , 1990.
[MMRC+1998] Mancoridis, S., Mitchell, B.S., Rorres, C., Chen, Y., and Gansner, E.R.: Using Automatic Clustering to Produce High-Level System Organizations of Source Code, in %Workshop on Program Comprehension%, 1998. [MDA2004]
MDA: Model Driven Architecture, http://www.omg.org/mda/, March 2004.
[MoWo2003] Moise, D., and Wong, K.: An Industrial Experience in Reverse Engineering, in Working Conference on Reverse Engineering, Victoria, Canada, November 2003. [MuUh1990] Muller, H., and Uhl, J.: Composing subsystem structures using (K,2)-partite graphs, in IEEE Conference on Software Maintenance, p. 12-19, 1990. [MuNo1995a] Murphy, G., and Notkin, D.: Lightweight Source Model Extraction, in ACM SIGSOFT~'95: Proceedings of the Third Symposium on the Foundations of Software Engineering (FSE3), , Washington, D.C., USA, p., October 1995.
Copyright © Fraunhofer IESE 2004
159
References
[MuNo1995b] Murphy, G., and Notkin, D.: Software Reflexion Models: Bridging the Gap between Source and High-Level Models, in to appear in ACM SIGSOFT~'95: Proceedings of the Third Symposium on the Foundations of Software Engineering (FSE3), , Washington, D.C., p., October 1995. [MuNo1997] Murphy, G.C., and Notkin, D.: Reengineering with Reflexion Models: A Case Study., in Computer, ieeecsp, 8, p. 29-36, 1997.
160
[Muth2002]
Muthig, Dirk: A Light-weight Approach Facilitating an Evolutionary Transition Towards Software Product Lines, University of Kaiserslautern, Kaiserlauten, Germany, 2002.
[MuPa2002]
Muthig, Dirk, and Patzke, Thomas: Generic Implementation of Product Line Components, in Proceedings of the Net.ObjectDays (NODE'02), Erfurt, Germany, p. 316-333, October 2002.
[Post2003]
Postma, A.: A Method for Module Architecture Verification and its Application on a large Component-based System, Journal of Information & Software Technology, 45, p. 171-194, 2003.
[PrFr1987]
Prieto-Díaz, R., and Freeman, P.: Classifying Software for Reusability, IEEE Software, 4(1): 6-16, 1987.
[QA2004]
QA: Quality Assurance, http://www.software-kompetenz.de, March 2004.
[RiRo2002]
Riva, C., and Rodriguez, J. V.: Combining Static and Dynamic Views for Architecture Reconstruction, in csmr, ieeep, Budapest, Hungary, March 2002.
[Schw1991]
Schwanke, Robert W.: An Intelligent Tool For Re-engineering Software Modularity, in Proceedings of the 13th International Conference on Software Engineering, p. 83-92 , 1991.
[ScHa1994]
Schwanke, Robert W., and Hanson, Stephen J.: Using Neural Networks to Modularize Software, Machine Learning, 15, p. 137-168, 1994.
Copyright © Fraunhofer IESE 2004
References
[TuGo2001]
Tu, Q., and Godfrey, M. W.: The Build-Time Architectural View, in International Conference on Software Maintenance, Florence, Italy, November 2001.
[WaKl1999]
Warmer, J. B., and Kleppe, A. G.: The Object Constraint Language. Precise Modeling with UML, Addison-Wesley, 1999.
[Weis1984]
Weiser, M.: Program Slicing, IEEE Transactions on Software Engineering, 10(4):352-357, July 1984.
[Wigg1997]
Wiggert, T. A.: Using Clustering Algorithms in Legacy Systems Remodularization, in wcre, IEEE Computer Society Press, Amsterdam, p. pp. 33-43, October 1997.
[XMI2004]
XMI: XML Metadata Interchange, http://www.omg.org/technology/xml/index.htm, 2004.
[YHR1995]
Yeh, A.S., Harris, D.R., and Reubenstein, H.B.: Recovering Abstract Data Types and Object Instances from a Conventional Procedural Language, in WCRE~'95: Proceedings of the Second Working Conference on Reverse Engineering, \rm (Toronto, Ontario; July 14-16, 1995), IEEE Computer Society Press (Order Number PR07111) , p. 227-236 , 1995.
[ZhKa2004]
Zhao, Y., and Karypis, G.: Criterion Functions for Document Clustering: Experineces and Anaylses, Department of Computer Science, University of Minnesota, Minnesota, USA, 2004.
Copyright © Fraunhofer IESE 2004
161
Document Information
Title:
Definition of Reference Architectures based on Existing Systems
Date: Report: Status: Distribution:
March 31, 2004 IESE-034.04/E Final Public
Copyright 2004, Fraunhofer IESE. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means including, without limitation, photocopying, recording, or otherwise, without the prior written permission of the publisher. Written permission is not needed if this publication is distributed for non-commercial purposes.