Investigations in Applying Metrics to Multi-View ... - CiteSeerX

1 downloads 0 Views 495KB Size Report
agrams and class diagrams. When the set of methods ... trial designs. The cases we studied ... agram type is not used (e.g. use case diagram for use cases). 5.
Investigations in Applying Metrics to Multi-View Architecture Models Johan Muskens, Michel Chaudron, Christian Lange Department of Mathematics and Computer Science Technische Universiteit Eindhoven, P.O. Box 513 5600 MB Eindhoven, The Netherlands {J.Muskens, M.R.V.Chaudron, C.F.J.Lange}@tue.nl Abstract The goal of our research is to develop industry-proof software architecture and design metrics. We identify a number of problems that arise in computing software architecture and design metrics in industrial settings that were not encountered in computing source-code metrics. These problems include the absence of a single, unifying representation for architectures and they arise from the fact that architecture diagrams are used in an informal manner. In this paper we describe our approach towards defining metrics for architectures and designs which are represented in the 4+1 views paradigm using UML. We report our experiences with architectural metrics in industrial settings.

1. Introduction Software architecture has emerged as a foundational concept for the development of large, complex systems [3]. We focus on software architecture, as it plays a significant role in the success of a software project. It determines the technical quality of the system, organizational structure for the development as well as development cost and effort. Given its profound impact, there is a strong need for methods for assessing the quality of software architectures. Assessment is most cost effective in the early stages of a software development project. One approach in software architecture evaluation is the scenario-based one, which needs a vast amount of expert participation [4, 6]. This paper however focuses on metrics. Software metrics have complementary strengths, they are more objective and cheaper in terms of effort. A lot of work has been done on software process metrics [1]. Some work has been done on software product

metrics [12, 2]. For the most part, this work deals with the level of source code. Considerably less work has been done on in the domain of software architecture and design metrics, even though some of the work on source code metrics can also be applied at design level. The hitherto research that is done on architecture metrics is restricted to single diagram analysis. Most source code metrics only consider structural properties and can therefore be considered as single view metrics. The goal of our work is to define industry-proof software architecture product metrics, that deploy the information from multiple model view. We have developed metrics and tools applicable on architectures and designs represented in the 4+1 views [8] paradigm which is the current state of practice in the industry. We defined metrics based on a meta model of the architecture combining the information from multiple views and validated the metrics on multiple industrial cases. In this paper we describe our approach towards industry-proof architecture and design metrics and our experiences with these metrics in industrial case studies. This paper is structured as follows. Section 2 discusses architecture representations and section 3 the problems concerned with automated analysis of software architectures. Section 4 discusses the definition of metrics and how we combine information from different views on an architecture in our metrics. In section 5 we describe our experiences with applying software architecture and design metrics on industrial architecture and design models. In section 6 we present our concluding remarks.

2. Architecture Descriptions In this section we discuss the main paradigm for representing software architectures. Models based on this

paradigm will be the input to our methods for computing architecture product metrics. We make the case that, in particular in industrial settings, such representations incur a number of problems that prevent the straightforward application of existing design- and source-level approaches in section 3.

2.1. Software Architecture Representations The origins of software architecting can be traced to the early 1960s, but the subject started receiving scientific attention only in the 1990s. An IEEE standard was established only in 2000 which defines architecture description (AD) as a collection of artifacts to document structure and behavior of a system. It continues to describe that an AD is organized into one or more constituents, called views 1 . A view may consist of one or more architectural models. There have been several proposals as to which views should be covered in an architecture description. In the literature the 4+1 Views approach by Kruchten [8], originating from Rational, and the Hofmeister, Nord, Soni approach [5] originating from Siemens received most attention. For the remainder of our paper we will focus on the 4+1 Views approach. We summarize the views and the diagrams that this approach suggest to use for describing architectures. Logical view: This view represents the functional structuring of a system; i.e. a decomposition into a set of classes and the relations between them. This view is typically represented using class-diagrams. Process view: This view is aimed at illustrating the dynamics of a system. State diagrams depict internal behavior and interaction of classes is expressed in sequence diagrams. Deployment view 2 : This view is aimed at expounding the non-functional aspects related to the hardware of the system such as availability, reliability, performance and scalability. It describes the describes the mapping of logical components to hardware components. Implementation view 3 : This view focuses on the organization of software modules in the development environment. Diagrams shows the organization in chunks libraries, or subsystems that can be developed by subset of the development team. Use case view 4 : This view combines the other 4 views by the use of a small set of important scenarios 1 2 3 4

IEEE 1471 def. of view: A representation of a whole system from the perspective of a related set of concerns. Deployment view is a.k.a. Physical view Implementation view is a.k.a. Development view Use case view is a.k.a. Scenario view

which are instances of more general use cases. The scenarios are an abstraction of requirements. This view is typically denoted using use-case diagrams.

3. Problems in Automated Assessment To successfully automate the analysis of industrial architecture descriptions, the following problems need to be addressed. Scattered information: The information needed for computing a metric may be scattered over multiple views. For instance to compute coupling between classes, some dependencies may be indicated in diagrams in the logical view, while other dependencies may be indicated in process view diagrams. Incompleteness [10]: The scattering may be remedied by defining metrics in such a way that information from all relevant views is taken into account. However, this may limit applicability of metrics to situations where all relevant views have been completely drawn, while our experience is that many projects knowingly only complete a subset of the architecture view. Furthermore, there is another source of incompleteness within views. Message Sequence Charts (MSC) are typically used to show some example execution scenarios, but these examples do not define the complete behavior of a system. Disproportion: Typically some parts of a system that the architect(s) feel is more complex, may be worked out in more detail. Clearly, this imbalance works through in the outcome of automated assessment techniques. On one hand this is a disadvantage because it is desirable that the metrics at architecture level would correspond to the metrics at the implementation level. However, the fact that some parts are worked out in more detail, or that more diagrams deal with a specific part of a system are themselves an indication of the criticality or complexity of that part. This information could be useful in allocating testing resources. Inconsistency: Inconsistencies are inevitable in software development [15]. Typically industrial systems are developed by teams. Differences in understanding of a system and differences in modeling style may lead architects to design inconsistent models. There are several professional tools for drawing UML diagrams. Some provide some basic checks on consistency between diagrams. However, the scope of these checks is very limited and leaves much room for introducing inconsistencies between different views. Incompleteness, disproportion and inconsistency arise in much stronger degree in software architec-

ture diagrams than they do in source-code. This is due to the fact that for source code there are formal criteria for the form (grammar) and tools for checking these criteria (compilers). Diagram Quality: One design can be represented by diagrams in different ways; for instance one may or may not decompose a diagram if it contains many elements. While different ways for organizing diagrams do not change the actual design, this may influence the metrics for the design. Hence, there is a risk that the metrics are biased by diagram quality. Clearly, if the input data to the analysis technique is of poor quality (a model of low quality or with many incomplete spots) it is difficult to produce sensible analysis results (‘garbage in – garbage out’). However, the challenge lies in producing methods that yield informative results even under the ‘imperfections’ of industrial development practices (due to, e.g. time pressure. . . ).

4. Design and Architecture Metrics In this section we first briefly motivate and explain the methods and tools we have used to obtain architecture level metrics. Next we describe how and why to combine multiple views in the analysis.

4.1. Defining Architecture Metrics We have argued that to compute metrics for architecture descriptions, multiple views need to be taken into account. To this end we have defined a single unifying meta model (figure 1) in which we can combine the information from multiple diagrams and views. The meta model and metrics that are currently implemented using the information from the most commonly used diagrams in industry (according to a survey we conducted amongst practitioners[11]). Information from the following diagrams is used: use case diagram, message sequence chart (scenario), class diagram, state diagram. These diagrams are related. A use case diagram is linked to several scenarios (message sequence chart), a scenario contains classes from the class diagram and the state-diagram shows the states and statetransitions of classes from the class diagram. Obviously, the architectural diagrams contain more information and more diagrams exist. It is likely that for the definition of new metrics the following meta model will be extended. Based on this meta model we are able to calculate equivalents of most of the classical code metrics (coupling, nr of services / methods, fan-in, fan-out, depth of scenario). The main advantage is that we are able

Use Case Diagram 1..n Use Cases

Actors 1..n

1

Message Sequence Chart precedes

1..n 1..n Scenarios

Messages 1

0..n

consists of 1..n

1..n

calls

caller

callee ha

changes to

s

Class Diagram 1

1

0..1 1

1

Objects

Methods 1..n

States

1..n

1..n 1

1

1..n 1

1..n

StateMachines

Classes 0..1

Attributes 1..n

State Chart Diagram

Figure 1. Meta Model to calculate these metrics much earlier in the development process. We used the following definitions for the coupling, number of methods, fan-in and fan-out. We define the following sets: U S C T M

= = = = =

set set set set set

of of of of of

all all all all all

use cases scenarios classes states services/methods

Based on these sets we consider relations of the following types: Us ⊆ U × S < u, s > ∈ U s means s illustrates u Sm ⊆ S × C × C × M < s, c1 , c2 , m > ∈ Sm means in context of s, c1 calls m f rom c2 Tt ⊆ T × T < t1 , t2 > ∈ T t means there is a transition f rom t1 to t2 Cm ⊆ C × M < c, m > ∈ Cm means c provides m Ct ⊆ C × T < c, t > ∈ Ct means c can be in state t We use the following definitions for the classical single view metrics. Coupling of class cx : (# ∈ Cc : c1 = cx ) Number of methods per class cx : (#∈Cm : c = cx )

Fan-in of class cx : (#∈Sm : c2 = cx ) Fan-out of class cx : (#∈Sm : c1 = cx )

Class Diagram

4.2. Combining Multiple Views

State Diagram

Class X methodA() methodB()

In an open discussion with architects, we invited them to suggest characteristics of architectural designs that they considered potential indicators for the quality of the design. These discussions have led to the definition of a number of metrics that combine information from multiple views. Consider the following examples.

Complexity of services: 7/2

Figure 2. Complexity of a class

• Complexity of a class • Number of classes per use case • Number of use cases per class • Lack of cohesion between methods Complexity of a class: This metric combines the information of the class diagram and the state diagram. This metric gives an indication of the average complexity of the methods of a class. A class has a number of states and transitions between these states. Method executions are responsible for state changes. Therefore we presume that methods of a class are more complex when they are responsible for a larger number of state changes. This metric computes the average number of state transitions per method for a class (figure 2). Main thought behind this metric is that complexity of classes / methods is bad for maintainability and extensibility. The complexity of class c is formally defined as (# ∈ T r : < c, t1 >∈ Ct) (# ∈ Cm : c1 = c) Number of classes per use case: This metric combines the information of use-case diagrams, scenario diagrams and class diagrams. Use cases describe the functional requirements of a system. The requirements are implemented by classes. When the number of classes used for implementing a use case is high, this means that changes in a use case can have impact on a large number of classes. More specific it means that related functionality is spread over the design. This metric counts the number of classes which call a method or from which a method is called in the context of a scenario that is part of a specific use case (figure 3). The main thought behind this metric is that related functionality spread over the design is bad for maintainability and reusability, because reuse of functionality then

Number of Classes for Use Case X = 7

Use Case X

1

2

:Class A

:Class B

3 :Class C

4 :Class D

5 :Class E

6 :Class F

7 :Class C

:Class G

Figure 3. Number of classes per use case means the reuse of a large number of classes and understanding how functionality is implemented requires knowledge of a large number of classes. The number of classes of use case u is formally defined as (# c ∈ C : (∃ s ∈ S : (< s, c, ⊥, ⊥ > ∈ Sm ∨ < s, ⊥, c, ⊥ > ∈ Sm) ∧ < u, s > ∈ U s)) Number of use cases per class: This metric combines the information of use-case diagrams, scenario diagrams and class diagrams. When a class is used for the implementation of a large number of use cases it can be an indication that cohesion between the methods of a class is low. It also means that when the class contains errors, many system features will suffer. Hence, this impairs the robustness of the system. This metric counts the number of use cases that contain a scenario in which a method of a specific class is called or in which that class calls a method (figure 4). The main thought behind this metric is that cohesion within a class should be high, because this is good for maintainability. Furthermore a defect in a single class should

Number of Use Cases for Class X = 2

:Class A

:Class B

:Class X

:Class D

:Class E

:Class F

:Class C

:Class G

:Class E

:Class F

:Class X

:Class G

kU k Information Systems Expresso 25 Beheer n.a. Course Adm. 21 Building Adm. 16 Embedded Systems MHP (es) 18 Med. ArchNetw. n.a. Med. Aquisition n.a. Med. Papu n.a. Med. BeamLim n.a. Med. FSCRis n.a. Med. FSCColl n.a. Med. DMT n.a. Med. RIS n.a. Med. ImageCh. n.a. Med. QA n.a. Med. Coll. n.a. Med. UIDS n.a. Med. IP n.a.

Figure 4. Number of use cases per class

Use Cases

Figure 5. Lack of cohesion not effect all (or a large part) of the system functionality. The number use cases of class c is formally defined as (# u ∈ U : (∃ s ∈ S : (< s, c, ⊥, ⊥ > ∈ Sm ∨ < s, ⊥, c, ⊥ > ∈ Sm) ∧ < u, s > ∈ U s)) Lack of cohesion between methods: This metric combines the information of use-case diagrams, scenario diagrams and class diagrams. When the set of methods of a class can be divided in to a large number of disjoint subsets each used only by a subset of use cases, it is an indication that the cohesion between the methods of a class is low. This metric counts the maximum number of subsets of methods of a class such that the sets of use-cases using the subsets are disjoint (figure 5). Traceability: For some of our multi-view metrics we assume that there is a mapping (or traceability) between model elements (e.g. sequence diagrams belong to use cases). We consider this mapping as part of the completeness of a model. [9]

kSk

kM k

kT k

336 110 47 12

15 22 45 17

4132 423 241 106

n.a. n.a. 18 37

163 163 617 411 59 43 29 68 111 115 18 70 89 157

39 n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a.

1018 1035 3640 408 312 228 49 258 344 552 63 350 231 754

n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a.

Table 1. Case characteristics

Methods

Class

kCk

5. Empirical Observations We used our method to analyze a number of industrial designs. The cases we studied were all designed using Rational RoseT M . First we give some general characteristics of the cases. Next we will describe our experiences with classical metrics (subsection 5.1) and multi view metrics (subsection 5.2). We analyzed 13 subsystems (each designed by a different team of developers) of a software system for medical imaging. The size of the subsystems varies between 24 and 617 classes (See table 1 for information on case sizes 5 ). We analyzed 5 other designs, each describing an entire system. The designs originate from the domains embedded systems and information systems. We experienced that industrial designs are almost never complete (according to the textbooks) and usually only a limited number of diagram types are used. Often the focus is on class-diagrams and some behavioral diagrams are used. We observe this in table 1, not applicable (n.a.) means that the corresponding diagram type is not used (e.g. use case diagram for use cases).

5

kXk = size of set X See subsection 4.1 for set definitions.

5.1. Classical Metrics A collection of experiments enabled us to see how classical metrics (coupling, number of methods per class, fan-in and fan-out) performed on models instead of code. The results can be found in figures 6, 7, 8, and 9. To identify outliers we computed a threshold. threshold

= average value + 2 ∗ standard deviation

The figures show for each analyzed model the maximum value, the average value and the threshold for each individual metric. Experiments show that equivalents of classical source code metrics, identify the same problem elements. In cases where we have source code as well as the design available, problem elements identified by the metrics based on the design were also marked by the source code metrics. The advantage of the design metrics is that they can be calculated much earlier in the development process. Hence problems can be detected earlier and solved more cost effective. Our measurements show that there are reasonable differences between embedded systems and information systems. Coupling, fan-in and fan-out are higher for embedded systems. Information systems score higher on the number of methods for a class. Information systems focus on providing services. Embedded systems focus on controlling hardware. We believe providing services results in a higher number of methods of a class and controlling structures tend to have high coupling.

Figure 6. Coupling

5.2. Multiple View Metrics Typically multiple architectural views provide more information about the intention of the design. Hence potentially new metrics can be defined that are better estimators for quality properties. Some proposals for metrics that are based on multiple views are described in subsection 4.2; [14] contains a more extensive explanation. These multi-view metrics have been used to identify weak spots in the design using outlier analysis. Initial validation has been performed by asking the architects of the analyzed systems whether they agreed or disagreed with the weak spots indicated by the metrics. The results of this experiment provide data about the correct indications and ‘false positives’ (type II error) indicated by the presented metrics. This result was generally well received. To complete this picture, we

Figure 7. Fan-in would also need to know the number of ‘false negatives’ (type I error); i.e. weak spots not identified by the metrics. This information is difficult to obtain via interviews with architects, as in most cases these spots are also not found by specialists. We are currently in the process of collecting data from the testing and maintenance phases of systems to empirically validate these results. The metrics ‘number of classes required for a use case’ and ‘number of use cases that require specific classes’ showed results on a design for a consumer product at Philips ASA lab. We analyzed a design of 163 classes implementing 18 use cases. ‘Number of classes

Figure 8. Fan-out

projects). In the small test cases the ‘complexity of a class’ metric was able to identify complex classes (this means the architect agrees that the identified classes are very complex). However it failed to identify some complex classes, because sometimes a single transition requires a very complex implementation and this is not detected by just looking at (maybe too abstract) state diagrams. An additional conclusion from these experiments is that a modest number of false positives is not considered a drawback of the method, because it is relatively easy to discard such results. In a way, the indicators provided by the metrics can be considered as a means for focusing an architectural review/inspection procedure. We have defined and collected metrics that are based on combining the information from multiple views. Although we have collected data from 18 different case studies, we found that the difference in maturity and completeness of the models prevent the objective comparison of these metrics. This does lead to an interesting avenue for research: measuring the completeness of models and its impact on the accuracy and precision of metrics.

6. Concluding Remarks

Figure 9. Number of methods

required for a use case’ marked one use case as being critical and ‘number of use cases that require specific classes’ marked one class. Interviews with the architect led to the conclusion that the use case was critical, complex and required very large parts of the system. About the class that was marked there were some discussions, but the class is heavily used in a lot of scenarios and the class was also marked by the classical metrics fan-in and fan-out. The industrial designs available to us did not make use of state diagrams to describe the behavior of the classes, therefore the metric ‘complexity of a class’ was used only on small designs (for example student

Software architecture has significant impact on the quality, cost and development time of software projects. Hence, the quality of software architectures needs to be evaluated in the early stages of the development process. Software product metrics are useful because they are objective, fast, repeatable and can be applied on parts of a product. Especially in projects with a large number of junior designers metrics can be useful. However specialist are still necessary for the interpretation of the metrics. Time and money can be saved by introducing product metrics in the early stages of software development. In the early stages of development, usually designs are high level and very abstract and there is no code yet. The architecture documentation that is available consists of multiple complementary views. In order to perform useful analysis at this stage, it is important to combine the information of different views. Moreover, the availability of multiple architectural views enables us to define more high level metrics and consistency / completeness checks.

6.1. Related Work Some work on metrics for assessing architecting processes is described in [1]. MAISA [16] is a research

project aiming at developing methods for the assessment of software quality based on design level UML diagrams. MAISA uses only single view metrics, focussing only on the structure of the system. Currently, the MAISA project is developing metrics based on UML design patterns. In [13] a number of metrics based on use cases diagrams are proposed and some structural metrics are discussed. Similar to our approach, [7] defines a number of metrics based on a meta model. Their metrics include structural, interaction and use case metrics. However the paper contains no validation. They do identify potential inconsistencies in UML models as problem in computing metrics.

6.2. Contributions Most work on software metrics focusses on process metrics. Product metrics are usually based on source code or on a single design diagram. We have developed metrics and tools that can be applied on software architecture and software design artefacts. Further more we developed metrics that combine information from multiple views. Often analysis techniques are validated on classrom-size cases. Our techniques can be and are applied on industrial size cases in order to validate the proposed metrics. Furthermore we are keeping a database with industrial metric results which allows us to do benchmarking.

6.3. Future Work System architecture representations typically provide more information about the intention of the design. Therefore, we are considering new metrics for predicting system properties. The combination of information from multiple views seems promising. In our future research we will try to develop and validate new metrics combining the information available in different views. In our experiments we noticed that architectures are often inconsistent and incomplete. The impact of these inconsistencies and incompleteness on the predictive value of architectural metrics needs to be investigated [10]. Furthermore we are investigating how we can use software visualization techniques to effectively gain insight in both the structure and the metrics computed on various software architectures - due to large volume of data tabular output does not suffice. The approach needs to be generic for a large class of architectural analysis problems, scalable, and easily customizable for getting insight into domain-specific aspects.

References [1] A. Avritzer and E. Weyuker. Investigating metrics for architectural assessment. In IEEE Proceedings 5th Int. Symp. on Software Metrics, Mar. 1998. [2] V. Basili, L. Briand, and W. Melo. A validation of object oriented design metrics as quality indicators. In IEEE Transactions on Software Engineering, Vol 22 (10), 751761, 1996. [3] L. Bass, P. Clements, and R. Kazman. Software Architecture in Practice. Addison-Wesley, 1998. [4] P. Clements, R. Kazman, and M. Klein. Evaluating Software Architectures. SEI Series in Software Engineering. Addison-Wesley, 2002. [5] C. Hofmeister, R. Nord, and D. Soni. Applied Software Architecture. Addison-Wesley, 1999. [6] R. Kazman, G. Abowd, L. Bass, and P. Clements. Scenario based analysis of software architecture. In IEEE Software, 47-55, Nov. 1996. [7] H. Kim and C. Boldyreff. Developing software metrics applicable to UML models. In Proc. of 6h ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering (QAOOSE 2002), June 2002. [8] P. Kruchten. Architectural blueprints - the 4+1 view model of architecture. In IEEE Software, Vol.12 No. 6, Nov. 1995. [9] C. Lange and M. Chaudron. An empirical assessment of completeness in UML designs. In Proc. of the 8th International Conference on Empirical Assessment in Software Engineering, 111-120, May 2004. [10] C. Lange, M. Chaudron, J. Muskens, L. Somers, and H. Dortmans. An empirical investigation in quantifying inconsistency and incompleteness of UML designs. In Proceedings of 2nd workshop on consistency problems in UML-based software developement (San Francisco), Oct. 2003. [11] C. F. J. Lange. Empirical Investigations in Software Architecture Completeness (M.Sc. thesis). Technische Universiteit Eindhoven (The Netherlands), 2003. [12] M. Lorenz and J. Kidd. Object Oriented Software Metrics. Prentice Hall, 1994. [13] M. Marchesi. Ooa metrics for the unified modeling language. In Proc. 2nd Euromicro Conf. On Software Maintenance and Reengineering, Florence, Italy, 67-73, 1998. [14] J. Muskens. Software Architecture Analysis Tool (M.Sc. thesis). Technische Universiteit Eindhoven (The Netherlands), 2002. [15] W. Schwanke and E. Kaiser. Living with inconsistency in large systems. In Proc. Int. Workshop on Software Version and Configuration Control, pp. 98-118, Software Configureation Management Conference, 1988. [16] A. Verkamo, J. Gustafsson, L. Nenonen, and J. Paaki. Measuring design diagrams for product quality evaluation. In Proc. of the 12th European Software Control and Metrics Conference, London, England, 357-366, Apr. 2001.

Suggest Documents