Object-Oriented Model Size Measurement - modeldrivenengineering ...

6 downloads 105 Views 89KB Size Report
Experiences and a Proposal for a Process. Vieri Del ... oriented software have been proposed. ... students in following a software development process and in.
Object-Oriented Model Size Measurement: Experiences and a Proposal for a Process Vieri Del Bianco

Luigi Lavazza

University of Insubria Via Mazzini, 5 21100 Varese (Italy) +39-0332218938

CEFRIEL & University of Insubria Via Mazzini, 5 21100 Varese (Italy) +39-0223954258

[email protected]

[email protected]

ABSTRACT The size of SW artifacts is universally recognized as one of the most relevant metrics needed to support important software engineering and management practices, such as cost estimation, defect density estimation for test planning, etc. Traditionally, Lines Of Code (LOCs) and Function Points (FPs) have been used –often successfully– to characterize the physical and functional size, respectively, of software coded in languages like C, COBOL, or Ada. Since object-oriented (OO) programming became a popular development practice, researchers and practitioners have defined several techniques aimed at measuring object-oriented software. Under several respects, OO metrics have not yet fully proven their validity. Model-Driven Development requires that suitable model metrics be defined in order to characterize the size of models, since it is expected that several features of the resulting code can be estimated on the basis of the model’s quantitative characteristics. In this paper we report some experimental evaluations of metrics for OO models. Such metrics were obtained by adapting for OO models some metrics that ware originally conceived for OO code. According to the reported results, we sketch a methodological approach to the measurement of model size and other quantitative characteristics.

Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics – Process metrics, Product metrics.

General Terms Measurement.

Keywords Object-oriented systems, size measures, Function Point Analysis, empirical validation, Lines of Code, Model metrics, UML metrics.

1. INTRODUCTION Since the introduction of object-oriented programming in the industrial practice, several techniques aimed at measuring object-

oriented software have been proposed. Metrics were proposed to evaluate the characteristics of object-oriented design [5], to evaluate the characteristics of UML [8], to estimate the development effort [2][6], and for several other purposes. Among the proposed metrics, a number of approaches undertook the adaptation of the principles of Function Point Analysis [1] to object-oriented systems. Among these are the definitions of Object-Oriented Function Points (OOFPs) [2][3] and Class Points [6]. In previous work, we applied several object-oriented metrics in order to assess their validity [7][11]. For this purpose, we measured two sets of programs produced by students in a Software Engineering course, as well as a set of Open Source (OS) programs. The requirement of all the examined programs was to implement a chatting system; the programming language adopted was Java. The analysis of the collected data highlighted the relations existing among FP-like OO metrics, traditional OO metrics (e.g., number of classes, attributes, methods, etc.), and lines of code (LOCs). The paper is organized as follows: section 2 briefly illustrates the evaluations of object-oriented metrics deriving from our previous work. Section 3 proposes a methodological approach to OO Model measurement that should be able to overcome the limitations of current OO metrics suites. Section 4 draws some conclusions.

2. OO METRICS: LESSONS LEARNED The evaluations reported here are based on the analysis of two sets of programs. The first one includes the programs developed by master students of a software engineering course. Students were required to develop a simple –but not trivial– chatting system. The development was meant to test the ability of the students in following a software development process and in object-oriented design and implementation (using UML and Java). The second set of programs includes open source programs (all licensed under the GNU Public License) implementing chatting systems or similar functionalities. We performed the following measurements on all programs: •

Traditional LOC counting. The physical size of the programs was measured by means of the traditional Lines of Code metrics. More precisely, we counted effective LOCS; that is LOCs containing semantically meaningful statements: blank lines, comment lines, and lines containing only syntactic

information (open and close braces, etc.) were not counted. Given the OO nature of the programs, we measured the LOCs per class. •

Basic OO metrics. The number of elements that characterize an OO system was measured. In particular, we measured the number of classes, and the number of attributes and methods per class.



FP-based OO metrics: we measured OOFPs [2] and Class Points [6].

Table 1 summarizes the data obtained from the measurement of the available chatting systems. The top 5 lines refer to the open source applications, while the bottom lines refer to the students’ applications. For each of these, two versions, having different functionalities were released. So, for instance, BCF1 and BCF2 refer to the two versions of program BCF.

Table 1. The measures of the chatting systems. Application chateverywhere_java chateverywhere_swing llamachat chipchat freecs BCF2 CC2 FDT2 MDP2 RER2 SM2 BCF1 CC1 FDT1 MDP1 RER1 SM1

eLOC 1747 1868 1275 1381 13699 627 796 884 1441 733 697 406 631 657 1143 465 378

Classes 24 27 33 14 128 15 22 10 32 16 17 13 21 10 26 12 12

The data reported in Table 1 were derived directly from the Java code of the applications, by means of an automatic measurement tool we developed called MACXIM [10][11].

2.1 Relations between object-oriented basic metrics and LOCs. The statistical analysis performed in [7] showed a strong linear correlation (r2 = 0.97, F observed = 200, F critic. 99% = 7.56) between LOCs and basic metrics, namely the numbers of attributes and methods. For students’ programs, the number of effective LOCs (eLOCs) is a linear combination of the number of methods (NoM) and the number of attributes (NoA). In fact, the formula eLOC = 3.3 * NoA + 5.7 * NoM computes eLOC with a reasonably small absolute error (23%). We observed that such correlation does not hold for the considered set of open source programs. More precisely, OS

methods attributes 142 104 171 124 134 112 114 70 978 749 82 35 102 32 78 49 196 125 124 55 88 41 57 23 80 33 55 46 149 106 78 41 58 22

OOFP 877 1026 582 578 3790 370 482 419 852 485 387 266 454 354 720 334 263

CP 122 133 149 76 613 62 71 41 110 72 69 51 69 43 92 47 46

programs tend to be bigger (+75% on average) than the students’ programs having a similar number of classes. This seems reasonable, since OS programs implement non functional requirements that the students’ work did not address: for instance the students’ programs were not required to be very robust, reliable, secure, etc. In practice, we have that a good correlation between the LOCs and the basic OO metrics holds for the students’ programs, but it cannot be used as a predictive model for OS programs.

2.2 Results concerning object-oriented FPbased metrics The statistical analysis performed in [7] showed a strong linear correlation between basic metrics (i.e., number of attributes, methods, etc.) and both OOFPs and CPs. More precisely, both FPbased metrics appeared to be linearly proportional to the number of methods (NoM). In order to explore this issue for OS software, we employed the model derived from the analysis of students’ programs to predict

the FP-like measures of open source programs. The result was that both OOFPs and CPs of OS software can be predicted quite well, with a reasonably small average absolute error. This result is particularly relevant, since it casts the doubt that ambitious metrics, also relatively complex to compute, such as OOFPs and CPs, are actually equivalent to much more simple and straightforward metrics like the number of methods. According to the results reported above, a very strong linear correlation between OOFPs and CPs was found. In fact it was quite easy to observe, even without the help of any sophisticated statistical tool, that OOFP are generally between 6 and 7 times the CPs (with one noticeable exception concerning llamachat, probably due to an unusually good (!) organization of the program).

2.3 LOCs and defect density The work reported in [12] evaluated the possibility of predicting defect density in a very large industrial software system on the basis of size expressed in LOCs or on the basis of object-oriented metrics (namely a subset of Chidamber and Kemerer suite [5]). The results of the analysis confirmed that the intuitive idea the bigger, the more faulty, is generally valid, even though the fault density increases only linearly (i.e., the number of faults addressed is proportional to the number LOCs considered). On one hand using LOCs as fault predictor allows to stay on the safeside, avoiding the risk of focusing first on big amounts of nonfaulty modules. On the other hand, finding a way to predict accurately fault-proneness would allow optimizing verification effort: in the case considered in [12] 30% of the system was responsible for over 80% of the faults. However, the experimental results showed that –in the considered case– the OO metrics we used are no better predictors than LOCs. Actually they showed an ability to predict fault density quite close to LOCs’ one. Such negative result should not be considered definitive, since it is possible that the low “quality” of OO metrics were actually due to a not very good organization of the programs, object-oriented programming having been introduced in the organization recently, and having not yet been mastered by several programmers.

2.4 Conclusions on metrics evaluations In conclusion, we can say that it is quite hard to estimate LOCs on the basis of models’ OO metrics or, more generally, to establish a correspondence between the number of LOCs and the numbers that quantitatively represent the object-oriented characteristics of the systems (NoM, NoA, etc.). This difficulty was experienced with code; therefore we expect that it will prove even more difficult to establish a correlation between LOCs and the basic OO metrics of the model of the system. An explanation for this phenomenon –still to be proved– is that the number of lines of code depends on the number of classes, attributes and methods, but also on other factors. In fact, the number of classes, attributes and methods tend to provide a good

indication of the functional size, as they describe the amount of data to be managed and the operations to be realized. Note that the correlation between basic OO metrics and FP-like metrics supports this statement. However, the complexity of operations or the non functional requirements for the system clearly affect the way the code is developed, and can therefore contribute to determining the size of the code. In any case, it is not at all clear to what extent LOCs are meaningful in an OO context. Research is still ongoing, but previous results do not support too much optimism [13]. In conclusion, it is not yet clear whether research addressing the goal of estimating LOCs on the basis of model size metrics can be successful or useful.

Let us now consider the metrics that have been proposed to address the functional size of software by suitable adapting the principles of Function Point Analysis [1]. Note that the considered FP-like metrics can be applied not only to code, but also to models (namely to UML diagrams), measurement being performed when code does not yet exist, according to the principles of FPA. Our results show that for code FP-like metrics seem to be equivalent to far less sophisticated OO metrics. Since the elements that determine the value of OO FP-like metrics are the same in code and in models, we expect that also applying OO FPlike metrics to models could result in hardly any better indicator than those provided by basic metrics (number of attributes, number of methods, etc.). Although sufficient for some initial evaluation, the work we did does not support general conclusions. We need to analyze more programs, and of different nature, in order to confirm our results.

3. MODEL MEASUREMENT: HOW? The results illustrated in Section 2 demonstrate that in the objectoriented domain it is often difficult to get practically useful contributions from predefined metrics. More precisely, a set of metrics selected for a given purpose often fails to achieve the goal. On the contrary, the large variety of object-oriented metrics definitions proposed in the literature [14][15][16][17] demonstrate that it is quite easy to devise new metrics for the most different purposes, which seem reasonable from a conceptual point of view, but whose effectiveness in practice is rather hard to prove.

3.1 A two-step approach to measurement In our opinion, a way to overcome this impasse consists in performing wide range, general purpose measurements, without committing too early to a specific set of metrics definitions. Putting it in a rough way, the idea is that you should measure as much as you can, thus maximizing the possibility that some of the collected data actually prove useful for the given purpose. Although this process could seem unfeasible in general (you have to collect as many different metrics as possible!), when applied to

models, the approach is made viable by the availability of a finite metamodel. Consider for instance UML: all the features of a UML artifact that you can identify and measure (e.g., the number of attributes, the number of methods, the position of a class in a generalization hierarchy, etc.) are all specified in the definition of the UML metamodel [18]. So, the metamodel defines the domain of the elements that can be measured, while the actual metrics can be defined in various ways combining different functions involving the elements of the model. In fact, metrics definitions range from the simple count of metamodel elements (e.g., the number of classes) to relatively complex algorithms (see for instance the computation of Class Points [6]). This observation suggests that the measurement is performed in two steps: 1.

The model is analyzed, and the elements of the model as defined in the metamodel are identified, extracted and stored in a database. The elements are properly classified according to the metamodel. In other words, the conceptual model of the database is directly inspired by the metamodel.

2.

Measures are computed by querying the database.

Step 1 guarantee that all the relevant information is captured. It also allows the user to store the information in a persistent and safe repository, from which measures can be retrieved exploiting powerful query engines. Step 2 guarantees the maximum flexibility. A problem with traditional metrics is that for every interesting metrics definition and for every language you need a specific tool. When new metrics are defined or conceived by the final user, or when the same metrics are to be collected from a different language, the corresponding tool is often not available, thus making the measurement process long and expensive. On the contrary, having the model representation stored in a database, the user has just to execute the proper query in order to retrieve the desired measure. Writing queries is easier than writing measurement tools; moreover, queries can be stored and reused very easily. Easing the availability of measures makes it possible to perform several measurements according to different criteria, and then verify –via statistical analysis– whether any of the collected metrics can be used as reliable indicator of the models features of interest (size, complexity, design quality, etc.). The proposed approach to measurement needs to be effectively supported by tools. Next subsection describes a tool implementing the proposed process.

Models & code (intermediate representation)

UML models & Java code Developer

MACXIM translator

Metrics definitions (XQuery) Measurer

Export directive

Results of Measurement (XML)

Results of Measurement (exported format) Query engine

eXist XML DB

Query engine

Figure 1. The measurement process supported by MACXIM.

3.2 Automatic Measurement MACXIM (Model And Code XML-based Integrated Meter) is a tool that was developed according to the two step approach described above [10].

The MACXIM tool parses XMI files exported by UML editing tools or XML files derived from Java source code, and populates a XML database with the relevant information (e.g., classes, methods, attributes, associations, etc.). The actual measurement is then carried out simply by querying the database (see Figure 1). The XML files representation of Java source code is obtained through JavaML (http://www.badros.com/greg/JavaML/); JavaML converts Java source files into XML files through the OS Jikes Java compiler (previously owned by IBM), and can also convert XML files back to Java source code through an XSLT style sheet. We come now to a very important question; that is, what information should the intermediate representations contain? On one hand, in order to save space, it would be advisable to compute most of the measures in advance, thus storing only the results of the “preliminary” measurements. On the other hand, the need to support multiple metrics (possibly involving different languages) suggests that the intermediate representation should include as much as possible of the detailed information available in the source code or model. We adopted the following trade-off: •



The coarse grain information concerning the structure of the system and the main elements (such as classes, functions, types etc.) are reported in detail in the intermediate representation: for instance, for every Java class we keep complete information about attributes and attribute types, methods and methods signatures, inheritance relations, etc. We summarize the information concerning the source code of finer grained elements (such as methods implementation) by computing various Source Lines of Code (SLOC) measures (total LOCs, effective LOCs, comment LOCs, etc.), the Cyclomatic Complexity[11], and the number and types of dependencies of a method.

The tool supports two ways for defining and executing metrics: they can be defined in XQuery (the query specifies what XML elements of the intermediate representations should be used and what computation should be performed) or defined (and executed) by providing the specific Java measurement program. In general the execution of the query can both compute the specified metrics and also store the results in the XML database itself. Even though the tool is conceived to let users define their own metrics, the definitions of the most common metrics have been written and tested, and are readily available to users. We defined various kinds of metrics at different granularities. At the project level metrics involve different types of artefacts and generally several instances of the same artefact type (e.g. several Java files). Other metrics are defined at the package, class, and method level. In particular, it is possible to derive different kinds of measures from UML models simply by writing the proper queries; the tool is equipped with queries that compute basic OO metrics, Chidamber & Kemerer metrics, and FP-like object-oriented metrics. Measures can themselves be elaborated via queries, for instance to compare measures of a model with the measures of the corresponding implementation.

4. CONCLUSIONS The paper presents our view of how model measurement should be approached by developers. As far as metrics definitions are not validated and a knowledge corpus concerning OO model measurement is not available, the best option is –in our opinion– to pursue maximum flexibility. In practice this means collecting lots of metrics, and then discarding the ones that do not contribute to the informative goals. This approach, although at first sight requiring to waste time and effort, is made practically possible by tool support. It must be observed that here we reported exclusively our point of view on model size metrics. We did not take into consideration other researchers’ findings, leaving the discussion about the integration of available results as a workshop activity.

5. REFERENCES [1] A.J. Albrecht, “Measuring Application Development Productivity”, Proc. Joint SHARE/GUIDE/IBM Application Development Symp., pp. 83-92, 1979 [2] G. Antoniol, C. Lokan, G. Caldiera, R. Fiutem, “A Function Point-Like Measure for Object-Oriented Software”, Empirical Software Engineering, 4 (3): 263-287, September 1999. [3] G. Antoniol, C. Lokan, R. Fiutem, “Object-Oriented Function Points: an Empirical Validation”, Empirical Software Engineering, 8 (3): 225-254, September 2003. [4] L. Briand, E. Arisholm, S. Counsell, F. Houdek, and P. Thévenod-Fosse, “Empirical Studies of Object-Oriented Artifacts, Methods, and Processes: State of The Art and Future Directions,” J. Empirical Software Eng., vol. 4, pp. 387-404, Sept. 1999. [5] S.R. Chidamber and C.F. Kemerer: “A Metric Suite for Object-Oriented Design”, IEEE Trans. Software Eng., vol.20, no.6, pp.476-493, June1994. [6] G. Costagliola, F. Ferrucci, G. Tortora, and G. Vitiello, “Class Point: An Approach for the Size Estimation of Object-Oriented Systems”, IEEE Transactions On Software Engineering, Vol. 31, No. 1, pp. 52-74, January 2005, [7] V. Del Bianco and L. Lavazza, “An Empirical Assessment of Function Point-Like Object-Oriented Metrics”, METRICS 2005, 11th International Software Metrics Symposium, Como, 19-22 September 2005. [8] H. Kim, C. Boldyreff: “Developing Software Metrics Applicable to UML Models” , 6th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering, June 11th, 2002. [9] B.A. Kitchenham, N. Fenton, and S.L. Pfleeger, “Towards a Framework for Software Measurement Validation,” IEEE Trans. Software Eng., vol. 21, no. 12, pp. 929-944, Dec. 1995.

[10] A.F. Crisà, V. del Bianco, L. Lavazza, “A tool for the measurement, storage, and pre-elaboration of data supporting the release of public datasets”, Workshop on Public Data about Software Development (WoPDaSD 2006), June 10 2006, Como, Italy. [11] Vieri Del Bianco, Luigi Lavazza, “An Assessment of Function Point-Like Metrics for Object-Oriented OpenSource Software”, International Conference on Software Process and Product Measurement, MENSURA 2006, November 2006, Cádiz, Spain [12] G. Denaro, L. Lavazza, M. Pezzè “An Empirical Evaluation of Object Oriented Metrics in Industrial Setting”, The 5th CaberNet Plenary Workshop, November 2003, Porto Santo, Madeira Archipelago, Portugal. [13] Armour, P. G. “Beware of counting LOC”. Communications of the ACM 47, 3, pp. 21-24, March 2004.

[14] Lorenz M., Kidd J., “Object-Oriented Software Metrics”, Prentice Hall, 1994. [15] Bruntink, M., “Predicting Class Testability using ObjectOriented Metrics”, 4th IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'04), 2004. [16] R. Harrison, S. J. Counsell, R. Nithi, “An Evaluation of the MOOD Set of Object-Oriented Software Metrics”, IEEE Transactions on Software Engineering, 24(6), pp. 491-496, 1998. [17] J. Hogan, “An Analysis of OO Software Metrics”, Technical report, University of Warwick, May 1997. [18] “Unified Modeling Language: Infrastructure”, OMG, formal/05-07-05, March 2006.