of Chidamber and Kemerer's (C&K) OO design metrics [11], together with two further ..... [12] Coad, P., and Mayfield, M., âJava-inspired design: Use composition rather ... [21] Stroustrup, B., âAdding classes to the C language: An exercise in ...
The role of inheritance in the maintainability of object-oriented systems Rachel Harrison and Steve Counsell
Abstract In this paper, we describe the empirical evaluation of the level of inheritance in five object-oriented systems. The systems studied vary in both size and application domain. Results from our analysis together with other recent investigations seem to support the thesis that inheritance is used either sparingly or incorrectly. Statistical correlation between four inheritance metrics and a set of dependent variables (non-comment source lines, software understandability, known errors and error density) provide evidence for this claim. It is also not clear that systems using inheritance will necessarily be more maintainable than those that do not. The data analysed from two of our systems suggests that deeper inheritance trees are attributes of systems which are harder to understand and (by implication) harder to maintain. We analyse why this might be the case, and propose ways of remedying this situation.
1. Introduction A key feature of the object-oriented (OO) paradigm is that of inheritance [21]. Use of inheritance is claimed to reduce the amount of software maintenance necessary, ease the burden of testing [1], and produce more reliable, high quality software [2]. In this paper, we investigate five OO systems, all written in C++, and empirically evaluate hem using a subset of Chidamber and Kemerer's (C&K) OO design metrics [11], together with two further inheritance-based design metrics developed as part of the MOOPS project [17][18]. The results of our analysis (and that of other investigations) seem to support the thesis that inheritance is either used sparingly during the development process, or is used incorrectly [5][9]. Use of inheritance does not seem to be delivering the benefits it promised, and it is not clear that systems using inheritance are easier to maintain than those that do not. In this paper, an analysis of the faults found in three of the five systems investigated showed very little relationship with any of the inheritance-based metrics. Inheritance-based metrics collected for all five systems also show a lack of relationship with three of the dependent variables collected for the five systems (i.e., the number of non-comment source lines, the number of known errors and error density). However, an interesting relationship was found between the depth of the inheritance tree and software understandability. In the next section, we describe related work in this area. In Section 3, we describe the five application domains studied and the metrics collected for each. Section 4 presents summary data for each of the five systems studied. Correlations and data analyses of each system are presented in Section 5, incorporating analyses of fault data in three of the systems. The implications of our results are then described (Section 6). Finally, some conclusions and suggestions for further work are presented (Section 7).
2. Related work Various empirical investigations have been made into the use of inheritance. For example, the seminal paper by Chidamber and Kemerer which describe their metrics [11] detail
449
empirical analyses of systems at two sites, one of which used C++ and the other Smalltalk. Whilst the authors conclude that their metrics do offer insights into whether developers are following object-oriented principles, it is also noted that the extent of inheritance at both sites was small (with median Depth of Inheritance Tree (DIT) values of 1 and 3 for the C++ and Smalltalk sites respectively). The explanation given is that designers wanted to retain comprehensibility and simplicity in favour of reuse. This would seem to imply a view that inheritance hinders the maintenance process. In Chidamber, Darcy and Kemerer [10], three commercial object-oriented system are empirically investigated, and, again, none showed significant use of inheritance. Bieman and Zhao [5] describe a study of 19 C++ systems, containing 2,744 classes in total. They found that only 37% of these systems had a median class inheritance depth greater than 1; the GUI applications studied showed the highest mean inheritance depth (3.46). This was significantly smaller than the maximum inheritance suggested by Booch of 6 plus or minus 2 [7]. Cartwright and Shepperd [9] describe the collection of a subset of metrics from a large telecommunications system (133,000 lines of C++). Their main finding was a positive correlation between the DIT metric of Chidamber and Kemerer [11] and number of userreported problems, casting doubt on the effective use of inheritance. They also report relatively little use of inheritance in the system they analysed, and classes with the highest change densities were found to be low down in the inheritance hierarchy (i.e., away from its root). In Basili, Briand and Melo [3], the results of an empirical study of the Chidamber and Kemerer (C&K) metrics are presented. The metrics are used as predictors of fault-prone classes. Data from eight medium-sized management systems, developed in C++ was collected. An experimental hypotheses suggested that a class located deep in the inheritance hierarchy was more fault-prone than a class higher up in the hierarchy; this hypothesis was found to be supported with statistical significance. This clearly implies that far from aiding maintenance, use of inheritance had the opposite effect. Lastly, Daly et al. [14] describe an experiment in which subjects were timed performing maintenance tasks on OO systems of varying levels of inheritance. Systems with 3 levels of inheritance were shown to be easier to modify than systems with no inheritance. Systems with 5 levels of inheritance were, however, shown to take longer to modify than the systems without inheritance. This would seem to imply that a certain level of inheritance is useful, but that there is an optimum level of inheritance beyond which maintainability becomes problematic. However, Harrison et al. [15] replicated the experiment and found that flat systems (containing no inheritance) were easier to modify than systems containing three or five levels of inheritance, although results indicated that larger systems were equally difficult to understand whether or not they contained inheritance.
3. Empirical analysis Empirical Analysis can be used to investigate the association between proposed software metrics and other indicators of software quality such as maintainability; thorough qualitative or quantitative analyses can be used to support these investigations [20].
3.1. Systems investigated Our empirical analysis consisted of five projects. These were:
450
? ? System 1, a Library of Efficient Data Algorithms (LEDA), written in C++ and consisting of about 123 KLOC (197 classes), designed and developed at the Max Planck Institute in Saarbruecken, Germany. ? ? System 2, the Gnu C++ Class Library, consisting of 53.5 KLOC (96 classes). ? ? System 3, SEG1, a system consisting of eleven medium-sized traffic simulation systems written in C++ by undergraduate computer science students. The average size of a system was 2.5 KNCSL and 16 classes. ? ? System 4, SEG2, consisting of twelve medium-sized traffic simulation systems written in C++ by undergraduate computer science students. The average size of a system was 3.5 KNCSL and 22 classes. ? ? System 5, SEG3, consisting of twelve medium-sized traffic simulation systems written in C++ by undergraduate computer science students. The average size of a system was 4.5 KNCSL and 27 classes.
3.2. System architectures The Gnu (System 1) inheritance hierarchies tended to be fairly shallow (with a mean depth of inheritance tree of 0.93). The LEDA (System 2) inheritance structure, on the other hand, contained a large number of small inheritance trees (the mean depth of inheritance trees was 0.63). Consequently, we would expect more coupling due to inheritance to be found in Gnu (System 1). In both LEDA and Gnu, extensive use was made of friend functions thus avoiding the use of inheritance via invisible coupling. Compared with the Gnu and LEDA systems, SEG1 (System 3), SEG2 (System 4) and SEG3 (System 5) contained very little inheritance; their inheritance hierarchies tended to be shallow. Neither SEG1, SEG2 or SEG3 used friend functions to the extent of LEDA and Gnu. Interestingly, in all systems, there was a high proportion of singleton classes (i.e., classes without any inter-class coupling). In each of the LEDA and Gnu systems, evidence was found of several base classes containing a large number of methods from which a large number of classes inherited. All systems investigated showed some evidence of non-inheritance based coupling.
3.3. Data collected For each system, the C&K inheritance metrics collected were, for each class: ? ? Depth of the Inheritance Tree (DIT). DIT was defined as the depth of inheritance of a class, i.e., its location in the inheritance hierarchy. ? ? Number Of Children (NOC). NOC was defined to be the number of immediate subclasses subordinate to a class in the class hierarchy. In addition, two further OO design metrics, developed using the Goal Question Metric (GQM) [4] method were also collected. These metrics are collectable at design time from the relevant object-model: ? ? The Number of Methods Inherited by a class (NMI). The total number of methods which can be potentially inherited by a class from all its superclasses. ? ? The Number of Methods Overridden per class (NMO). This is the number of virtual base class methods overridden by a class method. In addition, for each class in a system, we collected the following metrics: ? ? NCSL: the number of non-comment, non-blank source lines. ? ? SU: Software Understanding [6], which ranks software according to structure, application clarity and self-descriptiveness. Software understanding is rated on an ordinal scale of 1 to
451
5 (where 1 represents the simplest and easiest to understand class, and 5 the most complex and difficult to understand). A program with strong modularity, well documented with a clear match between program and application world-views would be considered easily understandable and given a rating of 1. A program with low cohesion with no match between program and application world-views and containing obscure code would be rated at the other end of the scale (5). Although subjective in nature, Boehm's SU metric represents an easily-collectible, consistent and useful reflection of class complexity. The data was collected as follows: NCSL was measured using an automated software tool. For all systems, the subjective complexity rating was provided by two data collectors, neither of whom had significant involvement in the development process of any of the systems, but whose background was software engineering. Limited system and user documentation was available only for the three SEG systems. Finally, the software engineering practices used to develop SEG1, SEG2 and SEG3 were well known to both collectors.
3.4. Fault data For Systems 3 (SEG1), 4 (SEG2) and 5 (SEG3), data relating to faults were also collected, as defined below: ? ? KE: the number of known errors (per class) found during testing. ? ? KE/KNCSL: The density of errors per KNCSL (per class). Each of the three systems were tested using a large set of pre-planned tests. A single tester (one of the data collectors) was allocated to each of SEG1, SEG2 and SEG3. Whenever an error was discovered in either of the systems, a Fault Report Form was completed indicating the exact nature of the error, the classes which were affected, a severity code depending on how serious that error was in terms of the system continuing operation and other comments useful for fixing it. The error was repaired by the tester, and the set of tests repeated. This gave a good picture of the distribution of errors across the classes in each of the systems and the overall density per KNCSL. In the next section, we describe the summary data for each of the systems studied.
4. Summary data The summary data for NCSL, total classes and methods are shown in Table 1. The summary data for the C&K metrics (DIT and NOC), and the two additional design metrics (NMI and NMO) are shown for all five systems in Appendix A. For System 1 (LEDA), the median values for each of the four metrics collected are zero in three cases and one in the other case. Table 1: Summary metrics for systems 1, 2, 3, 4 and 5 System 1 (LEDA) System 2 (Gnu) System 3 (SEG1) System 4 (SEG2) System 5 (SEG3)
NCSL 123000 53000 27500 42000 54000
Classes 197 96 113 172 317
Methods 4751 1366 1082 919 682
There is evidence of large numbers of inherited methods however. For System 2 (Gnu), the median values for each of the metrics are again small, although there is some evidence for large numbers of inherited methods. Systems 3, 4 and 5 (SEG1, SEG2 and SEG3) show very 452
little evidence of any inheritance as indicated by their median values. Evidence seems to suggest that we have a low DIT in many cases, but a large number of methods being inherited; this is particularly acute in the two larger systems (Systems 1 and 2). The large number of inherited methods in Systems 1 and 2 can be explained by the intrinsic functionality of classes in each system. In earlier work [16], a commercial retail system written in C++ and the Microsoft Foundation Classes were investigated in a similar way, and again, large numbers of inherited methods were found in inheritance hierarchies with a DIT of four or less.
5. Data analysis Correlations were performed for each of the systems against SU and NCSL. The SU and NCSL correlations for System 1 (LEDA) are shown in Table 2. Only metrics for which a relationship was identified are given in the table. The only notable feature is the positive significant relationship between DIT and SU. This indicates that as the depth of the inheritance tree increases, the SU score increases, reflecting a decrease in understandability. This would seem to support the claim that increasing complexity at deeper levels in the inheritance hierarchy, causes more faults to be invested at those levels. Table 2: SU and NCSL correlations for System 1 (LEDA) System LEDA
Dep. Var. SU NCSL
DIT +ive#
NOC
NMI
NMO
+ive: positive relationship # significant at the 5% level The correlations for SU and NCSL for the System 2 (Gnu) are shown in Table 3. Table 3, in common with LEDA, also shows a significant positive relationship between DIT and SU; this supports the view that classes at the bottom of an inheritance hierarchy may be more difficult to understand and maintain than those higher up. Table 3: SU and NCSL correlations for System 2 (Gnu) System Gnu
Dep. Var. SU NCSL
DIT +ive# -ive*
NOC
NMI
NMO
-ive*
-ive#
+ive: positive relationship; -ive: negative relationship * significant at the 1% level; # significant at the 5% level From Table 3, there is a significant negative relationship between DIT and NCSL, indicating, for this system, that at lower levels of the inheritance hierarchy, classes become smaller in size. Considering the SU and NCSL results together, it would seem that despite classes being smaller in size at lower levels of the hierarchy, they are not easier to understand. One explanation for this could be that a developer, when maintaining classes at lower levels of the inheritance hierarchy, focuses more on classes above than the class being considered. The significant negative relationship between NMI and NCSL indicates that as the number of methods inherited increased, the size of the class tended to decrease in this case. This would seem to be intuitively correct. Similarly, for the significant negative relationship between NMO and NCSL.
453
The class correlations for NCSL and SU for Systems 3, 4 and 5 (SEG1, SEG2 and SEG3) are given in Table 4. Interestingly, two of the SEG systems (SEG1 and SEG2) show a negative significant relationship between SU and DIT. Classes at deeper levels of the inheritance hierarchy were found to be easier to understand than those higher up in the hierarchy. One possibility is that classes at DIT 1 tended to inherit most of their functionality from a few key classes and consequently were trivial to understand. Further experiments are required before these results can be generalised however, particularly in view of the large number of singleton classes in the SEG systems. Table 4: SU and NCSL correlations for the three SEG systems investigated System SEG1
Dep. Var. SU NCSL
DIT -ive#
NOC
SEG2
SU -ive* NCSL SEG3 SU NCSL +ive: positive relationship; -ive: negative relationship * significant at the 1% level; # significant at the 5% level
NMI
NMO
+ive* +ive*
5.1. Fault analysis As well as the above correlations, an analysis was carried out on the faults found during testing for Systems 3,4 and 5 (SEG1, SEG2 and SEG3). The results of this analysis are shown in Table 5. From Table 5, the only noticeable feature is the significant negative relationship between DIT and number of known errors (KE), suggesting that as DIT increases, the number of faults found decreases; this could usually be explained by class specialisation at deeper levels of the inheritance hierarchy, making classes at deeper levels less prone to errors. This would seem to contradict the view that more faults are found at deeper levels of the inheritance hierarchy. However, use of key classes providing the required functionality at the root of the inheritance hierarchy (for all three SEG systems) could explain this result. Examination of the Fault Report Forms indicated that these key classes were responsible for the majority of the faults found in each of the SEG systems. This also explains the significant results for NMI. No relationships were found for NOC or NMO. Table 5: KE and KE/KNCS correlations for the SEG systems investigated System SEG1
Dep. Var. KE KE/KNCSL
DIT -ive#
NOC
SEG2
KE -ive* KE/KNCSL SEG3 KE KE/KNCSL +ive: positive relationship; -ive: negative relationship * significant at the 1% level; # significant at the 5% level
454
NMI -ive# +ive*
NMO
6. Implications of these results In OO systems, inheritance can be considered to be a form of coupling, in that to understand the functionality of one class, other related classes may have to be understood as well; this adds a level of structural and cognitive complexity to a system. Hence, classes at the bottom of a deep inheritance hierarchy would be more difficult to maintain than those higher up [13]. From a maintenance point of view, this would seem to condone either having zero or relatively low amounts of inheritance in systems, or inheritance in the form of a number of small, disjoint subtrees. We find support for this in [3], which hypothesises that a system with a large, single inheritance structure is more fault-prone than a system comprising many independent subtrees. The shapes of forests of inheritance trees can also affect the amount of code reuse in an object-oriented system [5]. Our results from Systems 1 and 2 show that understandability (and, by implication, maintainability) decreases as DIT increases. A possible reason for the confusion over the use of inheritance is that inheritance itself is not an easy concept to learn and use well; the lack of use of inheritance may result from a fear of the effects of using it. For example, in C++, the distinction between public, private and protected inheritance is subtle; such idiosyncrasies add to the complexity of learning to use inheritance, perhaps discouraging its use during design and implementation of OO systems. In [13], common abuses of inheritance are described. These can be attributed to a lack of experience on the part of the programmer. Use of C++ friends, as a simple alternative to inheritance may also explain the lack of inheritance in C++ systems. Friends can be used to change the functionality of a system, after it has been written, without major changes to the structure of the design or resulting code. This would be indicative of a faulty design and potential maintenance problems [12]. In addition, by violating encapsulation, friends are considered to be bad programming practice [8]. In proposing the use of various OO metrics, [19] suggest an upper limit of zero on friends because of their potential for harming the structure of a system. It could also be argued that inheritance falls into the class of programming language features which are accepted and instigated without thorough analysis, only to be subsequently rejected when experience has shown them to be unhelpful. It would appear that encapsulation is more relevant and applicable than inheritance, and its benefits to the maintenance process are more obvious. In [12], it is suggested that in future systems, architectures based on aggregation will be more appropriate than those based on inheritance. This is particularly true of systems incorporating multiple inheritance whose structures have tended to be elaborately concocted. Interestingly, the NOC metric showed no significant correlations in any of the tables. One explanation might be that a maintainer or developer rarely views inheritance on a breadth-wide basis; the DIT provides a depth-wise view which is more relevant from a maintenance point of view in terms of which classes are affected when a modification is made. Lastly, contrary to what was originally thought, it may be that most OO systems are simply not amenable to the use of inheritance, and their functionality does not lend itself well to use of inheritance. A number of lessons have been learnt from the research contained in this paper. Firstly, that the nature of the application domain of a system has a strong influence on the resulting architecture of that system. Clear patterns emerge in the shape of inheritance hierarchies and in the functionality offered by classes. This has serious implications for the maintenance process, particularly if such system features can be predicted. Secondly, that the motivation for using inheritance and the factors behind its use are still not well understood. The large use of friends in two of the systems (possibly as a means of avoiding the use of inheritance) indicates that friends are more convenient to use than inheritance. More empirical work needs to be undertaken to assess the effects of varying levels of developer experience on the use and
455
understandability of inheritance. Finally, although use of inheritance has been encouraged in the past, this could be considered a short-sighted attitude. It is still not clear exactly what benefits can be gained from using inheritance. Again, more investigations need to be undertaken in this area.
7. Conclusions and further work There is clearly a need for research to address many urgent issues arising from the use of inheritance and its effect on the process of maintenance. Fundamental issues such as whether inheritance can make maintenance of OO systems easier, whether there is an optimum level of inheritance, and whether we should focus on alternative features as a means of reducing the maintenance burden all need be addressed. Ideally, this empirical research should be carried out on as many real-world systems as possible, (with subjects of varying experience), supported by well-designed hypotheses. Industrial-strength tools need to be provided to aid the speedy collection of data and dissemination of results.
9. References [1] Basili, V.R., “Viewing maintenance as reuse-oriented software development”, IEEE Software, 7(1), pp. 19--25, 1990. [2] Basili, V.R., Briand, L.C., and Melo, W.L., “How reuse influences productivity in object-oriented systems”, Communications of the ACM, 39(10), pp 104--116, 1996. [3] Basili, V.R., Briand, L.C., and Melo, W.L., “A validation of object-oriented design metrics as quality indicators”, IEEE Trans. on Software Engineering, 22(10), pp. 751--761, 1996. [4] Basili V.R., and Rombach, H.D., “The TAME project: Towards improvement-oriented software environments”, IEEE Trans. on Software Engineering, 14(6), pp.758--773, 1988. [5] Bieman, J.M., and Zhao, J.X., “Reuse through inheritance: A quantitative study of C++ software”, Proceedings of ACM Software Reusability Symposium, (SRS 94), 1994. [6] Boehm, B.W., Clark B., and Horowitz, E. et al., “COCOMO 2.0”, Annals of Software Engineering 1(1), pp. 1--24, 1995. [7] Booch, G., “OO Analysis and Design with Applications”, Benjamin-Cummings, 1994. [8] Briand, L., Devanbu, P., and Melo, W., “An investigation into coupling measures for C++”, Proceedings of the 19th Int. Conf. on Soft. Engineering (ICSE'97), Boston, USA,1997. [9] Cartwright, M., and Shepperd, M., “An empirical investigation of an object-oriented software system”, (to appear in IEEE transactions on software engineering). [10] Chidamber, S.R., Darcy, P.D., and Kemerer, C.F., “Managerial use of object-oriented software metrics”, Working Paper Series No. 750, University of Pittsburgh, 1996. [11] Chidamber, S.R., and Kemerer C.F., “A metrics suite for object oriented design”, IEEE Transactions on Software Engineering, 20(6), pp. 467--493, 1994. [12] Coad, P., and Mayfield, M., “Java-inspired design: Use composition rather than inheritance”, American Programmer, pp. 23--31, Jan 1997. [13] Coplien, J. O., “Advanced C++”, Addison Wesley, 1992. [14] Daly, J., Brooks, A., Miller, J., Roper, M., and Wood, M., “Evaluating inheritance depth on the maintainability of object-oriented software”. Empirical Software Engineering, 1(2), pp. 109-132, 1996. [15] Harrison, R., Counsell, S., and Nithi, R., “Experimental assessment of the effect of inheritance on the maintainability of object-oriented systems”. Proceedings of Empirical Assessment in Software Engineering (EASE) '99, Keele, UK, 1999. [16] Harrison, R., and Counsell, S.J., “An assessment of the impact of inheritance on the maintainability of OO systems”. ICSM 97 Workshop on Empirical Soft. Studies, Bari, Italy. [17] Harrison, R., Counsell, S., and Nithi, R., “An Evaluation of the MOOD Set of Object-Oriented Software Metrics”. IEEE Trans. on Soft. Engineering}, 24(6), pp.491--496, 1998.
456
[18] Harrison, R., Counsell, S., and Nithi, R., “An investigation into the applicability and validity of object-oriented design metrics”. International Journal of Empirical Software Engineering, vol. 3, pp. 255--273, 1998. [19] Lorenz, M., and Kidd, J., “Object-oriented Software Metrics”, Prentice Hall Object-Oriented Series, 1994. [20] Schneidewind, N.F., “Methodology for validating software metrics”, IEEE Transactions on Software Engineering, 18(5), pp. 410--422, 1992. [21] Stroustrup, B., “Adding classes to the C language: An exercise in language evolution”, Software -- Practice and Experience, vol. 13, pp. 139--161, 1983.
Appendix A: Summary data for the five systems System System (LEDA)
1
System (Gnu)
2
System (SEG1)
3
System (SEG2)
4
System (SEG3)
5
Metric DIT NOC NMI NMO DIT NOC NMI NMO DIT NOC NMI NMO DIT NOC NMI NMO DIT NOC NMI NMO
Average 0.63 0.2 21.71 1.78 0.93 0.78 17.58 0.09 0.22 0.17 1.11 0.46 0.09 0.12 0.52 0.03 0.01 0.02 0.1 0.01
Min 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
457
Max 4 4 307 30 4 5 137 2 2 4 13 2 2 7 33 5 1 2 12 1
Median 0 0 1 0 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0
458