Entropy of the degree distribution and Object-Oriented Software Quality I. Turnu
M. Marchesi
R. Tonelli
DIEE - University of Cagliari piazza d’Armi Cagliari 09123, Italy
DIEE - University of Cagliari piazza d’Armi Cagliari 09123, Italy
DIEE - University of Cagliari piazza d’Armi Cagliari 09123, Italy
[email protected]
[email protected]
[email protected]
ABSTRACT
1.
The entropy of degree distribution has been considered from many authors as a measure of a network’s heterogeneity and consequently of the resilience to random failures. In this paper we propose the entropy of degree distribution as a new measure of software quality.
Many real systems have been described as complex networks, where nodes represent specific parts of the system and connections represent relationships among them. Examples of such networks come from very different contexts [18], [14], [6]. The study of networks in the form of mathematical graph theory is a fundamental topic of the discrete mathematics and it has a long history, as it starts in the eighteenth century with Euler. More recently, an important contribution to the study of such systems came from social sciences, whose interest started to emerge already around 1930s. However, recent years have witnessed a substantial new movement in the study of complex networks. Driven by the fast increasing of computational power, the focus has been in fact shifted from the study of a single relatively small graph to the analysis of the statistical properties of families of large systems, made of many thousands of nodes and intricately connected by millions of connections [7]. Traditional theories suggest to represent complex systems as random graphs, according to the models proposed by Erdos and Renyi [12]. However, there is an increasing evidence that many real world systems behave displaying global statistical properties that are not accounted for by the random graph model. One of these properties is related to the degree distribution of these networks, that often follows a power law [5, 4, 10]. Networks that exhibit this kind of distribution are known as scale-free networks, indicating the presence of few highly connected nodes (usually called hubs) and of a large number of nodes with small degree. Another important property is the small-world feature, also known as the generalization of the famous ‘six degree of separation’[13]. In small-world networks, a very small number of steps is required to reach a given node starting from any other node.
We present a study were software systems are considered as complex networks which are characterized by heterogeneous distribution of links. On such complex software networks we computed the entropy of degree distribution. We analyzed various releases of the publically available Eclipse and Netbeans software systems, calculating the entropy of degree distribution for every release analyzed. Our results display a good correlation between the entropy of degree distribution and the number of bugs for Eclipse and Netbeans. Complexity and quality metrics are in general computed on every system module while the entropy is just a scalar number that characterizes a whole system, this result suggests that the entropy of degree distribution could be considered as a global quality metric for large software systems. Our results need however to be confirmed for other large software systems.
Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics—complexity measures; G.3 [Probability and Statistics]: Miscellaneous— correlation and regression analysis; H.4 [Information Systems Applications]: Miscellaneous
General Terms
INTRODUCTION
Measurement
Keywords Software Metrics, Complex Networks, Entropy.
These properties have been found also for software networks [14, 17, 18]. This is the reason which justifies the assumption that software networks are complex networks, and the motivations to study them using the same approach. In software networks the in-degree distributions are power laws, like others software properties [16]. The out-degree distributions are more controversial. Some authors found power-law behavior [14, 17, 18], whereas others found a lognormal behavior [9]. The entropy of the degree distribution has been studied by Wang et al. [19] to measure the heterogeneity of complex networks. They conclude that the entropy of the degree distribution is an effective measure of network’s resilience to random failures. In this paper we investigated the corre-
lation between the entropy of degree distribution and the number of bugs of different releases of Eclipse and Netbeans software systems. Our results show a high correlation among the entropy of degree distribution and the system’s bug proneness, which is a reliable indicator of software quality. Of course, our study does not pretend to determine software quality in general by using only a single scalar number. In fact software quality is multifaceted and presents many different aspects, more o less correlated, like, for example, maintainability, code readability, efficiency, defect density and so on. We focus our research on the software fault proneness in an entire system release, and look for a single scalar index which can provide reliable indication about it at a first glance. Such index, the entropy of the degree distribution, must not to be intended as an absolute indicator of software bug proneness, but rather as comparative indicator. In fact there is no comparison in our research among indexes belonging to software systems which differ for project, purposes, sizes and so on. What we look for is how such index changes in comparison to changes in the bug number for different releases of a same large software system. looking at this kind of correlation among bug proneness and entropy of degree distribution allows software engineers to keep control of the global software quality, in understanding how the release under development can be more or less defect prone with respect to previous ones. For this reason, we believe that the entropy of degree distribution could be a new synthetic metric describing the complexity and the quality of a software system. The paper is organized as follows. In section 2 we explain the motivation for this study. In section 3 we recall the concept of software network, and how it is derived from source code analysis. Section 4 define the measure of complexity analyses based on entropy of the degree distribution. In section 5 we describe how the bugs extraction for Eclipse and Netbeans systems works. In section 6 we present results about the correlation between the entropy and the number of bugs of Eclipse and Netbeans systems. We reported the threats to validity in section 7. Section 8 concludes the paper.
2. RESEARCH QUESTIONS There are in the literature preliminary results on the suitability of the use of entropy as a global metric for describing the quality of a software system, and more specifically for analyzing code degradation and decrease of maintainability as the software system evolves. The entropy of degree distribution has been considered from many authors as a measure of a network’s heterogeneity, and consequently of the resilience to random failures, but never as a measure of software quality. In order to cover this topic, we organized our study according to the following research questions: RQ1: Is the entropy of the degree distribution correlated to software quality, as computed by defect proneness? RQ2: Is the entropy of the degree distribution, correlated to
software quality, as computed by Chidamber and Kemerer’s (CK) software metrics? RQ3: Are the correlation coefficients statistically significant for the systems analyzed?
3.
SOFTWARE NETWORKS
The source code of OO software system is composed of a set of class definitions. From a structural point of view, classes can be related though inheritance and composition. These are called data dependencies because they come from the definition of values. From a functional point of view, classes can depend on one another because in their code they call methods defined in other classes, or use temporary variables or objects belonging to other classes. These are called call dependencies. Analyzing the source code of an OO system, it is possible to build its class graph –a graph whose nodes are the classes, and the graph edges represent directed relationships (dependencies) between classes [9]. In this graph, the in-degree of a class is the number of edges directed toward the class, and is related to the usage level of this class in the system. On the other hand, the out-degree of a class is the number of edges leaving the class, and represents the level of usage the class makes of other classes in the system. Once the network is generated, it is possible to compute on it all the metrics that can be associated to a network.
4.
ENTROPY OF THE DEGREE DISTRIBUTION
By using the approach above, we parsed the source code of five releases of Eclipse and eight of Netbeans, and we built the graphs associated to each software network. Each graph node (class), is connected to other nodes (other classes) through graph edges (class dependencies). The binary relationships among classes are oriented so we consider the software graph as directed in order to compute the node degree as the sum of all the in-links and out-links of each class. The entropy of the degree distribution can be defined as follows:
H=−
N X
p(k)logp(k)
(1)
k=1
where N is the total number of nodes in the network and p(k) is the degree distribution. For computing p(k) we used the empirical frequencies for the node degree of the nodes. In Fig. 1 we report the Complementary Cumulative Distribution Functions (CCDF) as computed from the empirical probabilities pi for one release of Eclipse (left) and one of Netbeans. At the same time, we analyzed the CCDF of the bugs affecting system’s classes, obtaining a picture which is absolutely similar to that of the degree distribution (Fig. 2). Both CCDF show a fat-tail, typical of complex systems, where large values have a non negligible probability of ex-
Figure 1: The Complementary Cumulative Distribution Functions (CCDF) of the degree for Eclipse and Netbeans projects. 0
0
10
10
−1
10
−1
Pr(X ≥ x)
Pr(X ≥ x)
10
−2
10
−3
10
−2
10
−3
10 −4
10
−5
10
−4
0
10
1
10
2
3
10
10
4
10
5
10
10
0
10
x
1
10
2
10
3
10
4
10
x
(a) Eclipse3.2
(b) Netbeans3.2
Figure 2: The Complementary Cumulative Distribution Functions (CCDF) of the number of bugs for Eclipse and Netbeans projects. 0
0
10
10
−1
10
−1
10
−2
10
−2
10 −3
10
−3
10 −4
10
−5
10
−4
0
10
1
10
2
10
(a) Eclipse3.2
10
0
10
1
10
On the other hand, software configuration management systems like CVS keep track of all maintenance operations on software systems. These operations are recorded inside CVS [2] in an unstructured way; it is not possible, for instance, to query in a simple way CVS to know which operations were done to fix Bugs, or to introduce a new feature, or an enhancement. All these operations are performed on files, called Compilation Units, which may contain one or more classes. In Eclipse almost all CUs contain only one class, and only ten percent of CUs contain more than that. Such files contain one larger class plus one, or two at most, other smaller classes of service to the first one. In order to identify Issues (Bugs) affecting system CUs, we had to match the data stored in the BTS with other data recorded in CVS of Eclipse. All commit operations committed to the CVS log messages as single entries. Each entry contains various data – among which the date, the developer who made the change, a text message referring to the reason of the commit, and the list of CU’s interested by the commit. To obtain a correct mapping between Issue(s) and the related CU(s) we analyzed the CVS log messages, to identify commits associated to maintenance operation where Issues are fixed. If a maintenance operation is done on a CU to address an Issue, we consider the CU as affected by this Issue.
2
10
(b) Netbeans3.2
isting. In one case, this means that classes with a large number of dependencies are quite common in such software systems. In the other case, the meaning is that classes with a large number of bugs are quite common as well. This similarity induces to think that the two properties may be significantly correlated. The entropy, as described above, is a global index summarizing the overall shape of the statistical distribution and characterizing it. This is the reason why we decided to analyze the correlation among software defect proneness and entropy of the degree distribution in order to obtain a single global index for software quality.
5. BUG EXTRACTION Our analysis are based on open source systems, which give free access to the source code repository and bug tracking system. Bug Tracking Systems (BTS) are commonly used to keep track of Bugs, enhancements and features – called with the common term Issues– of software systems. The open source systems studied, Eclipse and Netbeans, make use of the BTS Bugzilla3 and Issuezilla4 , respectively. Each Issue inside a BTS is univocally identified by a positive integer number, the Issue-ID. BTS store, for each tracked Issue, its characteristics, life-cycle, software releases where it appears, and other data. In Bugzilla [1], a valid Bug is an Issue with a resolution of fixed , a status of closed , resolved or verified , and a severity that is not enhancement , as pointed out in Eaddy et al. [11]. Thus, Bugs are a subset of Issues. For Issuezilla, it is possible to adopt an equivalent definition: a Bug is an Issue with a resolution, status as above, and with type defect.
In our approach, we first analyzed the text of commit messages, looking for Issue-IDs. Unfortunately, every positive integer number is a potential Issue-ID, but sometimes numbers can refer to maintenance operations not related to IssueID resolution, such as branching, data, number of release, copyright updating, and so on. To avoid wrong mappings between Issue-IDs and CUs, we applied the following strategies: • For each release a CU can be affected only by Issues which are referred to in the BTS belonging to the same release. • We did not consider some numeric intervals particularly prone to host false positive Issue-IDs. The latter condition is not particularly restrictive in our study, because we did not consider the first releases of Eclipse, where Issues with ’low’ ID appear. All IDs not filtered out are considered Issues and associated to the addition or modification of one ore more CUs, as reported in the commit logs. We then assigned the bugs to classes in the corresponding CUs, choosing the larger class for the few CUs containing more than one class, after a manual inspection of the code. This method might not completely address the problems of the mapping between bugs and CUs [3]. In any case we checked manually: - 10% of CU-bug(s) associations (randomly chosen) for each release - all CU-bug associations for three sub-projects without finding any error. A bias may still remain due to lack of information on CVS [3].
The subset of Issues satisfying the conditions reported by Eaddy et al. is the Bug-metric we have used [11]. Clearly, not all source modules changed due to a Bug are to be considered “faulty”. Some changes can happen to realign a correct piece of code with another piece of code that was modified to fix the Bug. So, what we measure is to what extent a Bug affects one, some or many CUs, and not whether they were really faulty.
6. RESULTS This section presents the results obtained from analysis of the Eclipse and Netbeans case studies. From both projects we create the software graph and then extract the Entropy of degree distribution and bug metrics across various releases. We analyzed five versions of Eclipse system and eight versions of Netbeans system. We found a high correlation between the entropy of degree distribution (Hdegree) and bugs as reported in Tables 1 and 2. Table 1: Correlation coefficients between HDegree and bugs for all the Eclipse versions analyzed. Releases Eclipse2.1 Eclipse3.0 Eclipse3.1 Eclipse3.2 Eclipse3.3 Correlation P-Value
classes 8546 12254 14235 17165 17881 0.389 0.5173
locs 779130 1118453 1351957 1638699 1657986 0.398 0.5070
Hdegree 5.962486 6.539403 5.94605 6.571031 5.859742 0.777 0.1278
bugs 7023 16986 13836 15481 10451 1 —-
Table 1 shows the details of the correlations between the entropy of the degree distribution and the total number of defect for the Eclipse project. We found a high correlation, though the corresponding p-values are above 0.05, indicating that these correlations are not statistically significant, maybe due to the very low number of releases considered (five).
Table 2: Correlation coefficients between some software metrics and the entropy for all the Netbeans versions analyzed. Releases Netbeans3.2 Netbeans3.3 Netbeans3.4 Netbeans3.5 Netbeans4.0 Netbeans4.1 Netbeans5.0 Netbeans6.0 Correlation P-Value
classes 5563 7383 10121 11158 14880 17176 19603 44582 0.944 0.0004
locs 464938 612985 866373 968419 1309379 1510954 1708117 3857563 0.943 0.0004
Hdegree 5.544801 5.502977 5.488119 5.449568 5.522024 5.550924 5.559465 5.655894 0.899 0.0023
bugs 3888 5206 2951 2517 7878 5475 8524 17479 1 —-
Table 2 shows the details of the correlations between the entropy of the degree distribution considered and the total number of defect for the Netbeans project. In this case, the correlation is statistically significant – with p-value below 0.05 – indicating that the hypothesis that there is a linear relationship between the two correlated variables cannot be rejected at the 5 percent significance level. The higher significance is due do the higher number of versions considered (8).
However, the lack of statistical significance for the five Eclipse versions pushes to look at the details of the data distribution. In figures 4 and 3 we report the scatter plots for the data used to compute the correlations. The Eclipse data are five points only, roughly scattered along a straight line, due to the fact that to the highest values of the entropy correspond the highest bug number, and to low entropy values lower bug numbers. On the other hand the statistical significance is low, since they are too few and quite scattered along the straight line. An outliers analysis does not exclude any point, given that they are too few. The Netbeans data scatter plot provide again a similar plot, where the points are roughly scattered along a straight line, where points with higher values of the entropy present also higher bug number, and point with low entropy values present lower bug numbers. In particular this is true for the highest two points and for the lowest two points. In this case the test of statistical significance provides good confidence for this correlation, being eight the points considered, even if they are quite scattered along the straight line. This time an outliers analysis provides an outliers, the rightmost (and highest) point, which could be object of possible exclusion. If one excludes this point, the remaining ones are more scattered along the line, and the test of statistical significance fails. Thus one has to deal with the problem of accepting or excluding the point from the analysis. Usually outliers analysis poses the problem of understanding if the data collected for the out-of-scale points are true or fake, meaning that there can be wrong measurements or big errors affecting them. If not, the data may be the hint for a particular behavior of the system under certain conditions, which may be true. In our case the outliers does not suffer from big errors or from wrong measurements, and we accepted it as the indication of a behavior of our system. In particular, the difference among this point and the others is that from version 5.0 to 6.0 Netbeans increases enormously its size, passing from 19603 classes to 44582 classes. And this is a matter of fact, not a wrong measurement. On the other hand the entropy increases in the same proportion, with respect to its scale of values. Thus we believe that this is an indication that the system has become much more intricate and complex than in release 5.0, and this is reflected in the decrease of quality code, corresponding to a proportional increase of the bug number. The size effect also counts, since more code corresponds to more bugs also, but we already have the data from Eclipse where this size effect is not displayed, while the entropy behavior is instead a good representation of the bug number behavior. On the other hand, the outliers point is identified as outliers only because of the huge increase of system size, and for no other reason. The we believe it represents instead the true phenomena occurring inside the system, namely it is becoming extremely more complex, and more difficult to keep in control. If we instead decide to not consider some other points to our best convenience, like the one representing the release 3.2, which is the first release analyzed and which for some reason may be different for others, we end up with a set of points very well aligned around a straight line, only a little scattered, obtaining a very strong correlation and optimal statistical significance. For these considerations we decided not to exclude any point from the analysis, obtaining a good correlation for both systems and a good statistical significance only for Netbeans.
We also found a high correlation between entropy of degree distribution and some traditional metrics [8] as reported in Tables 3 and 4. Table 3: Correlation coefficients between some ck metrics and the entropy of degree distribution for all the Eclipse versions analyzed. Releases Eclipse2.1 Eclipse3.0 Eclipse3.1 Eclipse3.2 Eclipse3.3 Correlation P-value
CBO 5.201615 7.112127 5.300105 7.488669 5.205302 0.990 0.0010
RFC 17.211444 21.138241 17.374991 21.206583 16.931156 0.998 0.0001
Hdegree 5.962486 6.539403 5.94605 6.571031 5.859742 1 —-
Table 4: Correlation coefficients between some ck metrics and the entropy of degree distribution for all the Netbeans versions analyzed. Releases Netbeans3.2 Netbeans3.3 Netbeans3.4 Netbeans3.5 Netbeans4.0 Netbeans4.1 Netbeans5.5 Netbeans6.0 Correlation P-value
CBO 4.433399 4.403765 4.257484 4.119824 4.134812 4.315848 4.568601 4.894666 0.891 0.0031
RFC 15.145964 15.216985 15.263116 15.21563 15.348051 15.534234 15.553568 16.617491 0.873 0.0041
Hdegree 5.544801 5.502977 5.488119 5.449568 5.522024 5.550924 5.559465 5.655894 1 —-
Among CK metrics, CBO and RFC are certainly the most validated, and almost always found highly correlated with system fault-proneness as reported by [15]). Since CK metrics analyzed (CBO and RFC) refer to single classes, while the entropy of degree distribution is a global measure of the system, we computed the mean value of each metric (CBO and RFC)for each version of the analyzed systems that have been found to be correlated with fault proneness of software systems (see Tab.5) Table 5: Correlation coefficients between CBO and RFC metrics and the bugs for all the Eclipse and Netbeans versions analyzed. P-values are reported in parentheses. project Eclipse Netbeans3.2
CBO-BUGS 0.787 (0.1145) 0.819 (0.0129)
Figure 3: Scatterplot for Netbeans entropy vs bugs.
RFC-BUGS 0.802 (0.102) 0.950 (0.0003)
We are now ready to answer to the research question reported in section 2. RQ1: Is the entropy of the degree distribution, correlated to software quality, as computed by defect proneness? The answer is positive, because we have correlation coefficients of 0.8 and 0.9 for Eclipse and Netbeans project respectively. RQ2: Is the entropy of the degree distribution, correlated to software quality, as computed by ck software metrics? Also in this case the answer is positive. The entropy of degree distribution is highly correlated with the CBO and
Figure 4: Scatterplot for Eclipse entropy vs bugs. RFC metrics that have been found to be correlated with fault proneness of software systems. RQ3: Are the correlation coefficients statistically significant for the systems analyzed? The correlation coefficient is statistically significant only for the Netbeans projects with pvalue less than 0.05. For the Eclipse project we found a high correlation, but the corresponding p-values are above 0.05 due to the very low number of releases considered (five). On the other hand, we can notice that bugs number and the entropy of degree distribution for the Eclipse project display the same kind of oscillations with the evolution of the system releases.
7.
THREATS TO VALIDITY
In this work we consider only two software systems, Eclipse and Netbeans. These projects clearly cannot be representative of all Java Systems, though they are very large systems, composed of many sub-projects developed by different programmers. Moreover, the results are statistically significant only for the Netbeans project because we analyzed only five versions of the Eclipse project. Further studies on different systems – open source and commercial – are needed to further vali-
date our study. Another threat to validity is that Eclipse and Netbeans cannot be considered representative of systems written in programming languages different from Java. Thus, a full investigation on the possibility of using these metrics to describe the complexity of the whole software system must span over several languages and systems. Another consideration is that releases are delivered when they are considered stable, and the bugs found have been removed, thus the change in entropy can be the after effect of the number of faults found and fixed. By sampling more frequently the entropy and the number of defects may be better related 1 .
8. CONCLUSIONS We presented an empirical analysis where the entropy of degree distribution is calculated for software networks in order to introduce a global quality metric for the whole software system. For this purpose we showed how the entropy of degree distribution is related with the bugs affecting the whole system, which are an ubiquitous measure of system quality. Our results show strong correlations among the entropy of degree distribution and the number of bugs affecting the software. These results are confirmed for both the systems analyzed, Eclipse and Netbeans. However, further work is needed in order to extend the validity of our results to other systems, in particular to different programming languages or software paradigms. This will constitute the object of our future work.
9. ACKNOWLEDGMENTS This work was partly supported by a grant from R.A.S. (Regione Autonoma della Sardegna) awarded to I.Turnu, PO Sardegna FSE 2007-2013, L.R.7/2007 “Promotion of the scientific research and technological innovation in Sardinia”.
10. REFERENCES [1] In Bugzilla. http://www.bugzilla.org/. [2] In Cvs. http://www.nongnu.org/cvs/. [3] K. Ayari, P. Meshkinfam, G. Antoniol, and M. Di Penta. Threats on building models from cvs and bugzilla repositories: the mozilla case study. In Proceedings of the 2007 conference of the center for advanced studies on Collaborative research, CASCON ’07, pages 215–228, New York, NY, USA, 2007. ACM. [4] A. Barabasi, R. Albert, and H. Jeong. Scale-free characteristics of random networks: the topology of the world wide web. Phys. A, 281:69–77, 2000. [5] A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. [6] A. Broder, R. Kumar, F. Maghoul, and R. S. S. R. T. A. W. J. Raghavan, P. Graph structure in the web. Computer Networks, 33:309–320, 2000. [7] Q. Chen, H. Chang, S. Govindam, R.and Jamin, and S. Shenker. The origin of power laws in internet topologies revisited. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies, pages 1–4, Los Alamitos, CA, 2002. IEEE Computer Society. 1 The authors thank an anonymous referee for these considerations
[8] S. Chidamber and C. Kemerer. A metrics suite for object-oriented design. IEEE Trans. Software Eng., 20(6):476–493, June 1994. [9] G. Concas, M. Marchesi, S. Pinna, and N. Serra. Power-laws in a large object-oriented software system. IEEE Trans. Software Eng., 33:687–708, 2007. [10] M. Coraddu et al. Weak insensitivity to initial conditions at the edge of chaos in the logistic map Physica A: Statistical Mechanics and its Applications 340 (1-3), pp. 234-239, 2004. [11] M. Eaddy, T. Zimmermann, K. Sherwood, V. Garg, G. Murphy, and et al. Do crosscutting concerns cause defects? IEEE Transactions on Software Engineering, 34(4):497–515, November 2008. [12] R. A. Erdos, P. On random graphs. I. Publ. Math. Debrecen, 6:290–297, November 1959. [13] S. Milgram. The small world problem. Psych. Today, 2:60–67, 1967. [14] C. R. Myers. Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs. Phys. Rev. E, 68(4):046116, Oct 2003. [15] R. Shatnawi. A quantitative investigation of the acceptable risk levels of object-oriented metrics in open-source systems. IEEE Trans. Softw. Eng., 36:216–225, March 2010. [16] I. Turnu, G. Concas, M. Marchesi, S. Pinna, and R. Tonelli. A modified yule process to model the evolution of some object-oriented system properties. Information Sciences, 181:883–902, 2011. [17] S. Valverde, R. Ferrer-Cancho, and R. V. Sol´e. Scale-free networks from optimal design. Europhys. Lett., 60:512–517, 2002. [18] S. Valverde and R. V. Sole. Hierarchical small worlds in software architecture. arXiv:cond-mat/0307278v2, 2003. [19] B. Wang, H. Tang, C. Guo, and Z. Xiu. Entropy optimization of scale-free networks robustness to random failures. Physica A, 363:591–596, 2005.