Measuring Disruption from Software Evolution Activities Using Graph ...

4 downloads 139 Views 311KB Size Report
Mar 9, 2001 - Page 1 ... to represent different software versions and study changes to .... The table shows the disruption in values and rank and also.
Measuring Disruption from Software Evolution Activities Using Graph-Based Metrics Prashant Paymal, Rajvardhan Patil, Sanjukta Bhowmick, Harvey Siy Department of Computer Science University of Nebraska, Omaha {ppaymal,rpatil,sbhowmick,hsiy}@mail.unomaha.edu

Abstract—In this paper, we investigate how class relationships are disrupted after large scale changes. We use graphs to represent different software versions and study changes to graph properties. We explore different combinatorial metrics to measure the extent of disruption after perfective maintenance activities. Our early results, on JHotDraw, demonstrate that combinatorial metrics can provide a good indicator to the degree to which relationships are disrupted or preserved across different versions. Index Terms—perfective maintenance, graph theory, dynamic network analysis

TABLE I C OMMITS WITH P ERFECTIVE C HANGES IN JH OT D RAW Ver V1

Date 3/9/2001

Files 304

V2

10/24/2001

720

V3 V4

8/4/2002 11/8/2002

392 2

V5

5/8/2003

44

V6

1/9/2004

484

Commit Messages Merge to JHotDraw 5.2 (using JFC/Swing GUI components) before merge for version 5.3 (dnd, undo,...), merge dnd (before 5.3) after various merges... (before 5.4 release) Refactor to use StandardStorageFormat as a superclass. Refactoring of Cursor: - java.awt.Cursor (class) has been sistematically replaced After renaming the CH.ifa.draw to org.jhotdraw

I. I NTRODUCTION Understanding how software systems evolved is important for assessing the long term maintainability of these systems. However, due to the large number of components in most realworld systems, it is difficult to get a quick summary of what had changed from one version to another. Since the complexity of most object-oriented systems lie in the complexity of the inter-class relationships, we are interested in quantifying the extent to which such relationships are disrupted or preserved in the midst of software evolution. The ability to measure this disruption between two versions can be useful to evaluate alternative paths in which the software is to evolve. To this end, we apply techniques borrowed from the analysis of large dynamic networks [1], [2]. Large-scale networks are extensively used in modeling systems of entities and their mutual dependence, such as social networks. Such networks evolve over time. However, there has been little study in exploring metrics that characterize the amount of change in dynamic networks. In this paper, we explore combinatorial or graph-theory based metrics to quantify and evaluate the difference between networks representing several software versions. Our results show that these statistics provide important insight in understanding how the JHotDraw code evolved over time. This study is part of a larger work being conducted on understanding the impact of perfective maintenance activities on software evolution [3]. Perfective maintenance [4] is the process of modifying or rewriting the structure and organization of program source code to facilitate continued software evolution. We posit that perfective changes tend to be disruptive because they are mostly globally affecting, causing significant reorganization of the code. We measure the degree of disruption between several snapshots of the JHotDraw code

that culminate in perfective maintenance changes. II. M ETHODOLOGY Our case study consists of six versions of JHotDraw 51 from March 2001 to January 2004 (refered to as Version 1 to Version 6 in this text).2 The specific versions are listed in Table I. These versions were selected as they are large commits or refactorings which we have verified to have perfective maintenance activities [3]. From these versions, use relationships (inheritance and implementation, method calls and class member access, object declaration and instantiation) were extracted and modeled as a directed network, where each edge (u,v) is a dependency from class u to class v. Our objective is to extract key combinatorial properties from these six networks, that would enable us to detect evolutionary characteristics such as: points of significant change in the software and how these changes affect crucial classes in the network. Several vertex properties are computed for each network: Degree: The degree of a vertex is the number of edges connected to it. Indegree (Outdegree) refers to the number of incoming (outgoing) edges. Betweenness Centrality: Betweenness centrality of a vertex represents how often it occurs within dependency paths. The higher the betweenness centrality, the more the dependencies are to the class represented by the vertex. Betweenness centrality of a vertex v is computed as the ratio of the number of shortest paths in the network that include v to the total number of shortest paths in the network. 1 http://www.jhotdraw.org. 2 The range from 2001 to 2004 was selected because there was a continuous evolution punctuated by some large commits.

Clustering Coefficient: Clustering coefficient of a vertex represents the ratio of the edges between the neighbors of a vertex to the total possible connections between them. The higher the clustering coefficient the more likely that a vertex is part of a dense module with closely interconnected dependencies. Articulation Point: A vertex is an articulation point if its removal would cause the network to become disconnected. We examine how the relationships between these properties changed from one version to the next. We then estimate the extent to which relationships between classes are disrupted by calculating the following Pmetrics: v∈AllV ertices

V alue Disruption = P

v |V aluev i+1 −V aluei |

M ax(V aluei+1 ) v |Ranki+1 −Rankiv |

v∈AllV ertices

Rank Disruption = T otalV ertexN umber where V aluevi and Rankiv denote the value and rank of a property of vertex v in version i. To put these disruption metrics in context, we also examine how much was preserved, in terms of percentage of vertices that maintained their properties across all versions. Next, we analyze properties of new vertices to see what types of classes were added. Finally, we validate the disruption by investigating its impact of subsequent bug frequency. III. R ESULTS AND A NALYSIS A. Relationships Between Vertex Properties Table II shows the correlation between the metrics for ranking vertex properties. (We used Spearman rank correlations because the data values were not normally distributed.) It can be seen that there is a positive correlation between vertex degrees and betweenness centrality, indicating classes on important paths have high dependencies. However the distribution of clustering coefficients changes over time and this is reflected in its correlations with betweenness centrality and degrees. In V1 and V2, there is a negative correlation between clustering coefficient and betweenness centrality. In V3 and V4, this correlation is still negative, but no longer statistically significant. And in V5 and V6, the correlation has become positive. This indicates that, in V2 and to a lesser extent, V3 and V4, the new vertices are clusters of interdependent modules added at the end of the paths. This also hints that in V5 and V6, the newly added vertices tend to be at the periphery of the network. These are examined further in Section III-D. B. Disruption Table III lists the vertex-based properties of the network. The table shows the disruption in values and rank and also compares the set of the top (highest ranked) 25 vertices. The highest and second highest disruptions are marked with bold and italic fonts respectively. We see that there was significant evolution between V2 to V3 and between V4 to V5. We also compare sets of the top 25 highest ranked vertices for each property. Retained Vertices refers to the set of top 25 vertices that were common to the consecutive versions. Vertices in Vi (V(i+1) ) only refer to vertices that were in the top 25 in the

TABLE II S UMMARY STATISTICS OF DIFFERENT VERSIONS OF JH OT D RAW.

Vertices Add (Del)

Edges Add (Del)

Articulation Pts

V1 159 0 (0) 775 0(0) 7

V2 177 18 (0) 832 74(17) 8

V3 302 125 (0) 1454 655(33) 26

V4 339 38(1) 1684 256(26) 33

V5 528 190(1) 2136 466(14) 104

V6 544 16(0) 2167 64(33) 105

0 (0) 1 (0) 18 (0) 7(0) 71(0) 1(0) Indegree+Outdegree vs Betweenness Centrality - Spearman’s ρ ρ 0.6458 0.6380 0.7397 0.7510 0.8375 0.8374 p-value < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 Clustering Coefficient vs Betweenness Centrality - Spearman’s ρ ρ -0.156 -0.212 -0.062 -0.045 0.392 0.419 p-value ≈ 0.05 < 0.01 > 0.10 > 0.10 < 0.01 < 0.01 Indegree+Outdegree vs Clustering Coefficient - Spearman’s ρ ρ -0.313 -0.320 -0.110 -0.056 0.517 0.544 p-value < 0.01 < 0.01 ≈ 0.05 > 0.10 < 0.01 < 0.01 Add (Del)

previous (next) version but not in the next (previous). It should be noted that the vertices were present in both the versions, they were just not on the top 25 list. Newly Added Vertices refer to vertices that have been added in the next version and are in the high ranked category. The most significant changes in the sets happen for clustering coefficients (least number of retained vertices), which indicates once again that the changes involve adding a set of interdependent modules, rather than adding modules separately to different parts of the software. Also neither betweenness centrality nor indegree or outdegree show too much change in the highest ranked vertices. This indicates that the critical path of the software is probably left unchanged. C. Identifying Crucial Vertices We now analyze the composition of each network based on the importance of the vertices. We classify a vertex as High if it has high rank (within top 25) in at least one of the following categories: high betweenness centrality, high indegree, high outdegree, high clustering coefficient or is a articulation point. A vertex is marked as Extra High if it satisfies at least two of the listed categories. On the other side of the spectrum a vertex is termed Low if it has zero value for any one vertex-based properties and is not marked as a High vertex. A vertex is termed Extra Low if it has zero value for both betweenness centrality and clustering coefficient. Clearly High and Extra High vertices represent the most important classes in the network and the Low and Extra Low vertices the peripheral classes that do not have any significant impact on the software as a whole. The vertices that do not fall in any of these four categories are marked as Other. Figure 1(a) shows the distribution of vertices in each category over the six versions. From the bar graphs, we see that V1 and V2 exhibit a similar distribution as does V3 and V4, and V5 and V6. This matches our previous findings that the main evolution step occurred between V2 and V3 and V4 and V5. Also note that the first two (V1 and V2) have the greatest percentage of High vertices, indicating that the earlier

TABLE III C HANGE IN VERTEX - BASED PROPERTIES .

Property Value Disruption Rank Disruption Retained Vertices Vertices in Vi only Vertices in Vi+1 only Newly Added Vertices Value Disruption Rank Disruption Retained Vertices Vertices in Vi only Vertices in Vi+1 only Newly Added Vertices Value Disruption Rank Disruption Retained Vertices Vertices in Vi only Vertices in Vi+1 only Newly Added Vertices Value Disruption Rank Disruption Retained Vertices Vertices in Vi only Vertices in Vi+1 only Newly Added Vertices

V1-V2 V2-V3 V3-V4 Indegree .0022 .0138 .0025 .014 .252 .06 Change in Set of Top 24 20 20 1 5 5 1 1 2 0 4 3 Outdegree .0025 .0213 .009 .045 .292 .069 Change in Set of Top 24 17 20 1 8 5 1 4 4 0 4 1 Betweenness Centrality .0004 .0027 .0017 .051 .286 .074 Change in Set of Top 24 17 20 1 8 5 1 5 3 0 3 2 Clustering Coefficients 0 .0088 0 .078 .370 .074 Change in Set of Top 16 13 21 8 12 3 1 2 3 8 10 0

V4-V5

V5-V6

.0083 .0007 .112 .016 25 Vertices 21 23 4 2 3 2 1 0

(a) Percentage breakdown of all vertices in each version.

.002 .002 .209 .009 25 Vertices 24 24 1 1 1 1 0 0 .0107 .0016 .212 .012 25 Vertices 17 22 8 3 7 3 1 0

(b) Percentage breakdown of vertices that are common to all versions. Fig. 1. Distribution of vertices according to rank. The vertices are labeled according to their value of vertex-based properties. The label Only High (Only Low) marks vertices that are only High(Low) but not Extra High(Extra Low).

.0056 0 .157 .021 25 Vertices 14 19 11 3 2 3 9 3

stages of the software consisted of developing the more critical classes. In contrast, the last two (V5 and V6) have the highest percentage of Low vertices, which shows that as the software matures more peripheral classes are added. There are 158 common vertices present in all versions. Of these, 30% retained the same categorization across all versions. Figure 1(b) shows the distribution (vertices that changed categorization were marked Other). We see that 25% remained High or Extra High, indicating that important classes stay important.

(a) Percentage of new vertices per impact group with respect to the total number of vertices added.

D. Analysis of Newly Added Vertices The evolution of JHotDraw is primarily driven by the addition of vertices across different versions. Figure 2(b) shows the classification of newly added vertices for each transition. The transition V1-V2 shows a marked increase in high clustering coefficient vertices, which means new highly connected modules have been added. Transitions for V2-V3 and V3-V4 show that more zero valued vertices are added, and also a prevalence of middle-ranked vertices. Transitions to V5 and V6 show a marked increase in vertices with both zero betweenness centrality (at the end of the path) and zero clustering coefficient (neighbors are not connected). These vertices represent peripheral classes in the software. Table IV analyzes how the set of newly added vertices (Snew ) is connected to the network. The max(mean) de-

(b) Total number of vertices in each group. Fig. 2.

Breakdown of newly added vertices according to impact groups.

gree rows gives the maximum degree (indegree+outdegree) amongst the vertices in Snew . We observe that the maximum degree increases across the versions. This is expected since the total number of nodes also increase. The mean degree however, remains more or less the same. Row 3, Total Neighbors, lists the total number of vertices connected to Snew . A neighboring vertex will be counted

TABLE IV A NALYSIS OF CONNECTIONS OF NEW VERTICES TO THE NETWORK . Property Max Degree Mean Degree Total Neighbors Low Degree Neighbors High Degree Neighbors New Vertices Connected to New Vertices

V1-V2 90 27.1 78

V2-V3 121 28.62 742

V3-V4 137 34.14 259

V4-V5 143 24.86 521

V5-V6 144 21.72 39

40 (.51) 25 (.32) 18 3 (.16)

385 (.52) 243 (.33) 125 23 (.18)

115 (.44) 88 (.34) 38 8 (.21)

268 (.51) 120 (.23) 190 16 (.08)

14 (.36) 8 (.20) 16 13 (.81)

TABLE V B UG FREQUENCIES AFTER EACH VERSION .

Interval Post-V1 Post-V2 Post-V3 Post-V4 Post-V5 Post-V6

Total files changed 94 176 172 1720 50 89

Bug fixes 0 0 38 120 6 1

Pct 0.00% 0.00% 22.09% 6.98% 12.00% 1.12%

multiple times, if it has multiple edges to Snew . The neighbors set can also include nodes within Snew . Low (high) degree neighbors lists the number of neighbors with low (high) degree. A vertex with either a small indegree or outdegree (≤ 2) is marked as a low degree vertex. A vertex ranked with the top 25 highest indegree or outdegree is marked as a high degree vertex. The numbers in the parenthesis gives the percentage of low or high degree neighbors relative to the total value. Note that the percentage of low degree neighbors is significantly higher (36% to 52%) than the high degree neighbors (20% to 32%). This finding matches our hypothesis that most new connections are at the network periphery. Some newly added vertices are connected only to Snew . The number and percentage (relative to total number of vertices in Snew ) are listed in the last row of Table IV. These vertices represent bulk modular additions. For V2, V3 and V4 they represent about 20% to 16% of the new additions, however for V6 the number goes up to 81%. Only the transition V4V5 shows low percentage 8% of modular additions.

IV. R ELATED W ORK Disruption from large scale changes can be identified in other ways such as, visual inspection [5], growth in size [6] or relative increase in average path length [7]. These depend on overall growth of the code, e.g., size, number of edges, etc. We complement these approaches by looking at the change in ordering of vertex properties independent of overall growth. A shortcoming is that it can be oversensitive when there are several highly ranked vertices with similar values. V. C ONCLUSION We have applied different combinatorial or graph-theory based metrics to quantify and evaluate the disruption between versions. Based on these metrics, our observations on JHotDraw can be summarized as follows: 1) In almost all metrics, the significant evolutionary changes occur between V2-V3 and V4-V5. This was true in terms of the growth in vertices and edges, the extent of rank and value disruption, the change in clustering coefficients, and in the distribution of important vertices. 2) The network has grown cumulatively. The number of vertices and edges has steadily increased with almost no deletion of vertices. Newer vertices tend to get added to the peripheries of the network. 3) The top 25 ranking of vertices was generally stable across versions. Important nodes stay important. This indicates stability in the design. While large changes culminating in perfective maintenance were carried out, these did not change the basic network structure. 4) The bug frequency is higher after V3 and V5. The degree of disruption can help explain why bug incidence increases. This will be examined in the future. Dynamic network analysis can be applied to any longlived software system to gain insight into system evolution. As ongoing work, we are looking into additional algorithmic approaches for quantifying disruption across software versions and applying dynamic network analysis over larger projects. ACKNOWLEDGMENT This research was funded in part by a Nebraska EPSCoR First Award grant and by the College of IS&T, University of Nebraska at Omaha. R EFERENCES

E. Impact on Quality We computed the percentage of files involved in subsequent bug fixes during the intervals between versions. We looked for changes that have the keywords “bug fix” in the change log. The number of files involved in each revision is counted and summed for the period between versions. Table V shows that the period after V3 has the highest percentage of bug fixes, followed by the period after V5. These intervals with high percentages of bug fixes follow the periods with the highest measures of disruption (V2-V3 and V4-V5). Further studies will be held to determine if the bug incidence was due to the perfective change or to prior accumulated changes.

[1] N. Belov, M. Martin, J. Patti, J. Reminga, A. Pawlowski, and K. Carley, “Dynamic networks: Rapid assessment of changing scenarios,” 2009. [2] C. C. Bilgin and B. Yener, “Dynamic network evolution: Models, clustering, anomaly detection,” 2010. [3] I. Thapa and H. Siy, “Assessing the impact of refactoring activities on the JHotDraw project,” in ACM Symp. on Applied Computing, 2010. [4] E. B. Swanson, “The dimensions of maintenance,” in Intl. Conference on Software Engineering (ICSE ’76), 1976, pp. 492–497. [5] J. Wu, R. Holt, and A. Hassan, “Exploring software evolution using spectrographs,” in 11th Working Conf. on Reverse Engineering, 2004. [6] M. Aoyama, “Metrics and analysis of software architecture evolution with discontinuity,” in Intl. Wksp. on Principles of Software Evolution, 2002. [7] L. Wang, Z. Wang, C. Yang, L. Zhang, and Q. Ye, “Linux kernels as complex networks: A novel method to study evolution,” in Intl. Conf. on Software Maintenance (ICSM ’09), 2009.

Suggest Documents