2014 IEEE International Conference on Software Maintenance and Evolution
How Does Exception Handling Behavior Evolve? An Exploratory Study in Java and C# Applications Nelio Cacho∗ , Eiji Adachi Barbosa† , Juliana Araujo∗ , Frederico Pranto∗ , Alessandro Garcia† , Thiago Cesar∗ , Eliezio Soares∗ , Arthur Cassio∗ , Thomas Filipe∗ and Israel Garcia∗ ∗ Department
of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal, Brazil Email:
[email protected] † Informatics Department, Pontifical Catholic University of Rio de Janeiro - PUC-Rio, Rio de Janeiro, Brazil Email:
[email protected],
[email protected] Abstract—Exception handling mechanisms (EHM) were conceived as a means to improve maintainability and reliability of programs that have to deal with exceptional situations. Amongst the different implementations of built-in EHM, we classify them in two main categories: reliability-driven and maintenance-driven. Some programming languages, such as Java, provide built-in exception handling mechanisms that promote reliability-driven EHMs. Maintenance-driven EHMs, on the other hand, promote software maintainability by not forcing developers to specify exception handling constraints. Most of modern languages, such as C#, Ruby, Python and many others support this approach. Developers usually have to choose between maintainability-driven and reliability-driven approaches to structure exception handling in their applications. However, there is still little empirical knowledge about the impact that adopting these mechanisms have on software robustness and maintenance. This paper addressed this gap by conducting an empirical study aimed at understanding the relationship between changes in Java and C# programs and their robustness. In particular, we evaluated how changes in the normal and exceptional code were related to exception handling faults. We applied a change impact analysis and a control flow analysis in 116 versions of 16 C# programs and 112 versions of 16 Java programs.
I.
to specify in each method interface a list of exceptions that might be raised during its execution and, therefore, should be handled by client methods. Maintenance-driven EHMs, on the other hand, promote software maintainability by not forcing developers to specify exception handling constraints. In this manner, developers are free of the burden of specifying exceptional interfaces. Most of modern languages, such as C#, Ruby, Python and many others support this approach. However, there is little empirical knowledge on the impact of these two categories of mechanisms on the evolution and robustness of software programs. There is no study in the literature contrasting how developers evolve their exception handling with these different mechanisms. Their different side effects on the evolvability and robustness in existing software projects is actually unknown. There is also no comparison of fault-prone programming scenarios on changing the code when reliability and maintenance-driven exception handling facilities are employed. Existing empirical studies only focus on comparing the types of exception handling policies found in a single version of existing software projects [3]. This paper addressed this gap by conducting an empirical study aimed at understanding the relationship between the evolution of Java and C# programs and their robustness. In particular, we focus on the following research questions in conducting the exploratory study:
I NTRODUCTION
Exception handling mechanisms were conceived as a means to improve evolvability of programs that have to deal with exceptional situations [6], [20]. Almost all modern programming languages provide exception handling mechanisms for facilitating the development of robust and evolvable programs. All these mechanisms offer similar facilities to modularly evolve exception handling code in several ways [15], including: (i) the representation of new erroneous situations as new exception types, (ii) removal and addition of handlers (or exceptional code) for existing or new exception types, (iii) definition of code blocks (normal code) as protected regions for handling exception occurrences, (iv) new associations of these protected regions with exception handlers, and (v) changes to the exception handling policies in the exception handlers.
RQ1 RQ2
We focused on studying Java and C# as representatives of reliability-driven and maintenance-driven mechanisms for exception handling. In particular, we evaluated how changes in the normal and exceptional code were related to exception handling faults. We applied a change impact analysis and a control flow analysis in 116 versions of 16 C# programs and 112 versions of 16 Java programs.
However, existing exception handling mechanisms also have different characteristics. We classify them in two main categories: reliability-driven and maintenance-driven. Reliability-driven EHMs promote software reliability by means of the explicit specification of exception handling constraints. Some programming languages, such as Java [2] and Eiffel [21], provide mechanisms that enforce software developers 1063-6773/14 $31.00 © 2014 IEEE DOI 10.1109/ICSME.2014.25
How does exception handling code evolve in Java and C# programs ? How does robustness evolve in Java and C# programs ?
The following findings were observed: (1) addition and modification of normal code tend to require much more modifications in the exceptional code of Java programs than in C# programs, (2) these extra changes in Java was often related to the need of changing exception interfaces of method signatures, (3) addition of new protected regions occurred very 31
a general exception or the program is terminated. When no handler is found, it is said that the exception was uncaught.
often in C# projects, but they are much more often associated with generic or empty exception handlers – i.e. simple and ineffective exception handlers, (4) differently from what is claimed or expected [27], [24], the number of empty handlers were very low or inexistent in Java projects, (5) the introduction of more specific handlers was much more common in Java programs, and (6) the robustness was much higher in Java programs than in C# projects – e.g. the number of uncaught exceptions was much higher in C# projects. We also compared the most common fault-prone change scenarios in Java and C#. The remainder of this paper is organized as follows. Section II describes basic concepts and the characteristics of exception handling mechanisms. Section III presents the experimental procedures adopted in this study. Sections IV present results and their analyses. Section V presents the threats to validity. Section VI discusses related work. Section VII provides some concluding remarks. II.
B. Classification of EHMs Amongst the different implementations of built-in EHM, we classify them in two main categories: reliability-driven and maintenance-driven. Reliability-driven EHMs promote software reliability by means of the explicit specification of exception handling constraints. These mechanisms enforce software developers to specify in each method interface a list of exceptions that might be raised during its execution and, therefore, should be handled by client methods. The list of exceptions that might be raised by a method is called its exceptional interface. In this manner, it is possible to perform automatic reliability checks to verify if software modules adhere to the specified constraints. It is expected that such reliability-checks prevent software developers from introducing faults in the exception handling code. Java, for instance, offers a reserved key word (throws) to define exception interfaces of modules and separates exceptions in two types: checked and unchecked exceptions. The compiler forces checked exceptions to be either associated with a handler or explicitly defined in the exception interface. If a consumer method does not handle the exceptions specified in the exceptional interface of a provider method, nor specify them in its own exceptional interface, compilation errors occur. On the other hand, unchecked exceptions do not require that the programmer neither associates handler with them nor specify them in the exception interface.
E XCEPTION H ANDLING
Exception handling mechanisms [7], [15], [23] are the most common models to cope with errors in the development of software systems. Their basic concepts and a classification are presented in the next subsections. A. Basic Concepts Although EHMs implementations vary from language to language, they are typically grounded on the try/catch constructs, as depicted by the following general structure: try S catch (E1 x) H1
Maintenance-driven EHMs. Maintenance-driven EHMs, alternatively, promote software maintainability by not forcing developers to specify exception handling constraints. In this manner, developers are free of the burden of specifying exceptional interfaces. Most mainstream programming languages adopt maintenance-driven EHMs. In fact, amongst the top 10 most used programming languages1 , eight languages (C#, C++, Objective-C, PHP, (Visual) Basic, Python, JavaScript and Ruby) provide maintenance-driven EHMs, whereas one language (C) does not provide built-in EHM and one language (Java) provides reliability-driven EHM. The built-in EHM provided by C# is a representative of the maintenancedriven category since it does not support the specification of exceptional interfaces in methods signatures, hence, methods can throw any exception without making this explicit in their signature.
catch (E2 x) H2
The try block delimits a set S, which is a set of statements that are protected from the occurrence of exceptions. The try block defines the normal code, and it is associated with a list of catch blocks. Each catch block defines a set Hn , which is a set of statements that implement the handling actions responsible for coping with an exception. The set Hn consists of the exceptional code, and it is commonly called the exception handler. Each catch block has an argument En , which is an exception type. These arguments are filters that define what types of exceptions each catch block can handle. When a catch block defines as argument an exception type E1 , it can handle exceptions of the E1 type, and also exceptions that inherent from E1 . When a catch block handles an exception that is a subtype of its argument, it is said that this exceptions was handled by subsumption.
III.
Handler Search. If an exception is thrown in the context of the set of protected statements S, the EHM performs at runtime the search for a proper handler. The search takes into account the list of catch blocks statically attached to the enclosed try block. The type of the exception thrown is compared with the exception types declared as arguments in each catch block. For the first matching pattern, the exception handler of that catch block is executed. If no matching pattern is found, the exception propagates up the call stack until a matching handler is found. If no handler is found in the whole call stack, the exception handler mechanism either propagates
M ETHODOLOGY
Our exploratory study on the evolution of exceptional behavior comprises three stages: (i) in the first stage, we selected the sample of subject programs (Section III-A); (ii) then we defined the variables of the study and the suite of metrics to be computed from the source code and binaries of the subject programs (Section III-B); and (iii) finally, we used the gathered metrics to perform statistical analysis (Section IV). 1 http://www.tiobe.com/index.php/content/paperinfo/tpci/
32
A. Sample
could also observe that the behavior of exception handlers significantly varied in terms of their purpose [3], ranging from error logging to application-specific recovery actions (e.g., rollback). Table I presents our final sample of subject programs.
We selected our sample of subject programs based on the sample of Java and C# programs used on a previous study performed by Cabral and Marques [3]. We opted to select our subject programs from their study because their sample covers a wide spectrum of how reliability-driven and maintenancedrive exception handling is typically used in real software development environments. We divided our subject programs in the same categories used by Cabral and Marques [3]: (i) Libraries: software libraries providing a specific applicationdomain API; (ii) Applications running on server (ServerApps): software applications running on server programs; (iii) Servers: server programs; and (iv) Stand-alone applications: desktop programs.
B. Variables and Metrics The selected variables are the metrics quantifying some characteristics of both normal and exceptional code. We chose different metrics to capture changes in the normal and the exceptional code during software evolution, and also metrics that quantify the robustness of a program. Therefore, the variables of interest of this study encompass a suite of metrics classified in two categories: robustness metrics and change metrics. The following subsections describe each suite of metrics and how they were computed.
TABLE I: Subject Programs
Stand-Alone
Servers
Server-Apps
Libraries
C#
1) Robustness Metrics: Robustness is the ability of a program to properly cope with errors during its execution [17]. If a handler is not defined or correctly bound to an exception, program robustness is decreased. In order to measure software robustness, we followed the typical approach adopted in the community in which exception flow information is used as an indicative of robustness [25], [12], [10], [18]. Exception flow is a path in a program call graph that links the method where the exception was raised to the method where the exception was handled. If there is no handler for a specific exception, the exception flow starts in the method where the exception was raised and finishes at the program entrance point. In the context of our analysis we used Uncaught Flow metric to support our analysis of software robustness, as it is often used as an indicative of software robustness [25], [12], [10], [18]. The Uncaught Flow counts the number of exception flows that leave the bounds of the system without being handled. Therefore, this is the key indicator of a fault in exception handling behavior and, therefore, often represents lower robustness. Uncaught exceptions terminate the execution of a program, hence, the higher the number of uncaught exception flows is, the lower the system’s robustness is. In order to assess the relative change of number of uncaught flows between versions (pairs) in the Java and C# programs, we also computed the metric RelativeUncaughtFlow.
Java
Direct Show Lib
JoSQL
Dot Net Zip (ZIP)
JUnique
Report Net
Kasai
SmartIRC4Net
Thought River Commons
Blog Engine (Admin)
Activity Manager
MVC Music Store
Open Project
PhotoRoom
Open WorkBench
Sharp WebMail
XPlanner
Neat Upload
Apache Tomcat
RnWood SMTP Server
JBoss(core)
Super Socket Server (SocketEngine)
JCGrid
Super Web Socket
Winstone(src)
Asc Generator
Columba
CircuitDiagram (EComponent)
Compiere
NUnit (NUnitCore.Core)
Eclipse (core.resources)
Sharp Developer (Main Core)
JFTP
In our sample, we tried to include the maximum number of Java and C# programs used in the study of Cabral and Marques. However, since in their study they only analyzed one version of each subject program, whereas in our study we were interested in analyzing programs during their evolution, we had to discard those programs that did not have more than one version available to download, nor had public version control system available. For such cases, we replaced the discarded program with another program from the same category. Our final sample comprised a set of 32 open-source Java and C# programs. We have analyzed 99 pairs of versions for C# programs and 96 pairs of versions for Java programs. Overall, this corresponds to 4,977,111 lines of source code of which 240,588 are dedicated to exception handling. For this work, we have examined and processed 44,958 catch blocks. Even though we had to perform some replacements in the original sample of Cabral and Marques study, our sample was also particularly diverse in the way exception handling is employed. For instance, we could find almost all the categories of exception handlers in terms of their structure [11], including nested exception handlers, masking handlers, context-dependant handlers and context-affecting handlers. We
We employed the eFlowMining [13] tool to collect the robustness metrics. eFlowMining is a multi-language static analysis tool that uses the approach proposed by [12] to perform an inter-procedural and intra-procedural dataflow analysis. The tool identifies the exception flows and computes the Uncaught Flow metric. As well as [3], we have ignored (i.e. not considered) exception flows created by exceptions that are not normally related to the program logic (Exe.: System.MissingMethodException). The full list of exceptions not considered in our analysis is detailed in [3]. 2) Change Metrics: In order to quantify the changes in both normal and exceptional code, we collected a suite of typical change impact metrics [29]. Change impact metrics count at the program level the number of elements added, changed or removed between two versions of a given system. The elements considered in this study range from more coarsegrained elements, such as classes and methods, to more finegrained elements, such as try and catch blocks. In our
33
changed and two are added. Hence, the value of InterfaceTypeChanged is 1 and the value of InterfaceTypeAdded is 2.
Fig. 1: Collecting change impact measures: An example
study, we considered exceptional code only the fragments of code that contained the following instructions: (i) the throws clause, which defines an exception interface; and (ii) the catch block, which handles an exception.
•
ClassChurned: It counts the number of classes that have an exception interface or catch block added/changed/removed between a pair of versions. In the example presented in Figure 1, catch blocks and interfaces were added and changed in the same class, so the value of ClassChurned is 1.
•
MethodChurned: It counts the number of methods that have an interface or catch block added/changed/removed between a pair of versions. In the example presented in Figure 1, interfaces and catch blocks were added and changed in two different methods, so the value of MethodChurned is 2.
•
NormalChurnedLOC: It counts the sum of the added and changed lines of normal code for a pair of versions. In the example presented in Figure 1, one line of normal code is changed (within the first try block) and six lines are added. Hence, the value of the NormalChurnedLOC is 7.
•
EHChurnedLOC: it counts the number of lines of exceptional code added and changed between a pair of versions. In the example presented in Figure 1, four lines of exceptional code are added and two lines are changed. Hence, the value of the EHChurnedLOC metric is 6.
•
CatchBlockAdded: it counts the number of catch blocks added between a pair of versions. In the example presented in Figure 1, one catch block is added. Hence, the value of CatchBlockAdded is 1.
•
InterfaceAdded, InterfaceChanged, InterfaceRemoved: they count, respectively, the number of methods that added, changed or removed exceptional interfaces between a pair of versions. In the example presented in Figure 1, one interface is added and one is changed. Hence, the value of InterfaceAdded is 1 and the value of InterfaceChanged is 1.
Our main goal during the change impact analysis was to gather deeper knowledge about the recurrent change scenarios observed and their impact on software robustness. Therefore, we had to not only compute the change metrics, but also describe each change scenario observed and assess its impact on software robustness. Since some of these information are inherently qualitative, we could not rely on any existing static analysis tool to automatically extract them from the source code, nor implement a tool of our own. In this manner, we had to perform a manual inspection in the source code in order to: textually describe the change scenarios and assess the impact of the observed changes in the software robustness. These tasks were performed by a group of seven master students, a PhD student and two senior researchers. Each master student performed a manual inspection on the source code of the versions of a given subject program and simultaneously: (i) computed the change metrics, (ii) textually described the observed change scenarios and (iii) assessed the impact of the changes on software robustness by means of an exception flow analysis. The data produced by the master students were reviewed by the PhD student and the main researchers. Further clarifications, modification or improvements were performed by the master students when divergences were identified by the reviewers. Whenever necessary, the students performed an open discussion with the reviewers to resolve any conflict and reach a consensus on the data produced. If on one hand this manual inspection process did not allow us to scale our study to a larger sample, on the other hand it provided us with more accurate data and a better understanding of the change scenarios observed. Moreover, it also allowed us to identify and categorize change scenarios that improved or deteriorated the robustness of the subject programs. Next, each task is described in more details.
•
InterfaceTypeAdded, InterfaceTypeChanged, InterfaceTypeRemoved: they count, respectively, the number of exception types that were added, changed or removed from exceptional interfaces. In the example presented in Figure 1, one exception type is
Change metrics computation and textual description. The change impact metrics were manually computed with the aid of the DiffMerge tool. The DiffMerge tool performs a visual side-by-side comparison of two folders, showing which files are present in only one folder, as well as file pairs (two
The change metrics are computed based on the difference between the source code of a baseline version and the source code of a subsequent version. The definition of each change metric used in this study is presented next. Along with the definitions, there is also an example for each change metric. The examples are based on the code snippet presented in Figure 1. This figure shows two versions of a class, named C1, where the version N-1 (on the left) is the baseline version of C1 and the version N (on the right) is the subsequent version of C1.
34
Changes to Normal Code
Changes to Exceptional Code
TABLE II: Change Scenarios and Corresponding Descriptions Scenarios Generic Handler added Empty Generic Handler added Generic Handler added to rethrow exception Generic Handler removed Specialized Handler added Empty Specialized Handler added Specialized Handler added to rethrow exception Specialized Handler removed Changing the Exception Handling Policy Changing the catch block to use normal code No changes to catch blocks Try block added to existing method New method with try block added New method invocation added Variables modified Control flow modified Try block removed Method with try block removed No changes to try block
Description A catch block which catches a generic exception is added An empty catch blocks is added to handle a generic exception A catch block which catches a generic exception is added to construct a new exception which is thrown A catch block which catches a generic exception is removed A catch block which catches a specific exception is added An empty catch blocks is added to handle a specific exception A specific handler is added to construct a new exception which is thrown A specialized handler is removed Changes how an specific exception type is handled The catch block needs to change its code in order to use some references of the normal code The try blocks changes but nothing changes to the catch block A try block is added to an existing method declaration A new method declaration with a try block is added A new method invocation is added within a try bock A variable within a try block is modified A control flow statement is modified within a try block A try block is removed from a method declaration A method declaration with a try block is removed from the project The catch block changes but nothing changes to the try block
files with the same name but in different folders) that are identical or different. For file pairs with differences, the tool also graphically shows the changes between the two files. The DiffMerge tool was used to compare the project folder of two subsequent versions of a given subject program. For each file pair with differences, the master students manually computed the change impact metrics by inspecting the differences pinpointed by the tool. For each change scenario analyzed, the students also textually described the observed changes occurred within try blocks and catch blocks. For instance, in one of the change scenarios analyzed, one of the students described it as follows: “It was observed a new IF statement added to the try block and a new method invocation added to the catch block”. On a further step, the main researcher applied a coding technique to the textual descriptions of each change scenario in order to extract categories of change scenarios, divided in categories of changes in the normal code and categories of changes in the exceptional code. The result of this categorization is presented on Table II.
In this manner, by combining the change metrics values, the categories of change scenarios extracted from the textual descriptions and the exception flow analysis, we were able to better understand which categories of change scenarios actually improved or deteriorated the robustness of the subject programs. In particular, the manual inspection process allowed us to systematically discover the scenarios that were more prone to generate uncaught exception flows. IV.
DATA A NALYSIS AND D ISCUSSION
This section performs exploratory and statistical analyses of the variables presented in Section III-B. We aim at identifying programming language properties and mechanisms that tend to support or neglect the evolution of exception handling behavior. Section IV-A answers the first research question (RQ1) by explaining how the exception handling code evolves. Section IV-B answers RQ2 showing how robustness evolves in Java and C# programs. For the statistical tests performed along this section, we used the statistical package SPSS. We assume the commonly used confidence level of 95%2 (that is, p-value threshold = 0.05). The Spearman’s rank correlation test is used to identify highly-correlated metrics. This test is used because in our analysis the used metrics are nonparametric. For evaluating the results of the correlation tests, we adopted the Hopkins criteria to judge the goodness of a correlation coefficient [16]: < 0.1 means trivial, 0.1-0.3 means minor, 0.3-0.5 means moderate, 0.5-0.7 means large, 0.7-0.9 means very large, and 0.9-1 means almost perfect.
Exception flow analysis. Simultaneously to the computation of the change metrics, the students also performed an exception flow analysis for each change scenario analyzed. The exception flow analysis was performed with the aid of the eFlowMining tool. For each pair of subsequent versions analyzed, the eFlowMining computed the difference between the number of uncaught exception flows observed for each method. The difference was computed as: ΔU ncaught = U ncaughtSubsequent − U ncaughtBaseline If the value of the difference was higher than zero, then it meant that the number of uncaught exception flows increased during the evolution; therefore, the software robustness decreased in the given change scenario. On the other hand, if the value of the difference was lower than zero, then it meant that the number of uncaught exception flows decreased during the evolution; therefore, the software robustness increased in the given change scenario. Finally, if the value of the difference was equal to zero, then it meant that no changes were observed in the number of uncaught exception flows; therefore, the change scenarios observed had no impact on software robustness.
A. RQ1: Evolution of Exception Handling Code As a starting point to answer RQ1, we correlated the metrics NormalChurnedLOC, EHChurnedLOC, ClassChurned and MethodChurned for both Java and C# programs. The values shown in Table III depict the Spearman correlation coefficients quantified for these metrics. 2 Correlations at the 95% level are marked with * in Table III, while those at the 99% level are marked with **
35
LOC
Churned
Flow
Normal
Uncaught
Java
C#
ClassChurned
0,817**
0,383**
MethodChurned
0,838**
0,339**
EHChurnedLOC
0,853**
0,205*
NormalChurnedLOC
0,165
0,437**
EHChurnedLOC
0,120
0,462**
CatchBlockAdded
0,154
0,405**
11% of the scenarios were related to the Try block added to existing method scenario. The New method added with try block scenario usually co-occurred with scenarios involving addition of generic handlers: 64,6% of Generic handlers added and 16,5% of Empty generic handlers added. Similarly, the Try block added to existing method scenario also frequently cooccurred with scenarios involving addition of generic handlers: 19,5% of Generic Handler added, 24,4% of Empty Generic Handler added and 34,1% of Generic Handler added to rethrow. For the Java group, the addition of try blocks were a bit less common, accounting for approximately 32,3% of the total change scenarios of normal code, where: 27,2% were related to the New method added with try block scenario and 5,1% were related to the Try block added to existing method scenario. Similarly to the C# group, for the Java group the New method added with try block scenario usually co-occurred with the Generic Handler added scenario - 49,6%. The scenarios New method added with try block and Specialized Handler added also frequently co-occurred - 41,5%. The Try block added to existing method scenario, on the other hand, cooccurred more commonly with the Specialized Handler added scenario - 60,2%. In fact, the addition of specialized handlers co-occurred more frequently with the addition of try blocks for the Java group than for the C# group. Given this significant difference between Java and C#, this result seems to suggest that the EHM of Java provides more precise information about exceptions. This is probably one of the key reasons that developers are encouraged to implement more specialized handlers in Java. On the other hand, the EHM of C# does not provide enough information about exceptions, stimulating developers to implement initially generic handlers.
TABLE III: Correlations Between Churned Metrics
As it can be observed in the first 3 rows of Table III the correlations between the metrics NormalChurnedLOC, ClassChurned, MethodChurned and EHChurnedLOC are significant for both Java and C# groups. It can also be observed in the first three rows that the correlation coefficients for these metrics are considerably higher in the Java group than in the C# group. The higher correlation between the metrics NormalChurnedLOC, ClassChurned and MethodChurned observed in the Java group may suggest that when developers perform modifications in the normal code, they are required to modify exceptional code in a larger amount of classes and methods in Java programs than in C# programs. Similarly, the higher correlation between the metrics NormalChurnedLOC and EHChurnedLOC observed in the Java group may suggest that the EHM of Java forces developers to implement more lines of exceptional code when they add or modify normal lines of normal code when compared to the EHM of C#. Then, in order to properly investigate this difference, we manually inspected all the 9,692 change scenarios encompassing addition, modification and deletion of try-catch blocks and exception interfaces.
For the C# group, the addition of try blocks also commonly co-occurred with the Empty Generic Handler added scenario. The Empty Generic Handler added scenario co-occurred in 24,5% of the cases with the Try block added to existing method scenario and in 16,5% of the cases with the New method added with try block scenario. For the Java group, these cooccurrences were almost not observed. In fact, empty handlers (both Empty Generic Handler added and Empty Specialized Handler added scenarios) were more commonly observed in the C# group than in the Java group. It surprised us the higher percentage of empty handlers in C# programs when compared to Java programs. It was often assumed that the implementation of empty handlers in Java programs stem from the automatic reliability verification performed by the Java compiler [27], [24]. In this manner, we expected that a higher percentage of empty handlers would be observed in the Java group. This assumption was initially hold since there is no automatic reliability verification of exceptions in C# and, therefore, developers are not forced to handle them. This result suggests that empty handlers are not actually a side effect of automatic reliability verifications bothering developers to handle exceptions. Instead, it may be a side effect of the lack of proper information provided by the underlying EHM, which does not aid developers to properly implement handling actions for their exceptions.
The manual inspection underwent the following procedures. First, we distinguished scenarios where changes in the exceptional code and normal code co-occurred, from scenarios where independent changes were made to either of them. Second, each change scenario was classified in terms of the action performed in each of the try and catch blocks. Table IV depicts the scenarios involving co-changes between normal (columns) and exceptional (rows) code. The frequency of independent changes made to the normal code is captured in the row No changes to handlers; whereas independent changes to the exceptional code is captured in the last column No changes to try blocks. The last row labeled Percentage shows the distribution of the frequency for normal change scenarios. It is worth mentioning that some of the change scenarios of exceptional code observed in Java were related to changes in the exceptional interface. We will come back to this point later in this section. Since C# does not have explicit exceptional interfaces, we opted to not show these Java change scenarios, so that we could compare the two groups. It can be observed in the last row of Table IV that the most common change scenarios of normal code for the C# group were related to the addition of try blocks, accounting for approximately 45% of the total. More specifically, a tally of 34% of the change scenarios of normal code were related to the New method added with try block scenario. The remaining
The analysis of the independent changes performed to the exceptional code (column labeled as No changes to try blocks) reveals that this type of change occurred in similar
36
Empty Generic Handler added Generic Handler added to rethrow Generic Handler removed Specialized Handler added Empty Specialized Handler added Specialized Handler added to rethrow Specialized Handler removed Changing the Exception Handling Policy Changing the catch block to use normal code No changes to handlers Percentage
C# Java C# Java C# Java C# Java C# Java C# Java C# Java C# Java C# Java C# Java C# Java C# Java
12.2% 60.20% 2.4% 0.40% 7.3% 1.90% -
10.2% 41.50% 0.8% 0.50% 4.7% 7.60% -
3.8% 0.70% -
-
-
-
-
-
-
11.0% 5.1%
34.0% 27.2%
Control flow modified
New method invocation added
No changes to Try blocks
-
Method with try block removed
9.4% 0.7 3.8%
34.1% 1.90% -
64.6% 49.60% 16.5% 0.50% 3.1% 0.10% -
Try block removed
19.5% 35.40% 24.4%
Variables modified
New method added with try block
Generic Handler added
Try block added to existing method
Changes in the exceptional code
Programing language
Changes in the normal code (try blocks only)
-
-
-
3.6% 0.10% -
-
-
0.40% -
3.8%
-
-
-
-
-
-
0.30% -
3.6%
87.0% 29.60% -
96.8% 26% -
-
-
7.7% 2.20% 3.8% 0.80% 3.8%
-
-
-
0.20% 7.1% 20.90% 3.6% 0.10% 82.1% 78.20% 7.5% 30.6%
13.0% 70.30% -
3.2% 74% -
-
-
-
-
0.60% 0.60% 53.8% 86.80% 30.8% 8.40% -
6.1% 2.7%
16.8% 14.7%
7.0% 9.3%
1.9% 0.7 2.80% 11.3% 44.10% 15.1% 3.50% 50.9% 47.60% 14.5% 2.7%
0.30% 0.50% 8.3% 24.20% 50.0% 13% 41.7% 60.10% 3.2% 7.4%
0.50% -
TABLE IV: Classification of Normal and Exceptional Change Scenarios
proportions for both groups (row labeled as Percentage). This result suggests that in both groups there were similar attempts to change only the exceptional code in order to improve it. When we analyzed the distribution of these independent changes we observed that the focus of the changes performed were different for the Java and C# groups. For the Java group, the great majority of the independent changes to the exceptional code were related to the Changing the exception handling policy scenario - 86,8%. For the C# group, most of the independent changes were also related to the Changing the exception handling policy scenario, but in a lower percentage 53,8%. For the C# group, there were also independent changes of type: Changing the catch block to use normal code - 30,8% - and Generic handler removed - 7,7%. The greater focus on changing the exception handling policy observed in the Java group suggests that Java developers are keener on changing the exception handling policy to improve it during the evolution of their systems than C# developers. It might be also the case that the EHM of C# does not properly support developers in performing changes of the exception handling policies of their systems during evolution. In this case, the EHM of C# may be intended to ease changes in the source code, but it does not properly support developers in performing these changes.
burden of maintaining exceptional interfaces, since they do not exist in C# programs. To further assess the impact on software maintenance of the exceptional interfaces in Java programs, we computed metrics that estimate the effort to maintain these interfaces. Table V depicts the average values of the interface metrics for the Java programs pairs of versions. Interface Added Total
23,98
Interface Added Type 29,20
Interface Changed 1,43
Interface Changed Type 1,74
Interface Removed 11,20
Interface Removed Type 13,73
TABLE V: Values computed for the interface metrics As can be noticed on Table V, the effort to maintain the exceptional interface concentrated mostly on the addition of new exception types to existing exceptional interfaces (InterfaceAddedType) and the addition of exceptional interfaces to methods (InterfaceAdded). This happened mostly due to the implementation of new features that threw checked exceptions that had to be propagated, requiring the addition of a new exceptional interface, or the addition of a new exception type to an existing exceptional interface in methods in the call chain. There were also cases where exception types were removed from the exceptional interface (InterfaceRemovedType) and where the whole exceptional interface was removed (InterfaceRemoved). Most of these cases were related to checked exceptions being remapped to unchecked exceptions. Thus, the exceptional interfaces of these methods were no longer
Finally, for the Java group, almost 35% of the change scenarios of exceptional code also encompassed changes in the exceptional interface. This result suggests that Java developers spent considerable effort in maintaining the exceptional interfaces. On the other hand, C# developers are free of the 37
required and were removed. Finally, we observed very few cases of changes in the exception types declared in the exceptional interface in our sample (InterfaceChanged and InterfaceChangedType). It might be the case that changing an exceptional interface has such a high maintenance effort that developers avoid changing them. Instead, they seem to prefer remapping checked exceptions to unchecked exceptions and completely remove the exceptional interface.
specific handlers in C# creates new uncaught flows in 71,4% of the scenarios. In order to avoid this trend, Section IV-A mentions that C# programmers tend to use more frequently generic handlers that affect less software robustness since only 4,0% of such scenarios increase the number of uncaught flows. Additionally, Section IV-A shows the majority of the independent changes to the exceptional code in Java were related to the Changing exception handling policy scenario and that developers are keener on changing that scenario. Despite that, Table VI shows the Changing exception handling policy represents the highest percentage of scenarios that increase the number of uncaught flows.
B. RQ2: Evolution of Software Robustness In order to answer our second research question, we first analyze the relationship between changes in the normal and exceptional behavior and its impact on software robustness which is described at Table III. This table presents the values of the Spearman’s rank correlation coefficient on the variable UncaughtFlow, NormalChurnedLOC, EHChurnedLOC, and CatchBlockAdded for both Java and C# programs. The results show that all correlations are significant for the C# programs. This means that the data pinpoints to a trend: when the normal (NormalChurnedLOC) and exceptional code (EHChurnedLOC) changes, these changes are likely to adversely impact the robustness of C# programs. In particular, when additions and modifications in the normal and exceptional code are not made together with proper exception handlers, the number of uncaught flows often increases in C# programs. In contrast, no significant relationship was observed for the Java programs.
TABLE VII: The Percentage of Change Scenarios Leading to Uncaught Exceptions Scenarios No Changes To Catch Block Adding Specific Handler Adding Generic Handler Adding Generic Handler Rethrowing Adding Specific Handler Rethrowing Removing Generic Handler Removing Specific Handler Changing Exception Handling Policy Changing Code Inside Catch Block To Use Normal Code
Java 10,4% 16,6% 5,6% 3,0% 1,6% 12,7% 48,6% 1,5%
C# 14,3% 23,8% 4,8% 26,2% 11,9% 2,4% 7,1% 9,5%
To further understand the scenarios that negatively impact on software robustness, Table VII lists the change scenarios that most often generated uncaught exceptions. For Java, the Changing of the exception handling policy scenario was responsible for generating 48,6% of the uncaught exceptions, the most common one. Most of these cases were related to checked exception being remapped to unchecked exception. This finding is confirmed by the high number of exception interfaces removed from the method signature (Table V).
In order to understand which change scenario impacts more on software robustness, we classified the scenarios according to ΔU ncaught as described in Section III-B2. Table VI depicts the frequency on how each change scenario decrease, increase or had no effect on the number of uncaught flows. An analysis of the last row in Table VI reveals that 81,5% of all Java change scenarios had no impact on the number of uncaught flows and 6,1% of the scenarios were responsible for reducing the number of uncaught flows. Only 12,4% of the Java change scenarios increased the number of uncaught flows whereas this value almost doubled to 23,5% for the C# programs. The major difference between Java and C# can be observed for the change scenarios where the provision of reliability checks by the programming language avoided the introduction of new uncaught flows. For instance, 78,6% of the C# changes scenarios that added a generic handler to rethrow exceptions created new uncaught flows whereas none was created by Java programs.
In C#, the addition of catch blocks seemed to increase the number of uncaught flows. In particular, this was the case when adding generic or specific handlers intended to remap and re-throw exceptions. These cases represented 26.2% and 11.9%, respectively, of the uncaught exceptions (last column of Table VII). Uncaught exceptions in C# were also often caused by changes to the handlers to catch more specific exception types. They represented 23.8% of uncaught flows. Surprisingly, all these types of changes were (apparently) focused on improving exception handling. However, they recurrently led to exception occurrences being uncaught. In particular, remapping of exceptions, for instance, is commonly referred as a good programming practice, but a common source of bugs in C# programs.
As it can be observed in Table VI, the results for the Java programs generally present lower increase in the number of uncaught flows than the C# programs. In order to assess whether the results for the two groups (Java and C# programs) are different, we tested if there was a statistically significant difference between the values of the metric RelativeUncaughtFlow computed for each group (Java and C#). The result of the Mann-Whitney U test for RelativeUncaughtFlow (U=3.155,00; z= -4,095; p < 0.001) showed that there is a statistically significant difference between them. C# programs showed higher median values (Mdn= 2,56) against 0 in Java for RelativeUncaughtFlow. This result suggests that modifications in the normal and exceptional code exert a higher negative impact in the robustness of C# programs.
V.
T HREATS TO VALIDITY
This section discusses threats to validity that can affect the results reported in this paper. a) Internal Validity.: Threats to internal validity are mainly concerned with unknown factors that may have had an influence on the experimental results [28]. To reduce this threat, we have selected a set of subject programs whose developers had no knowledge that this study was being performed for them to artificially modify their coding practices. Additionally, our results can be influenced by the performance,
Table VI also helps to explain the findings of Section IV-A. For instance, the second row of Table VI shows that adding
38
TABLE VI: The Percentage of Change Scenarios Increasing or Decreasing Uncaught Flows.
Scenarios No Changes To Catch Block Adding Specific Handler Adding Generic Handler Adding Empty Generic Handler Adding Generic Handler Rethrowing Exception Adding Specific Handler Rethrowing Exception Adding Empty Specific Handler Removing Generic Handler Removing Specific Handler Changing Exception Handling Policy Changing Code Inside Catch Block To Use Normal Code Total
Decrease Java C# 6,5% 11,8% 5,0% 7,1% 4,2% 4,0% 18,2% 0,0% 0,0% 0,0% 0,0% 0,0% 50,0% 100,0% 11,1% 27,1% 13,8% 25,0% 2,5% 14,3% 12,3% 0,0% 6,1% 11,7%
in terms of precisions and recall, of the used tools. We tried to limit the number of false positive through a manual validation. The use of this validation mitigates threats to internal validity, but does not completely remove them as a certain amount of imprecision cannot be avoided by the tools that generally deal with undecidable problems. Likewise, the change metrics collected manually were validated by one Ph.D. student who was not aware of the experimental goal.
No Effect Java C# 82,9% 52,9% 75,9% 21,4% 80,2% 92,0% 81,8% 100,0% 100,0% 21,4% 84,4% 16,7% 50,0% 0,0% 86,0% 70,8% 71,7% 75,0% 76,5% 42,9% 81,1% 55,6% 81,5% 64,8%
subject programs are overly simplistic, implementing only one of the following actions: logging, user notification and application termination. The study of Cabral and Marques and our study are the only ones that assessed and compared EHMs implemented in Java and C#. Our study extend theirs by analyzing the subject programs during their evolution, since Cabral and Marques analyzed only one version of their subject programs. Moreover, our study also extend theirs by analyzing the subject programs in terms of robustness and by comparing the approaches adopted in Java programs to C# programs.
b) Construct Validity.: Threats to construct validity concern the relationship between the concepts and theories behind the experiment and what is measured and affected [28]. To reduce these threats, we use metrics that are widely employed to quantify the extent of changes over the software evolution [29], [22], [14] and to measure the software robustness [25], [12], [10], [18].
Coelho et al. [10] investigated the interplay between exception handling and software robustness in the context of AspectJ programs. Their results revealed that during programs evolution the number of uncaught exceptions increased. Marinescu [19] and Sawadpong et al. [26] studied the relationship between exceptions and robustness in the context of Java programs. Marinescu [19] revealed that classes that implement exception handling are more likely to exhibit defects than classes that do not use exceptions. Sawadpong et al. [26] revealed that the defect density of exceptional code is almost three times higher than the overall defect density. The defect density is defined as “the number of known defects caused by exceptions divided by the number of lines of non-commented source code” [26]. Our study also investigated the relationship between exceptions and robustness, but we went deeper into the analysis of more fine-grained properties of exception handling code. Our study not only collected robustness and change metrics, but also adopted a qualitative approach to comprehend and described in details a set of fault-proneness exception handling scenarios. In this sense, our work can be seen as a more accurate complement of the works of Coelho et al., Marinescu and Sawadpong et al.
c) External Validity.: External validity issues may arise from the fact that all the data is collected from 32 software systems that might not be representative of the industrial practice. However, the heterogeneity of these systems helps to reduce this risk. They are implemented in Java and C#, which are two representative languages in the state of objectoriented practice. Most of the applications are widely-used and extensively evaluated in previous research [3]. To conclude, the characteristics of the selected systems, when contrasted with the state of practice, represent a first step towards the generalization of the achieved results. d) Reliability Validity.: This threat concerns the possibility of replicating this study. The source code of the subject programs is publicly available. The way our data was collected is described in detail in Section III-B. Moreover, the eFlowMining tool is available to obtain the same data. Hence, all the details about this study are available elsewhere [8] to enable other researchers to control it. VI.
Num. Uncaught Flows Increase Java C# 10,7% 35,3% 19,2% 71,4% 15,6% 4,0% 0,0% 0,0% 0,0% 78,6% 15,6% 83,3% 0,0% 0,0% 2,9% 2,1% 14,5% 0,0% 21,0% 42,9% 6,6% 44,4% 12,4% 23,5%
In a previous study [9], we performed a thorough analysis of the relationship between exceptions and robustness in the context of evolving C# programs. We revealed that most problems hindering robustness during the evolution of the C# programs analyzed stemmed from changes in the normal code. We also revealed that many faults were introduced during changes in the exceptional code. In this paper we extended our previous work by also assessing Java programs and by
R ELATED W ORK
Cabral and Marques [3] analyzed exception handlers of 32 different open-source projects from different domains and implemented in both Java and C#. Their results revealed that the majority of exception handlers implemented in their
39
[9]
comparing the EHMs of Java and C# in terms of robustness. In fact, this work is the first one to compare different EHMs in terms of robustness in the context of evolving programs.
[10]
VII.
C ONCLUSION [11]
In this paper, we extended our previous work [9] by comparing the EHMs implemented by Java and C# and assessing their impact on the robustness and the evolution of programs. Our results suggest that the EHM of Java requires more modifications in the exceptional code during the evolution than the mechanism implemented by C#. On the other hand, the EHM of C# was more fragile and often decreased programs robustness when compared to the mechanism implemented by Java. Moreover, the most fault-prone change scenarios observed in both languages were actually related to attempts in improving the exception handling code. This suggests that current EHMs still do not provide proper support for developers maintaining exception handling code during evolution. Our ongoing work encompasses the implementation of the EFlow [4], [5], [1] model for C# programming language and compare its results with the Java ones (obtained using EJFlow [5]). EFlow aims to improve the simultaneous satisfaction of software maintainability and reliability by using the notion of explicit exception channels, which support modular representation of global exceptional-behavior properties.
[12]
[13]
[14]
[15] [16] [17]
[18]
[19]
ACKNOWLEDGMENT This work was partially supported by the following institutions: National Institute of Science and Technology for Software Engineering - INES, grant 573964/2008-4, and; the Brazilian National Agency of Petroleum, Natural Gas and Biofuels, PRH-22/ANP/MCTI Program.
[20]
[21]
R EFERENCES [1]
[2]
[3]
[4]
[5]
[6] [7]
[8]
[22]
J. Araujo, R. Souza, N. Cacho, A Martins, and P.AS. Neto. Handling contract violations in java card using explict exception channels. In Exception Handling (WEH), 2012 5th International Workshop on, pages 34–40, June 2012. Ken Arnold, James Gosling, and David Holmes. The Java Programming Language. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000. Bruno Cabral and Paulo Marques. Exception handling: A field study in java and .net. In ECOOP 2007 - 21st European Conference ObjectOriented Programming, volume 4609 of Lecture Notes in Computer Science, pages 151–175. Springer, 2007. Nelio Cacho, Thomas Cottenier, and Alessandro Garcia. Improving robustness of evolving exceptional behaviour in executable models. In Proceedings of the 4th International Workshop on Exception Handling, WEH ’08, pages 39–46, New York, NY, USA, 2008. ACM. Nelio Cacho, Fernando Castor Filho, Alessandro Garcia, and Eduardo Figueiredo. Ejflow: taming exceptional control flows in aspect-oriented programming. In AOSD ’08: Proceedings of the 7th international conference on Aspect-oriented software development, pages 72–83, New York, NY, USA, 2008. ACM. Flaviu Cristian. A recovery mechanism for modular software. In ICSE ’79, pages 42–50.A, 1979. Alessandro F. Garcia et al. A comparative study of exception handling mechanisms for building dependable object-oriented software. The Journal of Systems and Software, 59(2):197–222, 2001. Nelio Cacho et al. Study website: http://www.dimap.ufrn.br/ neliocacho/csharpjavastudy.htm, May 2014.
[23]
[24]
[25]
[26]
[27]
[28]
[29]
40
Nelio Cacho et al. Trading robustness for maintainability: An empirical study of evolving c# programs. In ICSE ’14: Proceedings of the 36th International Conference on Software Engineering, New York, NY, USA, 2014. ACM. Roberta Coelho et al. Assessing the impact of aspects on exception flows: An exploratory study. volume 5142 of Lecture Notes in Computer Science, pages 207–234. Springer, 2008. Fernando Castor Filho, Alessandro Garcia, and Cecilia Mary F. Rubira. Extracting error handling to aspects: A cookbook. In IEEE International Conference on Software Maintenance, 2007. ICSM 2007, pages 134– 143, 2007. Chen Fu and Barbara G. Ryder. Exception-chain analysis: Revealing exception handling architecture in java server applications. In ICSE ’07, pages 230–239. IEEE Computer Society, 2007. I. Garcia and N. Cacho. eflowmining: An exception-flow analysis tool for .net applications. In Dependable Computing Workshops (LADCW), 2011 Fifth Latin-American Symposium on, pages 1–8, 2011. Emanuel Giger, Martin Pinzger, and Harald C. Gall. Comparing fine-grained source code changes and code churn for bug prediction. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pages 83–92, New York, NY, USA, 2011. ACM. John B. Goodenough. Exception handling: issues and a proposed notation. Commun. ACM, 18(12):683–696, 1975. W. G. Hopkins. A new view of statistics, November 2013. Peter A. Lee and Thomas Anderson. Fault Tolerance: Principles and Practice. Dependable computing and fault-tolerant systems. Berlin ; New York, second edition, 1990. A.K. Maji, F.A. Arshad, S. Bagchi, and J.S. Rellermeyer. An empirical study of the robustness of inter-component communication in android. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on, pages 1–12, 2012. Cristina Marinescu. Are the classes that use exceptions defect prone? In Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution, IWPSE-EVOL ’11, pages 56–60, New York, NY, USA, 2011. ACM. P. M. Melliar-Smith and B. Randell. Software reliability: The role of programmed exception handling. In Proceedings of an ACM conference on Language design for reliable software, pages 95–100, 1977. Bertrand Meyer. Object-Oriented Software Construction. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997. Nachiappan Nagappan and Thomas Ball. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on Software engineering, ICSE ’05, pages 284–292, New York, NY, USA, 2005. ACM. D. L. Parnas and H. Wurges. Response to undesired events in software systems. In ICSE ’76: Proceedings of the 2nd international conference on Software engineering, pages 437–446, Los Alamitos, CA, USA, 1976. IEEE Computer Society Press. D. Reimer and H. Srinivasan. Analyzing exception usage in large java applications. In Proceedings of ECOOP’2003 Workshop on Exception Handling in Object-Oriented Systems, pages 10–18, 2003. Martin P. Robillard and Gail C. Murphy. Static analysis to support the evolution of exception structure in object-oriented systems. ACM Trans. Softw. Eng. Methodol., 12(2):191–221, 2003. P. Sawadpong, E.B. Allen, and B.J. Williams. Exception handling defects: An empirical study. In High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on, pages 90–97, 2012. Marko van Dooren and Eric Steegmans. Combining the robustness of checked exceptions with the flexibility of unchecked exceptions using anchored exception declarations. In OOPSLA ’05, pages 455–471, New York, NY, USA, 2005. ACM Press. Claes Wohlin, Per Runeson, Martin Hst, Magnus Ohlsson, Bjorn Regnell, and Anders Wessln. Experimentation in Software Engineering - An Introduction. Springer, 2000. Stephen S. Yau and James S. Collofello. Design stability measures for software maintenance. IEEE Trans. Softw. Eng., 11(9):849–856, 1985.