IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 29,
NO. 1,
JANUARY 2003
77
Predicting Maintenance Performance Using Object-Oriented Design Complexity Metrics Rajendra K. Bandi, Vijay K. Vaishnavi, Fellow, IEEE, and Daniel E. Turk, Member, IEEE Abstract—The Object-Oriented (OO) paradigm has become increasingly popular in recent years. Researchers agree that, although maintenance may turn out to be easier for OO systems, it is unlikely that the maintenance burden will completely disappear. One approach to controlling software maintenance costs is the utilization of software metrics during the development phase, to help identify potential problem areas. Many new metrics have been proposed for OO systems, but only a few of them have been validated. The purpose of this research is to empirically explore the validation of three existing OO design complexity metrics and, specifically, to assess their ability to predict maintenance time.This research reports the results of validating three metrics, Interaction Level (IL), Interface Size (IS), and Operation Argument Complexity (OAC). A controlled experiment was conducted to investigate the effect of design complexity (as measured by the above metrics) on maintenance time. Each of the three metrics by itself was found to be useful in the experiment in predicting maintenance performance. Index Terms—Object-oriented metrics, software maintenance, metrics validation, predicting software maintenance time.
æ 1
INTRODUCTION
T
HE object-oriented (OO) paradigm has become increasingly popular in recent years as is evident by more and more organizations introducing object-oriented methods and languages into their software development practices. Claimed advantages of OOP (object-oriented programming) include easier maintenance through better data encapsulation [10]. There is some evidence to support the claim that these benefits may be achieved in practice [36], [44]. Although maintenance may turn out to be easier for programs written in OO languages, it is unlikely that the maintenance burden will completely disappear [50]. Maintenance, in its widest sense of “post deployment software support,” is likely to continue to represent a very large fraction of total system costs. Maintainability of software thus continues to remain a critical area even in the objectoriented era. Object-oriented design can play an important role in maintenance especially if design-code consistency is maintained [6], [24]. The control of software maintenance costs can be approached in several ways. One approach to controlling software maintenance costs is the utilization of software metrics during the development phase. These metrics can be utilized as indicators of the system quality and can help identify potential problem areas [19], [38], [43]. Several
. R.K. Bandi is with the Quantitative Methods and Information Systems Department, Indian Institute of Management, Bannerghatta Rd., Bangalore-560 076, India. E-mail:
[email protected]. . V.K. Vaishnavi is with the Department of Computer Information Systems, Georgia State University, PO Box 4015, Atlanta, GA 30302-4015. E-mail:
[email protected]. . D.E. Turk is with the Computer Information Systems Department, Colorado State University, 026 Rockwell Hall, Fort Collins, CO 80523. E-mail:
[email protected]. Manuscript received 25 June 2001; revised 23 May 2002; accepted 12 Sept. 2002. Recommended for acceptance by G. Canfora. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number 114424. 0098-5589/03/$17.00 ß 2003 IEEE
metrics applicable during the design phase have been developed. Several studies have been conducted examining the relationships between design complexity metrics and maintenance performance and have concluded that designbased complexity metrics can be used as predictors of maintenance performance; many of these studies, however, were done in the context of traditional software systems [20], [25], [29], [40], [41]. The OO approach involves modeling the real world in terms of its objects, while more traditional approaches emphasize a function-oriented view that separates data and procedures. Chidamber and Kemerer [17] argue that, because of the fundamentally different notions inherent in these two views, software metrics developed with traditional methods in mind do not direct themselves to notions such as classes, inheritance, encapsulation, and message passing. Therefore, given that such metrics do not support key OO concepts, it seems appropriate to have new metrics especially designed to measure the unique aspects of the OO design [1], [2], [3], [4], [11], [17], [21], [22], [23], [27], [28], [35], [46], [47]. To be useful in practice, such metrics need to be validated too. This exact course of action has been suggested by at least one set of researchers: “metrics which reflect the specificities of the OO paradigm must be defined and validated” ([8], p. 751). Thus far, only a few empirical studies have investigated the relationship between the proposed metrics and OO design quality attributes such as maintainability [5], [8], [11], [12], [14], [26], [31], [33], [34]. Most of these studies have investigated two metrics sets (Chidamber and Kemerer’s metrics suite [17] and the MOOD metrics [3], [4]). The work of Briand et al. [11] is an exception and defines a number of new metrics for coupling and cohesion and investigates their relationship to fault-proneness in three large-scale projects. Another exception is the work of Cartwright and Shepperd [14] who show how accurate prediction systems for size and defects, based on certain simple counts, can be empirically built to suit a local context. Published by the IEEE Computer Society
78
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
Li et al. [34] study the metrics proposed by Chidamber and Kemerer [17] with reference to the maintenance effort in two commercial systems and conclude that these metrics in general can be used as predictors of maintenance effort [33], [34]. The major criticism about this work is that “Maintenance Effort” was operationalized as the number of lines of code changed. This is perhaps as controversial as using the lines of code (LOC) as a size metric. Another experimental study [8] conducted to validate these metrics as predictors of reliability (fault-proneness) found that five of the six metrics seem to be useful to predict class reliability during the early phases of the life cycle. Kolewe [31] confirms (based on a field study) that two of the metrics (class coupling and response for class) correlate with the defect densities. Briand et al. [12] study, in a university setting, a number of available metrics, including some of the Chidamber/Kemerer metrics, that measure structural properties of OO designs and report that the coupling and inheritance metrics studied are significantly related to the probability of detecting a fault in a class during testing. The work of Chidamber and Kemerer, however, is not without criticisms. Several researchers have pointed out ambiguities associated with some of these metrics [18], [30], [33]. For the MOOD metrics, Abreu and Melo [5] report that in an experimental study they found these metrics to correlate with the system reliability and maintainability. In another study, Harrison et al. [26] report that the MOOD metrics can be said to be theoretically valid, but only if appropriate changes are made to rectify existing problematic discontinuities. OO designs are relatively richer in information and, therefore, metrics, if properly defined, can take advantage of that information available at any early stage in the life cycle. Unfortunately, most of the prior research does not exploit this additional information. Three metrics, interaction level [1], [2], interface size [1], and operation argument complexity [15], which are the focus of the current paper, are among the metrics proposed and/or studied that seem to take advantage of some of the additional information available in an OO design. Their definitions include the use of interface size information in slightly different ways. Interaction level metric is the most complex out of the three metrics and additionally captures the potential interactions that may occur in an execution sequence. Operation argument complexity is the simplest of the three metrics. None of the studies of the interaction level [1], [2], [9], interface size [1], and operation argument complexity [15] metrics has validated the proposed metrics empirically. The metrics have, however, been subjectively validated, where the metrics values are compared to expert judgments. In such a study of 16 OO design quality metrics (including three Chidamber/Kemerer metrics [17]) by Binkley and Schach [9], the interaction level metric (also known as permitted interaction metric) was found to be the second best for predicting implementation and maintenance effort. The objective of the current paper is to present the results of a study that assessed the validity of predicting maintenance time from the design complexity of a system as measured by the three metrics mentioned above. These metrics have also been analytically validated [7] using the relevant mathematical properties specified by Weyuker [49]. This paper, however, focuses only on the empirical study.
VOL. 29,
NO. 1,
JANUARY 2003
Fig. 1. Research design.
The rest of the paper proceeds as follows: Section 2 discusses the design of the study, describing the dependent and independent variables, and the metrics that are to be validated. It presents the hypotheses to be tested, describes the subjects who participated in the study, and finally explains the data collection procedures, measurement instruments, and data collected. Section 3 presents an analysis of the data. Section 4 draws conclusions and makes suggestions for further work.
2
DESIGN
OF THE
EMPIRICAL STUDY
Fig. 1 summarizes the research design of this study,1 which suggests that design complexity, maintenance task, and programmer ability all influence maintenance performance. Maintenance performance is the dependent variable and design complexity, maintenance task, and programmer ability are independent variables. This paper reports on only the first two of these independent variables. The figure suggests that there may be some sort of causal relationship on maintenance performance by design complexity, maintenance task, and programmer ability. However, this study simply looked at whether a relationship exists, and whether these variables might be able to be used to predict maintenance time. The study does not make any claims with respect to causality. The empirical study was carried out using a controlled experiment in which students at a US university participated as subjects.
2.1 Dependent Variable Maintainability is defined as the ease with which systems can be understood and modified [25]. In past studies, it has been operationalized as “number of lines of code changed” [33], [34], time (required to make changes) and accuracy [20], [25], and “time to understand, develop, and implement modification” [39]. In this study, following Rising [39], maintainability was operationalized as “time to understand, develop, and actually make modifications to existing programs.” We did not include accuracy in the maintenance measurement because of the following reasons: 1) An inverse relationship exists between time (for making 1. In order to make it easy to replicate the experiment, all instruments, forms, designs, instructions, etc., have been made available at the Web site: http://www.biz.colostate.edu/faculty/dant/pages/papers/ IEEETrxSE-OOMetrics/20020917/.
BANDI ET AL.: PREDICTING MAINTENANCE PERFORMANCE USING OBJECT-ORIENTED DESIGN COMPLEXITY METRICS
79
changes) and accuracy. 2) For the measured accuracy to be statistically useful, the maintenance should be done in some restricted amount of time. 3) To counter the criticism of Rombach [40], that the students used as participants lack motivation, we designed the experiment as a required assignment for a course that was graded and the grade counted towards the course grade. With this as the motivating factor, it was not feasible to restrict the student to finish the assignment in a constrained amount of time in a single sitting.
2.2
Independent Variables
2.2.1 Design Complexity Interaction level (IL) [1], [2], interface size (IS) [1], and operation argument complexity (OAC) [15] were chosen as measures of design complexity in this study. All three metrics have been subjectively validated by comparing their values to experts’ judgments and have been found to perform well [1], [2], [9], [15]. The three metrics are described below. The fundamental basis for the interaction level metric, as well as for the other two metrics, is the assumption that the greater the interface, the more scope for (direct) interactions and interaction increases complexity. This assumption is consistent with the notions of complexity suggested by various researchers. Weyuker has developed a formal list of properties for software complexity metrics [49]. Interaction is one of these properties. Bunge defines complexity of a “thing” to be the “numerosity of its composition,” implying that a complex “thing” has a large number of properties [13]. Using this definition as a base, the complexity of an object class can be defined to be the cardinality of its set of properties. Abbott [1] extends this notion further and defines the complexity to be a function of the interactions of its set of properties. In the case of objects and classes, the methods and data attributes are the set of properties and, therefore, complexity of a class is a function of the interaction between the methods and the data attributes. The concept of IL specifies the amount of potential (direct) interaction that can occur in a system, class, or method. For example, the IL of a method indicates the amount of (direct) interaction that can occur whenever a method is invoked. To explain further, whenever a method is invoked, its parameters are used for some internal computation along with some of the data attributes associated with the class to which that method belongs. Also, a value (object) may be passed back to the calling routine. (Thus, the parameter count used in IL includes both the regular method parameters and any return value if one exists.) There is said to be an “interaction” between two entities A and B if the value of entity A is calculated directly based on the value of entity B, or vice versa. In the context of the interaction level metric, if the value of some data attribute is calculated directly based on the value of one or more of the parameters, or vice versa, then there is said to be an interaction between the parameters and the data attribute. It is expected that a higher interaction level will correlate with an increased difficulty in determining how to implement or modify a design. The interaction level metric can be computed at varying levels of granularity: The interaction level of a class is the sum of the interaction levels of its methods. The interaction
Fig. 2. Parameter/attribute sizes and their types.
level of a design is the sum of the interaction levels of its classes. The current study validates IL and the other two metrics at the design level. Both interaction level and interface size metrics use the concept of “number” and “strength.” For example, the interaction level of a method depends on the number of interactions and the strength of interactions. The size of a parameter (argument) or attribute is a specified constant (see Fig. 22), signifying the complexity of the parameter/ attribute type. The strength of interaction is defined as the product of the sizes of the parameters/attributes involved in the interaction. It is necessary to use both number and strength because they typically have an inverse relationship in the sense that decreasing one increases the other and vice versa. Also, a large increase in either number or strength (of interactions) could increase the complexity. Accordingly, the interaction level (IL) of a method is defined as: IL = K1* (number of interactions) + K2* (sum of strength of interactions). The constants K1 and K2 used in the linear combination are tentatively set to 1 for simplicity and to balance the effect of the strength of interactions and the number of interactions. They are however subject to revision as experience is gained with the metric. This approach is consistent with assumptions made by other researchers in tentatively fixing a value for the constants in metric definitions [16]. It is to be noted that the interaction level metric is derived based on the number and the strength of the interactions “permitted” by the design. These interactions may or may not actually occur in realizing the method. For example, a parameter of a method may, upon implementation, be seen to interact with only one of the data attributes, not all of them. Nonetheless, the design of the method has created the mechanism for these interactions to occur and hence “permits” them. Whether or not all the interactions occur and how many times they occur is an implementation issue. The presence or absence of the mechanism is a design issue and, hence, serves as an appropriate base for a design metric. The concept of interface size gives a measure of the means for information to flow in and out of class encapsulation. Some classes define many methods, perhaps many of which have complex signatures (i.e., parameter lists) that provide abundant means for information to flow 2. The size values in Fig. 2 are based on the values suggested in [1] and [15]. The size value for Boolean type is used if a parameter’s intended use (as a Boolean) is clear from the context.
80
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 29,
NO. 1,
JANUARY 2003
Fig. 3. Class quadrilateral’s interactions.
in and out of their encapsulation. Other classes may provide few methods, many of which have simple signatures. It is expected that a larger interface size will correlate with an increased difficulty in comprehending how to select and correctly use the services provided by a class. Interface size (IS) of a method is defined as: IS = K3* (number of parameters) + K4* (sum of sizes of parameters). As in the case of the definition of IL, the constants K3 and K4 used in the linear combination are tentatively set to 1 for simplicity and to balance the effect of the number of parameters and size of the parameters. They are, however, subject to revision as experience is gained with the metric [16]. Interface size of a class is the sum of the interface sizes of its methods. The interface size of a design (the focus of the current study) is the sum of the interface sizes of its classes. Operation argument complexity is the simplest of the three metrics. Operation argument complexity (OAC) of a method is defined as: OAC = P(i), where P(i) is the size of each parameter as specified in Fig. 2. Operation argument complexity of a class is the sum of the operation argument complexities of its methods. The operation argument complexity of a design (the focus of the current study) is the sum of the operation argument complexities of its classes. Example. The following example demonstrates the computation of these metrics. Let us consider a sample class “quadrilateral” which has eight float attributes (x1, x2, x3, x4, y1, y2, y3, y4) to store each of the four vertices. Assume that this class has one method “hasVertex” which takes a point (two float variables x, y) as a parameter and returns a Boolean value to indicate if the point is inside the quadrilateral. Accordingly, the class can be defined as below: Class Quadrilateral Attributes: float x1; y1; float x2; y2; float x3; y3; float x4; y4; Methods: Boolean hasVertex(float x, float y) The interaction permitted in the class is shown in Fig. 3. Based on the above information and Fig. 2, we can notice that 1) the size of each float data attribute (x1, .., x4, y1, .., y4) = 2. 2) The size of each float parameter (x, y) = 2. 3) The size of the Boolean return value = 0. 4) The strength of each interaction (involving parameter x or parameter y) = size of a data attribute (2) * the size of x or y (2) = 4. 5) The strength of each interaction which involves return value = size of a data attribute (2) * the size of return value (0) = 0.
Interaction level (IL) for the method “hasVertex” = K1*(number of interactions) + K2*(sum of strength of interactions), where K1 = K2 = 1. Thus, IL = 1*(number of data attributes * number of method parameters) + 1*(sum of strength of eight interactions with parameter x, eight interactions with parameter y, and eight interactions with return value) = (8*3) + ((8*4) + (8*4) + (8*0)) = 24 + 64 = 88. (Note that the return value of a method is treated similarly as a method parameter.) Therefore, IL = 88. Since the class has only one method, IL of the class = 88. Interface size (IS) for the method “hasVertex” = K3*(number of parameters) + K4*(sum of size of parameters), where K3 = K4 = 1; the method has 3 parameters (including return value), the size of parameter x = 2, the size of parameter y = 2, and the size of the return value (treated as a parameter) = 0. Therefore, IS = 1*(3) + 1*(2+2+0) = 7. Since the class has only one method, IS of the class = 7. Operation argument complexity (OAC) for the method “hasVertex” = P(i), where P(i) is the size of each parameter of the method = (size of parameter x + size of parameter y + size of return value) = (2+2+0) = 4. Therefore, the OAC of the class = 4.
2.2.2 Maintenance Task The second independent variable in the study was maintenance task. Most researchers categorize maintenance activities as adaptive, corrective, and perfective [32]. Adaptive maintenance is environment-driven. The need for adaptive maintenance arises when there are changes in hardware, operating systems, files, or compilers, which impact the system. Corrective maintenance is error-driven. This activity is equivalent to debugging, but it occurs after the system is placed in operation. Since programs are never truly error free, corrective maintenance is required throughout the life of a system. Perfective maintenance is user driven. Most perfective maintenance occurs in the form of report modifications to meet changing user requirements [32]. The bulk of maintenance activities are of this latter type. To be representative, two maintenance tasks were used in the study, one of which was perfective and the other was corrective. 2.2.3 Variables Held Constant Languages, hardware, interface modes, tools, and techniques used in software production may influence performance, and are used to measure environment measures. Managerial styles, such as team structures, communication, and feedback are examples of managerial measures. In this
BANDI ET AL.: PREDICTING MAINTENANCE PERFORMANCE USING OBJECT-ORIENTED DESIGN COMPLEXITY METRICS
research, environmental and managerial measures were kept constant and hence not included in the research model.
2.3 Summary of Research Variables Based on the above research model, in this study our main research objective was to focus on the relationship between the design complexity metrics and the maintenance performance. Since we measured design complexity using the metrics we wished to validate, if these metrics are indeed valid metrics of design complexity, we expected to see a positive correlation between design complexity and maintenance time. We studied this relationship in the contexts of both perfective and corrective maintenance tasks. 2.4 Hypotheses The hypotheses for the study are derived from the following proposition: P1. There is a relationship between the complexity of a system’s design and the maintenance time required to make changes. Propositions are generic statements made based on the research model discussed earlier (Fig. 1). P1 is a generic statement made based on the research model. There are numerous ways to assess whether “a relationship” exists between two variables: t-test/ANOVA, correlation, regression, etc. For each of the metrics of interest in this study, we performed three types of tests—ANOVA, correlation, and regression—to assess whether a relationship indeed seems to exist, to see whether each complexity metric can be used as a reliable indicator of expected maintenance time. Each test is expressed in terms of a hypothesis. Both the Null (HO) and the Alternate hypotheses (HA) are shown. The null hypothesis says that maintenance time does not vary as a function of the metric. If a metric is valid, we expected to find a significant relationship between the metric and the maintenance time, and hence our objective is to be able to reject the null hypotheses. The following hypotheses formalize these tests: H1O: There is no difference in the maintenance time required to make changes to systems, irrespective of whether they have low- or high-complexity designs: 1 ¼ 2 . H1A: There is a difference in the maintenance time required to make changes to systems, depending on whether they have low- or high-complexity designs: 1 6¼ 2 . H2O: There is no correlation between the complexity of a system’s design and the maintenance time required to make changes to that system: = 0. H2A: There is a nonzero correlation between the complexity of a system’s design and the maintenance time required to make changes to that system: 6¼ 0. H3O: There is no linear regression relationship between the complexity of a system’s design and the maintenance time required to make changes to that system: i ¼ 0.
81
H3A: There is a nonzero linear regression relationship between the complexity of a system’s design and the maintenance time required to make changes to that system: i 6¼ 0. We measured a system’s complexity with each of the three metrics, IL, IS, and OAC, and applied each of the hypotheses to each of the three metrics. Thus, there were nine tests that were run in order to assess Proposition P1. Proposition P1 and the resulting nine tests served as the main objective of our research, which was to validate the metrics (IL, IS, and OAC).
2.5
Study Participants and the Experimental Treatments The experiment was conducted over a duration of two quarters and subjects came from a total of five sections (three sections in one quarter and two sections in the other quarter) with number of students ranging from 10 to 25. Also, these sections were taught by a total of four different instructors (one instructor taught two sections). The subjects participating in this research consisted of students taking the undergraduate “Advanced Object-Oriented Programming” course at a US university. The prerequisites of this course include that the students must have successfully taken at least the “Introduction to OO Programming” course. There were 93 subjects who, on average, had taken 14 credits of CIS coursework, had a GPA of 3.5, and had two years of some type of system development experience within which they had eight months of OO experience. Fig. 4 (first table) summarizes this information. This information was collected using a questionnaire that was filled out by all subjects. (This questionnaire is available as Appendix F at the Web site for the paper—see footnote 1.) Two independent treatments were used in the experiment, one involving corrective maintenance and the other involving perfective maintenance, which constituted a required assignment (see Appendix E at the paper Web site for a sample of an actual assignment used). Two versions of each treatment were constructed and designated as the “low-complexity” version and the “high-complexity” version based on their corresponding metric (IL, IS, OAC) values. All the subjects from each of the five sections were assigned to work either on the low complexity or the highcomplexity version of each of the two treatments. (They were not told which version they had. These designations were used for the researcher’s identification only.) Electronic and hard copies of the source code, along with the design specifications and proper documentation were given to the participants. The maintenance timings were self-reported. The assignment of each section to the two treatments was based primarily on the desire to have at least thirty students for each treatment/version combination and secondarily on the desire to let a section have a low version of one treatment and the high version of the other treatment. The allocation of subjects to treatments is summarized in the second table of Fig. 4. Fiftyeight students completed the low-complexity version of Treatment 1 (Perfective, “Quadrilateral”); 35 completed the high-complexity version. Fiftyseven students completed the low-complexity version of Treatment 2 (Corrective, “Tractor-Trailer”); 36 completed the high-complexity version. Seven students did not finish the experiment or didn’t complete the profile survey.
82
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 29,
NO. 1,
JANUARY 2003
Fig. 4. Research design data.
The first treatment involved a system called “quadrilateral” (refer to Appendices A and B—at the paper web site). The subjects were required to perform a perfective maintenance task on this system. This task involved adding new functionality to the system—computing the area and perimeter of the quadrilateral. The second treatment involved a system called “Tractor-Trailer” (refer to Appendices C and D at the paper Web site). The subjects were required to perform a corrective maintenance task on this system. This task involved changing existing functionality of the system—changing the way taxes are computed for the tractor-trailer. The characteristics of these two systems as well as the corresponding metric values are summarized in the third table of Fig. 4. All of the four system designs (two versions for each of the two systems) were pilot tested before the experiment. The pilot test was conducted with students in the doctoral program in computer information systems at the same US university where the experiment was performed.
2.6 Research Design and Threats to Validity While there are limits to how widely a study based on students can be generalized, students are frequently accepted as valid subjects for experiments (e.g., [8], [12]). Likewise, the software that the subjects are asked to modify is quite simple when compared with industrial systems. The scope of the systems used in the study was limited in order to ensure that the participants could understand and perform changes in a reasonable duration of time. For similar reasons, the treatments were comprised of “constructed” systems rather than industrial strength code and only four levels of each metric were used. However, if a relationship is found between the complexity of these “toy” systems and maintenance time, then it is likely that an even more distinctive relationship may be seen in the real world when immensely more complex systems are used. Thus, while the results may not be directly generalizable to
professional developers and real-world systems, the academic setting is less complex and allows us to make useful initial assessments of whether these metrics would be worth studying further in environments where it may be more difficult and more costly to assess the proposed relationships. Another potential limitation and confounding factor could exist because the subjects were learning advanced OO programming at the time of the study and, thus their maintenance time might be much greater than that of “real” programmers. Thus, differences that showed statistical significance in the study may not show significance in the “real world.” Likewise, the fact that the maintenance timings were self reported in the study and were assumed to be accurate may be of concern. How precisely the students actually timed themselves is a valid issue even though particular attention was given to provide clear instructions in this regard; we decided to pay particular attention in the design of the experiment assignments to instructions for reporting maintenance time. However, since the students’ grades were not based on how much time they spent on the task, there is no inherent reason to believe that they would not report accurate times. Other research design issues include different quarters in which the experiment was conducted and different instructors involved. We decided to determine through analysis if the quarter or the instructor affected maintenance time. Limitations of the study are discussed in Section 4.1.
3
DATA ANALYSIS
As mentioned above, our main objective in this experimental study was to empirically explore the validation of the three design complexity metrics by assessing their ability to predict maintenance time. Accordingly, we focused on the relationship between these and the amount
BANDI ET AL.: PREDICTING MAINTENANCE PERFORMANCE USING OBJECT-ORIENTED DESIGN COMPLEXITY METRICS
of maintenance time, based on the research model discussed earlier. We conducted ANOVAs to determine if the mean maintenance times for the high- and low-complexity versions (categorized as high or low based on all three metric values) were significantly different, to be able to reject null hypothesis H1 for all three metrics. We conducted additional ANOVA tests to rule out the possibility that the different instructors who had taught the course sections, and the quarter in which the study was conducted, had any significant effect on the maintenance times observed. We also conducted a correlation analysis and both simple as well as multiple regression analyses to examine the importance of each metric (IL, IS, and OAC) in determining maintenance time. We found in all cases that the results were significant and, thus, were able to reject the null hypotheses H2 and H3 for all three metrics. In order to validate our analyses, we divided the data into model building and holdout data sets [37], [42], [45], [48]. Comparing the results obtained from the model-building and holdout data sets allowed us to gain confidence that our models and conclusions were valid. We built models using three different holdout sizes: 28 percent, 16 percent, and no holdout.
3.1 Experimental Results The following analyses were conducted on the data gathered. 3.1.1 Complexity versus Maintenance Time—Analysis of Variance (ANOVA) As mentioned above, each subject received two treatments: a quadrilateral system requiring perfective maintenance (treatment 1) and a tractor-trailer system requiring corrective maintenance (treatment 2). For each treatment, a single factor ANOVA was performed to verify if the true means of the dependent variable (maintenance time) for the two groups (high complexity and low complexity systems) were equal. Fig. 5 (first table) shows the mean maintenance times for the first treatment—quadrilateral system (perfective maintenance). As expected, the high-complexity version of the system had a higher mean time (about 125 minutes) compared to the low-complexity version (about 98 minutes). ANOVA was performed to test the statistical significance of this difference, and the results are as shown in the first table of Fig. 5. From the analysis we see that the P-value is less than 0.0001. Assuming as a null hypothesis that the complexity has no effect on the maintenance time, the probability of obtaining means as different as these due to chance is less than 0.0001. Therefore, as expected, this confirms proposition P1. Since the system was categorized as high or low complexity, based on the values of the three metrics, we can say that a system with greater IL (or IS, or OAC) requires more time to perform a given maintenance task than the time required by a system with lower IL (or IS, or OAC). Therefore, we can conclude that IS, IL, and OAC are valid complexity metrics, and can reject the null hypothesis H1 for the first treatment for all three metrics. Similar analysis was done on the maintenance times for the second treatment—tractor-trailer (corrective maintenance). Here again, the relative maintenance times are as
83
expected. Fig. 5 (first table) shows the mean maintenance times for the second treatment. The high complexity version of the system had a higher mean time of about 114 minutes compared to 84 minutes for the low complexity version. ANOVA was performed to test the statistical significance of this difference, and the results are as shown in the first table of Fig. 5. From the analysis, we see that the P-value is less than 0.0001. Assuming as a null hypothesis that the complexity had no effect on the maintenance time, the probability of obtaining means as different as these due to chance is less than 0.0001. Therefore, this also, as expected, confirms proposition P1. Since the system was categorized as high or low complexity based on the values of the three metrics, we can say that a system with greater IL (or IS, or OAC) requires more time to perform a given maintenance task than the time required by a system with lower IL (or IS, or OAC). Therefore, we can conclude that IS, IL, and OAC are valid complexity metrics and can reject the null hypothesis H1 for the second treatment also for all three metrics. We notice that, for both the treatments (perfective and corrective maintenance tasks), we are able to reject the null hypothesis H1 for all three metrics. Thus, metrics IS, IL, and OAC can be used to predict which system needs a higher maintenance time. This is consistent with the requirement for a valid complexity metric. We can therefore argue that the metrics IL, IS, and OAC are valid OO design complexity metrics. Before we fully accept this conclusion, we needed to perform additional analysis to rule out that extraneous variables, such as the instructor who taught the course or the quarter during which the study was done, were not responsible for the observed differences. Thus, it was decided to investigate if any of these factors had any impact on the observed timings. The results are described below. For Treatment 1 (quadrilateral), the low complexity version was administered to students from three sections taught by three different instructors. The mean times for the three groups were 96, 98, and 99 minutes, respectively. However, when we conducted an ANOVA, we noticed that the differences are not statistically significant (P value of 0.9122). Thus, we fail to reject the hypothesis that the true means of these three groups were equal. Similar analysis was done for the high complexity version of Treatment 1 (quadrilateral) and for both the low and high complexity versions of Treatment 2 (tractor-trailer). The results are shown in the second table of Fig. 5. In all the cases, we fail to reject the null hypothesis of equality of means. This eliminated the instructor as a variable for the differences in the maintenance timings. This was not surprising given the fact that all the instructors used the exact same lecture notes (transparencies) for teaching and gave similar assignments for the students, which was made possible because of the group effort in developing the course material. Next, we focused on the analysis to investigate if the quarter in which the experiment was performed had any impact on maintenance time. Since the experiment was conducted over a period of two quarters, we wanted to ensure that there was no significant difference between the two quarters. The data for each of the treatments was
84
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL. 29,
NO. 1,
JANUARY 2003
Fig. 5. Experiment ANOVAs.
separated by quarter and individual ANOVAs were conducted for each quarter. The ANOVA results (shown in Fig. 5, third table) indicate that, in all three cases, the high-complexity version took a significantly longer time than the low-complexity version. Thus, we could eliminate the quarter as a variable impacting maintenance time.
3.1.2 Impact of Complexity Metrics— Correlation Analysis Our next step in assessing the relationship between the metrics and maintenance time was simple correlation analysis. Fig. 6 (first table) shows the results. These correlations were calculated based on the modelbuilding data, not the holdout data. We also assessed the correlations from the holdout data. Fig. 6 (second table) summarizes these results. Note that, while the correlations are not the same as in the model-building data sets, they are all significant at the 0.05 level except for IL in the holdout data. It is a concern that IL does not correlate significantly with maintenance time in the holdout samples. However, since IL explained the smallest amount of variance (see Fig. 7) and because the holdout sample sizes were so small, this might explain why the results were not significant. Thus in almost every case we can reject the null hypothesis H2 for all three metrics and can conclude that the IL, IS, and OAC are all useful empirical predictors of maintenance time.
3.1.3 Impact of Complexity Metrics— Regression Analysis In this section, we discuss the results of regression analysis conducted to investigate the importance of each of the three complexity metrics (independent variables) in determining the maintenance time (dependent variable). Linear regression with one independent variable was performed for each of the three variables. Each of the variables IL, IS, and OAC was found to have a statistically significant positive relationship with maintenance time. Fig. 7 summarizes the test statistics. Based on the results, we can reject the null hypothesis H3 for each of the metrics and can again conclude that all three are valid predictors of maintenance time. Multiple regression analysis with all three variables together was then performed to determine the combined explanatory power of these variables. This, however, did not show any increase in the adjusted R-square. Instead, the regression coefficients of the variables were found to be highly unstable. The estimated standard deviation for the regression coefficient for each of the three metrics was also very high and the regression coefficient for each of the variables was statistically nonsignificant. Further, the regression coefficient for the variable, interaction level, became negative; it was positive in the simple regression analysis, and was also expected to be positive from theoretical consideration. All of these are classic symptoms of multicollinearity. This problem appeared when the three
BANDI ET AL.: PREDICTING MAINTENANCE PERFORMANCE USING OBJECT-ORIENTED DESIGN COMPLEXITY METRICS
85
Fig. 6. Results of correlational analysis.
Fig. 7. Regression analysis—metrics and maintenance time.
variables IL, IS, and OAC were used together in the multiple regression. We therefore conclude that the variables interaction level (IL), interface size (IS), and operation argument complexity (OAC) have a high degree of multicollinearity, which means that all these three variables account for most of the same variance in the maintenance time. This is not really surprising if one looks at the definitions for these metrics. One possible observation that can be made based on this is that deriving the three metric values based on essentially the same design information seems to be overkill. We can therefore conclude from a practitioner’s perspective that one would need only to measure using one of the three metrics (IL, IS, OAC). Finally, we see that IS and OAC explained more of the variance than did IL (25 percent each versus 12 percent for IL; see Fig. 7). Thus, we conclude that it would be more useful to use IS or OAC to predict maintenance time than to use IL.
3.2 Summary of Empirical Validation To summarize, our analysis showed the metrics interaction level, interface size, and operation argument complexity are empirically valid metrics for OO design complexity. In particular, in theory, we know that systems with higher complexity need more time than those with lower complexity in order to perform maintenance tasks. In this study, we categorized the relative complexity of the systems as high or low based on the values of the three metrics (IL, IS, and OAC). Based on this classification, we found in this study that the systems that were categorized as high complexity needed a higher time (for maintenance) than those categorized as low complexity. Thus, we conclude that the metrics IL, IS, and OAC are useful and valid metrics to measure the complexity of system design. We are confident in our results because we used three different measures (ANOVA, correlation, and regression) and used various model-building and holdout sample sizes and, in all cases,
(except for the correlation analysis for IL in the holdout samples) obtained results that support the same conclusion: all three metrics are valid predictors of maintenance time. The complexity metrics interaction level (IL), interface size (IL), and operation argument complexity (OAC) each by itself was found to be a useful predictor of maintenance time. However, IL, IS, and OAC all seemed to be measuring similar properties of the system design and, hence, are redundant. Computing only one of the three metrics should be sufficient and since IS and OAC each explained more of the variance than IL did, one of the two metrics IS and OAC may be the best choice.
4
CONCLUSIONS
AND
DISCUSSION
The main objective of this research was to empirically explore the validation of three object-oriented design complexity metrics: interaction level (IL), interface size (IS), and operation argument complexity (OAC). While not the focus of the current paper, the metrics have also been analytically validated [7] based on the relevant set of properties [49]. For empirical validation, a controlled laboratory experiment was conducted to achieve the research objective. Analysis of variance (ANOVA), correlation, and single and multiple regression analysis were used to quantitatively analyze the experimental data. A summary of the major findings from the experiment is presented here: 1) Each of the three complexity metrics by themselves was found to be useful in measuring the design complexity. 2) It is not necessary to measure all three metrics for a given design. Instead, any one of the three metrics (IL, IS, OAC) may be used in predicting maintenance performance (time to perform a given maintenance task). Given that IS and OAC each explained more of the variance than does IL, using one of them may be the best approach. The relative performance of IL in this regard, which is also the most complex of the three metrics, was somewhat surprising [9]. The research study does not make any claim on causality.
86
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
4.1 Limitations As was discussed in Section 2.6, it could be argued that the limited size of the systems modified, the limited number of levels of metrics present, the use of student subjects, and the fact that they were learning advanced OO techniques during the study are concerns great enough that the conclusions of the study could be suspect in the “real world.” However, we believe that, even with these issues, the study can provide useful information to software engineering practitioners and researchers. Some other possible limitations of the study include the small effect size observed and the simple model used. Effect Size: One large deficiency in the results of this study is the small adjusted R2 value—12-25 percent. Only a small amount of the variation in maintenance time is being explained by the complexity measured by the metrics (with IS as well as OAC at the 25 percent level). It would be nice to see a large adjusted R2 —75 percent, 90 percent, etc.—but because software maintenance is such a complex task, there are likely many issues that come into play besides the complexity of the design as measured by these metrics. Thus, clearly a more complex and comprehensive model would be desirable. However, the goal of the study was not to come up with a comprehensive model; it was simply to assess whether or not these metrics were useful in predicting maintenance time. The fact that they were found to be statistically significant predictors indicates that the study was successful within its scope; a negative but statistically significant result would have been a useful result as well. A further limitation of the study is that the computation of the metrics is not explicitly defined with respect to association, aggregation, and inheritance by their respective authors and the study does not include treatments using these concepts. Even though it is easy to interpret the computation of the metrics with respect to these concepts, the metrics have not been validated in such contexts. Yet another limitation of this study is that we have focused on only three of the many available design metrics that have been proposed in the literature. This study does not attempt to evaluate the other metrics or compare the three metrics used in the study with other metrics. Additional research is needed in this direction. In spite of its limitations, the study constitutes an important initial empirical work on the OO design metrics studied. 4.2 Further Research The experimental study can be extended and replicated in several directions: 1.
2.
The original metric definitions did not explicitly address unique object-oriented concepts such as inheritance. Future research can define appropriate metric computations for inheritance, aggregation, and association, and conduct a study to validate the metrics with respect to these OO concepts. For the design complexity metrics studied here, a study can be conducted to separately capture the time required to understand the system and task, make changes, and test the changes. Also, an analysis of the different ways the changes are made can be performed. This can provide additional information on the impact of design complexity on detailed maintenance activities.
3.
4.
VOL. 29,
NO. 1,
JANUARY 2003
A longitudinal investigation of one or more actively maintained systems can be conducted. The design complexity metrics being studied should be applied to the systems at the outset of the study and recomputed after each modification. Data can be gathered to evaluate how design complexity contributes to system deterioration, frequency of maintenance changes, system reliability, etc. This should provide useful information both to project managers as well as to system developers. A study can be conducted to compare the three design complexity metrics studied here to the other design metrics that have been proposed in the literature for their ability to predict maintenance performance.
ACKNOWLEDGMENTS The authors are indebted to the three anonymous reviewers and the associate editor, Dr. Gerardo Canfora, for their careful reading of the paper and their constructive suggestions for its improvement. This paper was partially supported by a research grant that V.K. Vaishnavi received from the Robinson College of Business, Georgia State University, Altanta.
REFERENCES [1] [2]
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
[13] [14] [15]
D. Abbott, “A Design Complexity Metric for Object-Oriented Development,” Masters thesis, Dept. of Computer Science, Clemson Univ., 1993. D.H. Abbott, T.D. Korson, and J.D. McGregor, “A Proposed Design Complexity Metric for Object-Oriented Development,” Technical Report TR 94-105, Computer Science Dept., Clemson Univ., 1994. B.F. Abreu and R. Carapuca, “Candidate Metrics for ObjectOriented Software within a Taxonomy Framework,” J. Systems and Software, vol. 26, pp. 87-96, 1994. B.F. Abreu, M. Goulao, and R. Esteves, “Toward the Design Quality Evaluation of Object-Oriented Software Systems,” Proc. Fifth Int’l Conf. Software Quality, Oct. 1995. B.F. Abreu and W.L. Melo, “Evaluating the Impact of ObjectOriented Design on Software Quality,” Proc. Third Int’l Software Metrics Symp., Mar. 1996. G. Antoniol, B. Caprile, A. Potrich, and P. Tonella, “Design-Code Traceability for Object-Oriented Systems,” Annals of Software Eng., vol. 9, pp. 35-58, 2000. R.K. Bandi, “Using Object-Oriented Design Complexity Metrics to Predict Maintenance Performance,” PhD dissertation, Georgia State Univ., 1998. V.R. Basili, L.C. Briand, and W.L. Melo, “A Validation of ObjectOriented Design Metrics as Quality Indicators,” IEEE Trans. Software Eng., vol. 22, no. 10, pp. 751-761, Oct. 1996. A.B. Binkley and S.R. Schach, “A Comparison of Sixteen Quality Metrics for Object-Oriented Design,” Information Processing Letters, vol. 58, pp. 271-275, 1996. G. Booch, “Object-Oriented Development,” IEEE Trans. Software Eng., vol. 12, no. 2, pp. 211-221, Feb. 1986. L.C. Briand, S. Morasca, and V.C. Basili, “Defining and Validating Measures for Object-Based High-Level Design,” IEEE Trans. Software Eng., vol. 25, no. 5, pp. 722-743, 1999. L.C. Briand, J. Wust, J.W. Daly, and D.V. Porter, “Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems,” The J. Systems and Software, vol. 51, pp. 245-273, 2000. M. Bunge, Treatise on Basic Philosophy: Ontology I: The Furniture of the World. Boston: Riedel, 1977. M. Cartwright and M. Shepperd, “An Empirical Investigation of an Object-Oriented Software System,” IEEE Trans. Software Eng., vol. 26, pp. 786-796, Aug. 2000. J-Y. Chen and J-F. Lu, “A New Metric for Object-Oriented Design,” Information and Software Technology, pp. 232-240, Apr. 1993.
BANDI ET AL.: PREDICTING MAINTENANCE PERFORMANCE USING OBJECT-ORIENTED DESIGN COMPLEXITY METRICS
[16] S.R. Chidamber and C.F. Kemerer, “Towards Metric Suite for Object-Oriented Design,” Proc. Sixth ACM Conf. Object Oriented Programming Systems, Language, and Applications (OOPSLA), pp. 197-211, Nov. 1991. [17] S.R. Chidamber and C.F. Kemerer, “A Metrics Suite for Object Oriented Design,” IEEE Trans. Software Eng., pp. 476-493, June 1994. [18] N.I. Churcher and M.J. Shepperd, “Comments on ’A Metrics Suite for Object Oriented Design’,” IEEE Trans. Software Eng., vol. 21, pp. 263-265, Mar. 1995. [19] D. Coleman, D. Lowther, and P. Oman, “The Application of Software Maintainability Models in Industrial Software Systems,” J. Systems Software, vol. 29, pp. 3-16, 1995. [20] B. Curtis, S.B. Shepperd, P. Milliman, M.A. Borst, and T. Love, “Measuring the Psychological Complexity of Software Maintenance Tasks with the Halstead and McCabe Metrics,” IEEE Trans. Software Eng., pp. 96-104, Mar. 1979. [21] D. De Champeaux, Object-Oriented Development Process and Metrics. Prentice-Hall, 1997. [22] L. Etzcorn, J. Bansiya, and C. Davis, “Design and Code Complexity Metrics for OO Classes,” J. Object-Oriented Programming, vol. 12, no. 1, pp. 35-40, 1999. [23] N.E. Fenton and S.L. Pfleeger, Software Metrics, A Rigorous and Practical Approach, second ed. Boston: Int’l Thomson Computer Press, 1997. [24] R. Fiutem and A. Antoniol, “Identifying Design-Code Inconsistencies in Object-Oriented Software: A Case Study,” Proc. Int’l Conf. Software Maintenance, pp. 94-102, 1998. [25] V.R. Gibson and J.A. Senn, “System Structure and Software Maintenance Performance,” Comm. ACM, pp. 347-358, Mar. 1989. [26] R. Harrison, S.J. Counsell, and R.V. Nithi, “An Evaluation of the Mood Set of Object-Oriented Software Metrics,” IEEE Trans. Software Eng., vol. 24, pp. 491-496, June 1998. [27] B. Henderson-Sellers, Object-Oriented Metrics: Measures of Complexity. Prentice-Hall, 1996. [28] B. Henderson-Sellers, “The Mathematical Validity of Software Metrics,” Software Eng. Notes, vol. 21, no. 5, pp. 89-94, Sept. 1996. [29] S. Henry and C. Selig, “Predicting Source-Code Complexity at the Design Stage,” IEEE Software, pp. 36-44, Mar. 1990. [30] M. Hitz and B. Montazeri, “Chidamber and Kemerer’s Metrics Suite: A Measurement Theory Perspective,” IEEE Trans. Software Eng., vol. 22, no. 4, pp. 267-271, Apr. 1996. [31] R. Kolewe, “Metrics in Object-Oriented Design and Programming,” Software Development, pp. 53-62, Oct. 1993. [32] B.P. Lientz and E.B. Swanson, Software Maintenance Management. Reading, Mass.: Addison-Wesley, 1980. [33] W. Li and S. Henry, “Object Oriented Metrics that Predict Maintainability,” J. Systems and Software, pp. 111-122, Nov. 1993. [34] W. Li, S. Henry, D. Kafura, and R. Schulman, “Measuring ObjectOriented Design,” J. Object-Oriented Programming, vol. 8, no. 4, pp. 48-55, July/Aug. 1995. [35] M. Lorenz and J. Kidd, Object Oriented Software Metrics. Englewood Cliffs, N.J.: Prentice Hall, 1994. [36] D. Mancl and W. Havanas, “A Study of the Impact of C++ on Software Maintenance,” Proc. IEEE Conf. Software Maintenance, pp. 63-69, Nov. 1990. [37] J. Neter, W. Wasserman, and M. Kutner, Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Designs, third ed. Boston: Irwin, 1990. [38] T. Pearse and P. Oman, “Maintainability Measurements on Industrial Source Code Maintenance Activities,” Proc. Int’l Conf. Software Maintenance, pp. 295-303, 1995. [39] L.S. Rising, “Information Hiding Metrics for Modular Programming Languages,” PhD dissertation, Arizona State Univ., 1992. [40] H.D. Rombach, “A Controlled Experiment on the Impact of Software Structure on Maintainability,” IEEE Trans. Software Eng., pp. 344-354, Mar. 1987. [41] H.D. Rombach, “Design Measurement: Some Lessons Learned,” IEEE Software, pp. 17-25, Mar. 1990. [42] R.D. Snee, “Validation of Regression Models: Methods and Examples,” Technometrics, vol. 19, pp. 415-428, 1977. [43] H.M. Sneed, “Applying Size Complexity and Quality Metrics to an Object-Oriented Application,” Proc. European Software Control and Metrics Conf.-Software Certification Programme in Europe (ESCOM-SCOPE), 1999. [44] H.M. Sneed and T. Dombovari, “Comprehending a Complex, Distributed, Object-oriented Software System: A Report from the Field,” Proc. Seventh Int’l Workshop Program Comprehension, 1999.
87
[45] M. Stone, “Cross-Validatory Choice and Assessment of Statistical Predictions (with Discussion),” J. Royal Statistical Soc. B, vol. 36, pp. 111-147, 1974. [46] D. Taylor, “Software Metrics for Object-Oriented Technology,” Object Magazine, pp. 22-28, Mar.-Apr. 1993. [47] D.P. Tegarden, S.D. Sheetz, and D.E. Monarchi, “A Software Complexity Model of Object-Oriented Systems,” Decision Support Systems, vol. 13, nos. 3/4, pp. 241-262, Mar. 1995. [48] W.N. Venables and B.D. Ripley, Modern Applied Statistics with S_PLUS. Springer, 1999. [49] E.J. Weyuker, “Evaluating Software Complexity Measures,” IEEE Trans. Software Eng., pp. 1357-1365, Sept. 1988. [50] N. Wilde and R. Huitt, “Maintenance Support for Object-Oriented Programs,” IEEE Trans. Software Eng., pp. 1038-1044, Dec. 1992. Rajendra K. Bandi received the PhD degree from the Robinson College of Business, Georgia State University. He is currently an assistant professor in the area of information systems at the Indian Institute of Management, Bangalore, India. His research and teaching interests include object-oriented analysis and design, software measurement, software reuse, and software development process models and maturity. In recent years, Dr. Bandi has been involved in research in the areas of knowledge management and social/ ethical issues in a computing based society.
Vijay K. Vaishnavi (SM’89-F’01) received the BE degree (with distinction) in electrical engineering from Jammu and Kashmir University, received the MTech and PhD degrees in electrical engineering (with major in computer science) from the Indian Institute of Technology, Kanpur, and conducted postdoctoral work in computer science for two years at McMaster University, Canada. Dr. Vaishnavi is currently a professor at the Department of Computer Information Systems at the Robinson College of Business, Georgia State University. His current areas of research interest include interorganizational systems (directory services, Web-based virtual communities, coordination, security), software development (objectoriented metrics, software specifications and their maturity, objectoriented modeling and design), and data structures and algorithms (multisensor networks and fusion). He has authored numerous papers in these and related areas. The US National Science Foundation and private organizations, including IBM, Nortel, and AT&T, have supported his research. His papers have appeared in IEEE Transactions on Software Engineering, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Computers, SIAM Journal on Computing, Journal of Algorithms, and several other major international journals and conference proceedings. Dr. Vaishnavi is an IEEE Fellow. He is also member of the IEEE Computer Society, the Association for Computing Machinery (ACM), and the Association for Information Systems (AIS). Daniel E. Turk received the MS degree in computer science from Andrews University, Berrien Springs, Michigan, in 1988 and the PhD degree in business administration (computer information systems) from Georgia State University, Atlanta, in 1999. He is currently an assistant professor in the Computer Information Systems Department at Colorado State University, Fort Collins. His research interests are in the areas of object-oriented systems, software engineering, business- and system-level modeling, software development process modeling, the value of modeling, and process improvement. He has papers and articles published in the Journal of Database Management, Information Technology & Management, and The Journal of Systems and Software, has presented papers at numerous conferences, including AIS, IRMA, OOIS, & OOPSLA, and has helped organize conferences and workshops in both the US and Europe. He is a member of the IEEE and the ACM.