Estimating the Size of Changes for Evolving Object Oriented Systems: a Case Study
Category: Experience Report Keywords: impact analysis, effort prediction, traceability, versions compliance check, object orientation
Abstract
ally throughout a project [12]; furthermore, estimates have also to be provided for post-release maintenance activities. Thus, it is necessary to predict size repeatedly throughout the entire software lifecycle. Most research on size prediction has dealt with traditional applications and traditional software development practices. Few methods have been proposed for Object Oriented (OO) software development. In this work we present and validate an approach to predict the size of changes during maintenance activities of OO systems. In a previous paper [1] we have presented an approach to establish and maintain traceability links, at the interface level, between subsequent releases of an OO software system. In this paper, we extend our approach to obtain a finer grained local detail of changes by integrating our tools with the GNU diff utility. As a result, the interface level evolution (added, deleted, modified classes and methods) is completed with more detailed metrics data (LOC added, modified and deleted). We use this enhanced traceability process to predict the size of changes in terms of added/modified LOCs from an estimate of the number of impacted classes. The approach has been experimented on 31 subsequent releases of DDD1 software and 9 releases of the LEDA2 library. Our process works on a code intermediate representation that encompasses the essentials of the class diagram in an OO design description language: the Abstract Object Language (AOL) [2, 13]. The process recovers an “as is” design from the code, compares recovered design of subsequent software releases, and builds a mapping between them. Bunge’s ontology [5, 6] has been taken as the conceptual framework to define the similarity criterion. An object is viewed as an individual which possesses properties. Comparing individuals for similarity translates into checking the similarity of the individuals’ properties. When in-
DR
AF
Size related measures have traditionally been the basis for effort estimation models to predict costs of software activities along the entire software product life cycle. Object-Oriented (OO) systems are developed and evolve by adding/removing new classes and modifying existing entities. We propose an approach to predict the size of changes of evolving OO systems based on the analysis of the classes impacted by a change request. Our approach can be used both in iterative development processes or during software maintenance. A first empirical evaluation of the proposed approach has been obtained by applying our tools to the postrelease evolution of OO software systems available on the net. The systems were analyzed, and models to predict added/modified LOCs from added/modified classes were statistically validated. In the paper preliminary results of the above outlined evaluation is presented.
T
G. Antoniol, G. Canfora, and A. De Lucia University of Sannio, Faculty of Engineering Palazzo Bosco Lucarelli, Piazza Roma I-82100 Benevento, Italy
[email protected] fg.canfora,
[email protected]
1. Introduction
Cost and effort prediction is an important aspect of the management of software projects. Experience shows that accurate prediction is difficult: an average error of 100% may be considered “good” and an average error of 32% “outstanding” [36]. Most methods for predicting effort require an estimate of the size of the software. Once a size estimate is available, models can be used that relate size to effort. Cost estimation is not a one-time activity performed at project early phases. Estimates should be refined continu-
1 http://www.cs.tu-bs.de/softech/ddd/ 2 http://www.mpi-sb.mpg.de/LEDA/
Metrics Extraction
V i C++ Source Code
Code2AOL Translation
Code AOL Specifications AOL Parsing and Similarity Computation
Version
Scores
Comparison
Difference Computation
Code AOL Specifications
Code2AOL Translation
Metrics Extraction
T
V j C++ Source Code
Vi - V j
Figure 1. Version Comparison Process.
2. Software Metrics Extraction: class level as well as function level software metrics are computed;
DR
AF
stantiated in the context of version traceability, individuals become classes, while properties are mapped onto attributes and methods. We adopt a multi-step approach: first a class interface average similarity is derived from classes and attribute/method names and signatures by means of string edit distances [15]. Then, a maximum match algorithm [9] computes the best mapping between releases; finally finer grained information are recovered by differencing method bodies. Based on the recovered mapping and using the diff tool, detailed information is extracted in terms of added, deleted, and modified methods/classes and LOCs. This information is used to build linear models to predict the dependent variable, the size of modified code (i.e., added and modified LOCs), with independent variables that can be obtained from the impact analysis of the change request. Model performance on future observations were assessed by means of a leave-one-out cross validation which guarantees a nearly unbiased estimate of the prediction error. The paper is organized as follows: Section 2 introduces our version traceability recovery process. In Section 3 we describe the case study, while in Section 4 we present the experimental results. In Section 5 we compares our approach to related work. Finally, in Section 6 we draw some preliminary conclusions and outline future work.
2. Recovering Version Traceability
The release comparison process is represented in Figure 1. The process consists of the following activities: 1. AOL Representation Extraction: in this phase an AOL system representation is recovered from code through a Code2AOL extractor;
3. AOL Parsing and Similarity Computation: for any given class in version Vi and any given class in version Vj a similarity weight is assigned; 4. Version Comparison: by means of a maximum flow algorithm an optimum matching is computed;
5. Difference Computation: code of corresponding classes and methods is compared to identify added, deleted, and modified LOC.
In the following subsection we will highlight the key issues of these elements and the related implications.
2.1. AOL Representation Extraction AOL has been designed to capture OO concepts in a formalism independent of programming languages and tools. AOL is a general-purpose design description language, capable of expressing concepts available at the design stage of OO software development. The language resembles other OO design/interface specification languages such as IDL[20, 25] and ODL[21]. More details on AOL can be found in [13, 2]. In a previous paper [1], the AOL representation is used to compare classes and methods at interface level; in this paper we augment the AOL representation by including methods’ code. This is used in the last step of the comparison process (difference computation).
( )
2.2. AOL Parsing and Similarity Computation
X =< x; P (x) >
(1)
where the object X is identified by its unique identifier, x, and P (x) is its finite collection of properties. In general two objects X and Y may possess different properties. Thus, a preliminary step in the definition of a similarity measure between them is the introduction of a mapping between a subset of the properties of X and a subset of the properties of Y . The remaining properties from P x and P y are unmatched properties respectively of X and Y . If similarity of two things is defined as the intersection of the sets of properties, we can immediately derive that two individuals are indistinguishable if and only if they share the same name and possess the same collection of properties. However, as software evolves implementations may deviate due to maintenance intervention: a criterion imposing substantial individuals identity is unnecessary stringent, possibly leading to unsatisfactory results. Therefore, a less stringent similarity of two things was experimented in the present work. More precisely, let Vi and Vj be the compared software versions. For any given class Ei;k < ei;k ; P ei;k > in version Vi and any given class Ej;l < ej;l ; P ej;l > in Vj we introduce a similarity between individual properties as follows:
DR
()
= =
( )
( ) ( )
(x; y ) = s(ei;k ; ej;l ) + (1
) s(x; y )
(2)
[0 1℄
where x 2 P ei;k , y 2 P ej;l , 2 ; is the weight associated with class name matching, and s u; v is the complemented edit distance [9] between strings:
s(u; v ) = 1
( )
d(u; v )=(juj + jv j)
(3)
Once the similarity between each pair of properties is available, an optimum match between Ei;k and Ej;l can be inferred by applying the maximum match algorithm [9] to the bipartite graph in which nodes are respectively properties of Ei;k and Ej;l and edges between nodes are weighted by the similarity score (equation 2). The similarity between Ei;k and Ej;l is defined as the average optimum weight between properties, as computed by the maximum match algorithm.
AF
AOL parsing and similarity computation is a two step process. The first step parses the AOL representation of two software releases and assigns weights to each class property. The second step computes an optimum match between the given classes. Chidamber and Kemerer [8] proposed a representation of substantial individuals (objects) as a finite collection of properties:
()
T
Figure 2. LOCs of DDD releases.
2.3. Version Comparison at the Interface Level
The results of the activities carried out in the previous phase may be again thought of as the definition of a bipartite graph in which nodes are entities and edges between entities exist if an optimum weight was computed. Clearly, each entity of version Vi may be connected to each entity of version Vj . By analyzing several releases of an OO system we noticed that dramatic changes or deep restructuring seldom occurs. Similarities among entities tend to be significant over a given threshold that may depend on the subject system. In a previous paper [1] we have defined a pruning threshold to remove edges that are introduced but that are unlikely to represent a real mapping between entities. Removing these edges may produce a graph with isolated nodes, that represent either items deleted from the old version or items added in the new release. Finally, on the pruned bipartite graph the maximum match algorithm is applied to induce the mapping function between Vi and Vj . The pairs of nodes resulting from this mapping represent items in common in the two releases i.e., items evolved from the old to the new release. We have also demonstrated in
a previous case study that the best results are achieved by and a pruning threshold of 70%. using the weight
= 30%
2.4. Code Difference Computation
of passed parameters, number of operators, of function calls and return points. The matching phase rely on an AOL parser. Once the abstract syntax trees of the compared versions are available, they are traversed and similarities computed. Computed weights are used to build a bipartite graph and are passed to the maximum match algorithm. The AOL parser and the edit distance computation has been implemented in C while the maximum match algorithm has been written in C++. Finally, given the recovered mapping, a script computes the differences in the code of corresponding pairs of methods. For each of such pairs the GNU tool diff is exploited to compare method bodies. The result is summarized as the number of added, deleted, and modified LOC; the number of added/deleted LOC is augmented with LOC of added/deleted classes and methods computed by the metric extractor.
DR
AF
In the previous two steps similarity has been measured on the basis of the string matching between class names, attributes (including attribute types) and method names (including signatures). Thus we are not guaranteed that, if two classes obtained a 100 % of similarity, no modification occurred among them passing from release Vi to Vj . It would be desirable a mechanism to identify method changes at statement level. However, we aim not to highlight minor changes; for example, if comments are added to a chunk of code or the code is indented to increase readability we would like to hide such a detail unless explicitly required. This requirements are achieved in the difference computation phase, where the diff tool is used to discover the differences in the code (excluding comments) of corresponding methods. In particular for these methods we identify the number of added, deleted and modified LOC; we also consider as added/deleted LOC the code of added/deleted classes and methods.
T
Figure 3. Number of Classes of DDD releases.
2.5. Tool Support
The process for comparing software versions shown in Figure 1 has been completely automated. A Code2AOL Extractor module has been developed, to extract the AOL representation from C++ code. Details of the tool can be found in [2]. The Code2AOL Extractor also extracts information about class relationships and class and function level metrics among which the classical class level metrics (number of public, private and protected attributes and methods, number of direct subclasses, number of direct superclasses, LOC, etc). Function level metrics include cyclomatic complexity, number of statements, LOC number, number
3. Case Study
The version comparison approach described in the previous sections has been experimented on a public domain C++ software system. We concentrated our effort on the analysis of the source code of different releases of the Data Display Debugger (DDD), a graphical user interface to GDB and DBX, the popular UNIX debuggers, developed at the Technische Universit¨at Braunschweig, Germany (free software, protected by the GNU general public license and available from http://www.cs.tu-bs.de/softech/ddd/). In particular, in our case study we analyzed and compared 31 different releases, ranging from version 1.4d to version 3.1.3. Figures 2 and 3 show the trend of LOC and classes in the different versions of DDD, respectively. The figures show that DDD evolved considerably from the first to the last version (from 46 to 107 KLOC) although the number of classes did not change very much (from 120 in the first release analyzed to 135 in the last release). We applied the version comparison tool at the interface
Several regression techniques were considered to model the relationships between the size of the changes and metrics about the evolution of OO entities, such as the number of classes with modified interface. A leave-one-out cross-validation procedure [35] was used to measure model performance: each given model was trained on n points of the data set L (sample size was n for the DDD case study) and accuracy tested on the withheld datum. The step was repeated for each point in L and accuracy measures averaged over n. This methodology gives an estimate of future performance on novel data and it is thus indicated in the design of predictive models. Moreover, it enables comparisons among different families of models, or different choices of parameters, or data preprocessing. Here, the model error was estimated as the cross-validation version of the normalized mean squared error (NMSE), which is the mean squared error normalized over the variance of the sample. Let yk be a data point belonging to a set of observations of the dependent variable Y and yk be its estimate, analytically:
1
= 30
^
AF
Once a mapping between methods has been built at the interface level, we applied the diff tool to identify the differences in the body of corresponding methods. LOC were classified either as unchanged, or as modified, deleted from the old method, added to the new method. LOC of added (deleted) classes or methods were added to the final added (deleted) LOC. It is worth noting that generally no classes were deleted passing from one version to the next one, while only few classes were added (at most 3% of total classes) and most of the classes remained unchanged (generally more that 70%). Similarly, the number of deleted LOC was generally low, compared to the number of added and modified LOC; on the other hand, the number of added LOC was greater than the number of modified LOC.
4.1 Empirical data analysis approach
T
level to the different releases of DDD, using the weights 30% for class name matching and 70% for method/attribute matching; we used 70% as the threshold to prune pairs recovered by the tool that are unlikely to represent similar pairs. This means that classes in the new (old) release were considered added (deleted) if they did not match any class in the old (new) release or if they matched some class with similarity value lower than 70%. A class was considered unchanged at the interface level if it matched some class in the previous release with a similarity value of 100%. Reference [1] shows that using these parameters produces an optimal matching, with an error rate less than 2% in the worst case.
These observations about the evolution of classes and LOC of the DDD releases suggested that for this type of system, most of the size of a change was due to change requests that impacted the classes at the interface level (evolution), rather than changes that impacted the classes at the implementation level (maintenance).
DR
Figures 4 shows the trend of the number of modified and added classes resulting from applying our method; the top curve represents the number of modified classes, while the middle and bottom curves represent the number of classes modified at the interface level and the number of added classes. Figure 5 shows the trend of the number of added (top curve) and modified (bottom curve) LOC.
4. Experimental Results
The described approach was applied to study the relationships between the evolution of OO entities, as identified by the traceability process, and the size of changes measured as number of added and modified LOC 3 . 3 LOCs
were measured as the number of non-blank lines, excluding comments and pre-processor directives
Pk2L(yk P NMSE = (yk
=
( )
k2L
y^k )2 y )2
(4)
mean Y is the mean of the observed valwhere y ues in the sample L. Where available, the cross-validation estimates of the standard error of the residuals yk yk and of the r-squared R2 of the fit were also computed. The size of the database suggested the use of models with a reduced number of free parameters. We considered multivariate linear models: Y
^
= b + b X + : : : + b n Xn 0
1
(5)
1
with n at most 2. Moreover, the use of resistant regression techniques was investigated to handle non-obvious outliers and extreme points. Model Added Classes Modified Classes Modified and Added Classes
b1 (b2 ) 1815 309 156 (1075)
p-value 6.0e-12 4.4e-11 2.628e-11
R2 0.80 0.77 0.86
Table 1. DDD Added classes, Modified classes and multivariate model parameters.
4.2 DDD results A preliminary set of experiments were performed to model the size of modified code (i.e., added and modified LOCs) by means of the independent variable(s) number of
added classes and/or number of classes with modified interface. It is worth noting, that these variables can be estimated from the impact analysis of the change request. NMSE 0.40 0.35 0.28
(error)
RSE
R2
1456 1379 1225
2110899 1864648 1482992
0.81 0.77 0.87
28%, meaning that the square error variance is less than half of the sample variance. From another point of view, the model based on the added and modified (in the interface) classes achieves a cross validation average error of 86 %, which can be considered good [36]. We compared Table 2 figures with models based on the number of classes modified both in the interface and in the implementation. The obtained results were consistently poorer then the models based on the number of classes with modified interface. This can be explained considering that for the DDD case study the number of added LOC is consistently higher than the number of modified LOC; indeed, changes in the interface of a class, likely induce modifications in its body and in other classes. Robust regression techniques were also investigated to handle non-obvious outliers. The applied robust fit uses Huber’s M-estimator and initially it uses the median absolute deviation scale estimate based on the residuals. The estimates obtained for the most promising model, i.e., the multivariate model without intercept, improved (NMSE of 24 %, error 1136.132); this also supports the hypothesis that actually influential points and/our outliers may be comprised in the data set.
AF
Model Added Classes Modified Classes Added and Modified Classes
T
Figure 4. Number of Classes Modified (top curve), Modified at Interface Level (middle curve), and Added (bottom curve) of DDD releases.
Table 2. DDD Added classes, Modified classes and multivariate model cross validation performances.
DR
A point of concern is whether an intercept term b0 should be included in the model. It may be reasonable to suppose the existence of support code not directly related to the modification being counted. However, we do not have any statistical evidence to believe that on DDD data an intercept value is needed (intercept p-value 0.2, 0.258, 0.4), thus it will not be considered in the following. As shown in Table 1 a multivariate model accounting for classes with modified interface (b1 ) and added classes (b2 ) seems to better explain the data. To assess prediction error on future observations, the three models were compared with cross validation. Table 2 reports cross validation results: best predictive capability is achieved when the multivariate model is considered. In Table 2 RSE is the mean residual square error, i.e., the total squared difference between the predictions and the observations for the given model averaged over the number of observations. Of course, we can not rely on the hypothesis that the dependent variable has a Gaussian distribution (the size of a change can not be negative), nevertheless, data of Table 2 clearly demonstrates that the multivariate model is to be preferred. The results summarized in Table 2 are encouraging: with two independent variables we obtain an NMSE of
(
)
Model Methods Methods and Added Classes
b0 (p-value) -364 (0.00309)
b1 (b2 ) 26
Model R2 0.9558
-318 (0.00586)
23 (306)
0.9617
Table 3. DDD Added classes and Modified Methods Model Parameters. Finally, we made an experiment considering the number of modified methods as independent variable, as shown in Tab 3. A simple model based only on the number of modified method outperforms the previous one; it is further improved if a multivariate model including added classes is
Model Modified Methods Added Classes and Modified Methods
NMSE 0.05 0.05
(error)
R2
532
0.96
546
0.96
Table 4. DDD Added classes, Modified Methods cross validation performances.
5. Related Work
elements with the same name but with different identifiers, are regarded as two different elements in a comparison. While the visual differencing tool offers powerful visualization capabilities it does not integrate a modified LOC prediction mechanism. On the contrary, our approach focuses on predicting the impact of a maintenance intervention. The problem of estimating the effort and costs of software projects has long been recognized as a key to successful software management and several authors have proposed estimation methods and tools [4, 37]. Notable examples include analog models [30, 31], that are based on the comparison of a proposed software project with one or more previous projects carried out in the same organization, and algorithmic models [4, 26, 27]. Expert judgment is also a widespread practice, possibly with the help of a systematic approach to combine the opinions of experts, such as Delphi [16, 28]. Most of these methods focus on the initial development stage of a system and do not tackle the problem of predicting effort and costs for post-release maintenance activities. Stensrud and Myrtveit in [34] demonstrated that the human performance are increased when a tool based either on analogy or on regression models is available. Clearly further work is needed to verify if the hypothesis validated in [34] with regards to software development could be generalized to maintenance activities. We deals with evolving systems and provide a means for predicting the size of changes, in terms of LOC, from the estimated number of added/modified classes. In addition, most estimation methods work fine for systems built with traditional software development practices and may be inadequate when applied to OO software systems. Our method is specific to OO systems and exploits traceability links both at the interface level and the level of local changes. Boehm [4] presents one of the first approaches to estimate maintenance effort; he relates the annual maintenance costs to the effort of initially developing the system and the annual change traffic, which is an estimate of size changes
AF
considered. Notice that, for this models the intercept is statistically significant, as shown in Table 3; as the models’ p-value is zero, it is not reported in the table. Cross validation, see Tab 4, clearly demonstrates that a simple model suffices: extremely good predictions can be obtained based only on the number of modified methods. Unfortunately, a prediction of the number of methods impacted by a maintenance request is much more difficult to obtain than the number of impacted classes.
T
Figure 5. Number of Added (top curve) and Modified (bottom curve) LOCs of DDD releases.
DR
Few approaches and systems have been presented to deal with the problem of building and maintaining traceability links either between design and code or among software releases [17, 13, 22, 24, 29, 10, 11]. This paper is an evolution of [1] and is actually similar to the works [13, 3, 17]. We share with [13, 3] the general idea and the approach: we both rely on Bunge’s ontology and adopt an intermediate representation to build traceability links. This work can be regarded as a natural evolution of [13]. Actually, [13, 3] can be considered the first step of the process in which a design to code mapping is recovered; subsequent steps are carried out with the approach presented here. Furthermore, we also advocate different and more flexible similarity measures, and a new matching algorithm that also adopts a pruning threshold to avoid false matching. The Rational Rose suite provides [10, 11] a mechanism to visual differencing two models. Each element of a model has an unique identifier that is exploited during comparison. In other words, as the manual state [11]: two models
Version KLOC Classes Methods Attributes
2.1.1 35 69 1649 201
3.0 34 109 2388 245
3.1.2 61 176 3519 346
3.2.3 69 178 3695 371
3.4 95 208 4967 510
3.4.1 100 211 5104 543
3.4.2 111 210 5197 589
3.5.2 123 235 6124 740
3.7.1 153 410 10260 1177
Table 5. LEDA main features.
Model Added Classes Modified Classes Modified and Added Classes
R2
b0 (p-value) 5718 (0.026 ) 12419 (0.118)
b1 (b2 )
110 -38
p-value 0.008 0.75
0.72 0.017
5645 (0.2590)
1 (110)
0.042
0.71
Table 6. LEDA Added classes, Modified classes and multivariate model performances.
6. Conclusions
T
which variables have to be included in the model: on the DDD systems, given the detailed information available, best results were obtained using a multivariate model with two independent variables, namely the number of added and modified (in the interface) classes. In particular, we obtained an NMSE of 28 %, meaning that the square error variance is less than half of the sample variance. The experiment performed on DDD were replicated on the LEDA library (see Table 5 for version characteristics). As shown in Table 6, the model based on added classes has a very good p-value; it also exhibit a high R2 and there is no statistical evidence that the model really needs the intercept coefficient (b0 p-values is 0.02). Table 6 demonstrates with a strong statistical evidence that there is no relation between modified classes and the size of a change (model p-value 0.75). This could be explained by the fact that we do not have the entire change request history; hence the main influencing factor is the code added with new classes rather than the code modified. This hypothesis is further supported by the high number of added classes going from one available LEDA version to the next one (see Table 5). Furthermore, the predictive performance in terms of added classes measured as NMSE and average R2 are quite good (0.45 and 0.67, respectively) with a mean relative error of 62%; in other words, in this case the model was able to predict freshly developed code, rather than modifications due to maintenance.
AF
in a typical year. Granja-Alvarez and Barranco-Garcia [14] propose an extension of Boehm’s COCOMO model by incorporating indices to measure the maintainability of the system. Sneed develops the SOFTCALC model to estimate maintenance costs from an adjusted size measure [33]. Depending on the language in which the system is coded, size can be expressed either as LOC, function points [18], and adaptation/extensions of function points to OO software [32, 23, 7]. Jorgensen [19] develops eleven different models to estimate software maintenance effort and comparatively apply them on industrial data to asses accuracy. Most of these methods assume the existence of a size estimate as a starting point for predicting effort and costs. Our work is complementary as we provide a means for deriving the needed size estimate.
DR
In this paper we have presented an approach to predict the sizes of changes (in terms of added and modified LOCs) of evolving OO software systems, based on an estimate of the number of entities impacted by a change request. The approach has been empirically evaluated by analyzing the relationships between the number of added/modified LOCs and the number of added/modified classes on 31 versions of DDD, a public domain software system. The comparison between subsequent versions of DDD is based on a traceability link recovery process that abstracts an intermediate representation from code and exploits string edit distance and similarity measures to identify a mapping between OO entities. The preliminary results of the experiment validate the hypothesis of using the number of added and modified classes as independent variables. A point of concern is whether or not multivariate models have to be adopted, and
7. Acknowledgements We wish to thank Dr. Andreas Zeller, Technische Universit¨at Braunschweig Abteilung Softwaretechnologie B¨ultenweg 88 D-38092 Braunschweig, Germany, that kindly provided the 31 releases of the DDD software.
References [1] G. Antoniol, G. Canfora, and A. D. Lucia. Mantaining traceability during object-oriented software evolution: a case study. In Proceedings of IEEE International Conference on Software Maintenance, Oxford, UK, IEEE CS Press, 1999. [2] G. Antoniol, R. Fiutem, and L. Cristoforetti. Using metrics to identify design patterns in object-oriented software.
[5] [6] [7]
[8]
[9] [10] [11] [12] [13]
[14]
[15] [16]
DR
[17]
[18]
[19]
[20]
[21]
[22]
T
[4]
[23] A. Minkiewicz. Measuring object-oriented software with predictive object points. In Proceedings of 8th European Software Control and Metrics Conference,Atlanta, Atlanta, May 1997. [24] G. C. Murphy, D. Notkin, and K. Sullivan. Software reflexion models: Bridging the gap between source and high-level models. In Proceedings of the Third ACM Symposium on the Foundations of Software Engineering, 1995. [25] OMG. The Common Object Request Broker: Architecture and specification. OMG Document 91.12.1, OMG, December 1991. [26] L. Putnam. A general empirical solution to the macro software sizing and estimation problem. IEEE Transactions on Software Engineering, 4(4):345–361, July 1978. [27] L. H. Putnam. SLIM System Description. Quantitative Software Management, 1980. [28] R. F. Scott and D. B. Simmons. Programmer productivity and the delphi technique. Datamation, pages 71–73, May 1974. [29] M. Sefika, A. Sane, and R. H. Campbell. Monitoring compliance of a software system with its high-level design models. In Proceedings of the International Conference on Software Engineering, pages 387–396, 1996. [30] M. Shepperd and B. Kitchenham. Effort estimation using analogy. Proceedings of the International Conference on Software Engineering, pages 170–178, 1996. [31] M. Shepperd and C. Scholfield. Estimating software project using analogy. IEEE Transactions on Software Engineering, 23(12):736–743, Nov 1997. [32] H. Sneed. Estimating the Costs of Object-Oriented Software. In Proceedings of Software Cost Estimation Seminar, 1995. [33] H. M. Sneed. Estimating the cost of software maintenance tasks. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press, pages 168–180, Opio, Nice, Oct. 1995. IEEE Press. [34] E. Stensrud and I. Myrtveit. Human performance estimating with analogy and regression models: an empirical validation. In Proc. of the Fifth International Symposium on Software Metrics - METRICS98, pages 205–213, Nov 2-5 1998. [35] M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36:111–147, 1974. [36] S. Vicinanza, T. Mukhopadhyay, and M. Prietula. Softwareeffort estimation: an exploratory study of expert performance. Information Systems Research, 2(4):243–262, Dec. 1991. [37] F. Wellman. Software Costing. Prentice-Hall, Englewood Cliffs, NJ, 1992.
AF
[3]
In Proc. of the Fifth International Symposium on Software Metrics - METRICS98, pages 23–34, Nov 2-5 1998. G. Antoniol, A. Potrich, P. Tonella, and R. Fiutem. Evolving object oriented design to improve code traceability. To appear in Proceedings of the 7 th Workshop on Program Comprehension, May 1999. B. W. Boehm. Software Engineering Economics. PrenticeHall, Englewood Cliffs, NJ, 1981. M. Bunge. Treatise on Basic Philosophy: Vol. 3: Onthology I: The Furniture of the World. Reidel, Boston, MA, 1977. M. Bunge. Treatise on Basic Philosophy: Vol. 4: Onthology II: A World of Systems. Reidel, Boston, MA, 1979. G. Caldiera, G. Antoniol, R. Fiutem, and C. Lokan. A definition and experimental evaluation of function points for object-oriented systems. In Proc. of the Fifth International Symposium on Software Metrics - METRICS98, pages 167– 178, Nov 2-5 1998. S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20(6):476–493, June 1994. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introductions to Algorithms. MIT Press, 1990. R. S. Corporation. Rational Rose/C++ Manuals, Version 4.0. 1997. R. S. Corporation. Rational Rose 98: Using Rational Rose. feb 1998. T. DeMarco. Controlling Software Projects. Yourdon Press, 1982. R. Fiutem and G. Antoniol. Identifying design-code inconsistencies in object-oriented software: A case study. In Proceedings of the International Conference on Software Maintenance - IEEE Computer Society Press, pages 94–102, Bethesda, Maryland, November 1998. J. C. Granja-Alvarez and M. J. Barraco-Garcia’. A methods for estimating maintenance cost in a software project: a case study. Software Maintenance Research and Practice, 9:161– 175, 1997. D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York, 1997. O. Helmer. Social Technology. Basic Books, New York, 1966. R. Holt and J. Y. Pak. Gase: Visualizing software evolutionin-the-large. In Proceedings of the Working Conference on Reverse Engineering, pages 163–166, Monterey, 1996. IFPUG. Function Point Counting Practices Manual, Release 4.0. International Function Point Users Group, Westerville, Ohio, 1994. M. Jorgensen. Experience with the accuracy of software maintenace task effort prediction models. IEEE Transactions on Software Engineering, 21(8):674–681, 1996. D. A. Lamb. Idl: Sharing intermediate representations. ACM Transactions on Programming Languages and Systems, 9(3):297–318, July 1987. D. Lea and C. K. Shank. Odl: Language report. Technical Report Draft 5, Rochester Institute of Technology, Nov 1994. S. Meyers, C. K. Duby, and S. P. Reiss. Constraining the structure and style of object-oriented programs. Technical Report CS-93-12, Brown University, 1993.