The rest of the paper is organized as follows: in the .... could be to start data collection using the standard factors ... On starting PR an estimation model Md0.
Software Renewal Projects Estimation Using Dynamic Calibration M.T. Baldassarre, D. Caivano, G. Visaggio Dipartimento di Informatica – Università di Bari - Via Orabona, 4, 70126 Bari – Italy {baldassarre, caivano ,visaggio}@di.uniba.it Abstract Effort estimation is a long faced problem, but, in spite of the amount of research spent in this field, it still remains an open issue in the software engineering community. This is true especially in the case of renewal of legacy systems, where the current and well established approaches also fail. This paper presents an application of the method named Dynamic Calibration for effort estimation of renewal projects together with its experimental validation. The approach satisfies all the requirements of the estimation models. The results obtained applying dynamic calibration are compared with those obtained with a competitor method: estimation by analogy. The results are shown to be promising although further experimentation on field is needed.
1. Introduction Interest in the development of software cost estimation models is supported by a wide range of literature on this subject, such as [23,5,24,25,26,27,12]. Many developed models work well on standard, capable and well established processes. Numerous methods use the project as the basic component for collecting experience for model calibration. Nevertheless, there are still many critical issues to face due to the concept of Process Diversity [18] which implies diversity in estimation models [19]. Therefore an estimation model, in general, should at least have the following essential requirements: • The process variables are specific to the organization carrying out the project, so the estimation model should use the available metrics and should be calibrated to the local data in order to obtain adequate predictions of the process characteristics [1] and usage context [16]. This is our opinion. Nevertheless, there is an open discussion on the use of local vs multi-company data [40, 41] that is far from producing objective results. • During project execution its processes will undergo frequent changes. Some occur without any external initiative, such as the increasing maturity of the developers. Others are caused by operations made during execution aiming to continuously improve the process [14, 17]. In the first case the estimation model must help point out when a change in the process has occurred so that this can be adequately controlled in terms of staffing, budget outlay and scheduling. In the second case the estimation model must be sensitive to the changes caused by the improvement initiatives to assess how beneficial they are [9]. In both cases the project manager must be fully aware of all the
independent variables of the estimation models and of their behavior. • Process changes, during project execution, impact on the parameters and drivers determined at the beginning of the project for customizing the model to the project and context. Thus the estimation model in use needs to be calibrated on the project parts having the smallest grain of the whole project. Renewal of legacy system projects [13,21,22], as other innovative ones, are not covered by accurate estimation models [10,15]. A renewal process has more characteristics that imply further estimation model requirements. In fact, it starts by using more detailed software components (i.e. code) than those usually used in the development processes (system requirements and analysis documents) and reconstructs high level components such as system architecture. Furthermore it includes activities depending on both: legacy system quality parameters and improvement goals of the same parameters [47,48]. Following, every renewal project is based upon a diverse process and often different parts of the legacy system are subjected to different processes [49]. In the scientific and industrial communities, the renewal processes have a briefer history and experience gained than the development and ordinary maintenance ones. For all these reasons the renewal projects have greater risks involved in terms of staffing, scheduling and costs. Therefore, intense monitoring during project execution must be carried out in order to manage the risks of underestimation and overestimation. Also, for the renewal processes the following requirements must be added to those previously listed: • estimations must be based on code quality metrics whose values are known at the beginning of the project. • estimation must be executed on one project chunk at a time. A project chunk consists in the execution of the renewal process on a small legacy grain. The dimension of the grain depends from the renewal process in use. To face process variability, some authors suggest to calibrate the estimation models and adapt them to the organization carrying out the process [1, 2, 12, 11]. The estimation models must be calibrated by small historical series, consisting of the data collected between two considerable changes in the process during execution of the same project. This paper presents a method for generating estimation models adequate to renewal projects. It satisfies all the fundamental requirements previously listed.
The method is empirically validated by means of a post mortem analysis of the data collected during execution of a renewal process on an industrial legacy system. The validation is based on analysis of how accurate the effort estimation would have been, if estimation models created with the method presented in this work had been used. The rest of the paper is organized as follows: in the following section the related works are presented. In section 3 the Dynamic Calibration approach is illustrated. Section 4 presents the empirical study in which the approach is validated on an industrial renewal project. In section 5 a comparison between DC and estimation by analogy approach is made. Finally, in section 6 conclusions are drawn.
2. Related work The need to be able to refer to estimation models at smaller grains than that of the entire project is investigated in some papers: • the grain of the cost estimation model must be consistent with the available data supporting the estimate [8]; • it is best to carry out stepwise calibration over time, to reduce the estimation error of the prediction model adopted [2]. This contrasts with many important research studies concerning the effort prediction in software development that identify a project as a grain (i.e. they use many projects as data points for models calibration). Research directions in this field can be broadly classified as: Model based; Learning Oriented; Expertise Based; Composite. Well known examples in the model based category are COCOMO [5] and SLIM [28]. All the models belonging to this category, refer to development processes for building a software system ex novo [13] [21] [22] and thus they are difficultly adaptable to renewal processes. These techniques need to collect what is required as input to the model without other project data. It has as a tacit requirement a standard process model. In fact an estimation model constructed from data drawn from one environment and transferred to another with different working practices, problem domains, development techniques etc. will have poor performance [16,29,30]. COCOMO or COCOMOII considers a set of corrective parameters for model calibration [8] and the recalibration is often effective [31,32]. However, even when the predictors and corrective parameters are adequately estimated at the beginning of the project, their values tend to change during execution as changes to the process and relative process variables are introduced. Hence, even a good initial estimation of the predictors will not prevent estimation errors during project execution. In [16] Kremerer reports that the difference between predicted and actual project effort was over 600% in his
independent study. This problem is more critical in the case of renewal projects, where the methods, tools an practices used, are often unfamiliar to the developers and also differ deeply from one project to another. Thus: • developers learn a lot during project execution. This influences the time for executing an activity becoming progressively shorter due to maturation effect; • it is not possible to reuse experience acquired during past projects due to their diversity. Furthermore, the parameters used by estimation models such as COCOMO are not enough to capture the diversity of the projects. Methods such as linear regression procedures are often used to develop simple, prediction systems. Here historical data are needed, to define the model and assess its accuracy. An example is MERMAID. It calibrates the model to the user environment with local data [33]. Such approaches cannot be calibrated on project parts. The learning oriented methods include neural nets, case based reasoning, rule induction and neuron fuzzy systems. They are inductive learning techniques and require accurate data for training and validation purposes. Experimental results suggest that neural net approaches offer improved accuracy versus model based approaches [34,42]. In [41] the same authors report about the use of back propagation learning algorithms on a multiplayer perception for predicting development effort, resulting in an error rate of 29%. The datasets were large (86, for training, and 136 projects, for validation). In [43] the use of neural net with a back propagation learning algorithm, is reported, with an error of 70%. The experimentation pointed out that neural nets seem to require large amounts of data, 50 or more cases [35]. Another learning based approach is case base reasoning (CBR). A case is a problem that has been solved; typically a project in our case. Human experience is widely used for making predictions. Although there is no strict requirement for systematic historical data, estimators frequently use remembered analogies when possible [38]. Unfortunately there is relatively little research in this area. The composite approach incorporates a combination of two or more techniques to formulate the most appropriate functional form estimation. The Bayesian-COCOMO II model has a good reputation [12]. A distinctive feature of the Bayesian approach is that it allows the investigator to use both sample (data) and prior (expert-judgment) information in a logically consistent manner in making inferences. According to the work, this model inherits all the characteristics of COCOMO II, although it enables the management to address the decision more adequately when the data are scarce and incomplete. Another composite approach is the Sparse Data Method (SDM) [27]. It is based upon a multi-criteria decision making technique known as Analytic Hierarchy Process AHP
[39] which hierarchically represents the problem by decomposing it into smaller, more meaningful chunks. The experiment in [27] suggests that expert judgment offers a stronger basis for prediction than possibly incomplete data which may fail to capture all relevant factors. Both approaches still rely upon an expert. Clearly, if the estimator has no knowledge of the project requiring prediction, any prediction becomes highly risky. The expert judgment is difficult to validate. Unfortunately these methods described also require process data collection for different projects. This is a weakness in general, but it is more burden in renewal projects than others due to the difficulty in acquiring and transferring experiences between different projects. The renewal projects’ innovativeness results in a lack of efficacies of the expert’s judgment, differently from other cases. Some authors have suggested to use multi organizational databases to get round this problem. The experience in the use of public domain metrics in [40] has left a number of unanswered questions. The most relevant for this paper is: how can an organization determine if the multi company dataset will be descriptive of their type of operation? It is clear that an organization will need to establish firstly whether they are a “like” organization in public data set. A recent research [41] suggests, among other things, that an effective strategy for a company could be to start data collection using the standard factors and by collecting factors specific to the organization. Thus, the benefits of the global data could be used, as well as considering organizational characteristics. An attractive approach is that of analogies [37], because, as our proposed approach, it requires variables capturing the information available when prediction is required. Consequently, the estimation can be calibrated on the project part having as small a grain as the available information allows, i.e. specific project chunks. It can work using the available metrics without imposing new ones. The performance of the analogy approach will be related to the number, relevance and quality of past projects stored in the case base. For example the sensitivity analysis in [36] suggests a need for at least 15 cases and [37] suggests a minimum of 10-15 projects. This approach, due to its characteristics similar to the proposed one, will be further investigated in the paper.
3. The Dynamic Calibration In this work a project PR is considered as the execution of a process P where: PR is a sequential set {PR1, PR2… PRk} of k subprojects or project components; P is a set { P1, P2… Ph} of h sub processes or process components; The generic component PRi ∈ PR consists of ordered repetition n times, with n≥1, of the same process component Pi. A sub process Pi∈P is associated to each
subproject PRi∈PR, thus a generic PRi is said to be based on process Pi and PR is said to be based on P. To avoid making the formalism introduced here more awkward, since the method described treats all project components in the same way, what has been said of one component applies equally to all the other parts of the same project. The following general statements can thus be made: PR indicates a generic part of the project rather than the whole project; P indicates a generic process component rather than the whole process. Accordingly, PR can be seen as the set {P1, P2,… Pn} where Pk is the kth repetition of P. Each repetition must be independent of the others, i.e.the events that occur during a repetition do not affect any of the other repetitions and the input of a generic repetition Pk is of the same type as the other repetitions belonging to PR but is not output of any of the repetitions required to conclude PR. The repetition index indicates the starting sequence i.e. Pm starts after Pk⇔m>k. A sequence has no relationship with the end of each repetition; Pk starts before Pm but may not finish before it. Pk is the process grain on which the estimation model is calibrated. {P1, P2,… Pn} is the historical series on which calibration is made. To define and calibrate the function estimating the effort for generic project component PR, a subset of metrics among those used to measure P is chosen as parameter. The metrics must be measurable or quantifiable before starting to execute any Pk. Before starting the process, the measurable metrics are those that can be observed from the input products, while the quantifiable metrics are used to anchor the effort; their values are fixed as the target before beginning process execution. The metrics that have these characteristics for P are: M={M1,M2,…Mm}. ∀Pi metrics M will assume the values M(Pi)={mi1,mi2,…,mim}. M is independent of the repetition but characteristic of process P. The estimation model consists of two functions: a main function for effort estimation and an error estimation function generated from it. This model can estimate the effort and adjust the value to ensure greater accuracy.
3.1. Definition and Calibration of the Estimation Model. The steps for defining and calibrating the method are briefly described below: 1. Start. On starting PR an estimation model Md0 of the Expected Effort (Ee) is used, called the baseline model: ∀Pi: Eei=Md0(M(Pi)). As seen below, the generic ith version of the estimation model will consist of two functions: Fe, mandatory, main function for estimating the effort; Fc, optional function for estimating the error and adjusting the estimation obtained with the main function. Md0 could be inherited from a previous project,
or be a model based on expert’s experience, or chosen among others present in literature. 2. Calculation of the error. At the end of the generic repetition Pi, the Actual Effort required for its execution will be known (Eai). Eai yields the prediction error: ERi= (Eai – Eei)/(Eai) (1). 3. Recording data point. At the end of each repetition Pi the data point of repetition ima : Di={M(Pi), Eei,Eai) is recorded. The set of data points acts as a historical data series. 4. Mobile Learning Set .When error |ERi|>Tr, Tr being the maximum error threshold established by the project manager, the current estimation model Mdk-1 must be controlled for inaccuracy. If the error overcomes the threshold for n times, the estimation model must be recalibrated using the new learning set Lsk =(Di, Di+1,…Di+n). For example: if Md0 is inaccurate for the first n repetitions of P, the first learning set Ls1=(D1,D2,…,Dn) must be collected. The cardinality, n, of generic Lsi can be established according to the independent variables in Fei. 5. Calibration of the estimation model. Using Lsk a new estimation model is obtained: 5.1. Analysis of the main components. If the set M is very numerous it will need to be reduced by extracting the main components. Analysis of these is the easiest way to do it [4], and could result in replacing the elements of M with a reduced number of elements, each of which would be a linear combination of the existing metrics. To keep it simple, we shall go on calling the set of main components M. 5.2. Adaptation of the estimation functions. Among all the metrics present in M those best correlated with the values of Eai in the data points of Lsk are selected. We shall call these Me. Using Me as parameters, a new estimation function is obtained (Fek) by means of the techniques described in the following section. If no metric can meaningfully account for the effort values, then the metrics in M do not characterize the effort required to carry out the process and new metrics must be introduced 6. Updating the data points in the learning set. ∀Di∈Lsk calculates the new Eei=Fek(Me(Pi)). As the Eai are known, the ERi can be recalculated with (1) . 7. Defining the corrective function. Among the metrics left in M after elimination of Me, those that best account for the errors generated by the new estimation function Fei must be identified. They are called Mc and are the estimation corrective factors. The error estimation function is obtained by ERi=Fci(Mc (Pi)); which then yields [Eei]adj=Eai/(1-ERi). [Eei]adj is the definitive prediction from which the new error [ERi]adj=(Eai– [Eei]adj)/(Eai) can be calculated. If no metric among the (M–Me ) can meaningfully account for the error values, then the corresponding Fc cannot be obtained and [Eei]adj
will be equal to Ee. When data points are added to the historical file of the project the analysis can be redone with a more extensive learning set than the one used to determine the econometric model in use. 8. Continuation of the project. After calibrating the model with a new Mdk composed of (Fek, Fck) we go back to step 2 until all the repetitions of the basic process component have been done. At the end, Mdk will be inherited for a successive project based on a similar or identical process, as judged by the project managers.
3.2 Considerations on the Method. In this section, some relevant aspects of the DC are discussed in order to make them more clear and understandable. To divide a renewal project PR into a set of repeatable sub projects Pi, the product to renew needs to be broken down into smaller and smaller components until they can be independently renewed. As in [5], a software Work Breakdown Structure consists of two hierarchies, one representing the software product and the other representing the activities needed to renew that product. This approach allows to identify the sub process for renewing each component of the legacy system. The error induced by the effort estimation function directly impacts on the staffing and project budgeting accuracy. Thus, the interpretation of ER strongly influences the effectiveness of project management. Using (1) ER is calculated in percentage points rather than real value. For instance, if it is +/- 0.3 then the predicted effort was 30% more/less than the real value. Both overestimation and underestimation can harm project management. Therefore, estimation accuracy is essential to reduce the risks of a project, and the acceptable error threshold depends on how critical the part of the project is compared to the whole. During project execution, if the difference in ER values is not statistically significant then the estimation model uses parameters that cannot discern changes in the process variables. On the other hand, if the variance of the ER is significant then the parameters show that a process variable is changing for the better/worse, since the real effort necessary is less/greater than the expected one according to the current estimation model. ER reveals changes in the process during project execution due to both any worsening or improvement (made or occurred) of the process. In this sense, DC supports continuous process improvements. In both cases the estimation model should be re-calibrated in order to conform to the changes. The econometric models are sensitive to the process’ maturity, and as it matures, less calibration of the models will be needed. Before scrapping the estimation model, it must fail (|ERi|>Tr) n times so that the data points corresponding to the repetitions where the error occurs (Di, Di+1, … Di+n) become a learning set. The project manager must decide the cardinality of the learning set. Although there is no validated rule for
defining an appropriate cardinality, a guideline is to consider at least 15 data points per variable in the estimation function [44]. Until the data points making up the new learning set have been collected, the estimation will suffer from serious errors negatively affecting the project. The cardinality of the learning set derives from a compromise between the need to limit the prediction error and to ensure statistical significance of the data. If the cardinality of the learning set Lsk is based on the expected number n of estimation function parameters, the estimation function may have m>n parameters. In this case, to obtain a more reliable estimation model, it is best to extend the learning set to include 10 to 15 times m data points. For all the data points extending the learning set, if Mdk-1 were used, there would be a high risk of inaccurate estimation because |ER| would probably be higher than the acceptable threshold for the project. To limit the damage, it is advisable to use Fck during extension of Lsk, which can also be derived from the errors Fek makes on the data points belonging to Lsk, that have been estimated by Fek-1. In this way, the estimation model obtained using the extended Lsk will be more accurate in that it uses more data points to derive the new effort estimation function. The cardinality and the elements of M must be proportional to the measurement costs. In particular, the metrics included in M must be: able to express the desired project quality factors; automatically measurable; interpretable by the developers. In other words, M must be specific to each project. If the usable estimation model metrics need to be reduced, analysis of the main components can be performed. If M doesn’t work well, the grain of PR, i.e. the process on which it is based on and the associated metric plan M can be changed in order to achieve more meaningful metrics for the process variables. The lists of parameters published in the literature could be used as reference, e.g. [6,7,5]. The estimation function must be expressed in an easily interpretable fashion to ensure that its results help understand the process and the causes of any deficiencies. Furthermore complicated methods do not necessarily produce more accurate estimations [20]. It is best to choose one of the following schemes, listed in order of preference: a) Linear: Ee= b+a*m where b expresses the part of the effort that is not sensitive to the predictive variable(s); b)·Multi-Linear with many parameters: Ee=b+a1*m1+a2*m2+…+ an*mn; ·c) Non linear: Ee=b + ma In the project process there are some activities or parts that do not depend on the characteristics of the input or output products measured by the metric plan. It is relatively easy to identify these activities if the process is well described. Instead, {a1,a2,…,an} represent the contributions to the effort of their respective parameters. The absolute value of ai expresses the effect mi has on the effort. Many non linear functions can be used and there is
ample description of them in literature. One of the best known is the one used in [5]. The DC method proposed does not exclude other techniques for deriving estimation functions but it advises multiple forward stepwise regression. This procedure includes one (the most significant) independent variable at a time into a linear model. Its effect is removed transforming the dependent variable into a residual one. Then the impact of each remaining independent variable on the residual is assessed to identify the next variable to include in the model. The analysis is repeated until all significant variable are found. As non linear functions are difficult to interpret, their use is advised only when linear regression does not provide a sufficiently accurate estimate. The estimation corrective function is derived with the same statistical analysis procedure as the effort estimation function. If even non linear functions were to fail to yield satisfactory results, then the metrics available do not include the main factors affecting the effort for executing the process [3].
4. The Empirical Study The legacy system involved in DC validation is an aged banking system. In order to execute the renewal project PR a specifically designed renewal process P was set up. It is made up of two sub processes: reverse engineering (PRE) and restoration (PREST). Greater detail on these two types of processes can be found in [21,22]. The activities involved in the renewal process P are detailed in Table 1. At the start of the project, the two types of processes had not been defined; they were then rigorously characterized according to their conceptual meaning: reverse engineering aims to rebuild the system documentation without altering the structure of the programs; restoration aims to restructure the system taking into account the semantics of the restructured data and procedures, but without altering its architecture. After distinguishing the activities it was evident that activity 1 is preliminary and must be carried out whether one or both processes are to be executed; the other activities were assigned to the processes as follows: activities 2,3,4,5,6,9,10,11,12 to PRE and 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 to PREST. Activities 2,3,4,5,6,7 treat data and are outside the scope of this work. Thus the correspondent metrics are here discarded. For what concerns the DC approach, a program can be seen as a subproject/project chunk PRi∈PR. Then each PRi is based upon PRE or PREST (from here on RE and REST for clarity). During project execution, the process to apply to each program must be decided according to the desired quality improvement. The history of the program shows a low frequency of changes, for instance, then it does not need much quality improvement. Its contribution to the system maintainability will be low and should only be subject to reverse engineering. On the other hand, if the data show a high number of changes
and maintenance requests are always demanding, unreliable and time-consuming, then it is the case for restoration. A total of 289 programs were subjected to RE and 349 to REST. Renewal of the whole legacy system was carried out by a team of four developers, and changed according to the tasks. All the developers involved in the renewal process had comparable skills and knowledge of the process model, techniques and tools used. Initially, the application domain was familiar only to the project manager, whereas it was then acquired by developers thanks to repeated execution of the renewal process on various programs in the system. The analyzed data were collected daily by the developers and checked weekly. Data accuracy was monitored during the project because the data were also used to check the quality improvements made and their costs. The following data were collected for each program and represent M in the DC approach: • NLOC, number of lines of code in the procedure division, excluding lines of comment and blank spaces; if a statement extends over several lines of listing it is considered as a single instruction; • NLOCD, number of lines of code in the data division, calculated with the same criteria as for NLOC • McCABE, cyclomatic complexity, that is the number of decision points in a program minus one [45] • HALSTEAD complexity [45] • NMODI, number of internal modules, i.e. sections and paragraphs plus the external modules before the renewal process. • NMOD, number of external modules after the renewal process • McCABE-AV, mean cyclomatic complexity of the remaining modules after the renewal process • COMPL-GA, gain in complexity per program, derived from (McCABE - McCABE-AV) • EFFORT spent on the renewal process, in man hours; • ID-DEVELOPER : developer’s ID.
NMOD and McCABE-AV deal with REST process. In RE, no modules were extracted and the program structure remained unchanged after process execution. These factors cannot be measured on the program in input to REST; however, their values were established by the project manager at the beginning and used as anchor factors. The project manager decided that the restoration could be stopped once the program reached the assigned values for these two metrics. Note that the data collection and metric plan definition occurred some years ago for purposes that differ from those related to this work. For the approach validation we use these data because they are available, although DC can potentially work with any other data.
4.1 Data Analysis A first useful analysis is the trend of the developers’ performance in reverse engineering and restoration, shown in Figure 1 and Figure 2 respectively. In both figures, the x-axis refers to the time sequence the programs were subject to either RE or REST. The y-axis reports the performances expressed in LOC/hour for each renewed program. The time sequence depends on how critical the programs were to the legacy system execution; each
program to be renewed was subjected to RE or REST, according to the project manager’s decision. Thus, the trend shown in the graphs occurred in the same period for both types of process and was due to continuous improvement of each. The programs subject to REST start from n. 290 only in order to make it clear according to the program number which renewal process was used. All the activities differed from those normally carried out by the organization executing the renewal to build and maintain software systems. Moreover, there are no previous experiences of these activities reported in literature, since the other renewal processes described had different aims and activities. Thus, the two processes are innovative. Therefore, to control the project risks, they required consistent improvements. The macroscopic improvements made were: 1. From the start of the project and during T1, it was necessary to improve the characterization of the two processes and refine the decision model to determine which process to carry out for each program. This resulted in a remarkable difference in process performance for both projects; 2. In T2, the tools for the two processes were adapted; in some cases simple tools that were not available on the market had to be built; 3. In T3, improvements had to be made in the formalization of the reading procedures to check the programs transformed during and after process execution. The operations carried out on the processes made them unstable. As can be seen in the graphs in Figure 1and Figure 2, the process performances vary deeply during project execution. Thus, any prediction model would be inaccurate if it were used for the whole project and so dynamic calibration of the model is necessary, during the project.
4.1.1
Application of Dynamic Calibration
As baseline model, being there no reference models we considered those used in the real project. According to the small experiences acquired in pilot projects carried out before starting the project, the models adopted were: for RE, PERF=400 NLOC/h; for REST, PERF=200NLOC/h. This yielded the baseline effort estimation functions in man hours: for RE Fe0=NLOC/400; for REST Fe0=NLOC/200. In both projects each repetition corresponds in executing the relative process on a program. In the case study there were 289 repetitions of RE and 349 of REST. Each repetition contributes one data point to the statistical data base. In both projects, the acceptable value of |ER| was 0.1. In the following, although the correlation analysis between the used variables (NLOC, NMOD, HALSTEAD, McCABE, NLOCD, COMPL_GA), has not been described for space reasons, it produces results
which vary between –0,38 e 0,48 except for COMPL_GA e McCabe where it is 0.97.
PHASE 1. Inventory
2.
Abstract data
3.
Reconstruct logical level of data Reconstruct application requirement
4.
5.
Reconstruct Logical levels of programs
6.
Improve logical model of programs
7.
Test and debug
ACTIVITY 1. Identification of : 1.1 Duplicate or obsolete Sources 1.2 Obsolete and useless Reports 1.3 Useless Files 1.4 Temporary Files 1.5 Permanent Files 1.6 Pathological Files 2. Identifying Meaning of data 3. Changing names of variables 4. Finding dead data 5. Finding redundant data 6. Classifying the data as: 6.1 Conceptual 6.2 Structural 6.3 Control 7. Build dependency diagram with the structure of the data existing in the programs 8. Expected functions in the programs derived from analyzing the reports and interviewing program users, maintainers and managers 9. Build Structure chart using the Sections and Paragraphs 10. Find dead instructions 11. Eliminate sections and paragraphs only called by dead instructions 12. Assign meaning to previously extracted modules 13. Extract modules using procedural Ifs 14. Find variables now obsolete and eliminate 15. Extract modules that implement the expected functions 16. Assign meaning to modules extracted previously 17. Equivalence Test for programs with altered structure
DELIVERABLE 1. System Sources and Files to be Renewed 2. Problems identified, to be solved with renewal process
3. 4. 5. 6. 7. 8.
Data Dictionary Diagram of improved dependencies Tables mapping old to new names of variables Programs with new data names Data classified by new data dictionary Expected functions for deriving Conceptual data calculated from raw data 9. Requirements for data reengineering 10. Dependency diagram 11. Expected functional requirements; 12. User requirements still unsatisfied 13. Structure chart, without altering the structure of the programs; 14. Documentation of modules in structure chart, using meaning assigned and instructions contained; 15. Improved Structure chart 16. Documentation of modules that appear in improved structure chart. 17. Test Plan 18. Test log 19. Software Debugged
Table 1: Scheme of renewal process
Stepwise Regression for the data points in RE. They show that NLOC is the variable with the highest correlation. The next most significant variable, NMOD slightly increases the R-Square and brings the p-level to borderline significance. Thus, this analysis shows that only NLOC should be used in the model. Table 3 shows the results of regression analysis for EFFORT vs NLOC, that defines the estimation function best predicting the effort in the data points of the learning set: Fe1=0.56 +0.004257 *N LOC Figure 1: Performance in RE
Figure 2: Performance of REST
Therefore all the variables used in the following can be considered as independent. To apply the DC, both projects require a first learning set to derive a more accurate estimation model. The learning sets consisted of the first 15 programs. Table 2 summarizes the data for
(2)
The correlation between the errors that (2) makes in all the data points of the learning set and the other available metrics was computed (not shown here for space reason) in order to find any corrective functions among the metrics not used in the estimation function. The low correlation coefficients obtained shown that no variable is a useful corrective factor. Table 8 shows the remaining models obtained by applying DC. For space reasons only the application of DC in the REST project will be presented in detail. Table 4 shows the regression analysis data on the, first 15 datapoints of REST. The following estimation function is derived: Fe1=-1.09709+0.00568*NLOC+0.1689*NMOD
(3)
Corrective factors were searched by a correlation analysis between the errors that (3) makes in the learning set and the other available metrics (not reported for space reasons). Results pointed out that there are no possible
Fc=-0.385852+0.003667*NMOD+(-0.000061)*COMPL-GA
(6)
A small error is obtained in this way. In fact although it is not shown in the paper, by using formula (4) in Table 5 the error ranges between 4% and 47% having 50% of the data points between 15% and 42% of error. By using formula (5) the error ranges between 0% and 13% having 50% of the data points between 1% and 6% of error. Finally by using the correction formula (6) the error ranges between 0% and 6% having 50% of the data points between 1% and 4% of error. With this approach the MMRE [46], is reduced from 31% by using formula (4) to 4% with (5), to 3% using (6). Table 5 shows that three recalibrations for each project were needed according to the three major improvements made on the projects and discussed in the previous section. Thus DC is effective in identifying process changes. It is interesting to investigate the resulting error that would have been obtained if a same estimation model would have been used for the entire project. Figure 4 shows the box plots for the estimation error made using the baseline estimation (ERRB) function for the whole RE project; the error that would have been obtained for the same project if only the first recalibration according to DC were performed (Err_bas_mod_1); the Summary of Stepwise Regression; Dependent Variable: EFFORT Step Multiple Multiple
error using the first and then the second recalibration (Err_bas_mod_1_2), and one using all three of them (Err_bas_mod_1_2__3) that expresses the performances of the complete DC application. For 50% of the repetitions the error goes from 32% of the real project to 3% for DC.; the highest peak was between 175% for the real project and 4% for the proposed approach. For clearness sake, the correspondent MMRE [46] made for the four cases illustrated in Figure 3 are respectively 43%, 147%,44%, 16%. We can conclude, that the recalibration of the estimation model in RE project improves the estimation accuracy. Figure 4shows box plots like the ones described in Figure 3, in the REST project. In this case for 50% of the repetitions the error goes from 11% of the real project to 3% for the proposed approach.; the highest peak was between 49% for the real project and 20% for the proposed approach. Box Plot Reverse Engineering 4,0 3,5 3,0 2,5 2,0 1,5 87%
1,0 0,5
54%
32%
3%
Err_bas_mod_1_2_3
Err_bas_mod_1_2
-0,5
ERRB
0,0
Err_bas_mod_1
correction factors. Since the estimation function has two independent variables it is best to extend the learning set with 15 more data points to obtain a more reliable function. It would not be enough to use the correction of the old estimation function already calculated on the learning set, because it makes very large errors that cannot be adjusted with any corrective parameters. During the estimation of the other 15 programs needed to complete the learning set for REST, function (3) must be used without any correction. With the new learning set, made up of 30 data points, the estimation function can then be recalculated. Table 5 shows the derived function, corresponding to the learning set 290–320. The estimation function has changed considerably, confirming the usefulness of extending the learning set. For the last learning set in Table 5, two variables are presented; the function had to be adjusted with the data points 514-543. In this case we used the procedure established in this approach for minimizing the estimation error during extension of the learning set. In fact, with the data points from 514 to 528 -the original learning set- the following model was obtained: Fe =-1.39673 + 0.00406*NLOC +0.04234*NMOD (5) Function (5) shows that the learning set needed to be extended with other 15 data points due to the presence of two independent variables. In the programs from 529 to 543 the effort was estimated with (5). Also the result adjusted with the error estimation function derived from the errors committed by (5) on the data points from 514 to 528. The resulting error model equation was:
Median 25%-75% Non-Outlier Range
Figure 3: box plots for the estimation ERROR in RE project using the different models
For completeness, the MMRE for the four cases illustrated in Figure 4 are respectively, 17%, 57%,18%, 8%. A primary aspect that and Figure 4 point out is that none of the estimation models used in this work would have been able to work well, to estimate the entire project with enough accuracy. The recalibration of the estimation model enacted by DC improved the estimation accuracy for the REST project as well as the RE one.
5. Dynamic Calibration vs Analogy An analytic comparison with the approaches previously cited in related works isn’t possible due to the incompatible diversities between them. Nevertheless the percent of errors the DC would have made on the two industrial projects is far more smaller than those reported in literature for the other approaches. A possible comparison can be made with the approach by analogy which, as pointed out in the previous sections, has many similarities with DC and is therefore a possible competitor. The comparison is done under a specific hypothesis. The hypothesis made is that each repetition of the sub process in DC is considered a project and, therefore, a data point in the approach by analogy. R-square
F - to
+in/-out NLOC
R 0.992802
1
R-square 0.985656
change 0.985656
entr/rem 893.2984
p-level 0,000000
NMOD
2
0.994998
0.990021
0.004365
5.2492
0.047688
HALSTEAD
3
0.995911
0.991839
0.001818
2.4502
0.151951
McCABE
4
0.996917
0.993843
0.002004
3.2558
0.104660
NLOCD
5
0.998037
0.996077
0.002234
5.1258
0.049848
Table 2: Stepwisw Regressionfor RE (Learning Set 1-15) Regression Summary for Dependent Variable: EFFORT; R= ,99280206 R²= ,98565594 Adjusted R²= ,98455255 F(1,13)=893,30 p