Document not found! Please try again

Effort Estimation for Program Comprehension - CiteSeerX

0 downloads 0 Views 74KB Size Report
Report Form (WRF) shown in Figure 1. ... already had experience of data collection with the WRF ..... [Arn94] Arnold R. "Software Reengineering - Tutorial".
Effort Estimation for Program Comprehension P. Fiore(*), F. Lanubile(**) and G. Visaggio(**) (*) Basica spa Via G. Amendola 162/1 70126 Bari, Italy Tel. 39-80-5490111 Fax. 39-80-5490292 (**) University of Bari, Italy Dipartimento di Informatica Via Orabona, 4 70126 Bari, Italy e-mail: [email protected]

Abstract1 This study presents an experience of deriving an econometric model for the software comprehension; this process is necessary for the renovation of existing software systems. The model uses data on the process and products. The aim of the econometric model is to minimize the risks when forecasting the budget and time needed to carry out the project. After having obtained the basic model, the correction factors which could reduce the risk or error of the forecast are considered. In particular, one factor, the suitability of the of the tool for the processes, requires such marked correction that the basic model is divided into two, one for each type of process (automatic or semiautomatic). All the models have the code lines of the existing program as independent variables; the model for semiautomatic processes uses the number of modules to be extracted as a regulator of the risk of forecasting error.

1. Introduction Many different techniques, methods and processes for understanding programs have been described in

literature; some examples are [Ant94, Can94, Mar94]. Few of these, however have undergone any real experimentation on the field and it is difficult to find econometric models, based on measurable variables about programs, which can forecast the effort necessary for carrying out these processes. Since projects for renewing existing software represent notable investments in terms of human and financial resources, a preventive estimation model of the effort required is essential to verify at least the economic feasability of the renewal process and, above all, to reduce the risk involved in defining the budget and time necessary for carrying out the process. The authors have already performed some quantitative analyses of their renewal process on the basis of data collected during experimentation on the field [Vis95, Vis94a]. These analyses showed that the effort required for the restoration process is influenced by many factors which constitute the variables of the process, such as the adequacy and capability of the tools used , the reverse engineer's experience, and the knowledge of the applicative domain available. In particular, in [Vis95] the hypothesis is made that the effort necessary for renewal of the program can be anchored to the depth of restoration. In this study, the authors analyze the quantitative data in greater detail with the aim of defining a model for forecasting the effort required to carry out the process.

1

This work has been partially supported by National funds MURST 40% under project "V&V in software engineering".

Workshop on Program Comprehension - Berlin, March 1996

1

To explain how the effort depends on the quality of the software produced, they have built a model for correcting forecasting errors. In the forecasting model, the independent variable used is the number of LOCs of the program to be restored. This variable can be measured a priori and therefore makes effective forecasting of the effort required to carry out the process possible. The process involved in the experiment will be described in section 2 and the experimentation environment in section 3. Section 4 will illustrate the basic econometric model and in section 5, the factors for correcting the basic model will be discussed and the model for full automatic (section 5.1) and semi-automatic (section 5.2) process derived. Section 6 will illustrate a verification of the model and section 7 will illustrate the related works. Section 7 will present the conclusions.

2. THE PROCESS MODEL The process model used in the experiment is described in detail in [Vis94a] and is only briefly outlined below. It can be said to be divided into two macrophases: Reverse Engineering and Restoration. They result in a logical model of the existing software, composed of structure charts specifying the modules, data dependence models, a data dictionary, tables and routes for navigating the data base. The Reverse Engineering phase involves building the documentation of the logical model starting from the program codes. This phase is largely automatic. The logical model derived from the reverse engineering process does not improve the level of readability, or comprehension, of the programs; hence, when they are difficult to read, restoration must be carried out to improve the comprehensibility of the logical model. Restoration aims to: classify data in the following classes: a) applicative domain data which are the attributes of recognizable entities in the applicative domain; b) control data which have no correspondence with the applicative domain but are used to record the occurrence of an event; c) structural data are needed to manage the organization of data bases or files; renaming variables, making their identifiers more meaningful; extracting modules with high internal cohesion from those with low cohesion and isolating them in the structure; externalizing modules which in the present process,

Workshop on Program Comprehension - Berlin, March 1996

are in line with the main; localizing variables declared to be global but used locally in both existing modules and in processes extracted during restoration; extracting modules with higher inner consistency level from modules and procedures of existing programs. Execution of this phase requires the intervention of an expert in the applicative domain to which the programs to be restored belong. He will help to classify the data and define the names of the variables and, above all, will point out the expected business functions which should be found in a program and then extracted to form modules with high internal cohesion. In our process model, great use of slicing techniques is made in this phase. Many of these techniques are well supported by the tools used but others, such as direct and transform slicing, are not. Human intervention is therefore required to decide which are the most suitable techniques for extracting the modules with the highest cohesion, to perform the processes surrogating the slicing extraction algorithms missing in the tools, with the help of the services provided by the extraction tools, to verify that the result of the processes is correct. Each phase includes validation and equivalence tests. Validation involves verifying that the documentation produced is coherent with the behavior and meaning of the components of the existing software system. The equivalence test ensures that the restored software behaves in the same way as the working software did.

3. THE EXPERIMENTATION ENVIRONMENT The sample is composed of a subset of 26 programs belonging to a banking application written in COBOL; further details are given in [Vis94a]. Approximately 31,981 Locs (instructions in the procedure division excluding comments) are present and the overall effort for reverse and restoration required about 170 man/hours. The code is divided into programs; by program, we mean a code unit which is executed autonomously during the operation of the software system. The measurements of the metrics concerning the code subjected to reverse engineering were executed automatically with commercial available tools (Viasoft ESW toolset). The process data, on the other hand, were captured manually by the workers by means of the Workday Report Form (WRF) shown in Figure 1. The forms are filled in daily by each subject. As each worker is

2

WORK DAY REPORT FORM

PROJECT NAME Phase code

Hours

DATE Status

INPUT object analyzed

OUTPUT object produced

PHASES

Activity

STATUS E. Execution W Wait

1. Inventory software system 2. Reconstruct logical level of data

W1 Wait for automated task execution W2 Wait for the completion of other activities W3 Communication problems W4 Hardware problems W5 Software problems W6 Resource problems

3. Abstract data 4. Reconstruct functional requirements 5. Reconstruct logical level of programs 6. Restore logical model 7 Test and debug 8 Produce documentation

Figure 1 - The sample work day report form generally only assigned to one program, he has few daily activities and so little time is required for filling in the form and the data provided is therefore reasonably complete and accurate. The people working during the experiment had already had experience of data collection with the WRF and we can assume they filled them in correctly. In any case, the forms are submitted to project monitoring and checks of the consistency, completeness and comprehensibility of each form. If the form does not pass the validation phase its compiler is interviewed to correct the data concerning him. The effort required for the programs can be extracted from the WRF and the effort concerning the data excluded. Further information on this point can be found in [Vis95, Wol93,Vot94]. The measurements used in the experiment were: LocsProcDiv: Lines of code in the "Procedure Division" of a COBOL program; LocsDataDiv: Lines of code in the "Data Division" of a COBOL program; NumSections: number of program blocks identified

Workshop on Program Comprehension - Berlin, March 1996

by labels in a COBOL program; McCabe: the McCabe complexity metric computed for the original programs; Halstead: Halstead's volume metric computed for the original programs; NumExtractedMods: number of modules obtained after restoring a program; AvgMcCabe: the average of the McCabe complexity values of modules obtained after the restoration; Hours: hours required for carrying out the reverse engineering or restoring phase. Our aim was to characterize the effort expressed in man/hours according to the other variables [Con86]. Unfortunately, only a part of the programs subjected to the process could be subjected to all the measurements: 29, to be precise, of which 26 were used to perfect the econometric models and the other three to verify the final model. The measurements for the entire sample are reported in Table 1. In Table 1 we also report the PRF values which indicate the level of performance reached by the process, calculated as the LocsProcDiv/Hours ratio.

3

Table 1 - Database used for deriving the econometric models Locs Proc Div

Locs

Mccabe

Halstead

N sect.

Data Div

Num

Avg

Extracted

McCabe

Hours

PRF

Mods.

Prog-01

517

6000

83

23940

17

20

8.700

4.5

114.89

Prog-02

749

5247

197

43060

21

8

23.000

4.5

166.44

Prog-03

1465

9355

234

83667

43

25

18.800

11.5

127.39

Prog-04

387

8417

72

14949

14

9

25.560

2.75

140.73

Prog-05

438

6325

72

22300

14

14

11.500

3.5

125.14

Prog-06

1090

8992

145

54373

26

10

29.740

6.75

161.48

Prog-07

228

1004

46

10701

13

7

26.667

1.5

152.00

Prog-08

386

5873

83

17208

7

8

29.532

2.5

154.40

Prog-09

438

9496

67

19610

21

23

5.545

4.75

92.21

Prog-10

682

6137

95

32360

27

28

4.167

7.75

88.00

Prog-11

122

4910

21

4320

8

4

28.500

1

122.00

Prog-12

663

8441

89

28430

11

17

11.667

5

132.60

Prog-13

1711

9196

243

92697

57

19

13.167

8

213.88

Prog-14

1827

7214

1365

99077

39

20

23.350

7.5

243.60

Prog-15

3024

10743

890

145381

174

45

24.942

12.75

237.18

Prog-16

2307

8219

495

108306

40

26

26.792

10

230.70

Prog-17

2413

9596

3342

137196

39

15

20.800

11.5

209.83

Prog-18

3622

9801

598

179484

146

43

16.953

16

226.38

Prog-19

713

6103

226

35520

29

5

24.800

4.25

167.76

Prog-20

1454

8280

235

67562

26

32

14.650

6

242.33

Prog-21

933

2838

130

41990

37

10

19.400

4

233.25

Prog-22

917

6976

145

44480

34

16

10.375

5

183.40

Prog-23

858

9555

101

38600

28

22

9.560

4.25

201.88

Prog-24

683

5399

169

31560

24

2

51.000

3.75

182.13

Prog-25

470

6849

68

18423

29

9

12.980

2

235.00

Prog-26

3794

9764

596

187404

149

29

22.241

17.5

216.80

SEE = sqroot(Σ(H-Hs)2/(N-k-1))

4. BASIC ECONOMETRIC MODEL The independent variables which affect the econometric models are those corresponding to the observables which can be measured on the programs in input to the process: LocsProcDiv; LocsDataDiv, McCabe; Halstead; NumSection. To simplify interpretation of the models, it was decided to use only single linear regression, in this first phase of the experiment [Mak83,Lap83]. The following statistical indicators were used to validate the regression model: Sample coefficient of multiple determination: R2 = Σ(Hs-Hm)2 / Σ(H-Hm)2 Standard Error of the Estimate:

Workshop on Program Comprehension - Berlin, March 1996

Mean Magnitude of Relative Error MRE = 1/nΣ|(H - Hs) / H| where Hm is the mean of the observed values, Hs is the value estimated by regression, N is the number of observations and K is the number of independent variables. R2 is between 0 and 1: the nearer R 2 is to 1, the better regression model. The quality of regression is indicated even by small SEE values. Finally, MRE represents the percentage value of error in computing the values using regression. To verify the reliability of the regression models, we considered an F-test of statistical significance. F = (R2/k)/[(1-R2)/(N-k-1)] with (k, N-k-1) degrees of freedom

4

F expresses the relationship between the explained sample variability and those not explained by the model, so that a higher value of F shows a better quality regression model. Table 2 shows the correlation between Hours and the independent variables. Table 2 - Correlation between Hours and other independent variables Correlation coeff. Locs Proc Div

0.95

Locs Data Div

0.65

Mc Cabe

0.48

Halstead

0.96

N sections

0.83

Num Extr. Mod

0.76

Avg Mccabe

-0.05

The results of performing single linear regression on the variables with a higher correlation with Hours are reported in Table 3. Table 3 - Simple linear regression models R2

SEE

MRE

LocsProcDiv

0.903

1.385

0.226

Halstead

0.912

1.315

0.215

NumSections

0.696

2.443

0.463

NumExtractedMods

0.582

2.867

0.448

The table shows that the best results are obtained using Halstead and LocsProcDiv. We chose LocProcDiv because this measurement is both easier to derive and more meaningful for the programmer, while its quality parameters are slightly less than the Halstead ones. The value of F is very high for all the independent variables, so much so that the level of significance associated with F is always 100%. The regression model using the LocsProcDiv is: H = 1.567 + 0.004 x LocsProcDiv (1) The constant term could explain the effort required to decide on the slicing techniques to be used. It can be seen that this is relatively high. The coefficient of the independent variables, on the other hand, explains the time the tool takes to obtain the instructions.

5. CORRECTIVE FACTORS Since the mean error obtained by the basic model is

Workshop on Program Comprehension - Berlin, March 1996

very high, it is necessary to take into consideration those corrective factors which could improve the forecast. Normally, these factors express the process variables which characterize the process. In this case, many of the process variables are constants in the different samples and are therefore contained in the basic models. In fact, the sample in question is composed of programs of the same software system, to each of which the same process is applied to obtain the logical model. Hence, with reference to [Bai81] and without going into details: the factors which express the knowledge of the applicative domain are constants since the software system is already working, so that its applicative content is homogeneously known by all the participants in the project; the requirements and specifications of the application do not affect the necessary effort, since they are contained in the working software and have not changed; the environments for developing and using the system have been tried and tested and are known not to present any problem during execution of the process; the programmers' experience has an influence at the beginning, as the authors observed in [Vis94a], but this events out during execution of the process on many programs, so that the lower productivity in the earlier programs is compensated for in the later ones. The process must receive particular attention, however. In Table 4, taken from [Vis95], the percentage of activities performed other than those provided for in the description of the processes, is seen to be very low. Table 4 - Distribution of effort per activity Utility Production

1.2%

Project management

1.0%

Waiting Status

1.0%

System Support Activities provided for by the process model

0.6% 96.2%

Thus, characterization of the sample elements must be on the basis of the requirements for the main activities in the process model. The first characterization pointed out in [Vis95] is the suitability of the tools used. This factor cannot be expressed by any measured variable but it can be taken into account if the set of programs is divided into two: the first 12 used the techniques of direct slicing or transform slicing [Vis93a], they required a great deal of human intervention since the tools were unable to extract

5

this slicing. The other programs did not need this slicing techniques and their restoration was therefore fully automatic. The two sets thus obtained have a statistically significant different performance. In fact, if we calculate the mean values of the performances associated with each, we can test the validity of the null hypothesis: Ho: PERFI = PERFII ; adapting the t-test to 99% significance, this hypothesis can be rejected. The two sets are statistically different and are therefore treated differently. A Semiautomatic Restoration model is derived for the first set and an Automatic Restoration model for the second. Another form of characterization is suggested in [Vis95], which states that the effort necessary for carrying out the process can be anchored to the depth of the restoration. In our case, this variable can be expressed by the number of modules extracted and their mean complexity (McCabe). These factors are used in the correction of the two models.

level. In fact, the engineer has only to decide between two slicing techniques and he is aided in this decision by the goal he wishes to reach. If he wishes to extract business function modules he must use "transform slicing", otherwise "direct slicing". During operation, the operator must check this continually and, by means of the tool used, must ascertain that the slice extracted contains all and only all the desired module. This contribution is expressed by the relatively high value of the LocsProcDiv coefficient. The values of the variables for verifying this regression are: R2 = 0.8639; MRE = 0.183; SEE = 1.1306. The mean error is still too high, approximately 18%, so the other corrective factors should be used. Table 6 shows the value of (r) and (R2) with respect to the error for some corrective variables. A high correlation can be observed between the error and the variable NumExtracted Mod and AvgMcCabe. Table 6 - Correlation with the error R2

r

5.1 MODEL FOR SEMIAUTOMATIC RESTORATION The correlation coefficients (r) and the determination coefficients (R2), with respect to Hours, are shown in Table 5, for all the independent variables, on all the programs in the first group. Table 5 - Correlation with Hours r

R2

Locs ProcDiv

0.93

0.87

LocsDataDiv

0.59

0.34

McCabe

0.79

0.64

Halstead

0.91

0.83

N sections

0.93

0.87

NumExtracted Mod

0.75

0.57

AvgMcCabe

-0.36

0.13

It can be evinced from this table that LocsProvDiv has the highest correlation. Using this independent variable, the regression line is found to be: Hours = 0.331 + 0.007 x (LocsProcDiv) (2) The significance of the coefficient may be the same as for the basic model. The constant term is relatively small in this regression line because human intervention is less at the decision level and more at the operative

Workshop on Program Comprehension - Berlin, March 1996

NumExtractedMods

0.93

0.86

AvgMcCabe

-0.91

0.83

Furthermore, the correlation coefficient between these two variables is -0.843 and such a high value suggests that the variables should not both be considered independent. Thus a regression line for error forecasting can be obtained: ERR = 0.419 + 0.025 x NumExtractedMods. (3) The coefficient of the variables is high and strongly affects the possible error in estimating the effort necessary to carry out the process. Using ERR, the forecast can be modified with the following formula: Hoursadj = (Hours)/(1-ERR)

(4) With Hoursadj we can derive the modified error shown in Figure 2, which can be seen to have reduced considerably. In fact MRE = 0.067 for the estimated hours.

5.2 MODEL FOR AUTOMATIC RESTORATION Table 7 shows the correlation coefficient (r) between Hours and the other variables, for the second group.

6

Error of the estimate 40 30

Moltiplicato per 1E-02

error %

20 10 0 -10 -20 -30 -40 RS-01

RS-02

RS-03

RS-04

RS-05

RS-06

RS-07 RS-08

RS-09

RS-10 RS-11

RS-12

Programs initial estimate

with correction

Figure 2 - Representation of the error of the estimate

Table 7 - Correl. between Hours and other variables Correlation coefficient Locs ProcDiv

0.99

LocsDataDiv

0.69

McCabe

0.41

Halstead

0.99

N sections

0.84

NumExtracted Mod

0.73

AvgMcCabe

0.01

extracting the instructions belonging to the slice required. For this model, the validation variables have the values: R2 = 0.9859; MRE = 0.087; SEE = 0.598 The regression line is good, therefore, and the mean error is small (approximately 8.7%), so that it is pointless to adjust the error introducing other factors, particularly bearing in mind that the correlation between the error and the other factors is too small, as shown in Table 8. Table 8 - Correlation with the error

The regression line corresponding to this variable is Hours = 0.423 + 0.004 x LocsProcDiv (5) As in the preceding model, the constant expresses the decision effort, which is high because the operator must choose the most suitable slicing technique from among all those offered by the tool (Transaction slicing, Computational slicing, Perform range, etc.). He has no doubts as to the decision other than the need to see the slice resulting from application of the chosen technique. Furthermore, the coefficient of LocsProcDiv is relatively small because in this case the effort is entirely spent by the tool, which runs through the program

Workshop on Program Comprehension - Berlin, March 1996

Correlation coefficient Locs ProcDiv

0.034

LocsDataDiv

0.092

McCabe

0.089

Halstead

0.064

N sections

0.006

NumExtracted Mod

-0.204

AvgMcCabe

0.0265

This is also so in reality. In fact in these programs, the number of modules extracted and their complexity is the result of the work of the tool not of the engineer.

7

6. CHECKING THE MODEL To use the system of econometric equation obtained in this work, the following procedure should be used: for each program to be restored, decide which technique to use on the basis of what knowledge is hoped to be acquired and of the initial level of quality; If transform or direct slicing techniques are to be used, then: use formula (2) after having counted the Locs, to obtain the forecast effort calculate the expected error using formula (3) after having decided how many modules to extract adjust the forecast using formula (4) if other slicing techniques are to be used, apply formula (5), using the number of Locs in the program to be subjected to the process. To check this procedure and the working of the econometric system, we tried it out on three programs, in addition to the sample used previously; these 3 programs were subjected to a semiautomatic restoration treatment. Their characteristics are reported in the table below. Table 9 - Characteristics of the programs used to check the model LocsProcDiv

NumExtractedMods

Actual hours

Prg-1

370

11

2.75

Prg-2

941

18

7

Prg-3

1.350

22

12

Application of the basic model to the programs results in an initial estimate of the expected hours and then the correction factor is calculated on the basis of the number of modules to be extracted; thus obtaining a final estimate. These data are reported in the following table:

building the model, this is the case of Prg-3, but it is better than the initial estimate. By adding new data to the database we could therefore extend the field of applicability of the model.

7. RELATED WORKS In the field of statistical analysis for obtaining econometric models for software cost estimation, our work has taken into account the method used by Bailey and Basili in [Bai81] and other suggestions given by Conte et al. in [Con86]. Many authors have been proposed models for forward engineering and have been defined many corrective factors (eg. cost drivers) for adjusting the initial regression line based on LOCs. In a reverse engineering project many factors depends upon the characteristics of the system to be reengineered and the particular process which is really carred out and therefore many factors described in the forward engineering cost models cannot be applied. In this work are discussed the effectiveness of such corrective factors and we derive a differet set of factors: the number of modules obtained after restoration and their mean complexity. These factors are different, as an example, from the ones used by Sneed and described in [Sne91]. The analyses on the data coming from the execution of restoration activities in reverse engineeering projects, has been presented also in [Vis95] where the focus was on the productivity that could be achieved in performing these activities; we pointed out also that the effort for restoration could be anchored to the budget and time available. The results presented in this work enforce that hipotesys by giving a more complete framework for the estimation of the effort.

8. CONCLUSIONS Table 10 - Results from the application of the model Initial estimate Prg-1

2.9

Prg-2

6.9

Prg-3

9.8

Final estimate 2.6

Error initial vs actuals

Error final vs actuals

-5.5%

5.5%

7.1

1.4%

-1.4%

11.3

18.3%

5.8%

The model seems to be well adapted to the real situation: the largest error is obtained when the pairs of values (LocsProcDiv, NumExtractedMods) the model is applied to are outside the range of values used for

Workshop on Program Comprehension - Berlin, March 1996

This study presents an experimental method for obtaining an econometric model. A process model for restoring existing software was empirically calculated. It confirmed that an econometric model depends on the process model and on the environment in which the process is carried out. An econometric model is necessary, bearing in mind the enormous investment involved in restoring software, to be able to forecast with a reasonable degree of reliability the budget and time necessary for understanding it. For this reason, we believe that from experiences like ours, concepts could be extracted which could be used by

8

all those operating in the same sector. A fundamental principle observed in our experience is that the suitability of the tool for the process greatly affects the cost of its execution, since an inadequate tool requires much more extensive human intervention. This suitability does not only affect the forecasting model, i.e. the mean performance of the engineers during execution of the process but also, indeed, above all, the risk of forecasting error. When a large amount of human intervention is involved, the factors which make it possible to carry out the process at different depths must be examined so that while managing the project, the manager can reduce the forecasting error and thus the risk of the budget and time being insufficient for the whole process. For example, in our process, the project manager can manage the reliability of the process by deciding how many modules to extract from the existing program. Unless the process model has this adaptability, it is difficult to distinguish the risk factors from those affecting them. This reduces the reliability of the forecasting model and makes it more difficult to manage, as corresponding errors of the two corrected models. Furthermore, some factors may have such a great influence on the econometric model as to require differentiation of the models according to the values of one or more factors. This occurred in our case with respect to the suitability of the tool, which, in turn, depended on the techniques used to understand the working programs. For this reason it was found better to define two models, one for all those programs in which the process was fully automatic and one for those in which it was semiautomatic. The continuation of this work will concentrate further on the econometric equations, extending the experimental sample. Acknowledgments We would like to thank Franco De Matteis for his help in collecting and analyzing the data. REFERENCES [Ant94] Antonini P., Canfora G., Cimitile A. "Re-engineering legacy systems to meet quality requirements: an experience report". 16th Int. Conf. on Sw. Eng. Sorrento May, 16 1994 - IEEE Comp. Soc. Press [Arn94] Arnold R. "Software Reengineering - Tutorial" 16th Int. Conference on Software Engineering, Sorrento May, 16 1994 - IEEE Computer Society

Workshop on Program Comprehension - Berlin, March 1996

Press [Bai81] Bailey J. W., Basili V.R., "A meta-model for software development resource expenditures" Proceedings of the 5th Intern. Conf. on Software Engineering - 1981 IEEE Comp. Soc. Press [Bas84] Basili V. R., Weiss D. M. "A methodology for collecting valid software engineering data" IEEE Transactions on Software Engineering Vol. 10 No. 6 Nov. 1984. [Bas92] Basili V. R., "Software modeling and measurement: the Goal/Question/Metric paradigm" Computer Science Technical Report Series CS-TR-2956 University of Maryland UMIACS-TR-92-96 - September 1992 [Can94] Canfora G., De Lucia A., Di Lucca G. A., Fasolino A. R., "Recovering the architectural design for software comprehension" Proceedings of the 3rd Workshop on Program Comprehension - 1994 IEEE Comp. Soc. Press [Con86] S. D. Conte, H.E. Dunsmore, V.Y. Shen, "Software Engineering Metrics and Models" The Benjamin/Cummings Publ. Comp. - 1986 [Chi90] Chikofsky E. J., Cross II J.H. "Reverse engineering and design recovery: a taxonomy" IEEE Software Jan. 1990. [Jar94] Jarzabek S., "Life-Cycle approach to strategic Re-engineering of software", Journal of Software Maintenance: research and practice Vol. 6 No. 6 1994 [Lap83] Lapin L. L., "Probability and statistics for modern engineering", PWS Publishers - B/C Engineering division - Boston 1983. [Mak83] Makridakis S., Wheelwright S. C., McGee V. E., "Forecasting: methods and applications" 2nd ed. John Wiley & Sons - 1983 [Mar94] Markosian L., Newcomb P., Brand R., Burson S. and Kitzmiller T. "Using an enabling technology to reengineer legacy systems". Comm. of the ACM Vol. 37 no. 5. May 1994 [Sne91] Sneed H. M. "Economics of Software Re-engineering" , Journal of Software Maintenance: Research and Practice Vol. 3 no. 3. Sept. 1991 [Sne95] Sneed H. M., Nyàry E., "Extracting Object-Oriented Specification from Procedurally Oriented Programs". 2nd Working Conference on Reverse Engineering 1995. IEEE Comp. Soc. Press. [Vis93a] Visaggio G., Cutillo F. and Fiore P. "Identification and extraction of domain independent components in large programs" Proceedings of the Working Conference on Reverse Engineering, Baltimora 1993. [Vis93b] Visaggio G, Abbattista F., Lanubile F.,

9

"Recovering Conceptual Data Models is Human Intensive". The 5th International Conference on Software Engineering and Knowledge Engineering. 1993 Knowledge System Institute [Vis94a] Visaggio G., Abbattista F., Fatone G.M.G., Lanubile F. "Analyzing the application of a reverse engineering Process to a real situation". 3rd Workshop on Program Comprehension Nov 14-15 1994 Washington D.C. USA. IEEE Computer Society Press. [Vis94b] Visaggio G., "Process improvement through data reuse", IEEE Software Vol.11 No.4, July 1994 [Vot94] Votta L. G., Perry D. E., Staudenmayer N. A.

Workshop on Program Comprehension - Berlin, March 1996

"People, organization, and process improvement" IEEE Software July 1994. [Vis95] Visaggio G., Fiore P., Lanubile F. "Analyzing Empirical Data from a Reverse Engineering Project" Proceedings of the 2nd Working Conference on Reverse Engineering (2nd WCRE) Toronto Canada July 1995. IEEE Comp. Soc. Press [Wol93] Wolf A. L., Rosenblum D. S. "A study in software process data capture and analysis" Proceedings of the 2nd International Conference on the Software Process (ICSP2), Berlin 1993. IEEE Comp. Soc. Press.

10