Using code metrics to predict maintenance of legacy programs: a case ...

3 downloads 258 Views 160KB Size Report
execution environment. Maintenance contracts defined in MANTEMA include SLA for some of these types of maintenance: • TRCA (Time of Resolution of Critic ...


8VLQJFRGHPHWULFVWRSUHGLFWPDLQWHQDQFHRIOHJDF\SURJUDPVDFDVHVWXG\ Macario Polo, Mario Piattini and Francisco Ruiz Escuela Superior de Informática University of Castilla-La Mancha Ronda de Calatrava, 5 13071-Ciudad Real (Spain) Tel.: +34-926-295300 {mpolo, mpiattin, fruiz}@inf-cr.uclm.es

$EVWUDFW 7KLV SDSHU SUHVHQWV DQ HPSLULFDO VWXG\ RQ WKH FRUUHODWLRQ RI VLPSOH FRGH PHWULFV DQG PDLQWHQDQFH QHFHVVLWLHV7KHJRDORIWKHZRUNLVWRSURYLGHDPHWKRGIRU WKH HVWLPDWLRQ RI PDLQWHQDQFH LQ WKH LQLWLDO VWDJHV RI RXWVRXUFLQJ PDLQWHQDQFH SURMHFWV ZKHQ WKH PDLQWHQDQFH FRQWUDFWLVEHLQJSUHSDUHGDQGWKHUHLVYHU\OLWWOHDYDLODEOH LQIRUPDWLRQ RQ WKH VRIWZDUH WR EH PDLQWDLQHG 7KH SDSHU VKRZV VHYHUDO SRVLWLYH UHVXOWV UHODWHG ZLWK WKHPHQWLRQHG JRDO .H\ZRUGV FRGH PHWULFV PDLQWHQDQFH SUHGLFWLRQ RXWVRXUFLQJ ,QWURGXFWLRQ Outsourcing of software life cycle activities is a growing business area in many sectors influenced by Information Technologies (IT) ([4], [8], [15]). According to [16], the most outsourced areas in IT are software maintenance, software development and network provision. Several authors have studied advantages and drawbacks of outsourcing (all of them applicable to the maintenance context), but always from the Customer organization’s point of view ([5], [9]). However, the acceptation of an outsourcing project by a supplier organization also entails some risks, that have received little attention. In the case of maintenance, these risks influence on the (un)fulfilment of the Service Level Agreements (SLA) covenanted in the contract. In MANTEMA ([13], [14]), a methodology for software maintenance jointly developed with Atos ODS (the third European organization in the supply of software services, and specialized in outsourcing of software maintenance), the establishment of outsourcing relationships receives special attention: its process model includes a set of activities which guide the Maintenance organization in the first stages of the outsourcing. One of the tasks included in MANTEMA is devoted to the collection of data on the software to be maintained in order to help the Maintenance organization in the assignment of adequate values to the SLA. MANTEMA distinguishes five types of maintenance:

8UJHQW FRUUHFWLYH: a detected error prevents normal system operation and the solution time is critical. 2. 1RQXUJHQW FRUUHFWLYH: a detected error does not block the normal operation of the system and the solution time is not critical. 3. 3HUIHFWLYH: when new functionalities are added to the system. 4. 3UHYHQWLYH: consists of the software modification to improve its maintainability and quality properties. 5. $GDSWLYH: when the system will change its execution environment. Maintenance contracts defined in MANTEMA include SLA for some of these types of maintenance: • TRCA (7LPHRI5HVROXWLRQRI&ULWLF$QRPDOLHV): it is the maximum time that the maintenance organization may employ in fixing a critic anomaly (urgent-corrective) without being sanctioned. • TRNCA, which is the 7LPHRI5HVROXWLRQRI1RQ &ULWLF$QRPDOLHV(non-urgent corrective). • NMRCA, which is the 1XPEHU RI 0RGLILFDWLRQ 5HTXHVWV UHODWHG WR &ULWLF $QRPDOLHV assumable per period (for example: 30 per month) • NMRNCA, which is the 1XPEHURI0RGLILFDWLRQ 5HTXHVWV UHODWHG WR 1RQ&ULWLF $QRPDOLHV (nonurgent corrective) assumable per period • DevCA, which is the maximum 'HYLDWLRQLQWKH QXPEHURI&ULWLF$QRPDOLHVin a shorter period, or the maximum number of MR that can be received in, for example, a week with the commitment of being served in TRCA (i.e.: NMRCA can be 20 and DevCA=10; this means that the Maintenance Organization commits to serve 20 MR per month, but with the number of MR per week limited to 10). • DevNCA, which is similar to DevCA but related to Non-Critic Anomalies. SLA for perfective and adaptive are not usually signed in the contract. Preventive interventions are not explicitly signed, but there is a commitment of doing “progressive preventive maintenance” during the 1.

outsourced period (for example: the Maintenance organization commits to decrease the number of errors per KLOC). Cost of these interventions is individually studied and estimated. MANTEMA includes an “Initial questionnaire” (whose template can be found in [14]) that the Maintenance organization must fill-in before the contract signing. This document details some characteristics of the software to be maintained, its hardware environment, the development and maintenance organizations, etc. However, it is usual that very little quantitative information on the software is available on those moments. In these cases, the maintenance organization has very little information to estimate the values of the SLA, which may introduce some uncertainty in the proposal and negotiation of SLA with the Customer. Usually, available data on the software to be maintained is limited to number of modules and size of each one. Therefore, the Maintenance organization must use these two variables to do predictions on the maintenance necessities of the software and on the SLA. In this paper we present the results of an empirical study which allows to categorize applications according to their maintenance necessities, as a function of its number of modules and size in lines of code (LOC). This results very useful to estimate the maintenance effort required by third-party software when there is very little information at the disposal of the supplier organization. The paper is organized as follows: section 2 describes the experiment, explaining its context, some initial results and presenting the working method and its results in deep, %DQN 1 2 Total:

including some equations for prediction; in section 3 we present our conclusions and draw the future lines of work. 'HVFULSWLRQRIWKHH[SHULPHQW Data used in this study correspond to the applications of two great banks, whose maintenance is being made by Atos ODS. All programs are developed in Cobol/CICS and manage DB2 databases. Both sets of applications have several millions of lines of code. Responsibility of the maintenance of the first bank was assumed by Atos ODS in May of 1999, whereas the maintenance of the other bank’s software begun in May of 2000. The MANTEMA methodology is being used to carry out the maintenance of both bank’s software. During these months, there have been many modification requests of urgent and non-urgent corrective and perfective, and none of preventive and adaptive. Data collected by Atos ODS in both projects include: • Number of applications in each bank • Number of modules per application • Total size of every application • Number of modification requests (MR) of urgent corrective, non-urgent corrective and perfective during the respective outsourced months of 1999 and 2000 per application • Total effort (in hours) devoted to each application in every type of maintenance Table 1 shows some totalised data of both projects; original, detailed data appear in Table 2.

1žRIDSSOLFDWLRQV 1žRIPRGXOHV 7RWDOVL]H 47 15.717 10.000.263 8 1.149 3.832.880 55 16.866 13.833.143

7DEOH6RPHGDWDRIERWKPDLQWHQDQFHSURMHFWV The four first columns of Table 2, and other additional data are collected in the “Initial questionnaire” document mentioned in Section 1. Unfortunately, it is common that only “coarse” metrics as LOC and number

of modules could be collected in the questionnaire (when the outsourcing contract is being prepared), which does that the Maintenance organization only could deal with them in order to estimate the SLA values, budgets, etc.

1999 UC Bank App# Programs Size (LOC)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 30 31 32 33 34 35 36 37 38 39 4 40 41 42 43 44 45 46 47 5 6 7 8 9 1 11 2 3 4 5 6 7

253 193 321 10 618 354 421 160 287 78 166 43 201 88 148 283 438 11 750 134 230 227 366 471 63 332 15 175 243 354 154 298 203 157 232 92 126 492 73 481 36 145 1676 1417 1669 328 705 141 171 285 42 110 54 130 216

131 427 82 254 206 500 11 165 375 194 242 070 203 560 73 169 151 317 18 546 106 311 15 110 98 162 39 108 76 719 152 348 213 886 3 165 460 223 54 693 101 221 106 337 237 160 252 498 37 677 175 999 4 786 97 318 130 653 186 845 63 115 138 779 103 824 99 736 128 185 31 048 68 324 355 746 39 715 270 054 24 360 50 474 1 408 388 1 234 236 1 388 623 164 979 385 256 703 540 612 451 1 065 148 112 457 304 675 168 505 353 984 512 120

MR

15 0 20 0 81 30 0 0 4 0 0 0 0 0 0 0 0 0 5 2 14 6 0 7 0 1 0 0 0 15 0 0 43 0 26 2 0 15 0 9 0 0 40 143 11 4 18

2000

NUC Eff

12 0 33 0 214 14 0 0 8 0 0 0 0 0 0 0 0 0 9 2 32 10 0 6 0 1 0 0 0 33 0 0 39 0 60 1 0 26 0 34 0 0 89 308 4 2 35

MR

11 2 63 0 110 55 12 4 30 0 0 0 8 0 0 17 7 0 58 5 18 5 19 48 0 3 0 14 22 22 5 14 31 9 19 16 4 58 8 33 0 0 39 62 43 4 42

Perf

Eff

20 2 176 0 327 75 16 2 45 0 0 0 24 0 0 31 9 0 107 3 51 11 23 105 0 9 0 26 32 33 6 12 28 22 31 42 9 162 15 31 0 0 85 91 43 6 90

MR

11 0 134 0 98 143 313 0 755 0 0 0 737 0 0 0 101 0 288 59 6 5 61 239 0 36 0 26 913 162 28 15 45 43 135 12 5 285 6 221 0 0 326 581 8 85 67

No data on 1999 in bank 2

UC Eff

24 0 463 0 338 512 1315 0 4557 0 0 0 3263 0 0 0 506 0 1237 192 39 20 196 906 0 142 0 83 5189 697 60 25 160 93 595 40 20 1176 9 887 0 0 1447 1996 20 596 238

MR

1 0 23 0 19 8 0 0 13 0 0 0 0 0 0 0 0 0 7 0 0 0 0 7 0 5 0 0 0 10 0 0 10 0 21 0 0 7 0 3 0 0 32 40 0 5 2 155 27 102 21 35 69 32 18

NUC Eff

1 0 41 0 51 7 0 0 34 0 0 0 0 0 0 0 0 0 17 0 0 0 0 14 0 8 0 0 0 46 0 0 6 0 93 0 0 6 0 3 0 0 86 79 0 5 7 466 66 260 39 56 175 92 32

MR

10 0 95 0 72 41 27 2 15 0 0 0 29 0 0 9 11 0 95 5 6 11 17 55 0 8 0 5 26 72 12 11 11 22 71 10 6 43 12 45 0 0 25 97 41 4 48 155 27 105 21 35 69 32 23

Perf

Eff

9 0 217 0 152 71 60 6 46 0 0 0 60 0 0 15 26 0 211 13 11 32 19 101 0 16 0 4 41 106 26 13 8 45 151 19 11 101 37 130 0 0 41 147 50 3 98 466 66 266 39 56 175 92 49

MR

41 0 61 0 154 132 32 0 403 0 0 0 0 0 0 0 0 0 527 51 43 33 0 429 0 275 0 9 73 4 113 0 51 28 270 3 0 330 0 58 0 0 115 575 14 36 24 198 18 525 332 51 283 5 427

Eff

133 0 186 0 702 392 75 0 2089 0 0 0 0 0 0 0 0 0 3064 248 239 158 0 2591 0 2487 0 29 334 6 556 0 268 113 1162 3 0 1789 0 189 0 0 434 2478 36 121 84 1004 67 3425 1803 93 1393 16 2898

Legend: MR=number of modification requests; Eff=mean effort in hours per modification request; UC=urgent corrective; NUC=non-urgent corrective; Perf=perfective

7DEOH'DWDXVHGLQWKHVWXG\

 6RPHUHVXOWVDWILUVWVLJKW Some initial results we can obtain from data in Table 2 are related to the mean effort of each maintenance type, which can be seen in Table 3. 7\SHRIPDLQWHQDQFH Urgent corrective Non-urgent corrective Perfective

0HDQHIIRUW 2 hours and 15 minutes 2 hours and 7 minutes 5 hours

7DEOH0HDQHIIRUWSHUW\SHRIPDLQWHQDQFH In [6], Fenton and Ohlsson study several hypotheses on the relationship between number of faults and failures and several metrics at module-level. They observe that a little percentage of modules (10%) contain most or the totality of the failures. Probably, we would obtain similar results if we had metrics of the 16.866 modules of our 55 applications; at application level, however, data show a correspondence near to 1:1 (Table 1).

7DEOH  5HODWLRQVKLS EHWZHHQ QXPEHU RI FRUUHFWLYH 05HIIRUWDQGVL]H

'LV WULEXWLRQRI0DLQWH QDQFH 5H TXH V W

'LV WU LEXWLRQRIH IIRU W

4%

Other interesting results, that confirm those of [2], [7] and [11], are related to the distribution of effort and number of Modification Request according to the maintenance type (Figure 1): perfective maintenance requires the most resources and has the greatest number of modification requests. %Accumulated %Accum. Corr. %Accum. Cor. Size MR Effort 8,92 9,40 8,05 14,01 17,92 20,05 25,91 36,88 42,41 43,32 56,39 61,68 66,79 78,90 80,44 97,20 100,00 100,00 100,00 100,00 100,00

8%

8% 16% UCEff

UC

NUCEff

NUC

PerfEff

Perf

76% 88%

)LJXUH'LVWULEXWLRQVRI05DQGHIIRUWGHSHQGLQJRQWKHW\SHRIPDLQWHQDQFH

 &RUUHODWLQJGDWDWKURXJKORJLVWLF UHJUHVVLRQ Our initial attempt was to find some kind of correlation between the metrics whose collection is proposed in the Initial questionnaire (independent variables) and the number of modification requests of each type, as well as the maintenance effort, in order to estimate quantitatively (for example: 100 hours of corrective and 200 of perfective) the future effort of maintenance. But, as only some of the proposed metrics can be collected, experiments in this sense have not produced meaningful results.

However, it is possible to do other types of correlations, with less ambitious pretensions but of great usefulness: for example, logistic regression can be used to classify data in two categories. Logistic regression correlates a set of independent variables with a binomial dependent variable: in this manner, elements may be classified into one of the two categories as a function of the independent variables. This technique has been used by several authors: Briand et al. correlate several objectoriented metrics with the presence or absence of faults in classes, in order to predict the fault-proneness of classes [3]. Khoshgoftaar et al. also find relationships between several metrics and fault-proneness in non-object

oriented telecommunications systems [10]. In [3], a complete explanation of logistic regression can be found.

 &DWHJRUL]LQJWKHGDWD As we appointed in Section 1, several Service Level Agreements must be proposed by the Maintenance organization to the Customer in order to negotiate them. In particular, the six SLA mentioned in Section 1 are of our interest in this paper. To propose reasonable values for these SLA, the Maintenance organization must deal with the few available metrics it has: in our case, Number of programs and Lines of code. A good manner of proposing such values is to estimate the future effort of urgent and non-urgent corrective interventions, as they are the two types of maintenance with SLA values in the contract. In this case, logistic regression can be applied to categorize every application according to its maintenance necessities. To find a threshold that allows to get good results, we have calculated the mean number of MR arrived per month of urgent and non-urgent corrective in each application, obtaining two columns of 56 rows of data. After this, we have taken as thresholds the median of both columns: mUC= 1,306 MR/month ; mNUC= 4,083 MR/month If an application has produced a monthly number of urgent-corrective MR less than mUC, then the application is of category “A” (which could be mapped to “nonproblematic”); otherwise, it is of category “B” (“problematic”). It happens the same with non-urgent corrective.

 $SSO\LQJORJLVWLFUHJUHVVLRQ With the criteria previously exposed, the values shown in Table 5 are obtained for urgent-corrective maintenance. Observed Predicted category Percent correct A B A 24 3 88.89% B 7 21 75.00% Overall 81.82% 7DEOH3UHGLFWLRQVIRUXUJHQWFRUUHFWLYH In this type of maintenance, 24 of the 27 nonproblematic applications are correctly categorized, whereas 75% of the 28 problematic applications are correctly categorized. The loglikehood of the model is – 75, being R2=0.67, what is a very good result. For non-urgent corrective, the Classification table is the following: Predicted category Percent correct Observed A B A 22 5 81.48% B 7 21 75.00% Overall 78.18% 7DEOH3UHGLFWLRQVIRUQRQXUJHQWFRUUHFWLYH As it is seen, the quality of predictions of this type of maintenance decreases with respect to urgentcorrective. Moreover, the loglikehood is now 152 and R2=0.51, also worse than previously. Both relationships may be seen in Figure 2: a more meaningful tendency is seen in the case of urgent corrective: as more size of the application, a little more effort required.

3,50 3,00 MeanUCEf

2,50

MeanNUCEf

VU 2,00 XR +

Mean UC Eff (tendency)

1,50

Mean NUC Eff (tendency)

1,00 0,50 0,00 0

200000 400000 600000 800000 1000000 1200000 1400000 1600000

6L]H )LJXUH5HODWLRQVKLSEHWZHHQVL]HDQGPHDQHIIRUW depending on the probability of that the element [ is classified in one of the two categories A or B. Such equation has the following form:

 3UHGLFWLRQHTXDWLRQV With logistic regression analysis, we can obtain an equation to do predictions, which returns 0 or 1,

3 ( [) =

1 − ( Z 0 + Z1· [1+...+ ZQ · [Q )

1+ H

, x = ( [1 ,..., [ Q )  (T



obtained through logistic regression, the equations for urgent and non-urgent corrective are the following ones:

(x1, …, xn) are the independent variables that we use to categorize each element. In our case, from the results

38& ( [) =

1 − ( −2.9822 + 3.87 ( − 5· VL]H − 0 , 0104· 1XPEHU2I Pr RJUDPV)

1+ H

318& ( [) =

 (T 

1

1+ H

− ( −2.4591+8.10 ( − 6 · VL]H + 0 , 0047 · 1XPEHU2I Pr RJUDPV)

 Obviously, in this case [ and [ are respectively 6L]H and 1XPEHU2I3URJUDPV. Eq. 2 and Eq. 3 are used to count the number of applications with strong or weak maintenance necessities. These data are meaningful enough for the Maintenance organization, although their analysis can be expanded taking into account that the distribution of maintenance effort and the number of modification requests has a Normal distribution (in applications with maintenance necessities) at a significance level of 99.9%. Combination of both results

(T

could be carefully used by the Maintenance organization to predict the future maintenance effort.

 1HJDWLYHUHVXOWVREWDLQHG We have also tried to find some relationship between the mean size of the modules of each application and the corrective effort with no positive results. This lack of results could be understood as a confirmation of those of [6], [12] and other researchers, which conclude that the size of a module is not a good predictor of its fault-proneness. However, our data avoid us to extract this conclusion with absolute certainty.

&RQFOXVLRQVDQGIXWXUHZRUN This paper has shown some positive results in the area of early prediction of maintenance necessities. The field of application of this work is the help in the valuation of Service Level Agreements for preparing maintenance contracts in outsourcing relationships. The results allow that the Maintenance organization could to value the SLA with a certain empirical basis. It is important to take into account the context studied: banking applications developed in Cobol/CICS. Our results do not contradict those of other researchers, which have not found meaningful influence of size metrics on the number of faults and failures, since they have analyzed metrics at module-level, and we are at application-level. Moreover, projects they analyze are related to telecommunications systems, probably developed in C or C++, where the context is very different and there are other factors not present in Cobol (i.e.: the use of pointers [1]). Our results invite us to continue the research, probably in the line of finding better correlations and prediction equations, directly between code metrics (which is the available information in the initial stages of an outsourced maintenance project) and maintenance effort. $FNQRZOHGJPHQWV This work is part of the MPM project, developed with Atos ODS, S.A. and partially supported by the Ministerio de Ciencia y Tecnología, Programa de Tecnologías de la Información y las Comunicaciones (FIT-070000-2000-307). 5HIHUHQFHV [1] Antoniol, G., Calzolari, F. and Tonella, P. (1999). ,PSDFW RI )XQFWLRQ 3RLQWHUV RQ WKH &DOO *UDSK 3URFHHGLQJV RI WKH UG (XURSHDQ &RQIHUHQFH RQ 6RIWZDUH 0DLQWHQDQFH DQG 5HHQJLQHHULQJ IEEE Computer Society, Los Alamitos, CA, USA, pp. 51-59. [2] Basili, V., Briand, L., Condon, S., Kim, Y., Melo, W. y Valett, J.D. (1996). 8QGHUVWDQGLQJ DQG 3UHGLFWLQJ WKH 3URFHVV RI 6RIWZDUH 0DLQWHQDQFH 5HOHDVHV Proceedings of the International Conference on Software Engineering. IEEE Computer Society, Los Alamitos, CA (USA), pp. 464-474. [3] Briand, L.C., Wüst, J., Daly, J.W. and Porter, V. (2000). Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems. -RXUQDO RI 6\VWHPV DQG 6RIWZDUH    [4] Brower, J.M. (1999). Outsourcing and privatizing information technology. &URVVWDON 7KH -RXUQDO RI 'HIHQVH6RIWZDUH(QJLQHHULQJ, pp. 28-30.

[5] De Looff, L. (1997). Information systems outsourcing decision making: a managerial approach. Hershey, PA: Idea Group Publishing. [6] Fenton, N.E. and Ohlsson, N. (2000). Quantitative analysis of faults and failures in a complex software system. ,((( 7UDQVDFWLRQV RQ 6RIWZDUH (QJLQHHULQJ   [7] Frazer, A. (1992). Reverse Engineering: Hype, Hope or Here? 6RIWZDUH 5HXVH DQG 5HYHUVH (QJLQHHULQJ LQ 3UDFWLFHChapman & Hall. [8] Hoffman, T. (1997). Users say move quickly when outsourcing your personnel. &RPSXWHU :RUOG, March 1997, p.77 [9] Klepper, R. and Jones, W.O. (1998). Outsourcing Information Technology, Systems and Services. New Jersey: Prentice-Hall. [10] Khoshgoftaar, T.M., Allen, E.B., Halstead, R. Trio, G. and Flass, R. (1998). Process Measures for Predicting Software Quality &RPSXWHU31(4), 66-72. [11] McKee, J.R. (1984). 0DLQWHQDQFH DV D )XQFWLRQ RI 'HVLJQ Proc. of the National Computer Conference, pp. 187-193. [12] Munson, J.C., Khoshgoftaar, T.M. (1992). The Detection of Fault-Prone Programs. ,(((7UDQVDFWLRQV RQ6RIWZDUH(QJLQHHULQJ18(5), 423-433. [13] Polo, M., Piattini, M., Ruiz, F and Calero, C.(1999). 0$17(0$ D FRPSOHWH ULJRURXV PHWKRGRORJ\ IRU VXSSRUWLQJ PDLQWHQDQFH EDVHG RQ WKH ,62,(&  6WDQGDUG Proc. of the 3rd European Conference on Software Maintenance and Reengineering, IEEE Computer Society, Los Alamitos, CA, USA, pp. 178181. [14] Polo, M., Piattini, M. and Ruiz, F. (2000). Managing the Maintenance Process. In van Bon (ed.): World Class IT Service Management Guide. The Hague, The Netherlands: ten Hagem & Stam Publishers. [15] Rao, H.R., Nam, K. and Chaudhury, A. (1996). Information Systems Outsourcing. &RPPXQLFDWLRQVRI WKH$&0, 39(7), 27-28. [16] Van Bon, J. (2000). Sourcing. In van Bon (ed.): World Class IT Service Management Guide. The Hague: The Netherlands: ten Hagem & Stam Publishers.

Suggest Documents