International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
A Fuzzy Logic Based Software Cost Estimation Model Ziauddin1, Shahid Kamal2, Shafiullah khan3 and Jamal Abdul Nasir4 1
1
COMSATS Institute of Information Technology, Vehari, Pakistan 2 FSKSM , Universiti Teknologi Malaysia 3 Kohat University of Science & Technology, Kohat, Pakistan 4 ICIT, Gomal University, D.I.Khan, Pakistan
[email protected],
[email protected],
[email protected], 4
[email protected] Abstract
Software cost estimation is a challenging and onerous task. Estimation by analogy is one of the expedient techniques in software effort estimation field. However, the methodology utilized for the estimation of software effort by analogy is not able to handle the categorical data in an explicit and precise manner. Early software estimation models are based on regression analysis or mathematical derivations. Today’s models are based on simulation, neural network, genetic algorithm, soft computing, fuzzy logic modelling etc. This paper aims to utilize a fuzzy logic model to improve the accuracy of software effort estimation. In this approach fuzzy logic is used to fuzzify input parameters of COCOMO II model and the result is defuzzified to get the resultant Effort. Triangular fuzzy numbers are used to represent the linguistic terms in COCOMO II model. The results of this model are compared with COCOMO II and Alaa Sheta Model. The proposed model yields better results in terms of MMRE, PRED(n) and VAF. Keywords: Fuzzy logic, Triangular Fuzzy Number, Membership function and Fuzziness, Software Effort Estimation
1. Introduction Estimating the work-effort and the schedule required to develop and/or maintain a software system is one of the most critical activities in managing software projects. The task is known as Software Cost Estimation [27]. During the development process cost and time estimates are useful for the initial rough validation and monitoring of the project’s progress; after completion, these estimates may be useful for project productivity assessment for example. There are different techniques used in software cost estimation: Algorithm model, also called parametric model, is designed to provide some mathematical equations to provide software estimation. LOC-based models are algorithm models such as [2, 6, 7, 8]. Ali Idri and Laila Kjjri [9] proposed the use of fuzzy sets in the COCOMO-81 models [8]. Musilek P. and others [18] proposed f-COCOMO model, using fuzzy sets. The methodology of fuzzy sets giving rise to f-COCOMO [18] is sufficiently general to be applied to other models of software cost estimation such as function point method [14]. W. Pedrycz and others [19] found that the concept of information granularity and fuzzy sets, in particular, plays an important role in making software cost estimation models more user friendly. Harish Mittal and Pradeep Bhatia [17] used triangular fuzzy numbers for fuzzy logic sizing. Lima, O.S.J. and Others [13] proposed the use of concepts and properties from fuzzy set theory to
7
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
extend function point analysis to Fuzzy function point analysis, using trapezoid shaped fuzzy numbers for the linguistic variables of function point analysis complexity matrixes. In this paper we propose triangular fuzzy numbers to represent the linguistic variables. The results can be optimised for the given application by varying fuzziness of the triangular fuzzy numbers. To apply fuzzy logic first fuzzification is done using triangular fuzzy number, Fuzzy output is then evaluated and estimation is done by defuzzification technique given in this paper. 1.1. Fuzzy Logic In 1965, Lofti Zadeh formally developed multi-value set theory, and introduced the term fuzzy logic [5]. Fuzzy Logic (FL) starts with the concept of fuzzy set theory. It is a theory of classes with un-sharp boundaries, and considered as an extension of the classical set theory. The membership μ of an element x of a classical set A, as subset of the universe X, is defined as follows: μ A (x) = 1 if x € A μ A(x) = 0 if x € A A system based on FL has a direct relationship with fuzzy concepts (such as fuzzy sets, linguistic variables, etc.) and fuzzy logic. The popular fuzzy logic systems can be categorised into three types: Pure fuzzy logic systems, Takagi and Sugeno’s fuzzy system, and fuzzy logic system with fuzzifier and defuzzifier . Since most of the engineering applications produce crisp data as input and expects crisp data as output, the last type is the most widely used type of fuzzy logic systems. Fuzzy logic system with fuzzifier and defuzzifier, first, proposed by Mamdani and it has been successfully applied to a variety of industrial processes and consumer products [16]. The main three steps of applying fuzzy logic to a model are: Step 1: Fuzzification: It converts a crisp input to a fuzzy set Step 2: Fuzzy Rule-Based System: Fuzzy logic systems use fuzzy IF-THEN rules. Fuzzy Inference Engine: Once all crisp input values are fuzzified into their respective linguistic values, the inference engine accesses the fuzzy rule base to derive linguistic values for the intermediate and the output linguistic variables. Step 3: Defuzzification: It converts fuzzy output into crisp output. Use of fuzzy sets in logical expression is known as fuzzy logic. A fuzzy set is characterized by a membership function, which associates with each point in the fuzzy set a real number in the interval (0,1), called degree or grade of membership [3, 4, 20, 21]. The membership function may be triangular, trapezoidal, parabolic etc. Fuzzy numbers are special convex and normal fuzzy sets, usually with single modal value, representing uncertain quantitative information. A fuzzy number is a quantity whose value is imprecise, rather than exact as in the case of ordinary single valued numbers [4, 9, 20, 21, 23]. Any fuzzy number can be thought of as a function, called membership function, whose domain is specified, usually the set of real numbers, and whose range is the span of positive numbers in the closed interval (0, 1). Each numerical value of the domain is assigned a specific value and 0 represents the smallest possible value of the membership function, while the largest possible value is 1. In many respects fuzzy numbers depict the physical world more realistically than single valued numbers. The curve in Figure 1a is a triangular fuzzy number, the curve in Figure 1b is a trapezoidal fuzzy number, and the curve in Figure1c is bell shaped fuzzy number.
8
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
Figure 1. Fuzzy Membership Functions A triangular fuzzy number (TFN) is described by a triplet ( ,m, ), where m is the modal value, and are the right and left boundary respectively. Fuzziness of a TFN (, m,) is defined as: Fuzziness of TFN (F) =
, 0 < F< 1
The higher the value of fuzziness, the more fuzzy is TFN. 1.2. COCOMO II The COCOMO I [7] model is a regression-based software cost estimation model, which was developed by Boehm in 1981 and thought to be the most cited and the most plausible model among all traditional cost estimation models. The COCOMO I was a stable model on that time. One of the problems with the use of COCOMO I today is that it does not match the development environment of the late 1990’s. Therefore, in 1997, Boehm developed the COCOMO II to solve most of the COCOMO I problems. Figure 2 shows the process of software schedule, cost, and manpower estimation in the COCOMO II. The COCOMO II includes several software attributes such as: 17 Effort Multipliers (EMs), 5 Scale Factors (SFs), Software Size (SS), and Effort estimation that are used in the Post Architecture Model of the COCOMO II. Effort
EMs (17) COCOMO II Effort Estimation Model
SF (5)
Schedule
Personne l
SS (1)
Figure 2. COCOMO II Estimation Process The formula for the process is given by:
17
ffort (P )=A Size
B
0.01 ∑ S
j
j=1
(
∏
i
i=1
)
∑
Personnel = Effort/Schedule Where A=2.94; B=0.91; C=3.67; D=0.28 More details about the Model can be found in [7].
9
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
2. Proposed Model Our model is established based on the COCOMO II and Fuzzy Logic. The COCOMO II includes a set of input software attributes: 17 Effort Multipliers (EMs), 5 Scale Factors (SFs), one Size in KDLOC (SZ) and one output, Effort. The architecture of the Model is shown in Figure 3.
Fuzzification
Rule Base
EM-Fuzzification
EMs (17)
Inference Engine
Defuzzification
SF- Fuzzification
SF (5)
SZ- Fuzzification
SZ (1)
Database
EFFORT
Figure 3. Fuzzy Logic Model for Software Effort Estimation 2.1. Inputs There are three set of inputs Size in KLOC 17 Effort Multipliers 5 Scale Factors All these inputs are provided as crisp data. These inputs are directed to fuzzification module. 2.2. Fuzzification This module is sub divided into three sub modules. SZ-Fuzzifications module converts the Size input to the fuzzy variable, SF-Fuzzification module converts Scale Factors input to fuzzy variables whereas EM- Fuzzification module converts the Effort Multiplier input to Fuzzy variable. The Terms Very Low, Low, Nominal, High, Very High and extra High have been defined for each variable. We defined a fuzzy set for each linguistic value for Triangular Membership Function. A Triangular Membership Function (TMF) is a three-point function [8], defined by minimum (α), aximum (β) and modal (m) values, that is, T (α, m, β), where (α ≤ m ≤β). We take each linguistic variables as a triangular Fuzzy numbers, TFN (, m, ), m, m. The membership function ((x)) for which is defined as: (x) =
0, x x- / m - , x m -x / -m , m x 0, x
Where m is the mean of input sizes.
10
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
The Fuzzy Set defined for Size is shown in Figure 4a. The Size range has been divided into small intervals to get a better fuzzy number. Figure 4b represents Fuzzy Set for Scale Factor RESL whereas Figure 4c represents Fuzzy Set for Effort Multiplier PERS. All the scale factors and effort multipliers have scaling range from Very Low to Extra High.
Figure 4a. Fuzzy Set for SIZE
Figure 4b. Fuzzy Set for Risk Resolution (RESL)
Figure 4c. Fuzzy Set For Personnel Capability (PERS) The Fuzzication Process converts the crisp data to linguistic variables which are passed to Inference Engine.
11
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
2.3. Inference Engine Inference engine operates on a set of fuzzy rules. The Fuzzy Model rules contain the linguistic variables related to the project. We have implemented this model using FIS tool in MATLAB. A sample of the rules is presented below: if (PREC is VL) then (EFFORT is XH) if(PREC is LO) then (EFFORT is VH) if(FLEX is VL) then (EFFORT is XH) The number of rules that have been used in the proposed model are 281. 2.4. Defuzzification This module calculates and converts the fuzzy output into crisp data form. The defuzzification has been performed using Centeroide Method. It has been used to calculate the Centre of Gravity (COG) area under the curve as COG =
∑
( )
∑
( )
3. Experimental Study In order to carry out our experiments to evaluate the efficiency of our proposed model, we have chosen 30 projects data, developed in local software house, ranging from Very Small to Large. We have chosen two estimation methods, COCOMO II and Alaa Sheta [1a] estimation method. We have used following MMRE, PRED (L) and VAF as criteria for assessment of these models. The Mean Magnitude of Relative Error, MMRE, is probably the most widely used evaluation criterion for assessing the performance of competing models. One purpose of MMRE is to assist us to select the best model as it gives us accuracy prediction of the competing models. MMRE is calculated as average of the Magnitude of Relative Errors MREs of each estimate. The MRE and MMRE are calculated as: MRE = ∑ Variance Account For (VAF) is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model. It provides a measure of how well future outcomes are likely to be predicted by the model. It is calculated as: VAF(%) = ( Where
12
(
) ( )
)
is the actual ffort and ’ is the Calculated ffort.
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
Prediction PRED (L) is the probability of the model having relative Error less than or equal to L. It is calculated: PRED (L) = K / N Where K is the number of Observations where MRE is less than or equal to L and N are total number of observations. Table 1 shows estimated efforts of all three models. It also shows MRE of all three models for every estimate. Table 2 shows comparison of these models. The estimation results are graphically shown in Figure 5, 6 and 7. Comparison of the model results in Table 2 shows that the proposed model has better estimation accuracy as compared to other models having 7.512 MMRE with 98.77% likeliness to have the same ratio of MMRE if the model is tested under different environment with different information. In order to further verify the prediction we have analysed the results for 25%, 15%, 10% and 8% prediction. The results show that the proposed model is 63.33% confident to have its average MRE, which shows the stability of its estimates. The stability of estimates of the proposed model can also be verified from Figure 5, 6, 7. The Figure 7 shows that there are minute variations in actual and estimated calculations. Table 1. Estimated Efforts and MREs of all Three Models Alaa Sheta
Proposed
MRE COCOMO
MRE Alaa Sheta
MRE Proposed
P.No
Size
Actual
COCOMO II
1
18
301
214
243
293
28.90365
19.2691
2.657807
2
50
1063
841
931
984
20.88429
12.41769
7.431797
3
40
605
601
453
537
0.661157
25.12397
11.23967
4
22
243
282
181
201
16.04938
25.5144
17.28395
5
13
141
128
95
127
9.219858
32.62411
9.929078
6
12
112
131
91
121
16.96429
18.75
8.035714
7
3
16
15
13
16
6.25
18.75
0
8
11
91
78
65
82
14.28571
28.57143
9.89011
9
42
781
751
634
722
3.841229
18.82202
7.554417
10
16
233
241
201
249
3.433476
13.73391
6.866953
11
21
334
304
289
318
8.982036
13.47305
4.790419
12
52
982
1043
756
890
6.211813
23.01426
9.368635
13
23
305
361
300
281
18.36066
1.639344
7.868852
14
27
421
459
398
401
9.026128
5.463183
4.750594
15
9
45
48
36
41
6.666667
20
8.888889
16
15
189
197
178
192
4.232804
5.820106
1.587302
17
4
17
19
14
17
11.76471
17.64706
0
18
6
22
23
19
22
4.545455
13.63636
0
19
2
8
8
7
8
0
12.5
0
13
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
20
7
34
39
31
29
14.70588
8.823529
14.70588
21
13
84
97
72
79
15.47619
14.28571
5.952381
22
18
274
329
241
268
20.07299
12.0438
2.189781
23
24
403
398
365
379
1.240695
9.42928
5.955335
24
26
371
352
322
362
5.121294
13.20755
2.425876
25
61
1241
1459
871
1102
17.56648
29.81467
11.20064
26
29
677
605
561
638
10.63516
17.13442
5.760709
27
19
291
271
202
263
6.872852
30.58419
9.621993
28
10
110
128
94
119
16.36364
14.54545
8.181818
29
21
239
201
189
213
15.89958
20.9205
10.87866
30
12
145
168
156
189
15.86207
7.586207
30.34483
Table 2. Comparison of the Models
MMRE VAF PRED(25) PRED(15) PRED(10) PRED(8)
COCOMO II 11.003% 95.86% 93.33% 63.33% 50% 40%
Alaa Sheta
Proposed
16.838% 93.90% 80% 53.33% 20% 10%
7.512% 98.77% 96.33% 93.33% 80% 63.33%
Figure 5. COCOMO II Vs Actual Calculation of Effort
14
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
Figure 6. Alaa Sheeta Vs Actual Calculation of Effort
Figure 7. Proposed Model Vs actual Calculation of Effort
4. Conclusion In this paper, we have presented a Software Effort Estimation Model using Fuzzy Logic Model. Triangular membership function has been used to generate linguistic fussy values. One can easily develop the same model using trapezoidal or Gauss-Bell membership functions. Similarly ther is still margin of improvement in fuzzification process. If fuzziffication process is further optimized, it will certainly result in substantial reduction in number of fuzzy rules implemented in inference engine.
15
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
[20]
[21] [22] [23] [24] [25] [26] [27]
16
A. . Sheta, “Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects”, Journal of Computer Science, vol. 2, no. 2, (2006), pp. 118-123. A. J. Albrecht, “ easuring application development productivity”, SHAR /GUID IB Application development Symposium. A. S. Andreou and E. Papatheocharous, “Software Cost Estimation using uzzy Decision Trees”, in 23rd IEEE/ACM International Conference on Automated Software Engineering, (2008), pp. 371-374. I. Attarzadeh and S. Hockow, “Improving the Accuracy of Software Cost Estimation Model Based on a New Fuzzy Logic Model”, World Applied Sciences Journal, vol. 8, no. 2, (2010), pp. 177-184. J. W. Bailey and Basili, “A eta model for software development resource expenditure”, Proc. Intl. Conf. Software Engineering, (1981), pp. 107-115. V. R. Basili and K. Freburger, “Programming easurement and stimation in the Software ngineering Laboratory”, Journal of System and Software, vol. 2, (1981), pp. 47-57. B. Boehm, “Software Cost stimation with COCO O II”, Prentice Hall, (2000). B. Boehm, “Software ngineering Economics”, nglewood Cliffs, NJ Prentice-Hall, (1981). A. Idri, A. Abran and L. Kjiri, “COCO O cost model using uzzy Logic”, 7th International Conference on Fuzzy Theory & Technology Atlantic, New Jersy, (2000). A. Idri, A. Abran and T. M. Khoshgoftaar, “ uzzy Analogy: A new Approach for Software Cost stimation”, 11 Intemational Workshop in Software Measurements, Montreal, (2001) August 28-29, pp. 93-101. C. Jones, “Programming Languages Table”, Release 8.2, (1996) March. I. R. Khan, A. Alam and H. Anwar, “Efficient Software Cost Estimation using Neuro-Fuzzy Technique”, National Conference on Recent Developments in Computing and its Applications, (2009), pp. 376-381. O. S. J. Lima, P. P. Farias and A. D. Belchor, “A uzzy odel for Function Point Analysis to Development and Enhancement Project Assessments”, CL I J, vol. , no. 2, (2002). J. E. Matson, B. E. Barrett and J. M. ellichamp, “Software Development Cost stimation Using unction Points”, I Trans. on Software Engineering, vol. 20, no. 4, (1994), pp. 275-287. E. Mendes and N. osley, “Web Cost stimation: An Introduction, Web engineering: principles and techniques”, Ch 8, (2005). A. Mittal, K. Parkash and H. ittal, “Software cost estimation using fuzzy logic,” SIGSO T Software Engineering Notes, vol. 35, no. 1, (2010), pp. 1-7. H. Mittal and P. Bhatia, “Optimization Criterion for ffort stimation using uzzy Technique”, CL I Electronic Journal, vol. 10, no. 1, (2007), pp. 2. P. Musílek, W. Pedrycz, G. Succi and M. Reformat, “Software Cost stimation with uzzy odels”, ACM SIGAPP Applied Computing Review, vol. 8, no. 2, (2000), pp. 24-29. W. Pedrycz, F. F. Peters and S. Ramanna, “A uzzy Set Approach to Cost stimation of Software Projects”, Proceedings of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering Shaw Conference Center, Edmonton Alberta, Canada, (1999). P. V. G. D. Prasad Reddy, K. R. Sudha, P. Rama Sree and S. N. S. V. S. C. Ramesh, “Fuzzy Based Approach for Predicting Software Development ffort”, International Journal of Software Engineering (IJSE), vol. 1, no. 1, (2010), pp. 1-11. P. Reddy PVGD, “Particle Swarm Optimization In The Fine Tuning Of Fuzzy Software Cost Estimation Models”, International Journal of Software Engineering (IJSE), vol. 1, no. 2, (2010), pp. 12-23. R. S. Pressman, “Software ngineering; A Practitioner Approach”, c Graw-Hill International Edition, Sixth Edition, (2005). C. S. Reddy and K. Raju, “An Improved Fuzzy Approach for COCOMO's Effort Estimation using Gaussian embership unction”, Journal of Software, vol. 4, no. 5, (2009), pp. 452–459. C. Yau and R. H. L. Tsoi, “Assessing the uzziness of General System Characteristics in stimating Software Size”, IEEE, (1994), pp. 189-193. L. A. Zadeh, “ rom Computing with numbers to computing with words-from manipulation of measurements to manipulation of perceptions”, Int. J. Appl. Math. Computer Sci., vol. 12, no. 3, (2002), pp. 307-324. L. A. Zadeh, “ uzzy Sets, Information and Control”, vol. 8, (1965), pp. 338-353. Z. Zia, A. Rashid and K. U. Zaman, “Software cost estimation for component based fourth-generationlanguage software applications”, I T Software, vol. 5, no. 1, (2011), pp. 103-110.
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
Authors Ziauddin Dr. Ziauddin is working as Assistant Professor in COMSATS Institute of Information Technology, Vehari. He has worked in Gomal University for 22 years. He has published more than 15 research papers in reputed International journals. His area of expertise is Software Engineering, Software Process Improvement, Software Reliability Engineering, Software Development Improvement, Software Quality Assurance and Requirement Engineering. He is Gold Medalist from Peshawar University in M.Sc Computer Science. He has Diploma in Computer Forensics from School of Education, Indiana University USA. Shahid Kamal Shahid Kamal Tipu is working as Assistant Professor in Institute of Computing & Information Technology, Gomal University, D.I.Khan since 2000. Currently he is doing his Ph.D. in Malaysia. His area of expertise is Information System Security and Software Quality. He has published his research in reputed International Journals. He is a renowned software engineer in local industry.
Shafiullah Khan Dr. Shafiullah Khan is Assistant professor in Institute of Information Technology, Kohat University of Science and Technology, Pakistan. He has done his PhD from the School of Engineering and Design, Brunel University, UK. His research mainly focuses on wireless broadband network architecture, security and privacy, security threats and mitigating techniques.
Jamal Abdul Nasir Jamal Abdul Nasir is working as Assistant Professor in Institute of Computing & Information Technology, Gomal University, D.I.Khan since 1988. Currently he is doing his Ph.D. in CIIT, Islamabad. His area of expertise is Data Mining and Artificial Intelligence. He has published his research in reputed International Journals. He is a one of the pioneers of Information technology in the country.
17
International Journal of Software Engineering and Its Applications Vol. 7, No. 2, March, 2013
18