A New Approach by Using Tabu Search and Genetic Algorithms in Software Cost Estimation Farhad Soleimanian
Raheleh Rezaii
Bahman Arasteh
Gharehchopogh
Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
[email protected]
Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran
[email protected]
Department of Computer Engineering, Hacettepe University, Ankara, Turkey
[email protected]
Abstract- Software Cost estimation (SCE) is an important and necessary issue before starting any software project. Because before starting any software project, we must know the real needs of software implementation cost, required time, required manpower and etc. In fact, the purpose of SCE is that the estimated
cost
differs
slightly
with
actual
cost.
Accurate
estimation of software projects has been always one of the managers' perturbation, therefore researchers have been trying to optimize SCE by using different methods and algorithms. By studying the previous works in this field, we find that a lot of works has been done by many researchers, and efficient methods have been provided, but none of them can do completely and
100%
accurate
estimation.
Considering
the
benefits
of
metaheuristic algorithms in SCE, we used hybrid of Genetic algorithm
and
metaheuristic
Tabu
search
algorithms
algorithm
could
play
a
that
are
significant
types
of
role
in
engineering optimization issues in this paper. This results show that, more optimum results had been obtained in SCE by using the hybrid of Tabu search algorithm and genetic algorithm.
Index Terms-Software Cost Estimation, Genetic algorithm and Tabu Search, COCOMO
I. INTRODUCTION To succeed in any work we need a precise planning before starting, software projects are not accepted. Software projects will face with certain defeat without precise planning before starting the project or in other words, without cost estimation, effort estimation, proper schedule and etc.... Cost estimation and effort estimation of software projects is a hard and complex work. Therefore, researchers search different methods over years, to find more exact cost estimation for software projects. Every software project depending on its kind, can have more exact estimation with the use of a special model or approach. This means that all the solutions may not suitable for all projects, and don't give optimum results. With the increase of complexity in software projects, cost of development also increase. So precise cost estimation is momentous during the early stages of project implementation. The main problem in this issue is to obtain exact measure in the development of the software projects for first time [I].
Since the accuracy of estimation rely heavily on accuracy of the simulation, selecting a good and accurate model for estimating is one of the most important issue in community of software engineers. Intermediate COCOMO model is the most widely used model because of its simplicity, in different stages of the effort estimation of software projects. In recent years, many researches has been done in the field of SCE and in this regard various methods have been developed including algorithmic, non-algorithmic methods and artificial intelligence methods [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. According to previous studies, artificial intelligence techniques have shown better performance than algorithmic techniques. Among algorithmic methods, COCOMO is a model for cost and schedule estimation in software projects. Basic model of COCOMO, was developed by Barry W. Bohem in 1981 [12]. Intermediate COCOMO is a model for cost and schedule estimating. Intermediate COCOMO model, can be used for decision-making in various fields of software projects [13]. In [1] used Artificial Neural Networks (ANNs) for SCE. In this research, 11 projects of 60 projects in database space agency NASA were compared with COCOMO algorithm model, and showed that in most cases, the error rate of the COCOMO algorithm model is more than ANNs model. The results show that in more than 90% cases, ANNs provide much better estimate than algorithmic COCOMO model. It can be concluded that the methods which are based on artificial intelligence are good complementary substitute for algorithmic methods. SCE with the hybrid of Scatter Search and Genetic Algorithms has been surveyed and evaluated [16]. In the hybrid model, effective factors in estimating was tested by using Genetic algorithm, and have been trained by Scatter Search , gained better results in comparison with COCOMO model. The results of experiments show that is more effective and has lower mean absolute relative error (MARE) in comparison with COCOMO model. In the presented method, genetic algorithm is used for the value of optimum parameters of COCOMO model. One of the problems of algorithmic COCOMO model is to determine the optimum value for the parameters.
Genetic Algorithm and Tabu Search are metaheuristic algorithms that used nature inspire for SCE projects. Genetic algorithm is one of the well-known stochastic optimization methods that does the work with evolution and deals with exploring new issues. This algorithm can be applied to issues that have a large search space. Genetic algorithm searches absolute optimum response by using repetition and creating different generations in parallel issues space. Tabu search algorithm tries to optimize the responses of the issues by selecting the best solutions. In this paper, we try to optimize the cost estimation of software projects by hybrid of genetic algorithm and Tabu search. This paper is organized as follows: in Section 2, we describe software cost estimation and Section 3, the proposed method is explained; in Section 4, we evaluate the results of this paper; finally, in Section 5, we present the conclusions and future works. II. SOFTWARE COST ESTIMATION Accurate and reliable cost estimation is an important factor in the success of software projects. Without the proper cost estimation of the project, the project manager can't discern how much time and what volume of manpower and other resources is required for the project and in case of error, the project is likely to face risks it does not assess the likelihood of success. If software cost and effort estimation of projects become more than the actual estimation the resources will be wasted and if these estimations become less than actual one the project won't be completed and it will fail. Because of the mentioned reasons, accurate SCE has always attracted the attention of researchers. They are always trying to invent a method which can provide more precise estimation. So they have been used many methods to reach this purpose. Nearly a decade had been passed since the advent of artificial intelligence methods in software engineering, the cause of emergence of these methods, are some problems that occurred in the previous methods SCE. According to researches artificial intelligence methods have higher efficiency than algorithmic methods. COCOMO, is an abbreviation of Constructive Cost Model. [t was presented in 1981 in "Software Engineering Economics" by Barry W. Boehm. This model is used to estimate the cost, effort and schedule on the basis of the numberless of the source lines of code (SLOC) [12, 13]. In SLOC cost estimation model, projects started by specitying the size of the projects. In this paper, first NASA database projects had been divided into three categories based on the number of lines of source code sets just like COCOMO model which was showed in Table 1 and each project depending on the kind have their own value parameters. Effective value coefficient effort in software cost converted to number. Finally, all the values was seated in "Equation [" and cost estimate can be achieved by intermediate COCOMO [[4, [5]. IS
PM
= ,a *
(Si.zs )ll
*
n EM[ + .c [= 1
([)
TABLE I.
CONSTANT PARAMETERS OF COCOMO MODEL
Kind of project
a
b
C
Organization
2.4
1.4
2.5
Semi-detached
3.0
1.12
2.5
Predicted
3.6
1.20
2.5
III. PROPOSED METHOD The proposed method is used a hybrid of Genetic Algorithm and Tabu Search to obtain the values of the fixed parameters in the "Equation 1". In this method, the primal population initiate by random selection of the NASA projects. Then, 80% projects of NASA was selected as training projects and 20% of them was selected as test projects. Both training and testing projects according to their size that existing at NASA's entry divided into three categories: organizational projects, semi-detached projects and predicted projects. [n next step all the didactic data were training during the operation, the values for constant parameters of COCOMO model (a, b and c) have been ensued for each three categories of training projects. Finally, using the obtained values of the constant parameters of COCOMO model and applying these values on projects which was selected for testing, the test function was done. The hybrid algorithm includes the following: Inputs: data sets of NASA (including effective factors in cost estimation of per project) Outputs: values for constant parameters of the COCOMO model and classified data Step [: Reading the existing data of data set Step 2: separating of training data and testing data Step 3: Classifying training data and testing data according to the type of project (organization, semi-detached and predicted) Step 4: Calling the hybrid algorithm Step 5: [nitializing for the primal population and initializing for constant parameters of COCOMO model Step 6: Scrutinizing the operation of function: Fitness suits of each chromosome was updated according to parameters Evaluating the operation of function: Fitness of function, here Fitness is MARE, which is aimed at minimizing the MARE by choosing appropriate values of the specified range. How to calculate MARE is shown in "Equation 1". Step 7: Ascending ordering of population on the basis of the fitness function Step 8: Selecting the new population based on Tabu Search for collation and mutation Step 9: Collating the selected population Step 10: Performing mutation on selected population Step 1[: Ascending ordering population on the basis of the fitness function Step 12: Repeating the steps eighth to eleventh Step 13: Obtaining the values for constant parameters of COCOMO model based on the most optimum fitness function Step 14: Finishing the work of the hybrid algorithm Step [5: Applying the obtained values on test data set
Step 16: Finishing the algorithm work. 1.2
IV. RESULT AND DISCUSSION By using the proposed method, constant parameters of COCOMO model means a, b, and c, more optimum are obtained, that by replacing obtained values and effort coefficients values and size of test projects in the database of NASA in "Eq. 1", the cost estimation is obtained. By replacing estimated cost and actual cost in "Eq. 2", MARE is obtained for each project. Finally by using "Eq. 3", mean of obtained MAREs are calculated which is called MMARE. Comparing obtained MMARE from different methods, can evaluate and compare those methods with each other. M A R Ei
IActuaJi =
-
--
Cocomo
--
Perposed method
0.4 0.2
Estima� I
10
--=------=-
--
A, ctua li
(2) In Eq 2, Actual I variable is representation of the actual costs for ith project and Estimated I variable is representation of estimated cost for ith by proposed method. (3)
20 25 30 number of projecl
35
40
45
Figure. 1. Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA(60)
12
In "Eq. 3", total calculated MAREs for all eXlstmg projects divided by the number of existing projects (N), results volume of MMARE. Estimated cost by different methods including intermediate COCOMO model, GA, TS, proposed method and also actual cost of each project for tested projects over NASA dataset with 60 project, have shown in Figure I. According to figure I, we can say that when the proposed method is used in SCE, gained more favorable results than Genetic Algorithm, Tabu Search and intermediate COCOMO model. The results of Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA(60) is shown in "Figure I", and the results of Comparing Proposed Method, Genetic Algorithm, Tabu Search Algorithm and COCOMO Model based on MARE on the Test Data over NASA(60) is shown in "Figure 2". The results showed improved accuracy of the cost estimation by Genetic Algorithm and Tabu Search than intermediate COCOMO model. In displayed Figure 2, horizontal axis represents the number of project (training and testing) and the vertical axis represents the value of MARE.
15
--Cocomo --
GA
--Tabu -- Perposed method
UJ
�
0.8
06
0.4
0.2
number of project
Figure. 2. Comparing Proposed Method, Genetic Algorithm, Tabu Search Algorithm and COCOMO Model based on MARE on the Test Data over NASA(60)
In "Figure 3", Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA (63) are shown, and in "Figure 4", same comparison for test projects are shown. As indicated in Figures, in most cases, proposed model is more optimum than the intermediate COCOMO model in their operation.
-Cocomo -GA -Tabu
--Cocomo
3.5
-- Perposed method
3 08 2.5 O} 0.6 1.5
UJ
� 05 0.4 03 5
10
15
20
25
30
35
number of project
40
45
50
Figure. 3. Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA(63)
0.2 0.1
number of project
Figure. 6. Comparing Proposed Method, Genetic Algorithm, Tabu Search Algorithm and COCOMO Model based on MARE on the Test Data over NASA (93)
--Cocomo
1.6
--GA --Tabu
1.4
-- Perposed method
1.2
�
0.8 0.6 0.4 0.2 0
0
4
2
6
8
number of projecl
10
12
Figure. 4. Comparing Proposed Method, Genetic Algorithm, Tabu Search Algorithm and COCOMO Model based on MARE on the Test Data over NASA(63)
--Cocomo -- Perposed method
25
Results of Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA (93) is shown in "Figure 5", and in "Figure 6", same comparison for test projects are shown. Based on the obtained results, we can conclude that when we combine two proposed algorithms, the results improve. According to "Figure 3", "Figure 4" and "Figure 6" we can say that proposed method operate more efficiently than other algorithms for SCE. Finally, to clarify the results of the experiments, value of MMARE for all compared methods is shown in "Table 2" TABLE 2. AVERAGE MEAN OF MARE ON THE TEST PROJECTS
Model Name
MMARE NASA(60)
MMARE NASA(63)
MMARE NASA(93)
COCOMO
0.2371
03158
0.2973
Genetic
0.1716
0.5989
0.8208
Tabu Search
0.1918
03085
0.5961
Proposed Method
0.1309
0.2808
0.2504
20
�
Based on these results, we can say that the hybrid of genetic algorithm and Tabu Search algorithm would be more optimum than intermediate COCOMO model in cost estimation of projects.
15
10
5
0
V. CONCLUSIONS AND FUTURE WORKS 0
10
20
30
40
50
number of projecl
60
70
Figure. 5. Comparing Proposed Model with Intermediate COCOMO Model According to MARE on the Training Data over NASA(93)
Software cost estimating is an important factor in software production and success or failure of that project is related to accurate cost estimation of the project. In this paper hybrid of Genetic algorithm with Tabu search algorithm is used for SCE and data sets of NASA are used as input data set. As regarding that Tabu search algorithm is preventing repeated choices, as a result, hybrid of this algorithm and genetic algorithm prevents premature compilation and also prevents sitting in a local
optimum. Therefore allows algorithm to move toward public optimization and provides the most optimum solution. According to the results based on MARE we can say that estimated cost by hybrid algorithm has more accuracy than the COCOMO model. The value of MARE that produced by COCOMO model for training data is equal to 0.2371% and the same value for proposed method is equal to 0.1309%. This numbers represent average reduction of the MARE up to 0.1062%. These results indicate accuracy increasing in cost estimation by proposed method. In the future for other studies, we can apply reducing the execution schedule of the proposed algorithm in SCE and can use other exploratory methods in SCE. REFERENCES [I]
F.S. Gharehchopogh,"Neural Networks Application in Software Cost Estimation: A Case Study", 2011 International Symposium on Innovations in Intelligent Systems and Applications (TNIST A 2011),Istanbul, Turkey,pp. 69-73,20 II.
[2]
1.
Maleki, L.
Ebrahimi, F.S.
Gharehchopogh, "A
Hybrid
Approach of Firefly and Genetic Algorithms in Software Cost Estimation", Magnt Research Report,2(6),pp: 372-388,2014. [3]
F.S. Gharehchopogh, A. Pourali, "A New Approach Based on Continuous Genetic Algorithm in Software Cost Estimation", JOURNAL OF Scientific Research and Development, 2(4), pp. 87-94,2015.
[4]
Z. A. Dizaji, R. Ahmadi, H, Gholizadeh, F.S. Gharehchopogh, "A Bee Colony Optimization Algorithm Approach for Software Cost
Estimation",
International
Journal
of
Computer
Applications (lJCA),104 (12),pp: 41-44, November 2014. [5]
Z.A.
Khalifelu,
F.S.
Gharehchopogh,
"Comparison
and
Evaluation Data Mining Techniques with Algorithmic Models in Software Cost Estimation", Elsevier Press,Procedia-Technology Journal,ISSN: 2212-0173,Vol: I,pp. 65-71,2012. [6]
F.S.
Gharehchopogh,
T.
Maleki,
N.
Ghoyunchizad,
E.
Mostafaee, "A Novel Hybrid Artificial Immune System with Genetic
Algorithm
for
Software
Cost
Estimation", Magnt
Research Report, 2(6),pp: 506-517, Nov 2014. [7]
Z.A. Khalifelu, F.S. Gharehchopogh, "A New Approach in Software Cost Estimation Using Regression Based Classifier ", AWERProcedia Information Technology & Computer Science, Vol:2,pp. 252-256,December 2012.
[8]
F.S. Gharehchopogh, Z. A. Dizaji, "A New Approach in Software Cost Estimation with Hybrid of Bee Colony and Chaos Optimizations Algorithms", Magnt Research Report, 2(6), pp: 1263-1271, Nov 2014.
[9]
F.S. Gharehchopogh, "Approach and Review of User Oriented Interactive Data Mining", 4th International Conference on Application of Information and Communication Technologies (AICT2010),
IEEE,
Tashkent,
Uzbekistan,
pp:I-4,
12-14
October 2010. [10] F.S. Gharehchopogh, 1. Maleki, S. Sadouni, " Artificial Neural Networks Based Analysis of Software Cost Estimation Model", Magnt Research Report,2(6),pp: 597-505, Nov 2014. [II] Z.A. Khalifelu, F.S. Gharehchopogh, "A Survey of Data Mining Techniques
in
Software
Cost
Estimation", AWERProcedia
Information Technology & Computer Science Journal, Vol: I, pp. 331-342,2012. [12] B. W. Boehm, "Software Engineering Economics", Prentice Hall,Pages. 767,1981.
[13] B.W. Boehm, "Software Cost Estimation with COCOMO IT", Prentice Hall PTR, Englewood Cliffs, New Jersy, pages. 544, 2000. [14] A. F. Sheta, "Estimation of the COCOMO Model Parameters Using
Genetic
Algorithms
for
NASA
Software
Projects",
Journal of Computer Science,2(2),pp.118-123,2006. [15] Y. Kultur, B. Turhan, A. Bener, "Ensemble of neural networks with
associative
memory (ENNA)
for estimating
software
development costs", Knowledge-Based Systems, Vol. 22, pp. 395-402, 2009. [16] 1. Maleki, F.S. Gharehchopogh,L. Ebrahimi, Z. Ayat, "A Novel Hybrid Model of Scatter Search and Genetic Algorithms for Software Cost Estimation", Magnt Research Report, 2(6), pp: 359-371,2014.