Software Development Effort Estimation â Neural ... - Semantic Scholar

Roheet Bhatnagar et. al. / International Journal of Engineering Science and Technology Vol. 2(7), 2010, 2950-2956

Software Development Effort Estimation – Neural Network Vs. Regression Modeling Approach Roheet Bhatnagar* Associate Professor, Department of Computer Engineering, Sikkim Manipal Institute of Technology, Majitar, Rangpo, East Sikkim, 737 136 INDIA. [email protected]

Vandana Bhattacharjee Associate Professor, Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, 835 215 INDIA. [email protected]

Mrinal Kanti Ghose Professor & Head, Department of Computer Engineering, Sikkim Manipal Institute of Technology, Majitar, Rangpo, East Sikkim, 737 136 INDIA. [email protected] Abstract : The global software development industry has now become more matured and complex. The industry is making use of newer tools and approaches of software development. The challenge then lies in accurately modeling and predicting the software development effort, and then create project development schedule. This work employs a neural network (NN) approach and a multiple regression modeling approach to model and predict the software development effort based on an available real life dataset which is prepared by Lopez-Martin et al. [1, 2]. A comparison between results obtained by both the approaches is presented. It is concluded that NN is able to successfully model the complex, non-linear relationship between a large number of effort drivers and the software maintenance effort, with results closely matching the effort estimated by experts. Keywords: Software Development, Software Development Effort, Project Development Schedule, Neural Network, Regression Modeling. 1. Introduction Developing a software project with acceptable quality within budget and on planned schedule is the main goal of every software development firm. Schedule estimation has historically been and continues to be a major difficulty in managing software development projects [3]. Failure of the project mostly is attributed to failure to fulfill customers’ quality expectations or the budget and schedule over-run. It is essential for a project manager to know the effort, schedule and functionality of a project in advance. Per-haps there is no point in starting a project when there is not enough time to finish it or enough money to fund it or if the quality is so inadequate that the end product will be useless and unmarketable. However, the project factors change in the duration of the project, and they may change a lot. The worse thing is that one can seldom predict how they will change, yet we need to know all these before we start. There is no way to calculate in advance and expect the initial values to be correct. This does not render the estimates vain. On the contrary, it calls for better quality estimation techniques, which will yield more accurate early results and guide us to more targeted and effective contingency plans. Software estimation is the act of predicting the duration and cost of a project. It is a complex process with errors built into its very fabric, however it is very rewarding when done the right way. The estimation

ISSN: 0975-5462

2950

Roheet Bhatnagar et. al. / International Journal of Engineering Science and Technology Vol. 2(7), 2010, 2950-2956 process does not finish until the project finishes. This is the answer of the project manager to the ever changing conditions of the project. An accurate estimate is a critical part of the foundation of an efficient software project. In this paper we discuss and evaluate two different approaches to estimate the effort in developing software using standard dataset. The paper is organised into four sections. First section is the Introduction, where estima-tion and its imporatnce are discussed. Section-2 briefly discusses the working methodology and the effort estimation using NN soft computing approach. In this section only, under respective headings we describe the experimentation steps and the findings of experiment on the standard dataset. Section-3 presents the result and discussion about the findings of experimentation. Section -4 summarises the results obtained by using the two different approaches and provides a conclusion as to which one is a better technique. 2. Working Methodology In the present work of our research we have tried to find out the Development Time (DT’) by applying first the Feed Forward Backpropagation Neural Network Model and then the Regression Analysis. Following methodology was adopted to carry out the effort estimation using the NN and Regression Analysis approaches. 2.1. Data Collection The standard dataset as proposed by Lopez-Martin et.al. has been used for the experimentation purposes. They used the sets of system development projects, where the Development Time (DT), Dhama Coupling (DC), McCabe Complexity (MC) and the Lines of Code (LOC) metrices were registered for 41 modules. Since all the programs were written in Pascal, the module categories mostly belong to procedures and functions. The development time of each of the forty-one modules were registered including five phases: requirements understanding, algorithm design, coding, compiling and testing [1, 2]. Table I shows the dataset used for carrying out experimentation. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Module Description Calculates t value Inserts a new element in a linked list Calculates a value according to normal distribution equation Calculates the variance Generates range square root Determines both minimum and maximum values from a stored linked list Turns each linked list value into its z value Copies a list of values from a file to an array Determines parity of a number Defines segment limits From two lists (X and Y), returns the product of all xi and yi values Calculates a sum from a vector and its average Calculates q values Generates the sum of a vector components Calculates the sum of a vector values square Calculates the average of the linked list values Counts the number of lines of code including blanks and comments Prints values non zero of a linked list Stores values into a matrix Generates range square root Returns the number of elements in a linked list Calculates the sum of odd segments (Simpson’s formula) Calculates the sum of pair segments (Simpson’s formula) Generates the standard deviation of the linked list values Returns the sum of square roots of a list values Prints a matrix Calculates the sum of odd segments (Simpson’s formula) Calculates the sum of pair segments (Simpson’s formula) Calculates the average of linked list values Returns the sum of a list of values

ISSN: 0975-5462

MC 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3

DC 0.25 0.25 0.333 0.083 0.111 0.125 0.125 0.125 0.167 0.167 0.167 0.167 0.167 0.2 0.2 0.2 0.2 0.25 0.25 0.083 0.125 0.125 0.125 0.143 0.143 0.143 0.143 0.143 0.167 0.167

LOC 4 10 4 10 23 9 9 14 7 8 10 10 10 10 10 10 15 10 10 17 11 15 15 13 14 14 15 15 13 14

DT in minutes 13 13 9 15 15 15 16 16 16 18 15 15 18 13 14 15 13 12 12 22 19 18 19 21 20 21 19 20 15 13

2951

Roheet Bhatnagar et. al. / International Journal of Engineering Science and Technology Vol. 2(7), 2010, 2950-2956 31 32 33 34 35 36 37 38 39 40 41

Generates the standard deviation of linked list values Prints a linked list Calculates gamma value (G) Calculates the average of vector components Calculates the range standard deviation Calculates beta 1 value Returns the product between values of two vectors and the number of these pairs Counts commented lines Reduces final matrix (according to Gauss method) Reduces a matrix (according to Gauss method) Counts blank lines

3 3 3 3 4 4 4 4 5 5 5

0.2 0.25 0.25 0.25 0.077 0.077 0.111 0.2 0.143 0.143 0.2

18 9 12 17 16 31 16 24 22 22 22

19 13 12 12 21 21 19 18 24 25 18

MC: McCabe Complexity, DC: Dhama Coupling, LOC: Lines of Code, DT: Development Time (minutes)

2.2. Neural Network Modeling Artificial Neural Network is used in effort estimation due to its ability to learn from previous data [4][5]. It is also able to model complex relationships between the dependent (effort) and independent variables (cost drivers). In addition, it has the ability to generalize from the training data set thus enabling it to produce acceptable result for previously unseen data. Most of the work in the application of neural network to effort estimation made use of feed-forward multi-layer Perceptron, Back-propagation algorithm and sigmoid function. However many researchers refuse to use them because of their shortcoming of being the “black boxes” that is, determining why an ANN makes a particular decision is a difficult task. But then also many different models of neural nets have been proposed for solving many complex real life problems and in this paper too we discuss the application of NN model for effort estimation. [6] A simplified NN architecture as given in Figure-1, with only one input layer (having 3 neurons for each input viz. MC, DC and LOC), one hidden layer (with minimum 3 neurons) and an one output layer (having one output as DT) was designed using Matlab NN Toolbox.

Figure-1 NN Architectural Model

The model was then trained by using 25 (60% of dataset) data from the dataset as given in Table 1, the remaining 8 (20% of the dataset) data and another 8 (20% of dataset) data were used to validate and test the model respectively. The data were randomly selected for all the three cases by the NN model. The plot is as given in Figure-2 below.

ISSN: 0975-5462

2952


Figure-2 NN plot for Training, Validation and Testing data

Table II shows the Actual Effort and Feed Forward NN Predicted development time (DT’) and the relative errors. MC

DC

LOC

DT

TRAINING DATA SET 1 0.25 10 13 1 0.333 4 9 2 0.083 10 15 2 0.125 9 15 2 0.2 10 13 2 0.2 10 14 2 0.2 10 15 2 0.2 15 13 2 0.25 10 12 2 0.25 10 12 3 0.083 17 22 3 0.125 11 19 3 0.125 15 18 3 0.125 15 19 3 0.143 13 21 3 0.143 14 20 3 0.143 15 19 3 0.143 15 20 3 0.167 13 15 3 0.2 18 19 3 0.25 9 13 3 0.25 17 12

ISSN: 0975-5462

NN prediction (DT ‘) 12.43 9.35 18.84 17.18 14.19 14.19 14.19 15.56 12.53 12.53 19.89 18.18 19.08 19.08 18.05 18.32 18.56 18.56 17.02 16.75 13.32 12.74

Error %

4.38 -3.89 -25.60 -14.53 -9.15 -1.36 5.40 -19.69 -4.42 -4.42 9.59 4.32 -6.00 -0.42 14.05 8.40 2.32 7.20 -13.47 11.84 -2.46 -6.17

2953

Roheet Bhatnagar et. al. / International Journal of Engineering Science and Technology Vol. 2(7), 2010, 2950-2956 4 0.111 16 19 4 0.2 24 18 5 0.143 22 25 VALIDATION DATA SET 2 0.111 23 15 2 0.167 7 16 3 0.143 14 21 3 0.167 14 13 3 0.25 12 12 4 0.077 16 21 4 0.077 31 21 5 0.2 22 18 TESTING DATA SET 1 0.25 4 13 2 0.125 9 16 2 0.125 14 16 2 0.167 8 18 2 0.167 10 15 2 0.167 10 15 2 0.167 10 18 5 0.143 22 24

20.63 18.54 23.36

-8.58 -3.00 6.56

17.65 14.67 18.32 15.33 13.08 20.86 20.59 21.40

‐17.66 8.31 12.76 ‐17.92 ‐9.00 0.67 1.95 ‐18.89

12.29 17.18 18.52 15.54 15.58 15.58 15.58 23.36

5.46 ‐7.38 ‐15.75 13.66 ‐3.87 ‐3.87 13.44 2.67

Table II – Actual Effort(DT) and NN predicted Efforts (DT’)

2.3 Statistical Analysis and Regression Modeling Before conducting regression analysis we proceed to check if the data was normally distributed. Figure 3 shows a histogram plot of a normally distributed dataset.

Figure-3 Histogram showing normal distribution of development time data (DT)

ISSN: 0975-5462

2954

Roheet Bhatnagar et. al. / International Journal of Engineering Science and Technology Vol. 2(7), 2010, 2950-2956 From the dataset, MC, DC and LOC were taken as input and DT as output. A linear regression model was obtained using the commercial package STATISTICA by conducting the stepwise regression modeling. Table III shows the table containing DT predicted through the regression analysis.

Actual (DT)

Predicted by Regression Analysis

Error %

(DT’)

ISSN: 0975-5462

13.00000

10.85161

16.52607692

13.00000

10.85161

16.52607692

9.00000

8.18266

9.081555556

15.00000

18.09036

‐20.6024

15.00000

17.18999

‐14.59993333

15.00000

16.73981

‐11.59873333

16.00000

16.73981

‐4.6238125

16.00000

16.73981

‐4.6238125

16.00000

15.38925

3.8171875

18.00000

15.38925

14.50416667

15.00000

15.38925

‐2.595

15.00000

15.38925

‐2.595

18.00000

15.38925

14.50416667

13.00000

14.32810

‐10.21615385

14.00000

14.32810

‐2.343571429

15.00000

14.32810

4.479333333

13.00000

14.32810

‐10.21615385

12.00000

12.72030

‐6.0025

12.00000

12.72030

‐6.0025

22.00000

19.95905

9.277045455

19.00000

18.60849

2.060578947

18.00000

18.60849

‐3.3805

19.00000

18.60849

2.060578947

21.00000

18.02969

14.14433333

20.00000

18.02969

9.85155

21.00000

18.02969

14.14433333

19.00000

18.02969

5.106894737

20.00000

18.02969

9.85155

15.00000

17.25794

‐15.05293333

13.00000

17.25794

‐32.75338462

19.00000

16.19679

14.75373684

13.00000

14.58899

‐12.223

12.00000

14.58899

‐21.57491667

12.00000

14.58899

‐21.57491667

21.00000

22.02067

‐4.860333333

21.00000

22.02067

‐4.860333333

19.00000

20.92737

‐10.14405263

18.00000

18.06548

‐0.363777778

24.00000

21.76707

9.303875

25.00000

21.76707

12.93172

18.00000

19.93417

‐10.74538889

2955


Table III – Actual Effort (DT) and Regression Analysis Predicted Efforts (DT’)

3. Result and Discussion A comparison of the 3-3-1 NN output with measured experimental values of effort shows the % error varying from +14.05 to -25.60, +12.76 to -18.89 and +13.66 to -15.75 for the training dataset (25 nos.), validation dataset (8 nos.) and testing dataset (8 nos.), respectively. A much simplified NN architecture was able to effectively and successfully model the non-linear relationship between the 3 variables and a single output parameter. The performance of NN can be further increased by increasing the neurons in the hidden layer and retraining the model with the data. Also the performance will improve with large datasets. 4. Conclusion In this paper, effectiveness of NN modeling approach of effort estimation for standard dataset was presented. The NN model trained using experimental data was found to have good generalization capabilities and is able to successfully predict the effort closely matching the experimental observations. Since the effect of various cost drivers on effort is often quite complex, ANN can be used as an effective tool to model and predict the development effort. However, the models should also be evaluated by exploring a variety of historical and unseen input data and the model can be adapted and tested to predict the early effort estimation in software development. 5. Running Heads SDEENNRMA

6. References [1] [2] [3] [4] [5] [6]

C. Lopez-Martin, C.Yanez-Marquez, A.Gutierrez-Tornes, “Predictive accuracy comparison of fuzzy models for software development effort of small programs, The journal of systems and software”, Vol. 81, Issue 6, 2008, pp. 949-960. C.L. Martin, J.L. Pasquier, M.C. Yanez, T.A. Gutierrez, “Software Development Effort Estimation Using Fuzzy Logic: A Case Study”, IEEE Proceedings of the Sixth Mexican International Conference on Computer Science (ENC’05), 2005, pp. 113-120. Steve McConnell. Rapid development: taming wild software schedules. Microsoft Press, 1996. A. Idri, T. M. Khoshgoftaar, A. Abran. “Can neural networks be easily interpreted in software cost estimation?”, IEEE Trans. Software Engineering, Vol. 2, 2002, pp. 1162 – 1167. A. Idri,, A. Abran,, T.M. Khoshgoftaar. “Estimating software project effort by analogy based on linguistic values” in. Proceedings of the Eighth IEEE Symposium on Software Metrics, 4-7 June 2002, pp. 21 – 30. H. Park, S. Baek, “An empirical validation of a neural network model for software effort estimation”, Expert Systems with Applications, 2007.

ISSN: 0975-5462

2956

Software Development Effort Estimation â Neural ... - Semantic Scholar

Software Development Effort Estimation â Neural ... - Semantic Scholar

Suggest Documents

Effort Estimation in Agile Software Development ... - Semantic Scholar

Effort Estimation in Global Software Development - Semantic Scholar

Neural Network based Software Effort Estimation - International ...

Software Development Effort Estimation using Fuzzy ...

Effort Estimation in Incremental Software Development - CiteSeerX

Effort Estimation in Component-Based Software Development ...

Software Development Effort Estimation: A Review

Guidelines for Software Development Effort Estimation

Software Development Effort Estimation Techniques

Software Development Effort Estimation Using Soft Computing

Software Development Effort Estimation Techniques

Estimation of Software Development Effort from ...

An Early Software Effort Estimation Method Based ... - Semantic Scholar

Analysis of Empirical Software Effort Estimation ... - Semantic Scholar

State of the Practice in Software Effort Estimation ... - Semantic Scholar

Software effort estimation based on Use Cases - Semantic Scholar

Analysis of Empirical Software Effort Estimation ... - Semantic Scholar

an approach for software effort estimation using ... - Semantic Scholar

Software Effort Estimation Inspired by COCOMO ... - Semantic Scholar

Re-UCP Software Effort Estimation Method: A ... - Semantic Scholar

Improving Software Effort Estimation Using Neuro ... - Semantic Scholar

Effort estimation in Agile Global Software ... - Semantic Scholar

Software Effort Estimation Using Ensemble of ... - Semantic Scholar

Software Effort Estimation in the Early Stages of ... - Semantic Scholar

Software Development Effort Estimation â Neural ... - Semantic Scholar