An Early Software Effort Estimation Method Based ... - Semantic Scholar

2 downloads 676 Views 468KB Size Report
Abstract— It is an important issue in the software industry to predict how much effort will be required for a software project as early as possible. Software size is ...
JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014

2169

An Early Software Effort Estimation Method Based on Use Cases and Conceptual Classes Tülin Erçelebi Ayyıldız (1, 2)

1

Başkent University/ Computer Engineering, Ankara, Turkey Email: [email protected]

2

Altan Koçyiğit (2)

Middle East Technical University/ Informatics Institute, Ankara, Turkey Email: [email protected]

Abstract— It is an important issue in the software industry to predict how much effort will be required for a software project as early as possible. Software size is one of the commonly used attributes in effort estimation. In this paper, we propose an early software size and effort estimation method based on conceptual model of the problem domain. Our method utilizes the noteworthy domain concepts identified mainly from the use cases written in the requirements phase of the software development lifecycle. In order to develop the model and evaluate its prediction quality, the use cases written and the effort data collected for 14 industrial software development projects of a CMMI level 3 certified defense industry company have been used. Evaluation results reveal a high correlation between the number of conceptual classes identified (i.e., domain objects) during the requirements analysis, the number of classes constituting the resulting software and the actual effort spent. Moreover, we have used the use case point (UCP) method to estimate the effort needed for each project and compared the results of UCP analysis with the results obtained with our method. The comparisons have shown that, for the projects considered, our method gives a better effort estimation compared to the effort estimated by using the UCP method. Index Terms— effort estimation, use cases, use case point, conceptual classes

I. INTRODUCTION Poor effort estimation is one of the main problems that impact the success of the software projects. Underestimation results in schedule and budget overruns, on the other hand overestimation can result in inefficiency and waste of resources. There is a multitude of effort estimation methods proposed in the literature. These effort estimation methods mainly use some of the attributes of the software to be developed. Size is the most commonly used and significant software attribute considered in the estimations [1]. Hence size measurement plays a very important role in effort estimation. Several size measures, including source lines of code (SLOC) and function points, have been defined for software [2]. SLOC is the simplest measure but it depends on the programming language used and definition of source line of code. More importantly, it can only be measured precisely at the end © 2014 ACADEMY PUBLISHER doi:10.4304/jsw.9.8.2169-2173

of the project and using SLOC in early size estimation is difficult [3]. Functional size measurement methods are much more suitable for early size measurement. They are independent of the programming language and coding style of the developers. However, many of these methods are suitable for procedural business information systems and effort estimation based on these measures does not usually consider the software development methodology used [4]. In this paper, we consider object oriented software development methodology which is used extensively in the industrial projects. We propose an effort estimation method for projects employing object oriented software development methodology. The essential elements in object oriented software are classes and their instances, objects. In [5], Class Point method is proposed to measure the size of the object oriented systems. Class Point method utilizes the design documents created in the design phase. However, as the design documents are not available until the end of design phase, this method cannot be used in the analysis phase. Use cases are broadly applied in object oriented software development and uses cases are usually key requirements inputs to object oriented analysis and design activities. Therefore, use cases are valuable resources for software size measurement and effort estimation [6]. There are many use case based methods proposed for software size measurement and effort estimation. Use Case Points (UCP) [7] method is one of such widely used effort estimation methods. We focus on the attributes that are directly related to the basic building blocks of object oriented software. Hence, we appreciate the Class Point [5] method for software effort estimation. The class diagrams are one of the primary artifacts created by object-oriented design. However, in the analysis phase, it is hard to determine implementation classes together with their methods and interactions. Therefore, we considered problem domain models rather than design class diagrams for size and effort estimation. As a result, in a project, our method can be applied much earlier than the class point analysis method. Much accurate estimation can be done by using Class Point or some other method in the later phases of development.

2170

JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014

A domain model illustrates noteworthy concepts, their attributes and associations in the problem domain of the software to be developed [8]. Problem domain classes constituting the domain model serve as an inspiration source while designing software objects. Therefore, the number of concepts in the problem domain models can give an idea about the size of the software in general and the number of classes to be created in the development phases in particular. Domain models are usually created in requirements analysis and in the object oriented analysis phases. One of the widely used methods for finding problem domain concepts (a.k.a. real situation conceptual classes) is identifying nouns and noun phrases in requirements documents or textual problem domain descriptions. Although any type of requirements specification document can be used for this purpose, in our method, we considered use cases as the primary sources to identify problem domain concepts. In order to develop the model and validate the usefulness of the domain models in software size and effort estimation, we conduct a case study by using use cases and the development effort collected for 14 completed industrial software development projects of a CMMI level-3 certified defense industry company operating in Turkey. We also compared the predictions made by our method with the predictions made by a widely used use case based size and effort estimation method, UCP. The rest of paper is organized as follows: Section II presents an overview of the UCP effort estimation method that we used to make comparisons. In Section III, brief information about 14 real life software projects used to develop and evaluate our model is given. Our size and effort estimation method based on conceptual classes is introduced in Section IV. The results of the case study conducted are presented in Section IV. Finally, in Section VI, we discuss our findings. II. USE CASE POINT (UCP) UCP is the basic technique proposed by Gustav Karner [7] for estimating effort based on use cases. The method assigns quantitative weight factors (WF) to actors and use cases according to their classification as Simple, Average and Complex. The UCP method is outlined in Figure 1. The sum of all the weights assigned to actors gives the Unadjusted Actor Weight (UAW). Similarly, the sum of the weights assigned to the Use Cases gives the Unadjusted Use Case Weight (UUCW). Thus, Unadjusted Use Case Points (UUCP) is computed as: UUCP = UAW + UUCW

(1)

Technical complexity of the projects considered is denoted by TCF. Thirteen technical factors have some specific weights and a score between 0 and 5 is assigned to each factor depending on its influence on the project. A value of zero means that the factor is irrelevant for this project; five means that it is essential.

© 2014 ACADEMY PUBLISHER

Figure 1. The UCP effort estimation steps [9]

TFactor is calculated by multiplying the value of each factor by its weight and then adding all these numbers to compute TCF as: TCF=0.6 + (0.01*TFactor)

(2)

Development resources are denoted by EF (aka experience factors). The UCP model describes eight such factors contributing to the effectiveness of the development team. EFactor is the sum of the value of each factor (between zero and five) multiplied by its weight. Then, EF is computed as: EF=1.4 + (-0.03*EFactor)

(3)

After computing TCF and EF, they are multiplied with the UUCP to yield Adjusted Use Case Points (AUCP) as: AUCP =UUCP *TCF*EF

(4)

In order to predict the effort in man-hours, UCP is multiplied by Productivity Factor (PF) as: Effort= AUCP*PF

(5)

Karner used the PF of 20 hours per UCP [7]. We used the same PF value in our evaluations. III.

PROJECTS

In order to develop our estimation method and to validate it, the requirements and effort data collected for previously completed 14 industrial software projects are taken from a CMMI level 3 certified defense industry company. Each of these projects is a real-time process control software development project that is coded the Java programming language. A waterfall based software development lifecycle had been employed in these projects. Each project was developed by a team of size 4-8

JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014

2171

professional software engineers and their actual effort in terms of man-hours and schedule logs were collected during development. We have used the detailed fully dressed use cases which were created and used during the software development and the final source code to identify the conceptual and software classes, respectively. The projects considered in this study are named A,B,C,D,E,F,G,H,I,J,K,L,M, and N and further details of the projects and the company could not be given in this paper due to confidentiality reasons.

variables. A result of zero means that there is no linear relationship between the two variables. It has been accepted that results between 0.5 and 1.0 has high correlation [11]. Scatterplot of number of concepts versus number of classes is given in Figure 2. The following result is obtained from Pearson correlation coefficient analysis:

IV. SIZE AND EFFORT ESTIMATION BASED ON CONCEPTUAL CLASSES

This result indicates that there is a high correlation between number of conceptual classes and number of software classes.

In order to investigate the relation between the number of the conceptual classes in the domain model and size of the software project in terms of number of software classes, we have extracted conceptual classes from text based use case descriptions by identifying nouns and noun phrases manually. Identification of noun and noun phrases is performed by the same person. In this analysis, we have used the use case texts, rather than the requirements specification documents. After nouns and noun phrases are identified, duplicates and synonyms are eliminated. This gave us the set of conceptual classes and their attributes. The number of conceptual classes that we identified, the number of software classes implemented and the actual effort for the projects are given in Table I.

Pearson correlation of Number of Conceptual Classes and Number of Software Classes = 0.990

(6)

TABLE I. NUMBER OF CONCEPTUAL CLASSES AND NUMBER OF SOFTWARE CLASSES Project Name Project A Project B Project C Project D Project E Project F Project G Project H Project I Project J Project K Project L Project M Project N

Number of Conceptual Classes 517 715 243 383 80 99 195 199 343 209 132 105 680 121

Number of Software Classes 341 484 189 302 62 61 157 152 292 174 99 79 513 78

Actual Effort (man-hours) 10561 13105 5819 8342 2165 2354 4667 6439 7210 5336 5597 2989 11286 2678

A. Correlation Between Number of Conceptual Classes and Number of Software Classes After identifying and counting conceptual classes and counting all of the software classes from the code, Pearson correlation coefficient is computed by the help of Minitab statistics tool [10]. The correlation between two variables is a measure of how well the variables are related. The most common measure of correlation in statistics is the Pearson Correlation (or the Pearson Product Moment Correlation PPMC), which shows the linear relationship between two variables. The correlation is between -1 and 1. A result of -1 means that there is a perfect negative correlation between the two variables, while a result of 1 means that there is a perfect positive correlation between the two © 2014 ACADEMY PUBLISHER

Figure 2. Scatterplot of number of concepts versus number of classes

We also derived a regression equation to predict the number of software classes corresponding to the number of concepts in the problem domain. A regression equation takes the form of y=a+bx, where “y” is the dependent variable that the equation tries to predict, “x” is the independent variable that is being used to predict “y”, “a” is the y-intercept of the line [5]. The regression equation for the number of software (Nsw) classes for a given number of conceptual classes (Nc) which we obtained using Minitab tool is as: Nsw = 6.9 + 0.72 * Nc

(7)

B. Correlation Between Number of Conceptual Classes and Actual Effort After identifying correlation between the number of conceptual classes and the number of software classes we used Pearson correlation coefficient again to investigate correlation between the number of conceptual classes and the actual effort. The Pearson correlation coefficient for the number of conceptual classes and the actual effort (E) is found as: Pearson correlation of Number of Conceptual Classes and Actual Effort = 0.965

(8)

2172

JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014

Since the results close to 1.0 indicate high correlation, we can say that number of conceptual classes and actual effort are highly correlated. The regression equation for the number of conceptual classes (Nc) and the actual effort (AE) is found as: AE = 1825 + 15.7 * Nc V.

Project B

In this section we evaluate the accuracy of the predictions that are made by using the regression equation found in the previous section. In order to make comparisons we also estimated the effort for each of the software projects considered with the UCP method. We compared the results obtained from our proposed method and the results obtained from the UCP method to the actual effort and assessed the accuracy of both estimation methods. An important question that needs to be answered by any estimation method is “how accurate are the predictions?” As an evaluation criteria, we applied Magnitude of Relative Error (MRE), Prediction Quality (Pred(e)) and Adjusted Mean Squared Error (AMSE). Accuracy of an estimation technique is inversely proportional to the MRE [3] which is calculated as: (10)

where AE is the actual effort, and EE is the estimated effort. Prediction quality (Pred(e) = k/n) is calculated on a set of n projects, where k is the number of projects for which MRE is less than or equal to “e”. In this study, we take e= 0.25. Conte, suggested that for an acceptable estimation model, the value of Pred (0.25) should exceed 0.75 [12]. The interpretation of MRE and Pred criteria is that the accuracy of an estimation technique is proportional to the Pred and inversely proportional to the MRE. The simplest raw measure is the mean squared error, however it depends on the mean of the data sets and it is thus difficult to interpret or make comparisons. Instead, we use AMSE, the adjusted mean square error. AMSE is the sum of the squared errors, divided by the product of the means of the predicted and observed outputs. Let Ei be a vector of n predictions, and let Êi be the vector of the observed values, then the AMSE of the predictor is: (11) ∗ The MRE, Pred and AMSE for the projects considered are presented in Table II.

© 2014 ACADEMY PUBLISHER

Project A

(9)

EVALUATION AND COMPARISON

AE − EE M RE = AE

TABLE II. RESULTS

Project C

Project D

Project E

Project F

Project G

Project H

Project I

Project J

Project K

Project L

UCP

Concepts Based

Estimated Effort

11595

9942

MRE

0.09

0.05

Actual Effort

10561

10561

Estimated Effort

14770

13051

MRE

0.12

0.004

Actual Effort

13105

13105

Estimated Effort

6177

5640

MRE

0.06

0.03

Actual Effort

5819

5819

Estimated Effort

10848

7838

MRE

0.31

0.06

Actual Effort

8342

8342

Estimated Effort

1388

3081

MRE

0.35

0.42

Actual Effort

2165

2165

Estimated Effort

1816

3379

MRE

0.22

0.43

Actual Effort

2354

2354

Estimated Effort

5180

4887

MRE

0.10

0.04

Actual Effort

4667

4667

Estimated Effort

7230

4949

MRE

0.12

0.23

Actual Effort

6439

6439

Estimated Effort

8335

7210

MRE

0.15

0

Actual Effort

7210

7210

Estimated Effort

6824

5106

MRE

0.27

0.04

Actual Effort

5336

5336

Estimated Effort

6320

3897

MRE

0.13

0.30

Actual Effort

5597

5597

Estimated Effort

3412

3474

MRE

0.14

0.16

Actual Effort

2989

2989

JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014

Project M

Project N

Estimated Effort

13480

12501

MRE

0.19

0.10

Actual Effort

11286

11286

Estimated Effort

3812

3725

MRE

0.42

0.39

Actual Effort

2678

2678

Pred (0.25)

0.71

0.71

AMSE

0.49

0.26

According to the results, in terms of MRE, Pred(0.25) and AMSE, for nine of the fourteen projects (A,B,C,D,G,I,J,M,N), our proposed method gives effort estimation more closer to the actual effort as compared to the estimations made by using Karner’s original UCP method. However, for the remaining projects (E,F,H,K,L) UCP method gives the better results. When all projects are considered, similar Pred results are obtained. However, AMSE result show that our method gives much accurate predictions. VI. CONCLUSION In this paper, we proposed an early software size and effort prediction approach based on conceptual models derived from use cases or other requirements artifacts. In order to develop the model we have used the requirements and effort data collected for 14 industrial software development projects. Our analyses have shown that there is a high correlation between the number of conceptual classes and number of software classes. In addition we observed a high correlation between the number of conceptual classes and actual software development effort. These correlations suggest that number of concepts can be used to predict the size of the software and required effort for object oriented software development projects. We derived a regression equation to relate the required effort to the number of conceptual classes. In addition we applied the UCP method and our proposed method on the data collected from 14 software projects. According to results obtained, for the projects considered, our proposed method gives better effort predictions than the UCP method. As a future work, we will further investigate accuracy of our method by increasing the number of projects and we will extend our idea to provide better estimations. Besides, we are working on identification of noun and noun phrases automatically by a tool to minimize the time and effort spent for estimation. REFERENCES Ribu, K.; Estimating Object-Oriented Software Projects with Use Cases, University of Oslo, Norway, 2001. [2] Abran, A.; Gallego, J.J.; Software Estimation Models & Economies of Scale, 21st International Conference on [1]

© 2014 ACADEMY PUBLISHER

2173

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11] [12]

Software Engineering and Knowledge Engineering, SEKE'2009, p.p 625-630, 2009. Gencel, Ç.; Buglione, L.; Demirörs, O.; Efe, P.; A Case Study on the Evaluation of COSMIC-FFP and Use Case Points, Software Measurement European Forum, SMEF 2006. Ozkan, B.; Turetken, O.; Demirörs, O.; Software Functional Size: For Cost Estimation and More, Paper presented at the EuroSPI, Dublin, Ireland, 2008. Costagliola, G.; Tortora, G.; Class Point: An Approach for the Size Estimation of Object- Oriented Systems, IEEE Transactions on Software Engineering, Vol. 31, No. 1, pp. 52-74, 2005. Ouwerkerk, J.; Abran, A.; An Evaluation of the Design of Use Case Points (UCP), Proceedings of the International Conference on Software Process and Product Measurement, MENSURA 2006. Karner, G.; Metrics for Objectory. Diploma thesis, University of Linkoping, Sweden. No. LiTH-IDA-EX9344, 21, December 1993. Larman, C.; Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development, 3rd ed., Addison Wesley, 2004, pp.131-136. Kim, S.; Lively, W.; Simmons, D.; An Effort Estimation by UML Points in the Early Stage of Software Development, Proceedings of the International Conference on Software Engineering Research and Practice. Nevada, USA, pp. 415-421, 2006. Brook, Quentin.; Lean Six Sigma Minitab: The Complete Toolbox Guide for all Lean Six Sigma Practitioners, 3rd ed., 2010. DeSanto, C.; Totoro, M.; Moscartelli, R.; Introduction to Statistics 9th ed., Pearson, 2010, pp. 169-182. Conte, S.D.; Dunsmore H.E.; Shen V.Y.; Software Engineering Metrics and Models, Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, USA, 1986.

Tülin Erçelebi Ayyıldız received B.Sc degree in Computer Engineering Department from Çankaya University, Turkey, in 2005 and M.Sc. degree in Computer Engineering Department from Hacettepe University, Turkey, in 2008. She is a Ph.D. student in Informatics Institute at Middle East Technical University. She is now a research assistant at Başkent University in the Department of Computer Engineering. Her main research interests are software size and effort estimation, software measurement and software process improvement. Altan Koçyiğit received B.Sc, M.Sc and Ph.D degrees in Electric and Electronic Engineering Department from Middle East Technical University, Turkey, in 1993, 1997 and 2001, respectively. He has been with the Middle East Technical University Informatics institute since 2002. His research interests include computer networking, software engineering, and parallel/distributed processing.