software effort estimation approaches – a review - Interscience

SOFTWARE EFFORT ESTIMATION APPROACHES – A REVIEW 2

S.K. MOHANTY1& A.K. BISOI

1 WIPRO Technologies Limited, INDIA School of Computer Engineering, KIIT University, INDIA

2

Abstract: - Software estimation is the process of predicting the effort & cost required to develop software. This review paper provides a general overview of software estimation models and techniques. Models can be categorized as Size-Based, Function-Based, Learning-Based and Expertise-Based. Both Size-based and Function-based models can be termed as Parametric as they use a function or formula of fixed form for software cost/effort estimation. Each has its own strengths and weaknesses. A key factor in selecting an estimation model is the accuracy of its estimates. Unfortunately, it is true that no single technique is best for all situations, and that a careful comparison of the results of several approaches is most likely to produce realistic estimates. Keywords: effort estimation, project estimation, cost models, metrics.

1.

INTRODUCTION

 Project duration (in calendar time)  Cost (in dollars) Most estimation models attempt to generate an effort estimate, which can then be converted into the project duration and cost. Although effort and cost are closely related, they are not necessarily related by a simple transformation function. Effort is often measured in person-months of the programmers, analysts and project managers etc.

One of the most critical activities in software project management during the project inception phase is to estimate the effort and cost needed to complete the project tasks. Decisions related to Financial and Human resource aspects are dependent upon the estimations of the effort needed to make software. The common problems with estimation are either related to the over estimation or the underestimation of the effort needed. The consequences of underestimation lead to low employee morale, decline in reputation and a stressful work environment. On the other hand, over estimation can lead to too many resources committed to the project or losing the bid or making poor decisions related to outsourcing parts of the project versus developing it internally. In general, the tendency in the software industry is to underestimate the effort and that leads to the realization during the project execution that the milestones cannot be met.

2.

ALGORITHMIC & PARAMETRICMODELS

In parametric models the development time and effort are estimated as a function of a number of variables. These variables represent the most important cost drivers. The core of an estimation model is a number of algorithms and parameters. Parametric models have been empirically calibrated to actual data from previously completed software projects. Again these models can be classified under two broad categories. The first category includes black box-based or requirement-based estimation techniques. Estimations are obtained once the software project scope is clearly defined in terms of the required functionalities. Algorithmic models need to be calibrated relative to the local environment in which they are used. The widely used estimation techniques, such as function points and use case points, are in this category. The second category includes techniques that are based on the projected size of the final software product in terms of number of lines of code (LOC). COCOMO is a widely-used technique in this category.

In recent years, software has become the most expensive component of Computer system based projects and therefore the role of proper estimation in a software project is important. Many estimation models have been proposed over the last 40 years. Although most of these researchers started working on developing models of estimation at about the same time, they all faced the same dilemma: as software grew in size and importance it also grew in complexity, making it very difficult to accurately predict the cost/effort of software development. Just like in any other field, the field of software engineering estimation models has its own pitfalls. The fast changing nature of software development has made it very difficult to develop parametric models that yield high accuracy for software development in all domains.

a) Function Point Analysis (FPA) The user view is functional and developer view is technical. The technical view in the early stages of the development cycle is difficult but a functional view is possible with fair accuracy. Hence, estimation of the software is proposed on functionality [1], [2]. A fair assumption would be that more the functions, features and facilities, the lager the size of the

Software estimation involves the determination of one or more of the following parameters:  Effort (usually in person-months)

International Journal of Internet Computing ISSN No: 2231 – 6965, VOL- 1, ISS- 3 2012 82

Software Effort Estimation Approaches – A Review

software. To generate a function-dependant size estimate, researchers suggested a method based on function points. Function point is a measure of the functionality [3]. Simple function would have less function points compared to complex functions [4].

Table: - 2 General Application Characteristics affecting the Complexity of Software Projects 1. Reliable Backup and Recovery 2. Data Communications 3. Distributed Functions 4. Performance 5. Heavily Used Configurations 6. Real-Time data entry 7. Ease of Use 8. Real-Time Update Needed 9. Complexity of the Interfaces 10. Complexity of the Processing 11. Reusability 12. Ease of Installation 13. Multiple Sites 14. Easy to Change The degree of influence of each of the characteristics can range from zero (meaning, not present, or has no effect) to five (meaning, a strong influence throughout). The sum of the fourteen characteristics, that is, the total degrees of influence (DI), given in equation 1.2 is then converted to the technical complexity factor (TCF) using the formula given as equation 1.3.

Complexity of software and the effort needed to develop it are a function of the number and type of five different kinds of functional components that can be obtained and assessed at the requirements specifications phase [5]. – Internal Files (IF) corresponding to the database files that are created and maintained within the application to develop. – External Files (EF) corresponding to the files owned and maintained by other applications, but are used by the application to develop. – External Inputs (EI) corresponding to the inputs that affect the control flow and internal logic of the application leading to the creation and maintenance of data. – External Outputs (EO) corresponding to the data leaving the application to different output devices, files or external systems. – External Inquiries (EIQ) corresponding to simple user queries resulting in responses to them. The first two types of components are referred to as data-based components and the other three as transaction-based components. Table: - 1 Weight Factors for Function-Point Metrics Simple Average Complex External 3 4 6 Input (EI) External 4 5 7 Output (EO) External 3 4 6 Inquiries (EIQ) External 7 10 15 File (EF) Internal 5 7 10 File (IF) Each component is then assigned a points value on the basis of its type and complexity. The points values of all the components are then summed (see equation 1.1) to give a size for the system in unadjusted function points (UFPs). = ∑( ) + ∑( ) + ∑(

) + ∑(

=

[ ]

TCF = (0.65 + 0.01 * DI) ( . ) The TCF is now used to modify the size of the system to give the overall size in function points as below. FPs = (UFP * TCF) ( . ) The mappings between FP to LOC for different languages are decided by International Function Point User Group (IFPUG). Another estimation technique namely “Feature point” extends the function points [6]. Advantages & Disadvantages of FPA [7]: FPA is independent of technology i.e. OS, programming language, database, developer productivity and methodology.  The FPA concept is simple to understand; hence, it becomes a good quick measure for any comparative analysis.  Identification of files is tricky and number of files is uncertain.  The system may have more than 14 characteristics that have not been considered.  The internal processing complexity due to the use of complex business rules, algorithms, calculations etc is not weighted properly.

) + ∑( ) ( . )

The technical complexity factor is given by quantifying the effect of fourteen General Application Characteristics that affect the complexity of carrying out the design and implementation task. The General Application Characteristics are shown below.



Fig – 1 International Journal of Internet Computing ISSN No: 2231 – 6965, VOL- 1, ISS- 3 2012 84


Table: - 6 Environmental Factors for UCP Metrics Environmental Factor Familiarity with Project Application Experience Object-Oriented Experience Lead-Analyst Capabilities Motivation Stability of Requirements Part-Time Staff Programming Language Difficulty

b) Use Case Point (UCP) The Use Case Point (UCP) is a software effort estimation technique that was introduced by Kamer in 1993. It is an extension of the function point method based on the use cases existing in the use case model of a software system. The unadjusted actor weight (UAW) is the sum of complexity values assigned to each actor. Similarly, the unadjusted use case weight (UUCW) is the sum of complexity values assigned to each use case. The total unadjusted use case point (UUCP) is the sum of UAW and UUCW. The number of adjusted use case points (AUCP) is computed by multiplying the UUCP with the product of two adjustment factors: the technical complexity factor (TCF) and the environmental factor (EF). Table: - 3 Actor Complexity Values Actor Type Value Simple 1 Average 2 Complex 3 =

(Actor Complexity Values)

Table: - 4 Use-Case Complexity Values Use-Case Type Simple: ≤ 3 Transactions Average: Between 4 and 7 Transactions Complex: ≥ 7 Transactions

=

(UseCase Complexity Values)

= . – .

(

)

) +

)

( . )

TCF

( . ) c) COCOMO (COnstructive COst MOdel) The COCOMO cost and schedule estimation model was originally published by Boehm in 1981. In the COCOMOs, the code-size is given in thousand LOC (KLOC) and Effort is in person-month [8], [9], [10]. A) Basic COCOMO. This model uses three sets of {a, b,c,d} depending on the complexity of the Software as below: Table: - 7 Coefficients used in the basic COCOMO model a b c d Product Type Organic 2.4 1.05 2.5 0.38 Semi-Detached 3.0 1.12 2.5 0.35 Embedded 3.6 1.2 2.5 0.32 E = a x Sizeb D = c x Ed P=E/D Where E is the effort in person-months, D is the development time in terms of number of months and P is the estimated number of persons needed [11]. B) Intermediate COCOMO and Detailed COCOMO. In the intermediate COCOMO, nominal effort estimation is obtained using the power function with three sets of {a, b}, with coefficient a being slightly different from that of the basic COCOMO [12], [13]:

( . )

Value 5 10 15

( . )

Table: - 8 Coefficients Used in the Intermediate COCOMO Model Product Type a b Organic 3.2 1.05 Semi-Detached 3.0 1.12 Embedded 2.8 1.2 E = a x Sizeb x EAF Where coefficients, a and b are given in Table-8, Size is the estimated KLOC and EAF computed using the cost drivers rating given in Table-9.

= . .

(

AUCP = (

Table: - 5 Technical Complexity Factors for UCP Metrics Technical Factor Weight Distributed System 2 Performance Requirements 1 End-User Efficiency 1 Internal Processing 1 Reusability of Code 1 Installation Ease 0.5 Usability Requirements 0.5 Portability Requirements 2 Changeability Requirements 1 Concurrency 1 Security Requirements 1 Direct Access to 3rd-Party 1 User Training Facility 1

+

Weight 1.5 0.5 1 0.5 1 2 -1 -1

( . )



Table: - 9 Cost Drivers Ratings Used in the Intermediate COCOMO Model Cost Drivers Very Low Low Nominal Product Attributes Required Software Reliability 0.75 0.88 1.00 Size of Application Database 0.94 1.00 Complexity of the Product 0.70 0.85 1.00 Hardware Attributes Run-Time Performance Constraints 1.00 Memory Constraints 1.00 Volatility of the Virtual Machine 0.87 1.00 Required Turn About Time 0.87 1.00 Personnel Attributes Analyst Capability 1.46 1.19 1.00 Application Experience 1.29 1.13 1.00 Software Engineer Capability 1.42 1.17 1.00 Virtual Machine Experience 1.21 1.10 1.00 Programming Language Experience 1.14 1.07 1.00 Project Attributes Use of Software Tools 1.24 1.10 1.00 Application of S/W Engineering 1.24 1.10 1.00 Required Development Schedule 1.23 1.08 1.00

High

Very High

Extra High

1.15 1.08 1.15

1.40 1.16 1.30

1.65

1.11 1.06 1.15 1.07

1.30 1.21 1.30 1.15

0.86 0.91 0.86 0.90 0.95

0.71 0.82 0.7

0.91 0.91 1.04

0.82 0.83 1.10

Fig – 1 International Journal of Internet Computing ISSN No: 2231 – 6965, VOL- 1, ISS- 3 2012 86

1.66 1.56


3.

LEARNING-BASED MODELS

a) Delphi Technique

Learning-oriented techniques include both some of the oldest as well as newest techniques applied to estimation activities. The former are represented by case studies, the more recent ones are represented by neural networks, which attempt to automate improvements in the estimation process by building models that “learn” from previous experience. Machine learning techniques have in the last decade been used as a complement or alternative to the previous two categories [14], .

The Delphi technique was developed at The Rand Corporation in the late 1940s originally as a way of making predictions about future events. This is a useful technique for coming to some conclusion regarding an issue when the only information available is based more on “expert opinion” than hard empirical data. In this technique, a group of experts on that domain are asked to make some assessment regarding an issue, individually in a preliminary round, without consulting the other participants in the exercise. The first round results are then collected, tabulated, and then returned to each participant for a second round, during which the participants are again asked to make an assessment regarding the same issue, but this time with knowledge of what the other participants did in the first round. The second round usually results in a narrowing of the range in assessments by the group, pointing to some reasonable middle ground regarding the issue of concern. The original Delphi technique avoided group discussion; the Wideband Delphi technique accommodated group discussion between assessment rounds. b) WBS (Work Breakdown Structure) The WBS is a way of organizing project elements into a hierarchy that simplifies the tasks of budget estimation and control. It helps determine just exactly what costs are being estimated. If an organization consistently uses a standard WBS for all of its projects, over time it will accrue a very valuable database reflecting its software cost distributions. This data can be used to develop a software cost estimation model tailored to the organization’s own experience and practices.

a) Neural Networks Estimation models that can be “trained” using historical data to produce ever better results by automatically adjusting their algorithmic parameter values to reduce the delta between known actuals and model predictions. The most common form of a neural network used in the context of software estimation, a “back propagation trained feed-forward” network as depicted in Fig-2. 4.

EXPERTISE-BASED MODELS

Expert Judgments has been widely used. Expert opinion, although always difficult to quantify, can be an effective estimating tool on its own or as an adjusting factor for algorithmic/parametric models. These techniques are useful in the absence of quantified, empirical data. Two techniques have been developed which capture expert judgment, are the Delphi technique and the Work Breakdown Structure [15], [16].

Fig - 3 International Journal of Internet Computing ISSN No: 2231 – 6965, VOL- 1, ISS- 3 2012 87


5.

[8]. B.W. Boehm et al "The COCOMO 2.0 Software Cost Estimation Model", American Programmer, July 1996, pp.217.

CONCLUSIONS

Since none of the techniques are sufficient enough to fit in all scenarios which are common irrespective of environments; it requires expertise as well as exposure to combine multiple techniques if possible and then calibrate [17], [18]. The approach which the practioners take to reduce the risk of underestimation is to produce estimates using different techniques performed by different experts. Differences between the estimated efforts can then be reconciled using statistical analysis techniques [19], [20]. For example, if the three estimates, ELow, EHigh and EMid are obtained, such that ELow < EMid < EHigh, the value for E that can be used is computed using the equation: E = (ELow + 4 x EMid + EHigh) / 6

[9]. D.S.Chulani, "Incorporating Bayesian Analysis to improve the Accurarcy of COCOMO II and its Quality Model Extension", Ph.D. Qualifying Exam Report, USC, February, 1998. [10]. B.W. Boehm et al.,"Cost Models for Future Software Life Cycle Processes: COCOMO 2.0", Annals of Software Enginneing on Software Process and Product Measurement, Amsterdam, 1995 [11]. USC-CSE (1997), “COCOMO II Model Definition Manual,” Center for Software Engineering, Computer Science Department, University of Southern California, Los Angeles, CA, website: http://csse.usc.edu/csse/research/ COCOMOII/cocomo_main.html [12]. Clark, B., S. Chulani, and B. Boehm (1998), “Calibrating the COCOMO II Post Architecture Model,” In International Conference on Software Engineering, April.

Recently, many researchers have introduced the applicability of using Soft Computing and Machine Learning Techniques to solve the effort and cost estimation problem for software systems. Use of artificial neural networks (ANNs), Genetic Algorithms (GAs), Genetic Programming (GP), Linear Regression (LR) and Fuzzy-Logic to provide a methodology for software cost estimation [21]. Even a fuzzy COCOMO model is proposed. Particle Swarm Optimization (PSO) is used by researchers to tune the parameters of the COCOMO model. Many hybrid schemes have also been investigated including neuro-GA, neuro-fuzzy etc. Although, there are many potential benefits from using more than one technique, there is no way to figure out which techniques to use before processing data [22].

[13]. A. F. Sheta, “Estimation of the COCOMO model parameters using genetic algorithms for NASA software projects,” Journal of Computer Science, vol. 2, no. 2, pp. 118–123, 2006. [14]. Attarzadeh, I. and S. H. Ow. A Novel Algorithmic Cost Estimation Model Based on Soft Computing Technique, Journal of Computer Science 6 (2): 117-125 (2010). [15]. Frakes, W. B. and K. Kang. Software Reuse Research: Status and Future, IEEE Transactions on Software Engineering, 31(7): 529-536 (2005). [16]. Jones, C. Strengths and Weaknesses of Software metrics, Software Productivity Research LLC, version-5, 1-17 (2006). [17]. Qureshi, M. R. J. and S. A. Hussain. A Reusable Software Component-Based Development Process Model Int. J of Advances in Engineering Software, 39(2): 88-94 (2008).

REFERENCES

[18]. Software Engineering – Kassem A. Saleh, J.Ross Publishinh, 2009

[1]. Function Point Counting Practices Manual: Release 3.0; IFPUG Counting Practices Committee; 1990

[19]. Software Engineering (Principles and Practice) – Waman S Jawadekar, TMH, 2004

[2]. Software Sizing and Estimating: Mk II FPA; Symons C.R.; John Wiley & Sons; 1991

[20]. Roger, S. P. Software Engineering: A Practitioner’s Approach. pp 722-.742 5th edi. McGraw-Hill (2000).

[3]. An Introduction to Function Point Analysis, Dr Paul Vickers

[21]. H. Mittal and P. Bhatia, “Optimization criteria for effort estimation using fuzzy technique,” CLEI ELECTRONIC JOURNAL, vol. 10, no. 1, pp. 1–11, 2007.

[4]. IFPUG, "Function Point Counting Practices Manual", Release 4.0, International Function Point Users Group IFPUG, Westerville, Ohio, 1994

[22]. Zaid, A., M. H. Selamat, A. A. A. Ghani, R. Atan and K. T. Wei. Issues in Software Cost Estimation, IJCSNS Int J of Computer Science and Network Security, 8(11): 350-356 (2008).

[5]. D. R. Jeffery, G. C. Low, and M. Barnes, “A comparison of function point counting techniques”, IEEE Trans on Soft. Eng., vol. 19, no. 5, 1993, pp. 529-532. [6]. D. St-Pierre, M Maya, A. Abran, J. Desharnais and P. Bourque, Full Function Points: Counting Practice Manual, Technical Report 1997-04, University of Quebec at Montreal,1997.



[7]. A. Abran, P.N. Robillard, Function point analysis: an empirical study of its measurement processes, IEEE Transactions on Software Engineering 12 (12) (1996) 895– 910.


software effort estimation approaches – a review - Interscience

software effort estimation approaches – a review - Interscience

Suggest Documents

Software Development Effort Estimation: A Review

A Review on Soft Computing-based Software Effort Estimation Models

Software Effort Estimation as Collective

Software Effort Estimation with Multiple Linear Regression: review ...

Early Stage Software Effort Estimation Using Random ... https://www.researchgate.net/.../Early-Stage-Software-Effort-Estimation-Using-Rando...

Effort Estimation in Component-Based Software Development ...

Software Effort Estimation as Collective Accomplishment

Adjusted Case-Based Software Effort Estimation ...

Guidelines for Software Development Effort Estimation

Effort estimation in software projects using fuzzy

Analysis of Empirical Software Effort Estimation Models

SOFTWARE EFFORT AND SCHEDULE ESTIMATION USING THE ...

Software Project Effort Estimation Using Genetic ... - CiteSeerX

Software Effort Estimation Using Machine Learning Methods

Software Development Effort Estimation Techniques

Software Development Effort Estimation Using Soft Computing

Reliable Confidence Intervals for Software Effort Estimation

Recommending effort estimation methods for software ... - CiteSeerX

Software Project Effort Estimation Using Genetic Programming

Applying moving windows to software effort estimation

Recommending effort estimation methods for software project ...

Software Development Effort Estimation using Fuzzy ...

Effort Estimation in Incremental Software Development - CiteSeerX

Neural Network based Software Effort Estimation - International ...