Keywords: software development effort; SDE; software testing effort; STE; ... release management, e-commerce and business intelligence, expert system and advanced ..... UK. Bennatan, E.M. (2003) 'So what is the state of software estimation?
Int. J. Software Engineering, Technology and Applications, Vol. 1, Nos. 2/3/4, 2015 145
Estimation of software testing effort using fuzzy multiple linear regression Praveen Ranjan Srivastava Information Technology and System Group, Indian Institute of Management, Rohtak, 1240011, India Email: praveenrsrivastava@gmail Abstract: Prior to the development of any software project, an estimate of the testing effort involved in it needs to be determined. The estimation is very important because the optimisation of the costs associated with the development depends on the accuracy of the estimation. Due to complex and varied nature of software applications and associated uncertainty, prediction of the cost of software development at an early stage is a tough job. In this research, by using a widely used algorithmic model COCOMO II, fuzzy logic and fuzzy multiple linear regression techniques, we provide an algorithm for software testing effort estimation. Keywords: software development effort; SDE; software testing effort; STE; fuzzy logic; fuzzy multiple linear regression; test effort drivers; TEDs. Reference to this paper should be made as follows: Srivastava, P.R. (2015) ‘Estimation of software testing effort using fuzzy multiple linear regression’, Int. J. Software Engineering, Technology and Applications, Vol. 1, Nos. 2/3/4, pp.145–154. Biographical notes: Praveen Ranjan Srivastava is working as an Assistant Professor in the Information Technology and Systems Group at the Indian Institute of Management (IIM), Rohtak, India. He is currently doing research in the area of software management and decision science and data analytics. His research areas are software testing, quality and effort management, software release management, e-commerce and business intelligence, expert system and advanced soft computing techniques. He has published more than 120 research papers in various leading international journals and conferences in the area of software engineering and management. He is the Editor-in-Chief of the International Journal of Software Engineering, Technology and Applications (IJSETA), published by Inderscience. He is also a member of the editorial board of various leading journals.
1
Introduction
Software engineering is a dedicated study and a systematic approach for producing software of better quality, of low cost and of higher efficiency and that can be built faster and maintained easily (Nau and Randell, 1968). Software development generally follows the guidelines of a software development life cycle (Somerville, 2005). Software testing is an important activity of the software development life cycle and its aim is not just finding the errors in the developed software but to make sure that the software fulfils
Copyright © 2015 Inderscience Enterprises Ltd.
146
P.R. Srivastava
client’s requirements (Somerville, 2005). During the testing process, the validation and verification of the software is conducted to check whether the software meets the requirements that guided its design, development and to check whether it is implemented with the same characteristics as expected. Efficient testing leads to good quality of software, user satisfaction and lower maintenance cost. However, ineffective testing leads to low quality products with increased maintenance costs, unsatisfied users and inaccurate results. In the software development process, nearly 35% of the elapsed time and more than 50% of the total cost are expended in testing and hence testing is an important activity of the software development process (Harrold, 2000; Beizer, 1990; Nageswaran, 2001; Bennatan, 2003; Kushwaha and Misra, 2008; Jones, 1996; Dawson, 1998). The process of estimating the software development effort (SDE) is completely based on uncertain and/or noisy inputs (Beizer, 1990; Boehm, 1981) and this is the reason why most software projects are facing the effort estimation problem. A list of acronyms used in this paper is provided in the Appendix. Several models for estimating SDE are available. Some of these models are COnstructive COst MOdel (COCOMO), function point analysis and so on (Somerville, 2005; Pressman, 2004). COCOMO is an algorithmic software cost estimation model that is based on historical project data, project characteristics and other variables. Based on SDE estimation, generally, software testing effort (STE) is around 40–50% of SDE (Nageswaran, 2001; Rubin, 1995). STE is estimated as a percentage of the calculated SDE, based on certain heuristics and previous experiences. STE cannot be determined independently (i.e., without SDE estimation) and there is no standard procedure to determine an accurate value for it. This research presents an algorithm to deal with the estimation of STE by integrating COCOMO II (Boehm, 1981), fuzzy logic (Zadeh, 1965) and weighing techniques (Ganesh, 2006), fuzzy regression analysis (Marza and Seyyedi, 2009) and test effort drivers (TEDs) (Srivastava et al., 2011; Bhattacharya et al., 2012). Trapezoidal membership functions with monotonic constraints (Gu et al., 2006) have been chosen in this algorithm, to achieve good generalisation. The presented algorithm is validated with a case study. The rest of the paper is organised as follows. In Section 2, we discuss the concepts and theories proposed till date on the use of fuzzy logic in STE estimation. Section 3 provides various methodologies and concepts that are used in this paper. Section 4 provides the steps of the algorithm. In Section 5, we present a case study by using a students’ project dataset. Finally, conclusion and future scope of this work are presented in Section 6.
2
Background work
In software development, the design of software which is optimal in terms of time, cost, efforts and other resources is very important and necessary. A systematic formulation of STE is needed to ensure that the above optimisation constraints are fully satisfied before the release of the software. Estimating the cost and duration of STE is a major challenge these days. An early estimation of STE is based on the testing metrics, which generally overestimate the efforts, depending on the expertise of the software testing team (Hamer and Frewin, 1982).
Estimation of software testing effort using fuzzy multiple linear regression
147
Halstead developed a set of metrics to measure the complexity of a programme module directly from its source code (Halstead, 1978). Kushwaha and Misra (2008) used cognitive information complexity method to estimate STE and proposed that the time required to comprehend the software is proportional to its cognitive complexity. Jones (1996) determined STE by using function point concept but the disadvantage is that the function points require the detailed requirements in advance. Nagappan (2004) has proposed a metric suite on the quality of STE. Nageswaran (2001) presented a method for estimating STE to perform all functional test activities based on the use case points. Jorgensen (2004) has emphasised the human factors in SDE estimation. An estimation model for test execution effort based on the test specifications was proposed by Aranha and Borba (2007). A classifying programme of debugging activities based on the levels of Bloom’s taxonomy was presented by Xu and Rajlich (2004). STE based on neural network theory was presented by Dawson (1998). An approach for the development of SDE and schedule estimation models using soft-computing techniques has been presented by Sheta et al. (2008) and this approach builds a suitable model structure to utilise improved estimations of SDE for NASA software projects. Srivastava et al. (2009) have presented an approach for STE estimation by integrating Halstead matrices with fuzzy logic. A main disadvantage of this approach is that the source code must be available in advance for the estimation. An intelligent approach for STE estimation has been presented by Srivastava et al. (2011) and it estimates reliable STE using fuzzy logic. Another approach (Bhattacharya et al., 2012) using swarm technique used by Bhattacharya et al. (2012) but main issue are number of particles and complexity. In the present research, we deal with the estimation of STE by using fuzzy logic and fuzzy multiple linear regression techniques to develop TEDs. Before explaining the presented method, we discuss several standard techniques that are used in the presented method.
3
An overview of different methods used in this research
3.1 COCOMO model COCOMO is a simple cost model based on the size of the project and is used for estimating the required number of person-months and the required development schedule in months, to develop the software. It also estimates an effort and schedule distribution by major phases. The basic COCOMO model was initially presented by Boehm (1981). Although the basic COCOMO model is an initial technique, it is hard to estimate the kilo lines of source code (KLOC) accurately during the early stages of the project with this model. The model is extremely vulnerable to misclassification of the development mode. In addition, the success of the model hugely depends on tuning it to the needs of the organisation, using historical data that is not available all the times. COCOMO II (Boehm et al., 2000) preserves the openness of the original COCOMO but it is a more comprehensive SDE model. COCOMO II is used in the present research because it is better suited for estimating the modern software development projects and for updated project databases. COCOMO II consists (Somerville, 2005; Pressman, 2004) of three sub-models with increasing reliability. These sub-models are: applications composition, early design and post-architecture. In this study we have used early design model because
148
P.R. Srivastava
we believe user requirements have been agreed and initial stage of the system design processed. Other model like application composition is based upon number of object points and reusability factors while post architecture model is depend upon complete design pattern and at least few modules must be ready to fire.
3.2 Fuzzy logic Fuzzy logic was proposed by Zadeh in his proposal of fuzzy set theory (Zadeh, 1965). It is a form of multi-valued logic that allows intermediate values to be defined between conventional evaluations like yes/no, true/false, black/white, etc. The fuzzy logic method comprises of: 1
taking an Input for analysis
2
processing these inputs according to if-then rules, expressed in plain language words
3
defuzzifying the resulting outputs by averaging and weighting techniques to obtain a final precise ‘crisp’ value.
3.3 Fuzzy multiple linear regression Fuzzy multiple regression is more suitable for any other fuzzy techniques (Bargielaa et al., 2007) since in this study lot of parameters are considers that’s why Fuzzy multiple linear regression can be considered during quantification of TEDs.
4
The algorithm
In this proposed study, this paper try to optimise STE suing various TEDs, COCOMO II and fuzzy multiple linear regression. The proposed algorithm consider reference (Srivastava et al., 2011) as a base, in Srivastava work (Srivastava et al., 2011) they consider COCOMO model and use of fuzzy logic. They simply compute testing effort value on the basis of new KLOC value is always less than the predicated value. In this proposed algorithm we have used instead of COCOMO ,COCOMO II model for more accurate analysis, because COCMO II have more close requirement analysis’s ,which help estimation of testing effort more accurately. We also improve confidence value and measurement of new KLOC and instead of fuzzy we have used fuzzy multiple linear regression for TEDs and finally we calculated error deficiency with the actual test effort value. The steps underlying the algorithm that is used to estimate the STE are as follows: Step 1
Prepare a software requirement specification document for the project/module under consideration.
Step 2
Estimate the KLOC for the project/module with the project team.
Step 3
Estimate the factors influencing the confidence value (C) (Srivastava et al., 2011). Regarding project assessment of any software, generally organisation believe on project manager estimation, that why few project are either over budget or under budget, or have a problem at the time of software release . Due
Estimation of software testing effort using fuzzy multiple linear regression
149
to above problem calculate confidence (C) of a manager. Confidence factor is depend upon the following factors: • weighted mean experience of the team (measured in number of years) covering various aspects of the team such as application experience, programming language experience, platform experience, etc., depending on the nature of the project • team capability (measured as the success rate of the project manager, normalised to a number between 0–10) along with a weighted mean of the project analyst capabilities and programmer capabilities, depending on the nature of project • familiarity of the project team (measured as the number of similar projects they have done in the past) • use of software tools and disciplined methods (rated on a scale of 0–10). Note that these factors can be altered as per the needs of the testing team but the methodology remains the same. Step 4
Calculate the value of C by the development of a fuzzy rule base, max-min composition and defuzzification using the centre of gravity (COG) method (Ganesh, 2006; Rajasekaran and Vijayalakshmi, 2004).
Step 5
Estimate a range for KLOC as follows: KLOC range = [min (KLOC), max (KLOC)] = [C * KLOC, KLOC/C]. Here, C is the factor for estimating the variation of actual KLOC from the project team mandated value of KLOC which in turn depends on the aforementioned factors. Hence, by multiplying C with KLOC we get the minimum value of KLOC and by dividing we get the maximum value.
Step 6
Using the two KLOC values (modified KLOC by step 5), estimate a range for SDE (E), i.e., [min (E), max (E)], using COCOMO II model as follows: E = 2.94 * EAF * (KLOC)b where EAF is the effort adjustment factor derived from the cost drivers and ‘b’ is an exponent derived from the five scale drivers (Somerville, 2005) (cost and exponent drivers are well defined in COCOMO model). The values of EAF = 1.00 and b = 1.0997 for any project with all nominal cost drivers and scale drivers.
Step 7
Determine the TEDs and their influence on the software testing process for the module under consideration. Extremely project specific factors can be researched or a generalised list can be followed which comprises (Srivastava et al., 2011): • schedule pressure (SP) is inversely proportional to STE • software complexity (SC) is directly proportional to STE • software quality (SQ) is directly proportional to STE • developer’s characteristics (DC) is directly proportional to STE. These factors can vary based on the nature of the project or the module under consideration.
150
P.R. Srivastava
Step 8
Determine the team estimated cycle time (a) and the management mandated cycle time (b) (Agrawal and Chari, 2007) and then calculate SP as: SP = (a – b)/a. We do not directly take SP as our crisp input here. We make a slight change and use SP′ = 1 – SP as crisp input. If the SP is low, the testing effort is high. SP is an inverse relation with software testing, but the other factors are directly related. Hence we make this little change to maintain the homogeneity (Agrawal and Chari, 2007).
Step 9
Estimate SC by assigning a value to it on the scale of 0–10.
Step 10 On a scale of 0–10, rate the factors (Bhattacharya et al., 2012): functionality (f), reliability (g), usability (h), flexibility (i), efficiency (j), transferability (k) and then calculate SQ as: SQ = (f + g + h + i+ j + k) / 6. Step 11 Estimate the factors: equipment experience (m) measured in number of years, experience in project management (n) measured in number of years, types of development and support tools (o), programming practices (p) and then calculate DC as: DC = [(m + n) / 2 + o + p] / 3. Step 12 Calculate the percentage P factor for STE, since it has been observed that STE is 40–50% of SDE, this is the reason we assumed the value of percentage P in this range for most of the projects. Step 13 To apply fuzzy multiple regression to calculate STE’, given the above parameters, acquire the historical data for the projects and calculate the fitness membership function (Ui) for each dataset. From the above values of SP′, SC, SQ, DC and STE we calculate the normal distribution membership function as: 2
U i = {1/ σ (2π )1 2 } * e−1 2{( yi − μ ) σ}
where yi = ‘P’ values. σ = standard deviation of ‘P’ values = 0.070604763, µ = average or mean of ‘P’ values = 0.558138462. Step 14 To obtain the regression parameters b0, b1, b2 … bk and percentage P’ (using various parameters which we explained steps 8 to 10) apply the following equations (Marza and Seyyedi, 2009; Gu et al., 2006): S11b1 + S12b2 + … + S1k = S1 y S21b1 + S22b2 + … + S2 k = S2 y … … … Sk1b1 Sk 2 b2 + … + Skk = Sky
∑ u∑ ux x − ∑ ux ∑ ux = ∑ u ∑ ux y − ∑ ux ∑ uy i, j = 1, 2,… , k = ∑ uy ∑ u – b * ∑ ux ∑ u – b * ∑ ux ∑ u – … – b * ∑ ux ∑ u
Sij = Siy b0
i
j
i
i
j
i
1
1
2
2
k
k
Estimation of software testing effort using fuzzy multiple linear regression
151
Step 15 Calculate the actual testing effort as: STE = P * SDE and the estimated testing effort as: STE′ = P′ * SDE′ where STE, SDE and P are the calculated values from the students project dataset [in house development project like login management system, registration system etc) and STE’, SDE’ and P’ are the predicted (our measurement) values. Step 16 Calculate the magnitude of relative error (MRE) as a performance metric for each of the data [the value which you calculated during the analysis i.e., actual testing effort (STE) and predicated testing effort (STE’)] and it is calculated as MREi =
Actual Effort i -predicted Effort i Actual Effort i
A case study of the algorithm by using students’ projects is presented in the next section.
5
A case study
Project description: we considered a case study involving the students’ projects. The dataset considered is obtained from different experimental results of the projects taken by students of Birla Institute of Technology and Science, Pilani and Indian Institute of management, Rohtak (most of the projects are in-house and are currently used by the students). For instance, one of the projects under consideration deals with the preparation of a GUI for user login and updating the user database using Java and Oracle database 10g. The projects vary from 1.8 to 9.1 KLOC and cover various programming languages such as Java, C, C++, PHP, Ruby. A comparison between STE (actual testing effort, in person months) and STE’ (proposed testing effort by the presented approach) is done and MRE is calculated as a performance metric for each of the data. The values are listed in Table 1. Table 1
MRE values between STE and STE’
STE
STE’
MRE
234.2763
214.8786
0.082798
117.2198
89.34402
0.237808
116.5112
99.03691
0.14998
145.4406
106.8031
0.265658
76.9447
60.04885
0.219584
142.494
139.9913
0.017564
28.745565
26.42657
0.080676
19.84194
18.87789
0.048586
42.94775
38.48371
0.103941
6.325771
4.732243
0.251911
6.119541
6.254253
0.022013
16.99266
13.64813
0.196822
3.468962
2.739771
0.210204
Note: Average MRE = 0.145
152
P.R. Srivastava
These values show that the variation between the actual and predicted STEs is not much. The variations are too much when we used the traditional approaches like COCOMO, test point analysis, Halstead software science etc (we tested using various approach, since those are well known approaches, that’s why we are not giving deviation information, we are only interested to take difference between actual and our predicated method. These numbers show that the presented algorithm is quite useful than existing methods such as COCOMO or COCOMO II. In contrast to compare with Srivastava work (Srivastava et al., 2011) the difference is in SDE value and another difference is COCOMO II used instead of COCOMO. And another deviation (beauty of this paper) is this paper used multiple linear regressions, instead of simple fuzzy theory.
6
Conclusions
The aim of this research is to utilise fuzzy approach and multiple regressions in estimating STE. However, one of the greatest difficulties in using this algorithm is determining and fine-tuning the fuzzy rules and it depends on the exposure and experience of the decision maker and accuracy of the parameters estimated by the project team. We are working on applying other soft-computing techniques to address the discontinuities in the membership functions.
References Agrawal, M. and Chari, K. (2007) ‘Software effort, quality, and cycle time: a study of CMM level 5 projects’, IEEE Transactions on Software Engineering, Vol. 33, No. 3, pp.145–156. Aranha, E. and Borba, P. (2007) ‘An estimation model for test execution effort’, 1st International Symposium on Empirical Software Engineering and Measurement, pp.107–116, Madrid, Spain. Bargielaa, A., Pedryczb, W. and Nakashimac, T. (2007) ‘Multiple regression with fuzzy data’, Fuzzy Sets and Systems, Elsevier, Vol. 158, No. 19, pp.2169–2188. Beizer, B. (1990) Software Testing Techniques, 2nd ed., Van Nostrand Reinhold Company Limited, UK. Bennatan, E.M. (2003) ‘So what is the state of software estimation?’, Cutter Consortium, February [online] http://www.cutter.com/research/2003/edge030211.html (accessed 13 December 2014). Bhattacharya, P., Srivastava, P.R. and Prasad, B. (2012) ‘Software test effort estimation using particle swarm optimization’, India 2012 , Advances in Intelligent and Soft Computing (AISC), Vol. 132, pp.827–835. Boehm, B. (1981) Software Engineering Economics, Prentice-Hall, USA. Boehm, B.W., Abts, C., Brown, A.W., Chulani, S., Clark, B.K., Horowitz, E., Madachy, R., Reifer, D. and Steece, B. (2000) Software Cost Estimation with COCOMO II, Prentice-Hall, USA. Dawson, C.W. (1998) An Artificial Neural Network Approach to Software Testing Effort Estimation, Vol. 20, Information and Communication Technologies, Transaction of the Wessex Institute, UK. Ganesh, M. (2006) Introduction to Fuzzy Sets and Fuzzy Logic, Prentice Hall of India Pvt. Ltd, India.
Estimation of software testing effort using fuzzy multiple linear regression
153
Gu, X., Song, G. and Xiao, L. (2006) ‘Design of a fuzzy decision-making model and its application to software functional size measurement’, International Conference on Computational Intelligence for Modeling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, pp.199–204, Sydney, Australia. Halstead, M.H. (1978) ‘Software science – a progress report’, Second Software Life Cycle Management Workshop, Atlanta, GA, USA. Hamer, P.G. and Frewin, G.D. (1982) ‘M.H. Halstead’s software science – a critical examination’, International Conference on Software Engineering, pp.197–206, Tokyo, Japan. Harrold, M.J. (2000) ‘Testing: a roadmap’, 22nd International Conference on Software Engineering, Future of Software Engineering Track, Limerick, Ireland. Jones, C. (1996) Applied Software Measurement, McGraw-Hill, USA. Jorgensen, M. (2004) ‘Realism in assessment of effort estimation uncertainty: it matters how you ask’, IEEE Transactions on Software Engineering, Vol. 30, No. 4, pp.209–217. Kushwaha, D.S. and Misra, A.K. (2008) ‘Software test effort estimation’, ACM SIGSOFT Software Engineering Notes, Vol. 33, No. 3, pp.1–6. Marza, V. and Seyyedi, M.A. (2009) ‘Fuzzy multiple regression model for estimating software development time’, International Journal of Engineering Business Management, Vol. 1, No. 2, pp.79–82. Nagappan, N. (2004) ‘Toward a software testing and reliability early warning metric suite’, 26th International Conference on Software Engineering, pp.60–62, Edinburgh, Scotland, UK. Nageswaran, S. (2001) ‘Test effort estimation using use case points’, Quality Week, June, San Francisco, California, USA. Nau, P. and Randell, B. (1968) Software Engineering: Report of A Conference Sponsored by the NATO Science Committee, Scientific Affairs Division, NATO, Garmisch, Germany. Pressman, R.S. (2004) Software Engineering: A Practitioner’s Approach, 6th ed., McGraw-Hill, USA. Rajasekaran, S. and Vijayalakshmi, G.A. (2004) Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis and Applications, Prentice-Hall of India Pvt. Ltd, India. Rubin, H. (1995) Worldwide Benchmark Project Report, Rubin Systems Inc., USA. Sheta, A., Rine, D. and Ayesh, A. (2008) ‘Development of software effort and schedule estimation models using soft computing techniques’, IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp.1283–1289, Hong Kong. Somerville, I. (2005) Software Engineering, 7th ed., Pearson Education, India. Srivastava, P.R., Kumar, S., Singh, A.P. and Raghurama, G. (2011) ‘Software testing effort: an assessment through fuzzy criteria approach’, Journal of Uncertain Systems, World Academic Press, UK, Vol. 5, No. 3, pp.183–201, ISSN: 1752-8909. Srivastava, P.R., Saggar, S., Singh, A.P. and Raghurama, G. (2009) ‘Optimization of software testing effort using fuzzy logic’, International Journal of Computer Sciences and Engineering Systems, Vol. 3, No. 3, pp.179–184. Xu, S. and Rajlich, V. (2004) ‘Cognitive process during program debugging’, 3rd IEEE International Conference on Cognitive Informatics, pp.176–182, Victoria, Canada. Zadeh, L.A. (1965) ‘Fuzzy sets’, Information and Control, Vol. 8, No. 3, pp.338–353.
154
P.R. Srivastava
Appendix SDE
software development effort
STE
software testing effort
COCOMO
constructive cost model
TEDs
test effort drivers
KLOC
kilo line of code.