Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007.
Software Testing Product Metrics - A Survey Dr. Arvinder Kaur1,#Mrs. Bharti Suri2,#Ms. Abhilasha Sharma3 1
Reader, University School of Information Technology, GGS Indraprastha University, Kashmere Gate, Delhi,
[email protected] 2 Lecturer, University School of Information Technology, GGS Indraprastha University, Kashmere Gate, Delhi,
[email protected] 3 M.Tech Fellow (Gate scholar), University School of Information Technology, GGS Indraprastha University, Kashmere Gate, Delhi,
[email protected]
Abstract Metrics are gaining importance and acceptance in corporate sectors as organizations grow, mature and strive to improve enterprise qualities. Measurement of a test process is a required competence for an effective software test manager for designing and evaluating a cost effective test strategy. A number of metrics are proposed by various researchers in the area of software testing. They enable us to detect trends and anticipate problems with their probable solutions in effective cost control, quality improvements, time and risk reduction. Thus, facilitates to ensure and achieve optimal business objectives in global competitive market. This paper, focusing on Software Testing Product Metrics, surveys, classifies and also systematically analyzes the various proposed metrics during the last decades by the researchers. Advantages or disadvantages for each product metrics along with its need/purpose, suitability, effect, data calibration and interpretation etc., have also been discussed in the present paper. Keywords – Software Testing, Software Testing Metrics, Software Testing Product Metrics. 1.
INTRODUCTION
In recent years software testing technologies have emerged as a dominant software engineering practice which helps in effective cost control, quality improvements, time and risk reduction etc. The growth of testing practices has required software testers to find new ways for estimating their projects. A key research area in this field has been ‘measurement of and metrics for’ the software testing. Measurement since plays a critical role in effective and efficient software development, making measurements of the software development and test process is very complex [2]. Researches therefore, in proposing, applying, validity and extending metrics for software testing has reached a critical mass – as also evident from various publications, presentations, products[2][9][16]. However, without the benefit of a birds-eye perspective, researchers in different groups were continued to discover and propose metrics. For instance, Norman [15] in 1994; Chen ET. Al [2] in 2004 proposed metrics that deal with product level whereas Kan et. Al [7] in
1
2001; N.Nagappan[10] in 2005 proposed metrics that deals with process level in software testing. The aim of this paper, therefore, is to survey categorically on proposed product metrics for software testing so as to understand the researches conducted so far in this area which may be overviewed in the Section 2 of this paper. Section 3 deals with the perspective on software testing and demonstrates the need and importance of measurement and usefulness of software metrics whereas Section 4 identifies the requirement of metrics in software testing and emphasize, classify systematically the various product level metrics in software testing. The complete set of metrics is captured from a survey of the literature. Conclusions drawn from literature survey, classification and their analysis are being highlightened in the Section 5 of this paper 2. OVERVIEW OF RELATED RESEARCH AND PROJECT A number of useful related measurements of testing product level metrics have been reported in the literature. George, Robert and Tammy [17] presented a set of metrics during the testing of NASA’s MCCU. Their usage of the metric set throughout the testing effort leads to identify risks and problems early in the test process, minimizing the impact of problems. Further, by having a metric set with good coverage, managers were provided with more insight into the causes of problems, improving the effectiveness of response. Rosenberg and Huffman [14] use an automated tool to track requirements and their test cases has opened the door to the use of new requirements and testing measures. The tool can be used to point out requirements that may be subject to testing problems. Based on their research work, they conclude that metrics are available in the requirements phase to assess test plans. Their work discusses the efforts to evaluate testing for the requirements. Norman emphasize over a unified product evaluation [15]. His conclusion was based on both predictive and retrospective use of reliability, risk and test metrics, that is feasible to measure and assess product quality.
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007.
software will perform as intended and must satisfy the user’s requirements [12].
Yanping Chen, Robert and Robenson[2] presented a real life project conducted by the IBM ECD test team to evaluate a set of metrics. They analyze the effectiveness of a set of complementary metrics for the cost, time and quality to measure the quality of test process based on the results of the project. Their future work includes the evaluation of new metrics to measure performance of individual test phases, giving suggestions to test teams and support teams for necessary changes in the test process, and implementing the whole set of metrics in a production test environment. 3.
B. Measurement –Need and Importance An eminent physicist, Lord Kelvin said “when you can measure what you are speaking about, and can express it in numbers, you know something about it; but when you can not measure it, when you can not express it in numbers, your knowledge is of meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the stage of science”. Hence, it is clear that everything should be measurable. If it is not measurable, we should make an effort to make it measurable [1]. Measurement lies at the heart of many systems that govern our lives. It plays a critical role on effective software development and provides the scientific basis for software testing. Measurement could be defined as the process of empirical, objective, assignment of numbers to properties of objects or events of the real world in such a way as to describe them [5]. It is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to characterize them according to clearly defined rules [4]. Fundamental measurement is a means by which numbers can be assigned according to natural laws to represent the property, and yet does not presuppose measurement of any other variables than the one being measured [18]. An effective measurement activity should be able to evaluate the current process and provide suggestion to the manager for future improvement [2].
SOFTWARE TESTING AND SOFTWARE METRICS
A. Software testing Guidelines Software testing is the technical kernel of software quality engineering and to develop critical and complex software systems which not only requires a complete, consistent and unambiguous design, and implementation methods, but also a suitable testing environment that meets certain requirements to face the complexity issues [3].Traditional methods are not effective in understanding the software’s overall complex behavior as they are time consuming and insufficient for the dynamism of modern business. But today’s innovative software testing technologies enable the creation of more so sophisticated quality assurance. We create tests, run them and report the progress and results of our work. New version of the product comes, we might reuse some old tests, check whether bugs have been fixed, and (or) try new tests. Eventually, the product is either cancelled or put into production.
C. Software Metrics and their Usefulness: In applying measurements to software engineering, several types of metrics are available called software metrics. Metric can be defined as “The continuous application of measurement based techniques to the software development process and its products to supply meaningful and timely management information, together with the use of those techniques to improve that process and its products” [1]. Schulmeyer defines a metric as “A quantitative measure of the degree to which a systems, component or process possesses a given attribute” [16].
Testing is the process of executing a program with the intention of finding faults [1]. Manages traditionally have evaluated test progress using programmer and test team intuitions [14]. The IEEE/ANSI defines testing as: “The process of operating a system or component under specified conditions, observing or recording the result and making an evaluation of some aspect of the system or component” and as “the process of analyzing a Software item to detect the difference between existing and required conditions and to evaluate the features of the software items”.
Software metrics are used to evaluate the software development process and the quality of the resulting product [17]. Software metrics aid evaluation of the testing process and the software product by providing objective criterion and measurements for management decision making. Their association with early detections and correction of problems make them important in software. Software metrics are all about measurements which, in turn, involve numbers, the use of numbers to make things better, to improve the process of developing software and to improve all aspects of the management of that process.
The question arises: Why do we test? The two main reasons are: to make a judgment about quality or acceptability and to discover problems [6]. Although it is an expensive activity, yet launching of software without testing may lead to cost potentially much higher than that of testing [1]. The primary advantage of testing is that the software being developed can be executed in its appropriate environment and the results of theses executions with the test cases provide confidence that the
2
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007.
Test metrics are powerful risk management tool, help us to measure current performance. Importance of having test metrics [13]: (i) Provides a basis for estimation and facilitates planning for closure of the performance gap. (ii) Provides a means for control/status reporting. (iii) Identify risk areas that require more testing. (iv) Quickly identifies and helps to resolve potential problems and identifies areas of improvement. (v) Test metrics provide an objective measure of the effectiveness and efficiency of testing.
Software metrics are used to provide quantitative checks on software design. They include information about productivity, quality and product or process effectiveness. Use of software metrics does not ensure survival, but it improves the probability of survival. Table 1: Classification of metrics Category Description Product Describe the Metrics characteristics of the project Process Describe the Metrics effectiveness and quality of the process that produce the software product Project Describe the Metrics project’s characteristics and execution
Example Size, Performance, Efficiency etc. Effort, Time, No. of defects found during testing
A. Software Testing Product Metrics Software testing product metrics are used for measurement of testing parameters at product level. As per the literature survey [2][7][8][10][13]and in general, the criteria for the classification of testing product metrics are as follows: (i) High quality product (ii) Time-to-market (iii) Cost-to-market (iv) Parameters for specific test phase (v) Test session efficiency and (vi) Test Focus The salient features for each type of the metrics based on the above criteria are also brief in Table 2.
No. of software developers, Cost, Schedule
Some metrics belong to multiple categories like quality metric may belong to all three categories.
5. CONCLUSION
4. SOFTWARE TESTING METRICS
We have presented a survey of existing metrics proposed for software testing focusing on product metrics. In particular, we have analyzed the metrics with respect to the coverage of effective cost control, quality improvement risk and time reduction. The survey does not concentrate over the specific concerns of the individual metric as one metric is not sufficient to gain in the quintessence of testing efficiency. Hence, the survey results indicate that carefully chosen testing metric suite can be very beneficial in software product measurement and can provide useful information to test managers for decision making and future improvements. The survey provides a historical view and uncovers gaps in existing research. The presented literature is based on a uniform representation of existing metrics and is aided by storing the information in a structured repository for ease of processing.
Software metrics are applicable to the whole development life cycle from initiation, when cost must be estimated to monitoring the reliability of the end product in the field, and the may that product changes over time with enhancement. Software metrics hold importance in testing phase, as software testing metrics acts as indicators of software quality and fault proneness. Testing metrics exhibit trends and characteristics over time that would be indicative of the stability of the process [11]. The essential step is establishing test metrics is to identify the key software testing processes that can be objectively measured. A. Test Metrics – Need and Importance A major percentage of software projects suffer from quality problems, which in turn requires new testing metrics to measure test processes effectively. Test metrics are key “facts” that project managers can use: (i) to understand their current position (ii) to prioritize their activities to reduce the risk of schedule over-runs on software releases.
3
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007.
Table 2: Comparison of Software Testing Product Metrics Year and Size of project 2004
Test Metric
Definition/ Purpose
Formula
Effect
Quality code (QC)
It captures the relation between the number of weighted defects and the size of product release. Purpose is to deliver a high quality product. It shows the relation between the number of weighted defects shipped to customers and size of the product release. Purpose is to deliver a high quality product. It shows the relation between the number of weighted defects detected by the test team during and the size of the product release. Purpose is to deliver a high quality product. It shows the relation between the number of weighted defects detected during testing and the total number of weighted defects in the product. Purpose is to deliver a high quality product. It shows the relation between tie spent on testing and the size of the product release. Purpose is to decrease timeto-market. It shows the relation between time spent on testing and the time spent on developing. Purpose is to decrease time-to-market. It shows the relation between resource or money spent on testing and the size of the product release. Purpose is to decrease cost-to-market. It shows the relation between testing cost and development cost of the product. Purpose is to decrease costto-market. It shows the relation between money spent by test team and the number of weighted defects detected during testing. Purpose is to decrease cost-tomarket. It shows the relation between thi number of weighted defedts detected and the size of the product release. It shows the relation between time spent on testing and the size of the product release.
WTP+WF/KCSI
The lower the value of QC, indicating fewer defects or less serious defects found, the higher the quality of code delivered.
WF/KCSI
A low number of QP indicates fewer defects, or less serious defects were shipped, implying a higher quality of the code delivered by the test teams.
WTTP/KCSI
The higher the number, indicating more defects or more imp. Defects were detected the higher the improvement to the quality of the product which can be attributed to the test teams.
WT/(WTP+WF)*10 0%
The higher the TE, indicating a higher ratio of defects or imp. Defects were detected before release, the higher is the effectiveness of the test organization to drive out defects.
TT/KCSI
The lower this number, the less time required by the test teams to test the product.
TT/TD*100%
The lower this number, the lower is the amount of time required by the test teams to test the product compared to the development team.
CT/KCSI
The lower this number, the lower is the cost required to test each thousand lines of code.
CT/CD*100%
The lower this number, the lower is the cost required by the test teams to test the product compared to development team.
CT/WT
The lower this number, the lower is the cost of finding one defect unit, and the more costeffective is the test process.
WP/KCSI
The higher this number, the higher is the improvement of the quality of the product contributed during this test phase. The lower this number, the less time required for the test phase relatively.
2004
Quality of the Product (QP)
2004
Test Improvement (TI)
2004
Test effectiveness (TE)
2004
Test time (TT)
2004
Test time over development time(TD)
2004
Test cost normalized to product size (TCS) Test cost as a ratio of development cost (TCD) Cost per weighted defect unit (CWD)
2004
2004
2004
2004
2004
2004
2004
815,362 SLOC of
Test improvement in product quality. Test time needed normalized to size of product. Test cost normalized to size of product. Cost per weighted defect unit. Test effectiveness for driving out defects in each test phase. Software size
It shows the relation between resource or money spent on the test phase and the size of the product release. It shows the relation between money spent on the test phase and the number of weighted defects detected. It shows the relation between the number of one type of defects detected in one specific test phase and the total number of this type of defect in the product. Measured by a count of source lines of code (SLOC). Goal is to show the
TTP/KCSI
CTP/KCSI
CTP/WT
WD/(WD+WN)*10 0%
__
4
The lower this number, the lower is the cost required to test each thousand lines of code in the test phase. The lower this number, the lower is the cost of finding one defect unit in the test phase, and the more cost-effective is this test phase. The higher this number, indicating a higher ratio of defects or important defects was detected in the “appropriate” test phase, the higher is the effectiveness of this test phase to drive out its target type of defects. Large increases in software size late in the development cycle often result in increased
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007. NASA’s MCCU project, 1992 815,362 SLOC of NASA’s MCCU project, 1992 815,362 SLOC of NASA’s MCCU project, 1992
815,362 SLOC of NASA’s MCCU project, 1992 815,362 SLOC of NASA’s MCCU project, 1992
risks to the system over time.
testing and maintenance activities.
Software reliability
It is the probability of failure free operation of a computer program for a specified time in a specified environment.
Z(t) = (h)exp(ht/N)
The decreasing failure rate represents the growth in the reliability of software.
Test session efficiency
Goal of the test session efficiency metric is to identify trends in the scheduled test time’s effectiveness
SYSE= Active Test Time/Scheduled Test Time
Both should be greater than 80%.
Test focus
Goal is to identify the amount of effort spent finding and fixing real faults versus the effort spent either eliminating false defects or waiting for a hardware fix.
Software maturity
Goal is to quantify the relative stabilization of a software subsystem (ii) To identify any possible overtesting or testing bottlenecks by examining the fault density of the subsystem over time. Three components are: T, O, H.
Subprogram complexity
Goal is to identify the complexity of each function and to track the progress of function with a relatively high complexity as they represent the highest risk.
Test coverage
Goal of the metric is to examine the efficiency of testing over time.
Computer resource utilization
TE= Total no. of good runs/Total runs TF = No. of DRs closed with a software fix/Total no. of DR
T = Total no. of DRs changed to a subsystem/1000 SLOC O = No. of currently open subsystem DRs/1000 SLOC H = Active test hours per subsystem/1000 SLOC % if functions with a complexity greater than a recommended threshold. % of code branches that have been executed during testing
Goal is to estimate the utilized capacity of the system prior to operations to ensure that sufficient resources exist.
__
WTP – no. of weighted defects found in the product under test (before official release) WF – no. of weighted defects found in the product after release. KCSI – no. of new or changed source lines of code in thousands. WTTP – is the no. of weighted defects found by the test team in the test cycle of the product. WT – no. of weighted defects found by the test team during the product cycle. TT – no. of business days used for product testing.
In an ideal case TF approaches unity as testing proceeds.
Graph of T versus H should begin with a near infinite slope and approach a zero slope. Otherwise a low quality subsystem is indicated and should be investigated. Graph of O versus H is an indication of low rapidly faults are being fixed. It should begin with a positive slope, then as debuggers begin to correct the faults, the slope should become negative
A positive trend indicates the needs to reevaluate the software change philosophy, possibly resulting in some re-design. A negative trend indicates the changed function had been redesigned to reduce complexity and increase maintainability. __
__
TD – no. of business days used for product development. CT – total cost of testing the product in dollars. CD – total cost of developing the product in dollars. WP – no. of weighted defects found in one specific test phase. TTP – no. of business days used for a specific test phase. CTP – total cost of a specific test phase in dollars. WD – no. of weighted defects of this defect type that are detected after the test phase. WN – no. of weighted defects of this defect type (any particular type) that remain uncovered after the test phase (missed defects)
5
Proceedings of National Conference on Challenges & Opportunities in Information Technology (COIT-2007) RIMT-IET, Mandi Gobindgarh. March 23, 2007.
[7] S.H. Kan, J. Parrish and D.Manclove “In-process metrics for software testing” IBM Systems Journal, Vol. 40. no. 1,2001. [8] Stephen H. Kan “Metrics and Models in Software Quality Engineering”, Second Edition, 2002. [9] Cem Kaner “Software Engineering Metrics: What do they measure and how do we know?” 10th International Software Metrics Symposium, Metrics 2004. [10] N.Nagappan “Toward a software Testing and Reliability Early warning Metric Suite” Proceedings of the 26th International Conference on software engineering (ICSE’04), 2004. [11] Hideto Ogasawara, Atsushi Yamada, Michiko Kojo “Experiences of software Quality Management Using Metrics through Life cycle”, Proceedings of ICSE-18, 1996.. [12] E. Osterweil “Strategic Directions in Software Quality”, ACM Computing Surveys 4, 1996, pp 738750. [13] Ramesh Pusala “Operational Excellence through efficient Software Testing Metrics” Infosys, 2006. [14] Linda H. Rosenberg, Theodore F. Hammer, Lenore L. Huffman “Requirements, Testing, and Metrics” NASA GSFC, 1998. [15] Norman F. Schneidewind “Measuring and Evaluating Maintenance Process Using Reliability, Risk, and Test Metrics” IEEE Transaction on Software Engineering, Vol. 25, No. 6, 1999. [16] G.Gordon Schulmeyer, James I. Mcmanns, “Handbook of Software Quality assurance, 3rd edition, Prentice Hall PTR, Upper Saddle River, NJ, 1998. [17] George E. Stark, Robert C. Durst, Tammy M. Pelnik “An Evaluation of Software Testing metrics for NASA’s Mission Control Center” 1992. [18] W. Torgerson, S., “Theory and Methods of Scaling” New York: John Wiley & Sons, 1958.
Z(t) – instantaneous failure rate. h – Failure rate prior to the start of testing N – no. of faults inherent in the program prior to the start of testing. SYSE – System Efficiency TE – Tester efficiency DR – Discrepancy Report TF – Test Focus T – Total Density O – Open Density H – Test Hours REFERENCES
[1] K. K. Aggarwal & Yogesh Singh “Software Engineering Programs Documentation Operating Procedures (Second Edition)” New Age International Publishers, 2005. [2] Yanping Chen, Robert L. Probert, Kyle Robenson “Effective Test Metrics for Test Strategy Evolution” Proceedings of the 2004 Conference of the centre for Advanced Studies on Collaborative Research CASCON’04 [3] P. Dhavachelvan, G. V. Uma, V.S.K. Venkatachalapathy “A new approach in development of distributed framework for automated software testing using agents” [4] N. E. Fenton and S. L. Pfleager “Software Metrics: A Rigorous and Practical Approach”, Second Edition Revised ed. Boston: PWS Publishing, 1997. [5] L. Finkelstein “Theory and Philosophy of Measurement” , in Theoretical Fundamentals, Vol. 1, Handbook of measurement Science, P.H. Sydenlam, Ed. Chichester : John Wiley & Sons, 1982, pp. 1-30. [6] Paul C. Jorgensen “Software Testing - A Craftsman’s Approach Second Edition” CRC Press, 2002.
6