Software Qual J (2006) 14: 135–157 DOI 10.1007/s11219-006-7599-x
Utilization of statistical process control (SPC) in emergent software organizations: pitfalls and suggestions K. U. Sargut · O. Demir¨ors
C
Springer Science + Business Media, Inc. 2006
Abstract Common wisdom in the domain of software engineering tells us that companies should be mature enough to apply Statistical Process Control (SPC) techniques. Since reaching high maturity levels (in CMM or similar models such as ISO 15504) usually takes 5–10 years, should software companies wait years to utilize Statistical Process Control techniques? To answer this question, we performed a case study of the application of SPC techniques using existing measurement data in an emergent software organization. Specifically, defect density, rework percentage and inspection performance metrics are analyzed. This paper provides a practical insight on the usability of SPC for the selected metrics in the specific processes and describes our observations on the difficulties and the benefits of applying SPC to an emergent software organization. Keywords Statistical process control · Control chart · Defect density · Rework percentage · Inspection performance
1. Introduction SPC (Statistical Process Control) has been widely used in manufacturing industries to control and improve processes (Sutherland et al., 1992). While the benefits of SPC are well founded for manufacturing companies, there have been many debates (Card, 1994; Kan, 1995; Lantzy, 1992) about its application in the software industry. Some of the inherent characteristics of software such as invisibility and complexity (Brooks, 1987) usually entail subjective judgment in collecting and interpreting process and product measures, even when these measures
K. U. Sargut () Computer & Information Sciences & Engineering Department, University of Florida, Gainesville, FL, US e-mail:
[email protected] O. Demir¨ors Informatics Institute, Middle East Technical University, Ankara, Turkey e-mail:
[email protected] Springer
136
Software Qual J (2006) 14: 135–157
are well defined. This causes variation of data and makes it difficult in applying SPC techniques in software. Nevertheless, the interest to apply SPC techniques in the software industry has grown as more organizations improve their processes by utilizing models such as Capability Maturity Model (CMM) (Paulk et al., 1993), Capability Maturity Model Integration (CMMI) (CMMI product team, 2001) and SPICE (ISO/IEC 15504-4, 1998). Process improvement models implicitly direct software companies to implement SPC as a crucial step for project level process control and organizational level process improvement purposes. In CMM, Quantitative Process Management key process area at level 4 requires establishing goals for the performance of the project’s defined software process, taking measurements of the process performance, analyzing these measurements, and making adjustments to maintain process performance within acceptable limits. Similarly in CMMI, Organizational Process Performance and Quantitative Project Management process areas require establishing process performance baselines for the organization’s standard software processes and quantitatively managing the project’s defined process to achieve the project’s established quality and process performance objectives. The SPICE Measurement attribute requires a software firm to analyze trends in process performance and maintain process capability within defined limits across the organization. Likewise, a level 4 process attribute called Process Control necessitates control of process performance to maintain control and implement improvement. Although these models direct software organizations to apply SPC techniques, the existence of SPC related requirements in high maturity levels (level 4 and above) and the common sense belief that the SPC can only be performed after achieving level 4 almost prohibits organizations implementing SPC techniques earlier. The critical issues for successful SPC implementation are process stability, measurement capability and data reliability. In other words, if the process is performed consistently, the right metrics are selected and a reliable data collection mechanism is established, it may be possible to benefit from implementing SPC techniques. In this study we focused on the question “Can emergent software organizations— organizations which are small to medium size and have CMM maturity levels three and below—utilize SPC techniques and benefit from the results?” To answer this question we performed a case study in an emergent organization (Sargut, 2003). During the case we collected the existing metric data related to company-specific procedures. By utilizing control charts, we observed practical evidence of the benefits and difficulties of applying SPC. In this paper we demonstrate the findings of this case study. In the second section, we review the studies related to the utilization of SPC in the software industry. In the third section, we describe major problems in implementing SPC and provide possible solution methods related to the specific metrics that we analyzed during the case study. In the fourth section, we give details of the case study and demonstrate our practices on defect density, rework percentage and review performance metrics. In the fifth and sixth sections, we discuss our findings and present our conclusions.
2. Utilization of SPC in software development The utilization of SPC in software development has been studied by a number of researchers. Some of these studies are based on process improvement models and focus on SPC as a way of achieving high process maturities (Burr and Owen, 1996; Florac and Carleton, 1999; Carleton and Florac, September 1999; Florac et al., 2000; Florac et al., 1997; Jakolte and Saxena, 2002; Humphrey, 1989). These model-based studies mostly represent CMM understanding of SPC for software process improvement. One of these studies belongs to Springer
Software Qual J (2006) 14: 135–157
137
Humphrey, who describes a framework for software process management, outlines the actions to provide higher maturity levels and acts as a basic guide to improve processes in a software organization (Humphrey, 1989). In his work, SPC appears as a means of data analysis for level 4 organizations. Florac and Carleton describe the utilization of SPC in the context of CMM for software process improvement (Florac and Carleton 1999). They provide detailed technical information about SPC and a roadmap of SPC implementation. They mostly focus on the principles and the methods for evaluating and controlling the process performance depending on Shewhart’s principles (Shewhart, 1939). They also discuss issues related to the application of control charts in software development and incorporate the experience gained from the manufacturing industries to the software processes. With all these various dimensions, the study can be regarded as a CMM-based SPC guideline for software development. SPC was one of the important themes in the 2001 High Maturity Workshop (Paulk and Chrissis, 2002). Thirty five high maturity (CMM level 4 and 5) organizations took part in the workshop that was performed to investigate the practices characterizing CMM for Software level 4 and 5 organizations. One of the workshops in this CMM-based study demonstrated that level three measures are not appropriate for implementing SPC. It also showed that the processes for establishing the process performance baselines should be maintained at low levels so that the organization-wide baselines can be useful in different projects. Some of the studies act as guidelines for using SPC by converting general SPC principles specifically to software development (Burr and Owen, 1996; Florac and Carleton, 1999). Burr and Owen provide such a guideline by describing the statistical techniques currently available for managing and controlling the quality of software during specification, design, production and maintenance (Burr and Owen, 1996). They focus on control charts as beneficial SPC tools and provide guidelines for measurement, process improvement and process management within software domain. Lantzy presents one of the earliest studies on the debate of applying SPC to software processes (Lantzy, 1992). He outlines a seven-step guideline for successful SPC implementation in a software organization. This study reveals some important points for the application of SPC to software processes:
r Metrics should correlate to the quality characteristics of the products that are defined by the customer.
r Metrics should be selected for the activities that produce tangible items. r SPC should be applied only to the critical processes. r The processes should be capable of producing the desired software product. Kan provides a comprehensive study of the theory and application of metrics in software development (Kan, 1995). It emphasizes the difficulty in achieving process capability in the software domain and is cautious about SPC implementation. He mentions that the use of control charts can be helpful for a software organization especially as a supplementary tool to quality engineering models such as defect models and reliability models. However, it is not possible to provide control as in manufacturing since the parameters being charted are usually in-process measures that do not represent final product quality. Final product quality can only be measured at the end of a project as opposed to the manufacturing industry, so that on-time control of processes becomes impossible. He also underlines the necessity of maturity for achieving process stability in software development. Finally, he brings a relaxed understanding by stating that the processes can be regarded in control when the project meets in-process targets and achieves end-product quality goals. Radice describes SPC techniques constrained within the software domain and gives a detailed tutorial by supporting his theoretical knowledge with practical experiences Springer
138
Software Qual J (2006) 14: 135–157
(Radice, 1998). He states that all SPC techniques may not be applicable for software processes and gives XmR (X and moving range) and u-charts as possible techniques. He also explains the relevance of SPC for CMM level 4 and regards backing-off of control charts in level 4 as a mistake. He states five problems with control charts: too much variation, unnecessary use of control charts, lack of enough data, lack of specification limits from the clients, and the idea that control charts cannot be used with software processes. The literature also includes studies that present practical experiences with SPC in the software domain (Card, 1994; Florac et al., 2000; Jakolte and Saxena, 2002; Weller, 2000). Weller provides one of the rare practical examples by presenting the SPC implementation details from a software organization (Weller, 2000). He uses XmR charts for the lines of code inspected per hour for each inspection and achieves a stable inspection process after removing the outliers from the dataset. Then he draws the u-chart for the defect density data for each inspection. By using these findings, he makes reliable estimations for inspection effectiveness and gains an insight on when to stop testing. Florac et al. present their analysis findings from a collaborative effort between SEI (Software Engineering Institute) and the Space Shuttle Onboard Project (Florac et al., 2000). They first emphasize the importance of selecting key processes, providing operational definitions, addressing the issues of data homogeneity and rational sub-grouping, using the correct control charts, understanding multiple-cause and mixed-cause systems, finding and testing the trial limits and recalculating the limits. Then they perform an SPC study on the package review data by implementing each of these factors. Finally they summarize the benefits of applying SPC to software processes. Card discusses the utilization of SPC for software by also considering some of the objections and mentioning possible implementation problems (Card, 1994). He gives a control chart example for testing the effectiveness measure and concludes that SPC principles can be beneficial for a software organization although formal statistical control techniques may not be used. Jakolte and Saxena move ahead on the idea of 3 sigma control limits and propose a model for the calculation of control limits to minimize the cost of type 1 and type 2 errors (Jakolte and Saxena, 2002). This is a pioneering study as it questions an accepted practice for control charts and the results of the sample studies are encouraging. However, the study is far from being practical as it includes too many parameters and assumptions. The usability of SPC in software development was also discussed in a panel named “Can Statistical Process Control be Usefully Applied to Software” in the European SEPG conference in 1999. Keller mentions the importance of SPC on management decision making and forecasting (Keller, 1999). He also outlines the observations and results for a specific SPC implementation. Barnard and Carleton emphasize the mixed behavior of processes (Barnard and Carleton, 1999) referencing their experiences on the Space Shuttle Onboard Project (Florac et al., 2000). Hirsh states that SPC charts should be used to gather desired benefits and the managers should be trained to ensure the use of SPC (Hirsch, 1999). She also mentions the necessity of embracing the value of metrics, having a defined and repeatable process, having a defined and repeatable metric program and having curiosity about metrics for usefully applying SPC to software. Meade demonstrates a synopsis from the SPC utilization as part of level 4 implementation in the Lockheed Martin Corp. (Meade, 1999). She specifies the importance of understanding the data and reveals that it is not possible to apply SPC to all software metrics. She finally portrays the results of the SPC study in the company showing that the smaller programs are better able to perform SPC. Wigle questions SPC implementation Springer
Software Qual J (2006) 14: 135–157
139
in the software organizations and stresses the importance of applying SPC at a level where decision making occurs (Wigle, 1999). Heijstek shares his experiences on statistical studies in Ericsson and shows the lack of data quality as the main analysis problem (Heijstek, 1999). As referred by Paulk and Carleton (Paulk and Carleton, 1999), Card emphasizes the importance of process stability and delineates lack of well-defined business objectives to guide collection and analysis as the major problem. We observe that the difficulties of using SPC techniques in the software domain are frequently discussed by researchers. Nevertheless, most of these studies contain very little information about implementation details, and examples are usually restricted to defect density and inspection effectiveness measurements. Moreover, the existing practical evidence represents only the state of high level organizations. SPC utilization in emergent organizations is discussed by very few researchers, but without practical evidence. Florac and Carleton state that it is possible to benefit from SPC at low levels, but it is advisable to have the basic measurement practices before using SPC (Carleton and Florac, September-1999). On the other hand, Radice supports utilization of SPC at lower levels once the process is executed consistently within the definitions and sufficient data are provided (Radice, 1998). Although the current resources do not provide practical evidence for our case study, we established a good understanding of SPC in a software organization by reviewing the available literature. In this regard, our survey revealed that:
r SPC might not be applicable to all software processes. r SPC should only be applied to the critical processes in a software organization. r Not all SPC techniques are applicable to software processes. r The processes should be well-defined and stable so that we can apply SPC techniques successfully.
3. Selected metrics At the beginning of the case study, we decided on the metrics for which we were going to make an SPC analysis. We first analyzed the company status and selected rework effort, defect density and review performance metrics after discussions with the software engineering process group. Then we carried out theoretical research to establish a firm knowledge on these measures. We outlined the general characteristics of each measure including its definition, importance on software development, collection principles, various implications of metric information, difficulties in the measurement processes, data analysis guidelines, and possible uses of the control charts. 3.1. Defect density The number of defects in a work product is an important measure as it provides intuition about how well the customer will be satisfied (from post-release defects), how much rework will be performed, how efficient our inspection processes are, which processes need improvement, and which components of the system are error prone. Therefore, defect counts provide evidence not only on the quality of the product, but also on the quality of the related processes. Springer
140
Software Qual J (2006) 14: 135–157
In order to provide a basis for comparison between different software products, defect data are normalized with the addition of size dimension to the analysis. The resulting metric is defect density, which is defined as: defect density =
number of defects product size
The analysis and interpretation of the defect density metric lies on the assumption that, on average, we have a certain expectancy of defect count for unit size of a software artifact. As we desire to find most of the existing defects in a successful inspection, this measure is also an indicator for the effectiveness of the related inspection processes. Before starting to measure defect density, we established the necessary background to provide consistency among measurements. We found the following issues to be important concerns for a successful defect density analysis:
r A precise definition should be documented for each type of defect. r The critical products for SPC analysis should be determined. r The defect data should be categorized into specific groups (in terms of defect type, priority, severity etc. (IEEE Std 1044-1993, IEEE Std 1044.1-1995)). The level of detail should be neither so high that the analysis becomes too difficult, nor so low that the results become meaningless. r The defect origins (cause codes) should be recorded for each defect. r The size measures should be precisely defined and measured for different work products separately. r The charting technique should be determined very carefully. In most of the resources (Sutherland, 1992; Florac and Carleton, 1999; Radice, 1998; Weller, 2000), the u-chart is suggested for tracking the defect density data. However, the u-chart depends on the assumption that the defect data follows a Poisson distribution. Therefore, it is very important to check for the validity of this assumption by analyzing metric data. On the other hand, the width of control limits for the u-chart is inversely proportional to the square root of the sample size (product size). Therefore, having 10 defects in a sample of size 20 is not equivalent to having 100 defects in a sample of size 200, which is not reasonable for code and document defects. Although the chart may be useful for code defects if SLOC (software lines of code) is counted for software components with similar sizes, it is usually more appropriate to use XmR charts in the other cases. Once the chart (u-chart or XmR chart) is drawn, we can comment on the reasons for unusual behaviors. Observations that exceed the upper limit may indicate a highly defective component, or a very successful inspection process. Likewise, a low defect density measure may be due to a high quality product with very few defects, or a poor inspection process. The determination of causes requires a detailed root-cause analysis, which should be carried out apart from the SPC analysis. 3.2. Rework percentage Rework is the total hours spent that was caused by unplanned changes or errors (Pajerski and Sova, 1995). The amount of rework is a good indicator of the quality of processes as it shows the magnitude of effort spent due to previous errors. Rework increases project costs without adding any value to the product. Thus, it is possible to produce a product without any rework by doing the processes right the first time (Crosby, 1980), which is described as Springer
Software Qual J (2006) 14: 135–157
141
“Cost of Quality.” Even though eliminating all the rework is not quite possible in real life, this theoretical limit provides a threshold for assessing the processes. In order to provide a baseline to make a comparison among rework efforts of different process instances, a normalized measure, such as rework percentage, may be utilized for the related process: Rework Percentage =
rework effort total effort
Rework percentage provides an understanding of the relative amount of rework with respect to the total effort. Thus, it enables us to analyze any group of tasks representing a specific domain. For instance, the performance in a software domain (i.e. quality assurance, configuration management etc.) comprising particular tasks, the quality of activities in a certain project phase or the processes during the preparation of a work product can all be examined by looking at rework percentage values. However, a successful and meaningful analysis requires the consideration of some preliminary issues:
r Rework should be defined precisely. r Metric data, as well as the rework data, should not be used besides its intended goal like for judging the individual performance.
r The enhancement effort should not be regarded and measured as part of the rework. r For each rework process instance, the related process/project phase that causes the rework should be defined. As soon as the mentioned issues are resolved and the relevant measurement data are collected, XmR charts can be drawn for SPC analysis. The analysis can be performed by comparing the rework percentages among different projects/project components at the end of each project phase. Thus, the charts will indicate the performance of a project component at a particular project phase. It is also possible to analyze the rework from a different perspective by finding the amount of rework related to a certain process or project phase. However, such an analysis necessitates the determination of defect causes. Alternatively the rework percentage may be calculated on a periodic (i.e. monthly) basis independent of the project phases. Depending on the data on hand, the analysis may be carried out separately for different CSCIs (Computer Software Configuration Item), projects, and/or builds. However, this analysis should be based on the assumption that the expected rework percentage amount is the same in the different periods within which the measurement is performed. In this analysis, the values exceeding the upper limit may indicate deficiencies in project planning, poor performance of processes, high inspection performance or a long period of testing and inspection processes. The points under the lower limit might be due to low inspection performance, superior process performance, or a very long project phase (We know that most of the rework is done towards the end of project phases after inspections. If a project phase lasts too long and the inspections have not been performed yet within the measured period, the rework will accordingly be low). On the other hand, an increasing trend in the rework percentage may be due to problems in the earlier project phases or the previous inspection processes. In this case, more emphasis should be given to the ongoing inspections as an aid to avoid future deficiencies. A decreasing trend may be the reason for bad performance in current inspections, or might be indicative of high product quality. Obviously, the rework percentage and the defect density charts should be analyzed concurrently so that more reliable interpretations will be possible. Springer
142
Software Qual J (2006) 14: 135–157
3.3. Inspection performance Inspection is the formal review of a product to identify defects. It enables software specialists to figure out and correct errors before the product is released. Humphrey (Humphrey, 1989) shows that inspection is a powerful tool for improving the product quality and for reducing rework amounts. It is also used to provide an agreement among the appropriate parties, to verify the product acceptance criteria, and to complete a formal task (Humphrey, 1989). The importance of inspection on the product quality and the project costs necessitates tracking the performance of the inspection process in a software company. One of the useful measures reflecting the inspection process effectiveness is the inspection performance: Inspection Performance =
number of defects found inspection effort
The interpretation of the data depends on the idea that the number of defects found during an inspection is directly proportional to the inspection effort. Thus, we expect to find more errors if the inspection lasts longer. In order to have a correct interpretation of the inspection performance metric, some preliminary issues should be considered.
r An effective defect measurement mechanism should be constructed (see section 1.1) r The inspections should be categorized considering their points in the life cycle and their purposes (such as draft reviews, final reviews, change reviews, joint reviews).
r The inspection type, the reviewers, the inspection effort (including assessment and meeting times), the number of defects found by each reviewer, the type of defects, the inspected artefact, the inspection date, and the related trouble report should be recorded for each inspection. As soon as the above issues are considered and the required metric data are collected, SPC analysis can be carried out by drawing XmR charts for the individual inspection effectiveness measures. As the inspection outcomes depend on the product, it is best to perform separate analyses for different products or product groups (e.g. design documents, requirements documents, code etc.). In the XmR chart, the points above the upper control limit show the instances in which the number of defects found per unit effort exceeds the process performance limits. This may be because of high inspection effectiveness or low product quality. On the other hand, the points below the lower control limit may indicate low inspection process performance, when there are many defects left undetected in the product. It is also possible that the product actually contains very few defects. In either case, the defect density measure may be used in parallel to reach the correct understanding while interpreting the results. It is also possible to use u-charts to track effort in terms of the minutes spent on the inspection. However, the applicability and utility of such a sensitive analysis should be investigated by using the existing metric data. We left implementation of u-charts on inspection performance data as a future study item. 4. Utilization of SPC The goal of the case study was to investigate the applicability of SPC by using an emergent company’s existing measurement data and to identify the difficulties and the benefits of the technique. We performed the case study in an organization which was established in 1998 Springer
Software Qual J (2006) 14: 135–157
143
and is certified CMM level three in August 2002. In the organization, the existing process descriptions are mostly based on the definitions from the CMM and various IEEE standards (including IEEE Std 1044-1993, IEEE Std 1044.1-1995, IEEE Std 730-1998, IEEE Std 730.11995, IEEE Std 828-1998 and IEEE Std 830-1998). All the processes are standardized by the documented procedures, and the quality assurance is provided extensively via periodic and milestone based audits. Standard metric datasheets are used to collect more than 20 metrics including effort, rework effort, defects, requirement stability and SLOC. The project level metric data are periodically collected by the assigned project individuals, and the company level metric data are collected by the assigned individuals from the relevant departments. The raw data are presented by means of bar and Pareto charts, which are analyzed by the project managers and the relevant technical staff in the project. The analysis is performed by detecting any irregular behavior through subjective judgment. Finally, the results are discussed in the periodic project meetings. Organizational level metric data are analyzed in a similar way by the quality assurance department, and the results are discussed in the periodic managerial level meetings. All the metrics used in the case study have been previously collected in the projects (before and after achieving CMM level three). Therefore, the data represents the company status while the processes were in transition for CMM level three. However, the selected metrics had been collected for years. Therefore, we presume that the measurement processes and metric data are precise enough to enable SPC analysis. Moreover, change can be regarded natural since continuous improvement is already part of being a high level organization. Before starting to work, an agreement is made with the company to outline the boundaries of our study. A statement of work is prepared and signed by both parties. Moreover, a proposal is prepared to document the general objectives of the study. It is decided that the name of the company would not be mentioned in any part of the study considering the confidentiality of the utilized data. Similarly, the actual data would not be presented on the charts. Before drawing Individuals (XmR) charts, the data would be multiplied by a constant factor. However, the same trend would be observed and the same outliers would be detected after this modification despite the changes in the mean and the variance values. As the limits of a u-chart are dependent on the size measure, a similar change would cause errors while identifying the outliers. For this reason actual data would be used to draw u-charts, but the numbers on the y-axes would be hidden. At the end of the study, the company would be given a copy of all research documentation including process descriptions and analysis results. Therefore, this study would serve as an appealing experience for the company for future improvement programs to achieve higher maturity levels. Based on this foundation we started our study in two dimensions. On the one hand, we prepared a list of metrics collected in the company and worked on each measure considering its meaningfulness and feasibility in terms of performing SPC analysis. By using such a bottomup approach, we selected the metrics among the ones that were already being collected in the company. On the other hand, we identified the most significant issues for process improvement in the company. Then we mapped the collected metrics and the selected processes. As a result of this mapping, we constructed a list of measures as the initial candidates for our analysis (see Table 1). Although some of these measures were not directly collected, we had the relevant data to derive them. After an initial evaluation, we decided to eliminate some of the candidate measures which were not deemed critical enough to necessitate a thorough SPC analysis. Then we prioritized the remaining metrics with respect to their significances and started to work on the high priority metrics (metrics 1 through 6 in the Table 1). After a more detailed analysis, we decided to disregard the Test Performance metric as we could not gather the effort data for different tests separately. Similarly, we decided to exclude Requirements Stability and Productivity metrics since the number of data points was not sufficient for Springer
144
Software Qual J (2006) 14: 135–157
Table 1 Candidate Measures No
Measure
Description
1 2
Rework percentage Review performance
3 4 5 6 7
Defect density Productivity Test performance Requirements stability Customer support time
Percentage of rework effort to the total effort How effective is the review process (in terms of the defects found per review time) Relative number of defects in a product Production amount (product size) per effort Number of defects found per testing effort Percentage of added, deleted and changed requirements Time passed from receiving an accepted customer problem until corrected code is deployed to the target environment Trouble report creation rate relative to the closure rate The amount of audit effort relative to the total effort The number of nonconformities detected during audits relative to the total effort The amount of the peer review effort relative to the work product size Number of pre-release defects relative to all defects found in a product Time elapsed between the initiation and the resolution of a trouble report
8 9 10
Backlog management index (BMI) Audit effort percentage Audit performance
11
Peer review effort index
12
Total defect containment effectiveness (TDCE) Trouble report aging
13
a thorough analysis. In the end, our scope has been drawn by Rework Percentage, Defect Density and Review Performance metrics. For each selected metric, we first performed a detailed analysis to establish a good understanding of the metric basics and the data analysis mechanisms. We then collected actual metric data from seven projects with various characteristics regarding their sponsors (internal or external), application domains (management information systems, embedded systems, command and control systems), market places (commercial or military) and some others. By working on the metric data, we determined the company-specific parameters, carried out the relevant normalization procedures, utilized the relevant SPC techniques and interpreted the analysis results. It was not appropriate to directly use the collected raw metric data for our SPC study. Therefore, we had to make some derivations by using some additional data from different sources. After organizing the data in the relevant format, we drew Individuals and u-charts, detected outliers and investigated them to understand the variation. However, we restricted our analysis only to the tests of upper and lower control limits instead of investigating the trends and other tests for detecting the outliers. Our study revealed various difficulties of using metric data that were not specifically defined for SPC analysis. Moreover, we provided practical evidence on the utilization of Statistical Process Control in software domain. Now we will go over the results of the case study for each metric in detail. 4.1. Defect density In the company, the data of all defects found during a review, test or audit have been collected and tracked through Problem Reports (PR—for code defects) and Document Change Requests (DCR—for document defects) since the foundation of the company. Although the trouble reports evolved in time, the basic defect information such as the subject work product, Springer
Software Qual J (2006) 14: 135–157
145
the related project phase, the defect priority, the initiation and closure dates are recorded for all projects. An individual priority (low, medium, high, very high or other) is assigned for each defect on a trouble report. However, after collecting the data, we realized that there was not enough data from each priority category for a successful SPC analysis. Thus we combined 5 priority categories within 3 groups: Group 1: high and very high Group 2: medium Group 3: low and other While making this categorization, we made an assumption that we could pay similar attention to the defects in priorities low and other. The code size is collected for each CSCI in terms of non-comment, non-blank Source Lines of Code. Unfortunately, however, data from the seven projects were not sufficient to perform SPC analysis for code defects. Consequently, we decided to restrict our analysis to the requirements and the design documents, and defined size measures as the following: 1. Requirements documents (Software Requirements Specification (SRS) and Interface Requirements Specification (IRS)): The number of requirements is used to compute size. 2. Design Documents (Software Design Description (SDD) and Interface Design Description (IDD)): The number of pages is used to compute size. The cumulative number of defects is computed for each document. As the size of a document remains almost the same throughout a software project, the defect density value gradually increases as more defects are detected. For this reason, we decided to perform separate analyses by comparing the defect density values at the end of each project phase for the two document groups. However, the division of data among different priorities, project phases and document groups highly reduced the affectivity of the SPC analysis as the number of samples became insufficient. Therefore, we restricted our analysis to the implementation and the maintenance phases. The document size is gathered for each version, and the size of the last version which is already released at the end of the project phase is used for the defect density measurement. However, in some of the projects the components are further divided into builds and different phases occur at the same time because of evolutionary design approaches. In such cases, the size of the artifact that is active at the initiation date of the latest DCR written for the related phase is used. We also realized that different builds of a CSCI have the same documentation, with different versions. For instance, an SRS document prepared for build 1 is updated for build 2 with required changes. Therefore, we decided to use the latest version by counting all the defects for the previous and the current versions. For instance assume that 30 defects are found for 30 requirements (size of an SRS) at the end of requirements analysis phase of build 1. On an ongoing project, these data are used to calculate the defect density amount for the requirements analysis phase. Let 17 more defects be found since the end of build 2 and the size become 28 (i.e. 3 requirements have been deleted and 1 added). Now the initial defect density count is updated and the updated information is used for the analysis. We drew Individuals charts for each project phase-priority combination for the requirements and the design documents after putting the data into time order. The Individuals chart for the group 2 defects at the end of implementation phase for the requirements documents is shown in Figure 1. We observed that the system level IRS of Project 4 (point 4) is very close to the upper control limit on the chart. After talking to the project manager and looking at the IRS document we understood that the IRS requirements included interface details regarding Springer
146
Software Qual J (2006) 14: 135–157
Defect Density Group 2 Implementation 3.0 UCL=2.631
Defect Density
2.5 2.0 1.5 1.0
_ X=0.634
0.5 0.0
LB=0 1
3
5
7
9 11 Observation
13
15
17
Fig. 1 Defect density group 2 implementation (requirements documents)
all the CSCIs. Consequently, any change to one of the CSCIs affected the IRS document and caused updates. Moreover, the project was open to any proposed changes being an internal research-and-development project. Thus a high defect rate is observed in the IRS document. Assuming it as an outlier, we removed the point from the dataset and calculated the control limits again. This time Project 5 SRS (point 3) appeared to be an out-of-limit instance (Figure 2). During the rework percentage analysis, an abnormal situation has already been recognized for this project (section 4.2). The customer was a partner organization and the company was more tolerant of the changes and the additional requirements. The relaxed behavior of the customer during the requirements analysis phase resulted in future updates in the requirements document. The detection of the same outlier by two independent control charts has been a good sign for the efficacy and the reliability of our analysis. The control charts revealed some important concerns about defect density analysis. A common observation for requirements and design documents was the scarcity of data points for the group 1 defects. As it was not possible to draw separate control charts for the group 1 defects, apart from group 2 and group 3 priority analyses, a separate analysis is performed by combining all the defects without considering their priorities. This finding showed that SPC would not be applied for all categorized metric data. Moreover, a very high variability is observed among defect density values of different documents. Although the graphs are not depicted in this paper, the data shows a similar trend for the code defects. Considering that the processes are defined and defects are correctly recorded, this incident indicates that the nature of some processes could make them very difficult to stabilize. The high variability also caused control limits to be too wide, and reduced the sensitivity of control charts. Although the variability can be reduced by further dividing the document groups (e.g. separate analyses for SRS and IRS), this may cause data insufficiency as the number of data points per group decreases. Thus, there is a trade-off between the number of data points and the depth of analysis, which makes SPC more difficult to apply. Springer
Software Qual J (2006) 14: 135–157
147
Defect Density Group 2 Implementation (Updated) UCL=2.571
2.5
Defect Density
2.0
1.5
1.0
0.5
_ X=0.510
0.0
LB=0 1
3
5
7 9 Observation
11
13
15
Fig. 2 Defect density group 2 implementation updated (requirements documents)
Nevertheless, the control charts gave us an understanding of the status of different products allowing us to make a comparison with respect to the control limits. We also captured extreme points that demonstrated poor performances and poor product quality as a result. By working on the reasons for these outliers, we prepared a list of lessons learned which provided a baseline for the future improvement studies in the organization. In order to attain a second dimension on the process, we also drew u-charts. However, the u-charts were oversensitive for the defect data and captured too many data points outside the limits. The u-chart showing priority group 2 defect density values for the design documents at the end of the implementation phase is given in Figure 3. Although it shows the variable behavior of the process, it proves to be useless in the current conditions. 4.2. Rework percentage In the company, rework is defined as any modification to configuration items after the final peer review of the first release and changes to the internal/external baselines.1 Any change before the product is put into the configuration control is regarded as part of the development, but not as rework. Such nonconformities are recorded on and tracked through Action Item Lists. On the other hand, Problem Reports and Document Change Requests are used to document defects that are detected after an internal or formal baseline is established. For this reason, any defect recorded on a PR or DCR form is believed to cause rework. Moreover, the total problem resolution time, which corresponds to the rework effort in terms of personhours, is recorded on these forms. Therefore, it is possible to calculate the total rework effort by summing up the efforts on trouble reports within the dates corresponding to the related
1
A document/code becomes part of an internal baseline when it is put into configuration control. If the document/code is also submitted to the customer, it becomes part of an external baseline. Springer
148
Software Qual J (2006) 14: 135–157
Fig. 3 u-chart (defect density for design documents) (DD: defect density, CL: central line, UCL: upper control limit, LCL: lower control limit)
project phases. The timesheet data are also collected daily for each individual and thus the effort amounts can be obtained at the project level. As the rework effort is calculated for more than three years in the company and the results are utilized for process improvement studies, we can assume that there is a firm understanding of the meaning of measurement. However, as the related project phase is not recorded on the trouble reports, the study has been limited to measure the rework percentages within specific time intervals. A trouble report related to a document includes many defect items on the same form. Nevertheless, the defect fixing (including analysis, correction and relevant reviews) effort is recorded as one measure for all the defects. As different defects on a form would have different origins, we were unable to make an analysis on the defect origins. In the company, the project phases are overlapping for the different CSCIs and the builds. As the existing data collection mechanism is not sufficient to obtain the CSCI and the build based effort amounts, the exact rework percentages for different project phases cannot be derived. For this reason, the rework effort and the total effort data are divided into weekly units for the analysis. For the trouble reports that stayed open for more than one week (7 days), we assumed that the effort amount is uniformly distributed among different work days. Thus, we calculated a weighted effort for the weeks, where the weight is determined by the number of days that the trouble report stayed open in the corresponding week. For instance if the initiation date of a trouble report is 22nd of May—Thursday and closure date is 27th of May—Tuesday, 33.3 percent of the total effort is counted for week 2, and 66.6 percent for week 1 (4 days from Thursday to Sunday; 2 days from Monday to Tuesday). As the rework amounts increase during the inspection and the testing periods, the variation with respect to the different points in the life cycle can be considered natural. In order to smooth out the effect of this natural variation, we performed SPC analysis on a four-week period through which the rework percentage is assumed to be within certain limits. Therefore, four consecutive weekly rework and total effort amounts are summed up separately and the rework percentage amount is calculated for each of these four-week periods. The analysis Springer
Software Qual J (2006) 14: 135–157
149
Project 5 Coding Rework 0.6 1
Re w or k Pe r ce nt a ge
0.5 0.4 1
0.3 0.2
UCL=0.1883
0.1
_ X=0.0553
0.0
LB=0 1
3
5
7
9 11 Period
13
15
17
Fig. 4 Project 5 coding rework
is performed separately for the documentation and the coding rework as they would possess different trends. In order to find out the organizational level control limits, company-wide monthly rework percentage amounts are calculated combining data from all projects. Moreover, the rework data for the periods before the implementation phase for the coding rework are disregarded although some coding has been done during throw-away prototyping in some of the projects. Finally Individuals charts are drawn to analyze each project separately with the organizational upper and lower control limits. The Individuals chart for Project 5 coding rework is shown in Figure 4. Points 11, 12 and 14 demonstrate high rework amounts covering a period from 27.05.2002 to 29.07.2002 and from 26.08.2002 to 30.09.2002. The chart makes a peak at point 14 and then drops down to normal values. From our project-specific analysis we learned that the customer started to use the product actively after May 2002. The high rework trend after the 27th of May is a reason for the problems detected by the customer during product usage. The project manager relates this occurrence to the nature of the project and to the behavior of the customer. The customer was a partner firm of the company and this increased its power to act on the requirements. This flexible mood caused the customer to be less concerned about the stability of the requirements and the modifications became unavoidable after the product installation. Moreover, the programmers were not experienced with the programming language and this caused more rework than expected during the maintenance phase. The Individuals chart for the documentation rework in the same project does not show any out-of-control situation (Figure 5). However, point 12 makes a peak between the dates 27.05.2002 and 24.06.2002, showing a parallelism to the coding rework in the same period. In fact, the occurrence of high documentation and coding rework is a natural result of the defects in the IRS document as mentioned in section 4.1. Springer
150
Software Qual J (2006) 14: 135–157
Project 5 Documentation Rework 0.16 UCL=0.1424
0.14
Rework Percentage
0.12 0.10 0.08 _ X=0.0545
0.06 0.04 0.02 0.00
LB=0 1
3
5
7
9 11 Period
13
15
17
19
Fig. 5 Project 5 documentation rework
SPC analysis for the rework percentage demonstrated that control charts act as means of viewing process performances and detecting various abnormal situations in the projects. As the study relied upon the historical data and past events, it was usually difficult to comment on the reasons of the process deviations without having a chance to take on-time corrective actions. Nevertheless, this study provided an opportunity to observe issues within the processes which lead the company to further improvement actions. Although the findings were not some hidden facts or surprising deductions, the results demonstrate the success of control charts in detecting out-of-control situations that cause high rework in the projects. Being able to obtain this outcome despite the limitation of the case study is a promise for the value of SPC implementation in a software organization. 4.3. Inspection performance In the company, the inspection process is carried out by the independent reviewers and a summary report is prepared at the end of each review. Critical documents such as SRS, SDD, IRS, IDD and code are inspected during the peer reviews. The summary report for a peer review includes details such as the names of the reviewers, their review and preparation times, the number of problems found and the review results. There are three types of peer reviews: draft, final and change. Draft peer review is performed as a first check before the product is ready for final peer review. The aim of performing this early review is to minimize the number of the defects before the product is submitted to the customer. Final peer review is performed with the customer just before the release to verify the appropriateness of the product. If no defects are found during this review, the product is released. Change peer review is performed to verify the appropriateness of the proposed changes to a product that is previously released. As some of the defects are fixed during a draft peer review, the product is expected to have fewer defects before entering the final peer review. Therefore, the reviewers will most Springer
Software Qual J (2006) 14: 135–157
151
probably find fewer defects during a final in contrast to a draft peer review, and the review effectiveness will seem to be worse. On the other hand, different trouble reports are used to record the defects that are found during draft and final peer reviews. As the defects found during a draft review are not categorized with respect to their priorities, the analysis is separated for the draft and the final peer reviews. Similarly, the reviewers are inclined to focus on the changed sections rather than the whole product when the review is performed to verify the changes. Although the effort will also be low, there is not enough evidence to assume that the decrease in the number of defects found is proportional to the decrease in the effort. Moreover, the reviewers will not find even a single defect in most of the change peer reviews. Therefore, it will not be rational to judge a change peer review as ineffective although no defect has been found. As the aim of using this measure is not to evaluate the product but the review process, change peer review is left aside in our analysis. The reviews are performed by independent individuals within a project. As soon as the independence is provided, any available and skilled personnel may be included in the review team. This structure makes it irrelevant to restrict the analysis to the CSCI level since the performance of a specific review will not be indicative of the CSCI for which the review is performed. Thus we worked on the individual review data within the whole project. As the structures and the review orientations are different for a code component and a document, it may be natural to observe different distributions for review effectiveness for these products. For this reason, the reviews for code and different document types are analyzed separately. In the light of these decisions, the review data are put into time order for each review type-product combination. Then the review effectiveness values are calculated by dividing the number of defects by review time (minutes) for each review. In order to investigate whether the metric is statistically related to the number of reviewers, a regression analysis is performed. However, most of the reviews were performed by only one reviewer and the remaining data could not capture any relationship. Finally the Individuals charts are drawn for SDD, SRS, UITD (unit integration test description), UTD (unit test description) documents and for code. The chart for SDD draft peer reviews is shown in Figure 6. As observed in the figure, Point 20, which represents the SDD document of CSCI 3 in Project 1, is out of the upper limit. The investigation of the corresponding review summary report and the DCR form depicted some critical issues about this CSCI. First of all, two CSCIs were combined into one within CSCI 3 a few months ago. This major change caused many updates in the requirements as well as in the design. The defect density chart also shows a similar peak for this CSCI and supports our deduction (point 14 in Figure 7). Moreover, this update caused the CSCI to be quite large and this caused critical modifications during the implementation of the subsequent builds. Finally, the design is found to be inferior than the other CSCIs’ and many changes are regarded significant defect removals. A number of causes are proposed for this trend and one of the most important causes is believed to be the high staff turnover rate. As the data was historical, the reviews could not be investigated just after their occasions. For this reason, it was difficult to understand the causes of outliers for all cases. Nevertheless, SPC application to peer review performance measure can be regarded beneficial as the charts enabled us to notice outliers that represent special causes. The charts demonstrated that the draft and the final peer reviews have different trends and behaviors in the organization. In parallel to our assumption, the number of defects found during the draft peer reviews turned out to be more than the ones detected during the final peer reviews. This finding has been a good feedback for future studies. Springer
152
Software Qual J (2006) 14: 135–157
SDD Draft Peer Review 0.00025 1
Peer Review Performance
0.00020 UCL=0.0001613
0.00015
0.00010 _ X=0.0000648 0.00005
0.00000
LB=0 1
3
5
7 9 11 13 15 Observation Number
17
19
Fig. 6 SDD draft peer review individuals chart
Defect Density Group 2 Implementation 10
1
1
UCL=8.81
Individual Value
8
6
4
2
_ X=1.93
0
LB=0 1
3
5
7
9 11 Observation
Fig. 7 Defect density group 2 implementation (design documents) Springer
13
15
17
Software Qual J (2006) 14: 135–157
153
5. Discussion This case study gave us insight on the difficulties and effort required to implement SPC in a software organization. One of our initial observations has been the importance of preimplementation activities. A considerable amount of effort is spent on constructing a company specific foundation before actually starting to use SPC. The basic work items in this phase include diagnosing the current state of the organization, planning the improvement program, selecting the metrics, defining the new processes, organizing the metric database to determine the initial process performance baselines, and normalizing the metric data. We saw that the existence of the process documentation and the periodic status reports eased our work in diagnosing the company status, identifying the processes that need close control and selecting the right metrics for our analysis. As the metric data had been collected and stored for more than 4 years, we also had a convenient database. Nevertheless, having well-defined process descriptions and a collection of metric data was not sufficient to apply SPC techniques successfully. Not all the data were stored in a computerized system, and some of the related data items were stored in independent sheets. For this reason, we had to reconfigure the database and manually collect some information from the datasheets. Moreover, we needed some other measures to normalize the metric data. Therefore, we had to put more effort on collecting those measures. Once this preliminary study is finished and a foundation is established, the rest of the work was to draw control charts and analyze the results. Considering the fact that the company already collected and analyzed the metric data, the additional effort of implementing SPC, apart from the initial work, had been insignificant. During the case study we also achieved benefits of SPC implementation. Most significantly, it enabled us to detect more specific occurrences of outliers together with the use of control limits. Previously, the data was put in a tabular form, and visualized by bar/Pareto charts. The interpretations were based on subjective criteria of the managers who determined their actions based on their intuition. As a result metrics were frequently used to validate decisions rather than to determine them. After SPC implementation, metrics became more supportive and assistive for decision making. Moreover, the preliminary study helped to reveal problems in the related processes and the metrics. Therefore, processes like measurement, project management, problem resolution, and peer review are refined. Similarly, the existing metric data are reconfigured, the existing shortcomings are eliminated and the data became more meaningful for interpretation. This study showed that the quantity (the number of data items) and the quality (how precise and detailed) of the metric data are very critical for the reliability of control limits. That is, the definition of the metrics should have considered the intent to apply SPC techniques and the number of projects these metrics are applied should be large enough. Although the industrial averages for different metrics are available, it is not possible to calculate the control limits without the data. Even if the data are available, the process capability values of the other organizations cannot construct a baseline for a specific organization as it depends on various factors such as the procedures, the metric definitions and the analysis approaches. The measurement data hides some inherent characteristics of the processes which are unique for each software organization. For this reason, the quantity of the metric data becomes one of the major problems for the small-sized organizations and can limit the utilization of SPC techniques to a few metrics. We also observed that each metric has particular characteristics and complexities related to its definition, collection and interpretation. The difficulty increases further when the existing metric data are used for SPC implementation. Most of these difficulties arise due to the unsuitability of the metric definitions for detailed statistical analysis. Although the processes Springer
154
Software Qual J (2006) 14: 135–157
are stable and various metrics are collected, the lack of awareness about the specific attributes of the metrics that are vital for detailed comparisons and statistical analyses creates difficulties for SPC utilization. Another finding of this study is that it might be difficult to construct a general logic that will be used as a standard guideline for implementing SPC for software measures. Each metric has its own dynamics, inherent characteristics and normalization techniques. The data are collected in various time intervals, analyses are performed in different points in the life cycle and different charts should be used for depiction. Therefore, separate approaches might be required for each metric to be analyzed. During the study, we came up against various difficulties while trying to use the existing metric data. We considered some data that would have been useful for SPC analysis; but they were not collected as we assumed. The collection periods of some data were different from our needs; so we had to manually classify and reorganize the data. The level of detail for some metrics was not sufficient to classify the data as expected and perform an in-depth analysis. It shows how important it is to define the metrics and the measurement processes considering future needs. Finally, we observed that the complexity of the processes and the existence of the large number of external parameters for the value of the metric data inflate variability and make it difficult to apply SPC to the software production. If the analysis is performed for specific processes, it takes more time to collect, normalize and analyze the data. Moreover, the data set may not be meaningful since a small sample size is used. On the other hand, if SPC is applied to generic process data, the data turns out to be insufficient for the analysis, the variability increases and it becomes difficult to detect the outliers. Thus, the level of detail should be determined very carefully based on the amount and the quality of the metric data on hand, and the criticality of the process.
6. Conclusions Process improvement initiatives usually face resistance until the management is convinced that the projected benefits are beyond possible costs. For this reason, it is very important to make a tradeoff analysis before starting an improvement program. Our case study has been a good reference for showing the required effort and short term benefits of SPC in an emergent software organization. The study reveals many preliminary issues that should be resolved before actually utilizing SPC. Having documented procedures, maintaining a large metric database, using well-defined measurement processes are positive aspects that reduce the amount of work during this period. In contrast, the lack of some data items, inappropriate organization of data, insufficient detail and amount of data cause additional work to provide required settings. Thus, we can conclude that the level of process maturity is an influencing factor on the amount of the effort spent before using SPC. However, this effort not only provides SPC utilization, but also improves all related processes. On the other hand, the cost of using SPC can be regarded negligible after this initial effort. The only additional effort is for drawing and analyzing the control charts. In effect, the benefits are better process control, more effective detection of outliers, and track of process improvement actions. Thus, the payoff is high especially in the long term. After this case study, the company management decided to standardize SPC analysis for technical and managerial decision making. Accordingly, process definitions for measurement, problem resolution, preventive action and project management processes are updated Springer
Software Qual J (2006) 14: 135–157
155
to incorporate SPC implementation for rework percentage, defect density and inspection performance metrics. The related data collection forms are also refined to collect some additional data for use in SPC. Moreover, new process definitions are written to describe SPC details (the normalization of data, creation and interpretation of control charts etc.) and actions to take in various situations. As a further step, a new study has been initialized to expand SPC analysis to include some other metrics (these are kept confidential by the organization). To sum up we see that SPC implementation is not a straightforward task in an emergent software organization. Relatively low-maturity processes may require some additional effort before the actual implementation. Nevertheless, these costs can be justified by the associated improvements in the processes. Moreover, with proper implementation a control chart acts as an auditor by demonstrating existing nonconformities about a process/product provided that the necessary preliminary actions are taken. It gives an opportunity to detect the problems and improve the software processes. It is also an effective tool as it provides a visual interface with a scientific foundation. Therefore, we have evidence that it is possible to implement and benefit from SPC in an emergent software organization. Our study had some limitations due to time and data constraints. Nevertheless, this study has been a step towards understanding practical implications of SPC in a software organization. One possible extension is to investigate the effect of different parameters (like programming language, defect priority, SLOC definition for defect density; peer review type, defect type, number of people in the review for review performance) on SPC outcomes. Classifying data points with respect to the relevant parameters provides more sensitive and effective results. Similarly, it may be easier to collect a sufficient number of data points by avoiding excessive classification. Another direction for future work is to try variations in SPC implementation regarding the measures used, charts drawn, and the control limits applied. Better results may be obtained by using different metric definitions (i.e. for product size, SLOC etc.), drawing different statistical charts and applying limits other than 3-sigma for certain data. Further studies are also needed to compare implementation of SPC for various metrics among different software organizations with different characteristics in terms of maturity and size. This will help us to have more expertise on SPC implementation in the software domain.
References Barnard, J. and Carleton, A.D. 1999. Analyzing Mature Software Inspection Process Using Statistical Process Control (SPC), The European SEPG Conference. Brooks, F.P. 1987. No Silver Bullet: Essence and Accidents of Software Engineering, IEEE Computer Magazine. Burr, A. and Owen, M. 1996. Statistical Methods for Software Quality. Thomson Publishing Company. ISBN 1-85032-171-X. Card, D. 1994. Statistical Process Control for Software?, IEEE Software pp. 95–97. Carleton, A.D. and Florac, A.W. 1999. Statistically Controlling the Software Process, The 99 SEI Software Engineering Symposium. Software Engineering Institute, Carnegie Mellon University. CMMI Product Team 2001. CMMI S M for Systems Engineering, Software Engineering, and Integrated Product and Process Development (CMMI-SE/SW/IPPD, V1.1), Continuous Representation, Carnegie Mellon University. Crosby, P.B. 1980. Quality is Free: The Art of Making Quality Certain. Penguin Book USA Inc. ISBN: 0-451-62585-4. Florac, A.W. and Carleton, A.D. 1999. Measuring the Software Process: Statistical Process Control for Software Process Improvement. Pearson Education. ISBN 0-201-60444-2. Florac, A.W., Carleton, A.D., and Barnard, J.R. 2000. Statistical Process Control: Analyzing a Space Shuttle Onboard Software Process, IEEE Software, pp. 97–106. Springer
156
Software Qual J (2006) 14: 135–157
Florac, A.W., Park, E.R., and Carleton, A.D. 1997. Practical Software Measurement: Measuring for Process Management and Improvement (CMU/SEI-97-HB-003). Software Engineering Institute, Carnegie Mellon University. Heijstek, A. 1999. SPC in Ericsson, The European SEPG Conference. Hirsch, B. 1999. Can Statistical Process Control be Usefully Applied to Software? The European SEPG Conference. Humphrey, W. 1989. Managing the Software Process. Reading, Mass.: Addison-Wesley Publishing Company. ISBN 0-201-18095-2. IEEE Standard Classification for Software Anomalies, Std 1044-1993. IEEE Guide to Classification for Software Anomalies, Std 1044.1-1995. ISO/IEC 15504-4:1998(E), Information Technology—Software Process Assessment—Part 4: Guide to Performing Assessments. Jakolte, P. and Saxena, A. 2002. Optimum Control Limits for Employing Statistical Process Control in Software Process, IEEE Transactions on Software Engineering 28(12): 1126–1134. Kan, S.H. 1995. Metrics and Models in Software Quality Engineering. Addison-Wesley Publishing Company. ISBN 0-201-63339-6. Keller, T. 1999. Applying SPC Techniques to Software Development: A Management Perspective, The European SEPG Conference. Lantzy, M.A. 1992. Application of Statistical Process Control to Software Processes, WADAS ’92. Proceedings of the Ninth Washington Ada Symposium on Empowering Software Users and Developers. pp. 113–123. Meade, S. 1999. Lockheed Martin Mission Systems, The European SEPG Conference. Pajerski, R. and Sova, D. 1995. Software Measurement Guidebook, NASA GB-001-94. Software Engineering Program. Paulk, M.C. and Carleton, A.D. 1999. Can Statistical Process Control be Usefully Applied to Software? The 11th Software Engineering Process Group (SEPG) Conference. Paulk, M.C. and Chrissis, M.B. 2002. The 2001 High Maturity Workshop, (CMU/SEI 2001-SR-014), Carnegie Mellon University. Paulk, M.C., Weber, C.V., Garcia, S.M., Chrissis, M.B., and Bush, M. 1993. Key Practices of the Capability Maturity Model, Version 1.1. Software Engineering Institute, Carnegie Mellon University. Radice, R. 1998. Statistical Process Control for Software Projects, 10th Software Engineering Process Group Conference. Sargut, K.U. 2003. Application of Statistical Process Control to Software Development Processes via Control Charts, Middle East Technical University (Master’s Thesis). Shewhart, W.A. 1939. Statistical Method: From the Viewpoint of Quality Control, Lancaster Press Inc. Sutherland, J., Devor, R., and Chang, T. 1992. Statistical Quality Design and Control. Prentice Hall Publishing Company, ISBN: 002329180X. Weller, E. 2000. Practical Applications of Statistical Process Control, IEEE Software, pp. 48–55. Wigle, G.B. 1999. Quantitative Management in Software Engineering, The European SEPG Conference.
Umut Sargut He received his BS degree from Bilkent University Industrial Engineering department. After graduation, he started to work as a software engineer in Milsoft Software A.S. During his 2-year work experience, he participated in process improvement studies for measurement and problem resolution processes, and witnessed a successful CMM Level 3 assessment. He received his master’s degree in Information Systems from Middle East Technical University. Currently he is a PhD student in the University of Florida Computer and Information Sciences and Engineering Department.
Springer
Software Qual J (2006) 14: 135–157
157
Onur Demir¨ors He has Ph.D. and M.Sc. degrees in Computer Science from Southern Methodist University and B.Sc. degree in Computer Engineering from Middle East Technical University. He has been working in the domain of software engineering as an academician, researcher and consultant for the last 15 years. His work focuses on software process improvement, software project management, software engineering education, software engineering standards, and organizational change management. He managed a number of research and development projects on software process improvement, business process modeling and large scale software intensive system specification/acquisition. He has over 50 papers published in various books, journals and conferences and over 20 students have completed their graduate degrees under his supervision. He worked as a consultant for a number of software developing companies to improve their processes based on ISO 9001, ISO 15504 and CMM. He is currently working for Middle East Technical University as the head of the department of Information Systems – www.ii.metu.edu.tr.
Springer