Proceedings: Pacific Northwest Software Quality Conference, 2006
The Challenge of Productivity Measurement David N. Card Q-Labs, Inc
[email protected] Biography- David N. Card is a fellow of Q-Labs, a subsidiary of Det Norske Veritas. Previous employers include the Software Productivity Consortium, Computer Sciences Corporation, Lockheed Martin, and Litton Bionetics. He spent one year as a Resident Affiliate at the Software Engineering Institute and seven years as a member of the NASA Software Engineering Laboratory research team. Mr. Card is the author of Measuring Software Design Quality (Prentice Hall, 1990), co-author of Practical Software Measurement (Addison Wesley, 2002), and co-editor ISO/IEC Standard 15939: Software Measurement Process (International Organization for Standardization, 2002). Mr. Card also serves as Editor-in-Chief of the Journal of Systems and Software. He is a Senior Member of the American Society for Quality. Abstract - In an era of tight budgets and increased outsourcing, getting a good measure of an organization’s productivity is a persistent management concern. Unfortunately, experience shows that no single productivity measure applies in all situations for all purposes. Instead, organizations must craft productivity measures appropriate to their processes and information needs. This article discusses the key considerations for defining an effective productivity measure. It also explores the relationship between quality and productivity. It does not advocate any specific productivity measure as a general solution. Introduction A productivity measure commonly is understood as a ratio of outputs produced to resources consumed. However, the observer has many different choices with respect to the scope and nature of both the outputs and resources considered. For example, outputs might be measured in terms of delivered product or functionality, while resources might be measured in terms of effort or monetary cost. Productivity numbers may be used in many different ways, e.g., for project estimation and process evaluation. An effective productivity measure enables the establishment of a baseline against which performance improvement can be measured. It helps an organization make better decisions about investments in processes, methods, tools, and outsourcing. In addition to the wide range of possible inputs and outputs to be measured, the interpretation of the resulting productivity measures may be affected by other factors such as requirements changes and quality at delivery. Much of the debate about productivity measurement has focused narrowly on a simplistic choice between function points and lines of code as size measures, ignoring other options as well as many other equally important factors. Despite the complexity of the software engineering environment, some people believe that a single productivity measure can be defined that will work in all circumstances and satisfy all measurement users’ needs. This article suggests that productivity must be viewed and measured from multiple perspectives in order to gain a true understanding of it. International Standards One might hope to look to the international standards community for guidance on a common industry problem such as productivity measurement. While some help is available from this direction, it is limited. The most relevant resources are as follows: •
IEEE Standard 1045, Software Productivity Measurement [2] describes the calculation of productivity in terms of effort combined with counts of lines of code or function points. It recommends variations to address software re-use and maintenance scenarios. It provides a project characterization form, but does not discuss how different characteristics might lead to different productivity measures.
1
•
ISO/IEC Standard 15939, Software Measurement Process [1]. This standard is the basis for the Measurement and Analysis Process Area of the Capability Maturity Model – Integration [4]. ISO/IEC Standard 15939 contains two key elements: a process model and an information model. The process model identifies the principal activities required for planning and performing measurement. The ISO/IEC information model defines three levels of measures: indicators, base measures, and derived measures. Figure 1 illustrates these different levels of measurement. The counts of inputs and outputs used to compute productivity are base measures in this terminology. Each base measure quantifies a single measurable attribute of an entity (process, product, resource, etc.) Multiple values of base measures are combined mathematically to form derived measures. A base or derived measure with an associated analysis model and decision criteria forms an indicator.
•
SEI technical reports discuss how to define effort [12] and size measures [13], but give little guidance on how they can be combined to compute things such as productivity.
Thus, the SEI reports discuss considerations in defining base measures (using the ISO/IEC Standard 15939 terminology), while IEEE Standard 1045 suggests methods of combining base measures to form derived measures of productivity. Note that none of these standards systematically addresses the factors that should be considered in choosing appropriate base measures and constructing indicators of productivity for specific purposes. That is the topic of the rest of this article.
Information Product
Level of Analysis and Flexibility
Level of Data Collection and Standardization
Indicator
Combination of Indicators and Interpretations
Base or Derived Measure With Decision Criteria
Derived Measure
Function of Two or More Base Measures
Base Measure
Quantification of a Single Attribute
Attribute
Characteristic of a Process or Product
Figure 1. Levels of a Measurement Construct
The Concept of Productivity Many different approaches to measuring productivity have been adopted by industry for different purposes. This section discusses the common approaches and makes recommendations for their application. The basic equation for productivity is as follows:
2
productivity =
output _ produced resources _ consumed
The simple model of Figure 2 illustrates the principal entities related to the measurement and estimation of productivity. A process converts input into output consuming resources to do so. We can focus on the overall software process or a subprocess (contiguous part of the process) in defining the scope of our concern. The input may be the requirements statement for the overall software process and for the requirements verification subprocess, or the detailed design for the coding process (as another example). Thus (requirements) input may consist of initial product requirements or previous work products provided as input to a subprocess. In this model the “requirements” are relative to the process or subprocess under consideration. The (product) output may be software or another work product (e.g., documentation). Resources typically have a cost (in the local currency) associated with them. Usually, effort is the primary resource of concern for software development. The output has a value associated with it (typically the price to the customer). The value is a function of capability, timeliness, quality, and price.
Cost
Requirements (Input)
Resources (Effort)
Value
Process or Subprocess
Product (Software)
Figure 2. Simple Model of Productivity Using this model, the numerator of productivity may be the amount of product, volume of requirements, or value of the product (that is, things that flow into the process or subprocess). The denominator of productivity may be the amount or cost of the resources expended. The designer of a productivity measure must define each of the elements of the model in a way that suits the intended use and environment in which the measurement is made. The product of software development is complex. In addition to code, other artifacts such as data, documentation, and training may be produced. If the resources expended in the production of each artifact can be distinguished, then separate productivity numbers may be computed for each. However, the most common approach is to use a broad size measure (such as lines of code or function points) with a consolidated measure of resources. Figure 2 is a generic model. The designer of a productivity measure must address the following issues in defining a precise productivity indicator: • • • •
Scope of outputs (product) – which products get counted? Scope of resources – which resources get counted? Requirements (or other input) churn – what if the target changes during development? Quality at delivery – how are differences in quality accounted for?
3
These issues are discussed in the following sections. Size Measurement This section describes the two most common methods for measuring size – the numerator of the productivity equation. These are Function Points and Lines of Code. Function Points is a functional (input) size measure, while Lines of Code is a physical (output) size measure. The amount of functionality (requirements or input) satisfied usually corresponds to the amount of product delivered. However, the value of a product from the customer perspective often does not track closely to its size. Reuse and code generation tools affect the effort to produce a given quantity software. That is more requirements can be satisfied and more output produced with less effort. Consequently the effects of these technologies must be considered in determining productivity either by weighting the size measures or defining multiple productivity measures for different development scenarios. More than one size measure may be needed to capture all of the information needed about the quantity of product delivered. That is software produced by different methods may need to be counted separately. Software size, itself, has an effect on productivity. The phenomenon of “Diseconomy of Scale” has long been recognized in software development [9]. This means that the productivity of larger software projects is lower than the productivity of smaller projects, all other factors being equal. Thus, comparisons of projects of different sizes must take this effect into account. This effect is non-linear so that as the range of software sizes increases, the differences in productivity become larger. Function Points The Function Point Analysis (FPA) approach has been widely accepted for estimating human-computer interface, transaction processing, and management information systems. FPA involves a detailed examination of the project’s interface description documents and/or prototypes of user interfaces. Usually, these are developed early in the project’s life cycle. When they are not available, similar materials from previous projects may be analyzed to obtain analogous data. The FPA approach rates the complexities of interface elements in a systematic way. Nevertheless, this approach still exhibits a significant element of subjectivity. Many different FPA counting algorithms have been developed. The following discussion is based on the Function Point Counting Practices Manual [7]. Different, but similar, measures are required for each of five types of interface elements that are counted. The complexity of each interface element is quantified by considering the presence of three specific attributes. Separate counting rules are applied to each interface type. FPA was originally developed and promoted as an estimation technique. Because it is based on information that is available relatively early in the life cycle, it can be quantified with greater confidence than physical size measures (such as Lines of Code) during project planning. However, one of the disadvantages of FPA is that even after the project is complete, the measurements of Functions Points still remain subjective. Key contributors to the accuracy of this estimation approacht are the skill of the measurement analyst conducting the function point count, as well as the completeness and accuracy of the descriptive materials on which the count is based. Lines of Code Perhaps, the most widely used measure of software size is Lines of Code. One of the major weaknesses of Lines of Code is that it can only be determined with confidence at project completion. That makes it a good choice for measuring productivity after the fact, however. The first decision that must be made in measuring Lines of Code is determining what to count. Two major decisions are 1) whether to count commentary or not, and 2) whether to count lines or statements. From the productivity perspective, comments require relatively little effort and add no functionality to the product so they are commonly excluded from consideration.
4
The choice between lines and statements is not so clear. “Line” refers to a line of print on a source listing. A statement is a logical command interpretable by a compiler or interpreter. Some languages allow multiple logical statements to be placed on one line. Some languages tend to result in long statements that span multiple lines. These variations can be amplified by coding practices. The most robust measure of Lines of Code is generally agreed to be “non-comment source statements”. A major factor in understanding productivity, especially in product line development, is taking into account the sources of the software that go into a delivery. Usually, at least three categories are used: • • •
New – software that is developed specifically for this delivery Modified – software that is based on existing software, but that has been modified for this delivery Reused – pre-existing software that is incorporated into the delivery without change
The resources required to deliver these classes of software are different. Modified software takes advantage of the existing design, but still requires coding, peer review, and testing. Reused software usually does not require design, coding, or peer review, but does require testing with the other software. These differences can be taken into consideration by weighting the Lines of Code of each type or by computing separate productivity numbers for each type. The latter approach requires recording resource (effort) data separately for each type of software, so it tends to be less popular. The result of the weighting approach often is described as “Equivalent Source Lines of Code”. Typical values for weighting schemes are as follows: • • •
New – 100% Modified – 40 to 60% Reused – 20 to 40%
Ideally the weights are determined by the analysis of historical data from the organization. The concept of Equivalent Source Lines of Code makes it possible to determine the productivity of projects with varying mixes of software sources. Alternatively, some general adjustment factor can be applied to the software size as a whole to account for reuse and other development strategies. However, that captures the effect less precisely than counting the lines from different sources separately. Other Size Measures Many additional size measures have been proposed, especially to address object-oriented development. These include counts of use cases, classes, etc. Card and Scalzo [11] provide a summary of many of them. However, none are widely accepted in practice. That doesn’t mean that they should not be considered. However, their use in productivity measures does not eliminate concern for the factors discussed here. Resource Measurement The denominator, resources, is widely recognized and relatively easily determined. Nevertheless, the obvious interpretation of resources (whether effort or monetary units) can be misleading. The calculation of productivity often is performed using only the development costs of software. However, the magnitude of development resources is somewhat arbitrary. The two principal considerations that must be addressed are 1) the categories of cost and effort to include and 2) the period of the project life cycle over which they are counted. Four categories of labor may be considered in calculating productivity: engineering, testing, management, and support (e.g., controller, quality assurance, and configuration management). Limiting the number of categories of labor included increases the apparent productivity of a project. Calculations of productivity in monetary units may include the costs of labor as well as facilities, licenses, travel, etc. When comparing
5
productivity across organizations it is essential to ensure that resources are measured consistently, or that appropriate adjustments are made. Requirements Churn and Quality at Delivery Figures 3a, 3b, and 3c illustrate the effect of the period of measurement on the magnitude of the resource measure. These figures show the resource profile (effort or cost) for a hypothetical project broken into three categories: production, rework, and requirements breakage. Requirements breakage represents work lost due to requirements changes. This may be 10 to 20 percent of the project cost. Rework represents the resources expended by the project in repairing mistakes made by the staff. Rework has been shown to account for 30 to 50 percent of the costs of a typical software project [5]. Usually, rework effort expended prior to delivery of the product is included in the calculation of productivity, while rework after delivery usually is considered “maintenance”. However, this latter rework is necessary to satisfy the customer. A comparison of Figures 3a and 3b shows the effect of delivery date on productivity. The overall resources required for the two projects to deliver a product that eventually satisfies the customer is assumed to be identical. However, the project in Figure 3a delivers earlier than the project in Figure 3b. Consequently the development effort of the project in Figure 3a seems to be smaller, resulting in apparently higher productivity. However, this project will require more rework during maintenance than the project in Figure 3b. The project in Figure 3b is similar in every other respect except that it delivered later and had more time to fix the identified problems. This latter project would be judged to have “lower” development productivity, although the total life cycle costs of ownership would be very similar for the two projects. Thus, development cost (and consequently apparent productivity) are affected by the decision on when and under what conditions the software is to be delivered. The true productivity of the two projects is essentially identical. Comparing Figures 3b and 3c show the impact of requirements churn. While the two projects deliver at the same time, and so exhibit the same apparent productivity, they may experience different amounts of requirements breakage. The project in Figure 3b, with the larger requirements breakage actually has to produce with a higher “real” productivity to deliver the same output as the project in Figure 3c where requirements breakage is lower.
Delivery Development
Maintenance
Effort Production
Rework
Requirements Breakage
Time Figure 3a – Apparent “High” Productivity Project
6
Delivery Development
Maintenance
Effort Production
Rework
Requirements Breakage
Time Figure 3b – Apparent “Low” Productivity Project Delivery Development
Maintenance
Effort Production
Rework
Requirements Breakage
Time Figure 3c – Highest Effective Effort (Lowest Productivity) In summary, when determining what value to put in the denominator of the productivity equation, the resources required for rework after delivery should be added to the development cost, while the resources associated with requirements breakage should be subtracted. Obviously, tracking the data necessary to do this is difficult, but such data helps to develop a more precise understanding of productivity. Typical Productivity Calculations Measures of size and resources may be combined in many different ways. The three common approaches to defining productivity based on the model of Figure 2 are referred to as physical, functional, and economic productivity. Regardless of the approach selected, adjustments may be needed for the factors of diseconomy of scale, reuse, requirements churn, and quality at delivery. Physical Productivity This is a ratio of the amount of product to the resources consumed (usually effort). Product may be measured in lines of code, classes, screens, or any other unit of product. Typically, effort is measured in terms of staff hours, days, or months. The physical size also may be used to estimate software performance factors (e.g., memory utilization as a function of lines of code). Functional Productivity
7
This is a ratio of the amount of the functionality delivered to the resources consumed (usually effort). Functionality may be measured in terms of use cases, requirements, features, or function points (as appropriate to the nature of the software and the development method). Typically, effort is measured in terms of staff hours, days, or months. Traditional measures of Function Points work best with information processing systems. The effort involved in embedded and scientific software is likely to be underestimated with these measures, although several variations of Function Points have been developed that attempt to deal with this issue [14]. Economic Productivity This is a ratio of the value of the product produced to the cost of the resources used to produce it. Economic productivity helps to evaluate the economic efficiency of an organization. Economic productivity usually is not used to predict project cost because the outcome can be affected by many factors outside the control of the project, such as sales volume, inflation, interest rates, and substitutions in resources or materials, as well as all the other factors that affect physical and functional measures of productivity. However, understanding economic productivity is essential to making good decisions about outsourcing and subcontracting. The basic calculation of economic productivity is as follows: Economic Productivity = Value/Cost Cost is relatively easy to determine. The numerator of the equation, value, usually is recognized as a combination of price and functionality. More functionality means a higher price. Isolating the economic contribution of the software component of a system can be difficult. Often, that can be accomplished by comparison with the price of similar software available commercially. Ideally, the revenue stream resulting from a software product represents its value to the customer. That is, the amount that the customer is willing to pay represents its value. Unfortunately, the amount of revenue can only be known when the product has finished its useful life. Thus, the value must be estimated in order to compute economic productivity, taking into consideration all the factors affecting the customer’s decision to buy. Thus, Value = f(Price, Time, Quality, Functionality) Poor quality may result in warranty and liability costs that neutralize revenue. Similarly, time must be considered when determining the economic value of a product - a product which is delivered late to a market will miss sales opportunities. Thus, the amount of revenue returned by it will be adversely affected. Consequently, the calculation of value for economic productivity must include timeliness and quality, as well as price and functionality. Note that this definition of economic productivity does not take into consideration the “cost to the developer” of producing the product. Whether or not a product can be produced for a cost less than its value (expected sales), is another important, but different, topic. Comparing Productivity Numbers Having chosen a productivity calculation along with appropriate definitions of resource and size measures, productivity numbers can be produced. Comparing productivity numbers from a series of closely related projects (e.g., members of a product line) is straightforward. However, making comparisons across different projects or organizations requires greater care. Many factors affect the productivity achieved by a project. Most estimation models provide adjustment factors to account for many of these factors. Generally, these influencing factors fall into two categories: controllable and inherent. These two categories of factors must be handled differently when comparing productivity across projects or organizations.
8
Controllable factors can be changed by management. They are the result of choice, although not always desirable choices. Examples of controllable factors include personnel experience, development environment, and development methods. Depending on the purpose of the productivity comparison adjustments may be made to account for these factors, especially when using productivity to estimate project effort. However, productivity comparisons of completed projects often are made specifically to evaluate the choices made by an organization, so usually no adjustments are made for the controllable factors when comparing productivity results from different projects Inherent factors are those that are built into the problem that the software developers are trying to solve. Examples of inherent factors (beyond the control of the software development team) include: • • •
Amount of software developed (diseconomy of scale) Application domain (e.g., embedded versus information systems) Customer-driven requirements changes
The software development team cannot choose to develop less software than necessary to do the job, build a different application than the customer ordered, or ignore customer change requests in the pursuit of higher productivity. Consequently, adjustments must be made for the inherent factors when comparing productivity results from different projects. While quality at delivery is a controllable factor, the eventual quality required by the customer is not, so adjustments also should be made for post delivery repair, too. Summary Measures of product size and resources must be carefully selected in deciding upon the construction of a productivity indicator. It is not simply a choice between Function Points, Lines of Code, or another size measure. Many other factors also must be considered. Table 1 summarizes the reported impact of some of the factors previously discussed, as a percent of the project’s effort. Table 1. Factors in Productivity Measurement
Factor
Typical Impact (%)
References
Requirements Changes Diseconomy of Scale Post Delivery Repair Software Reuse
10 to 40 10 to 20 20 to 40 20 to 60
8, 9, 10 8 5, 9 8, 9
Left unaccounted for, the variable effects of these factors on measured productivity can overwhelm any real differences in project performance. Even if an organization finds itself unable to measure all of these factors, they should be excluded consciously, rather than simply ignored. No single measure of productivity is likely to be able to serve all the different needs of a complex software organization, including project estimation, tracking process performance improvement, benchmarking, and demonstrating value to the customer. Multiple measures of productivity may be needed. Each of these must be carefully designed. This article attempted to identify and discuss some of the most important issues that must be considered. In particular, it did not attempt to define a universal measure of productivity. References [1] ISO/IEC Standard 15939: Software Measurement Process, International Organization for Standardization, 2002. [2] IEEE Standard for Software Productivity Metrics, IEEE Std. 1045-1992, IEEE Standards Board, 1993 [3] J. McGarry, D. Card, et al., Practical Software Measurement, Addison Wesley, 2002
9
[4] M. Crissis, M. Konrad and C. Schrum, Capability Maturity Model – Integration, Addison Wesley, 2003 [5] R. Dion, Process Improvement and the Corporate Balance Sheet, IEEE Software, September 1994 [6] D. Wheeler and D. Chambers, Understanding Statistical Process Control, SPC Press, 1992 [7] Function Points Counting Practices, Manual Version 4.0, International Function Point Users’ Group, 1998 [8] B. Boehm, Software Engineering Economics, Prentice Hall, 1981 [9] M. J. Bassman, F. McGarry, and R. Pajerski, Software Measurement Guidebook, NASA Software Engineering Laboratory, 1994 [10] R.B. Grady, Practical Software Metrics for Project Management and Process Improvement, Prentice Hall, 1992 [11] D. N. Card and B. Scalzo, Estimating and Tracking Object-Oriented Software Development, Software Productivity Consortium, 1999 [12] Wolfhart Goethert, Elizabeth Bailey, & Mary Busby, Software Effort & Schedule Measurement: A Framework for Counting Staff-Hours and Reporting Schedule Information, Software Engineering Institute, September 1992 [13] Robert Park, Software Size Measurement: A Framework for Counting Source Statements, Software Engineering Institute, September 1992 [14] History of Functional Size Measurement, Common Software Measurement International Consortium, www.cosmicon.com/historycs.asp, August 2006
10