informs
Information Systems Research Vol. 16, No. 3, September 2005, pp. 292–306 issn 1047-7047 eissn 1526-5536 05 1603 0292
®
doi 10.1287/isre.1050.0059 © 2005 INFORMS
Optimal Software Development: A Control Theoretic Approach Yonghua Ji
School of Business, University of Alberta, Edmonton, Alberta T6G 2R6, Canada,
[email protected]
Vijay S. Mookerjee, Suresh P. Sethi
School of Management, University of Texas at Dallas, Richardson, Texas 75080 {
[email protected],
[email protected]}
W
e study the problem of optimally allocating effort between software construction and debugging. As construction proceeds, new errors are introduced into the system. The objective is to deliver a system of the highest possible quality (fewest number of errors) subject to the constraint that N system modules are constructed in a specified duration T . If errors are not corrected during construction, then further construction can produce errors at a faster rate. To curb the growth of errors, some of the effort must be taken away from construction and assigned to testing and debugging. A key finding of this model is that the practice of alternating between pure construction and pure debugging is suboptimal. Instead, it is desirable to concurrently construct and debug the system. We extend the above model to integrate decisions traditionally considered “external” such as the time to release the product to the market with those that are typically treated as “internal” such as the division of effort between construction and debugging. Results show that integrating these decisions can yield significant reduction in the overall cost. Also, when competitive forces are strong, it may be better to release a product early (with more errors) than late (with fewer errors). Thus, underestimating the cost of errors in the product may be better than overestimating the cost. Key words: optimal software development; concurrent development and debugging; optimal control theory History: Salvatore March, Senior Editor; H. Raghav Rao, Associate Editor. This paper was received on September 14, 2004, and was with the authors 1.75 months for 2 revisions.
1.
Introduction
1993, Hou et al. 1997, Pham and Zhang 1999). More recently, however, it has been recommended that software construction and system debugging and testing should be viewed as concurrent activities (Blackburn et al. 2000). One benefit of testing and debugging a system (while other parts of it are still being constructed) is that defects in the system can be detected early, thus reducing the (negative) downstream effects of faulty code (Fagan 1986). In a recent Information Week survey (Hayes 2003), more than half of respondents indicated that “their organizations had changed the development processes for in-house-developed software to focus on the early detection of glitches, coding errors, and other flaws.” Another benefit of concurrency is that future construction work is of better quality because debugging and testing produce relatively stable (albeit incomplete) snapshots of the system that support subsequent development work (Cusumano and Shelby 1997).
Over the last couple of decades, software systems have been deployed in a vast number of business applications. As the criticality of these applications grows, the importance of delivering systems with high reliability cannot be overemphasized. To ensure reliability, software testing is necessary. The cost of inadequate software testing can be significant, estimated to range from $22.2 to $59.5 billion annually in the U.S. market alone (RTI 2002). Other studies have found that in a typical commercial environment, to ensure that a software product performs according to its specifications, the cost of software testing could easily range from 50% to 75% of the total development cost (Hailpern and Santhanam 2002). In the past, software systems were often constructed and tested in separate, sequential steps: development followed by testing (Myers 1976). System-level software testing occurred only after the system was completely developed (Yamada et al. 292
Ji, Mookerjee, and Sethi: Optimal Software Development
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
1.1. Problem Description Our goal is to build a simple, structured model of concurrent construction and debugging with a view to gaining insights. The basic setting of our model is as follows. We consider a system that consists of N modules. The decision maker (e.g., the project manager) can allocate effort between construction (writing and unit testing modules) and testing and debugging (system testing and fixing system-level errors). We denote ut to be the proportion of the total effort allocated to constructing modules at time t. The debugging effort is therefore 1 − ut. An important variable in the above model is the time available T for project completion. In some situations, project time may be fixed. For example, in the case of in-house development, the delivery date for a system may have been decided for business reasons. Another situation is when an outside development team is committed to delivering a high-quality system in the amount of time specified by contract. In both cases, the development team is required to produce a good-quality product in a specified time period. We develop an exogenous time model that corresponds to the above situations. Specifically, the time available for completing N system modules is fixed, and the objective is to find ut such that the number of defects in the completed product is minimized. The trade-off in this model is as follows. Because system debugging exhibits diminishing returns, too much testing and debugging reduces construction effort and impedes the project’s progress. The benefit from these activities (identification and removal of errors) may not be sufficient to compensate for the reduction in construction effort. On the other hand, too little effort allocated to testing and debugging can allow errors to proliferate, resulting in a system of relatively poor reliability at the end of the period T . For the exogenous time model, we find that it is optimal to start a large project with all effort allocated to construction and continue with this (full) allocation for a certain period of time. Then the allocation to debugging should be continuously increased for another period of time to curb the growth of errors. Depending on the time available for development, the final activity can be either full debugging (to further reduce errors) or full construction (to finish the project on time). A key finding here is that the practice
293 of alternating between pure construction and pure debugging and testing is only optimal for a small project. Instead, it is often desirable to simultaneously construct and debug the system. In other situations, the project duration T may itself be considered as a decision variable. This is most appropriate for a commercial software firm, where the release date for a product is not only impacted by the competitive pressures of delaying its release, but also by the goodwill lost by releasing a faulty product to customers. We model such situations in an endogenous time model. In this model, we choose the project duration T and the construction effort ut to minimize the cost of errors in the completed system plus the opportunity cost of delivering the system at time T . Of particular interest in the endogenous time model is the interplay between external market factors and internal practices used to develop the system. The model is novel because external factors (such as opportunity costs, goodwill etc.) are rarely considered in making internal software development decisions—in this case, the proportion of construction effort during the course of system construction ut. For the endogenous time model, the choice of the project completion time depends on external factors as well as internal factors. If the project completion time is chosen suboptimally, a significant increase in cost can occur. However, a finding of our study is that the increase in cost is more significant when the project duration is chosen to be higher than the optimal value than if it is chosen to be lower than the optimal value. 1.2. Contributions of Study Previous work on software development has addressed issues related to the timing and frequency of integration activities (system testing and debugging) during the course of system construction. Most development approaches prescribe that system testing and debugging activities be performed when a certain amount of work has been completed or when a certain period of time has elapsed. Chiang and Mookerjee (2004) analyze a development process in which system integration occurs when the number of errors in the system reaches a certain threshold. Feng et al. (2003) use dynamic programming to determine an optimal integration policy to minimize the total development cost.
294 A key advancement of the present work over previous studies is that we model construction and debugging to occur in parallel; i.e., it is feasible to have new construction continuing while previously constructed parts of the system are still being tested and debugged. Another important difference is that we account for imperfect error removal, whereas previous studies have typically assumed perfect error removal (i.e., after testing and debugging, the system is assumed to be in a completely error-free state). At a more conceptual level, the area of concurrent design in manufacturing has characteristics similar to the problem being studied here. Product development in manufacturing consists of two major activities: product design and process design. Process design clearly depends on product design; however, delaying process design to begin after the entire product has been designed entails a very high development cycle. To shorten the development cycle, it is necessary to overlap process design and product design; i.e., process design is begun even though product design has not been completed (Ha and Porteus 1995). While software construction and debugging can be considered to be analogous to product design and process design, there are several important differences. Unlike product development, where process design depends on product design, in software development, construction and debugging are usually viewed as reciprocally dependent activities. Construction affects debugging because debugging cannot occur without some construction. However, debugging also affects construction because debugging can improve the quality of future construction work. Many studies have found that the process of error generation is such that the rate of error generation is higher when there are more existing errors in the system. That is, adding new code to a system with many faults causes errors to occur at a high rate because the new code may get written to conform to the existing (faulty) code. Thus, debugging affects the quality of future construction work, and hence construction and debugging are reciprocally dependent. Another factor that separates software development from traditional product development is that requirements are often more unstable in software development, and all development methods must contend with the fact that requirements frequently change during development
Ji, Mookerjee, and Sethi: Optimal Software Development
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
(Grunbacher and Hofer 2002). One effect of evolving requirements is that despite debugging effort, bugs may continue to persist in the system, in part because the requirements have changed and what was originally a correctly working feature is now considered a defect in the system. A brief comment on the methodology used in this study is in order here. Optimal control theory has been widely used as a technique to solve dynamic optimization problems in many disciplines (Sethi and Thompson 2000). This technique has been used to solve problems in which the system outcome depends on not only the current action, but also on previous ones. The “control” (here, the proportion of development effort, u) manages the evolution of a system through certain variables (here, the number of modules and the number of bugs) in such a way that an optimal outcome (here, minimum bugs or minimum cost) is achieved by the end of the time horizon. Optimal control theory decouples a dynamic problem over time into a simpler static optimization problem at a single time instance t, and economic insights can be gained from analytical results. For such dynamic problems, traditional mathematical programming would use a multistage dynamic programming formulation and solve the problem numerically. Other benefits of using optimal control theory to solve our problem are immediately clear when we consider the two time problems studied in this paper. With respect to the exogenous time model in the paper (T is given), a discrete dynamic programming formulation could conceivably have been attempted by choosing the control at a fixed number of stages between 0 and T . The difficulty in using dynamic programming arises because there is no known upper bound for the number of bugs at a given stage. In the endogenous time problem (where the project schedule T is itself a decision variable), the traditional dynamic programming approach is further complicated by the fact that the number of stages would also be unknown. In the next section, we present the exogenous time model. Section 3 derives the solution for the optimal control for this model. Section 4 provides some numerical results and compares three different policies. Section 5 presents the endogenous time model that combines software development and release decisions. Section 6 summarizes and concludes the paper.
Ji, Mookerjee, and Sethi: Optimal Software Development
295
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
2.
Exogenous Time Model
In this section we present a model that is composed of two main components: module growth and error growth. The trade-off between accelerating the project’s progress (by increasing module growth) and improving system reliability (by reducing the error growth) is at the heart of the model. 2.1. Module Growth When developing a new module, a programmer needs to ensure that this new module is consistent with previously written modules. Thus, it becomes more difficult to write a new module when there are more existing modules in the system. Formally, ˙ = nt
ut k0 + k1 · nt where k0 and k1 are positive constants (1)
In (1), nt is the number of modules at time t and ˙ nt = dnt/dt is the rate of increase of modules at time t. As nt increases, the system becomes more ˙ decreases. A lower value of the concomplex and nt stant k0 leads to a higher module growth rate for the same proportion of construction effort ut. This constant can be thought of as a value that influences team productivity at the beginning of the project, i.e., before the productivity-reducing effect of system size begins to take effect. The constant k1 (module growth coefficient) captures the impact of intermodule connectivity in the system on the effort required to write new modules. The intermodule connectivity affects module growth because when writing a new module, programmers need to ensure that it is consistent with the modules that already exist in the system (Mookerjee and Chiang 2002). Thus, as system size increases (nt increases) or the module growth coefficient increases, it becomes more difficult to write new modules; i.e., ˙ nt decreases. If we let ut = 1, and solve the differential equation √ in (1), we obtain a solution of the form: nt = t + . The interpretation of this solution is that if all effort is allocated to construction, then the appearance of modules in time should follow this concave growth pattern. To provide a real example, we use the project data collected by the Software Engineering Laboratory on NASA’s GOES Attitude Ground Support System project (Heller et al. 1993). In this project, the
team first did full construction until all the system modules were written u = 1 and then debugged the system for the rest of the project u = 0. Thus, if we set u = 1, this data can be used to check the validity of Equation (1). The data fits the theoretical model very well (adjusted R2 = 0 96) and also provides us with reasonable values of the coefficients k0 and k1 that we will use in our numerical experiments in §4. 2.2. Error Growth Two factors affect error growth. First, fresh errors are created when new modules are introduced into the system. Thus, the error growth rate increases with the module growth rate. The second factor is the result of testing and debugging; as more effort is allocated to testing and debugging at time t, more errors are ˙ 1 is modeled as eliminated. The error growth rate mt ˙ = c1 n mn˙ − c2 n mgu
m
(2)
In (2), c1 n m and c2 n m are positive. The function gu = 1 − u is the debugging and testing effort. The terms c1 n m and c2 n m are given by c1 n m = k2 + k3 m + k4 n
where k2 k3 and k4 are positive constants
c2 n m = k5 m/n
k5 > 0
(3) (4)
We have checked Equation (2) in our model and found it to be consistent with previous empirical research on software defects. When construction and debugging phases are arranged sequentially, as is traditionally done, the number of bugs at the end of the construction phase Tc can be solved from Equation (2). The solution form is given below (see A.4 in the online appendices2 ): mTc = −k4 N /k3 + k4 + k2 k3 /k32 · ek3 N − 1
From the above, it is easy to verify that if N modules have to be developed by time T , the ratio N /mTc decreases with N . Therefore, if system modules are of 1
We sometimes suppress the variable t in nt, mt, ut, etc., for brevity. This is consistent with the notation used in most control theory literature, and x˙ ≡ dx/dt. 2 The online appendices for this article can be found at http://www. informs.org/Pubs/Supplements/1047-7047-2005-16-03-Debug.pdf.
Ji, Mookerjee, and Sethi: Optimal Software Development
296 similar size (measured in terms of thousand lines of code, or KLOC) and most defects in the system are found by testing, our model would predict that product quality, measured by the ratio of size to the number of defects found in the system, should decrease with product size. This outcome of Equation (2) is consistent with previous empirical research (Harter and Slaughter 2003), where it is found that there exists a negative correlation between product quality and product size. Equation (3) represents the effect of programming quality, system quality, and system size on the growth rate of errors. As the programming quality increases (lower k2 ), fewer errors will be added for the same number of new modules. When new modules are added to a system of poor quality (i.e., with more existing errors), the creation of fresh errors occurs at a faster rate. The rate depends on the value of the system quality coefficient k3 , which can be interpreted as the rate (per unit system quality) at which errors multiply if left uncorrected. Therefore the system can become more expensive to fix if the errors in the system are not corrected (Boehm et al. 1975). Finally, the constant k4 (error growth coefficient) captures the impact of intermodule connectivity on the growth of errors in the system. Also, for the same value of the error growth coefficient, more errors are likely to be generated in larger systems; thus, the product term k4 n. Everything else remaining equal, system quality concerns should lead to early testing and debugging so that the growth of errors can be curbed. Equation (4) is drawn from the well-known software reliability growth model (SRGM) (Goel and Okumoto 1979). This equation represents the effect of error density in the system (number of errors divided by system size, m/n) on the effectiveness of error removal. When there are more modules in the system, it is harder to find errors than when there are fewer modules, everything else being the same. Also, everything else being the same, it is easier to find errors when there are more errors in the system—an effect that should tend to delay the testing and debugging effort. Equations (1) and (2) depict the tradeoff in our model. As construction effort is increased, more new modules are created. However, more errors are also introduced. Thus, one must strike a balance
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
between error and module growth so that a system of the highest possible quality is completed within the allotted time. 2.3. Exogenous Time Optimal Control Problem The problem consists of choosing the optimal amount of construction u over the project duration T to minimize the number of errors at T , with the requirement that N modules be completed by time T . Specifically, we have the following optimal control problem: Min mT ut
subject to n˙ =
u k0 + k1 · n
˙ = c1 n m m
n0 = 0
nT = N (5)
u − c2 n mgu k0 + k1 · n m0 = 0
0 ≤ u ≤ 1
(6) (7)
The functions c1 and c2 are defined in (3) and (4). Equations (5) and (6) represent module growth rate and error growth rate as described in (1) and (2). In this control problem, u is the control variable and n and m are state variables.
3.
Solution
In this section, we present an analytical solution to the model in §2.3 and discuss its properties. The Hamiltonian3 of the problem in §2.3 is: H n m 1 2 u u = 1 k0 + k1 · n + 2 c1 n m
u − c2 n mgu
k0 + k1 · n
(8)
The adjoint variables 1 and 2 satisfy the following equations: H n m 1 2 u ˙ 1 = − n 1 T = a constant to be determined H n m 1 2 u ˙ 2 = − m 3
2 T = −1
(9) (10)
The nonnegativity constraint for m does not need to be explicitly imposed. For the error growth equation given in (2), m can never become negative.
Ji, Mookerjee, and Sethi: Optimal Software Development
297
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
We can provide economic interpretations of 1 and 2 in the software development context. The term 1 t can be interpreted as the marginal value of modules at time t, which should be positive because increasing the number of modules is beneficial to the development of the system. Similarly, 2 t—the marginal value of errors at time t—should be negative. In Appendix B of the online supplement, we show that the optimal solution leads to a positive value for 1 and a negative value for 2 . Because the Hamiltonian (8) is linear in control variable u, we have the following bang-bang and singular solution form for u to maximize the Hamiltonian: Hu > 0 1 ut = to be determined Hu = 0 (11) 0 H < 0
Figure 1
Project Classification
N Large tight projects
Large relaxed projects
N0 Small projects T
T0
Figure 2
Small Project
u 1
u
where Hn m 1 2 u u 1 1 = + 2 c1 n m + c2 n m
k0 + k1 · n k0 + k1 · n
Hu =
The above solution form has the following economic interpretation. The Hamiltonian (8) is the sum of the positive value accruing from module increase and the negative value caused by error increase. If the marginal value of the construction effort Hu is positive, then maximum possible construction effort should be expended u = 1 because Hamiltonian H is linear in control variable u. If Hu is negative, then minimum construction effort should be expended u = 0; i.e., all effort should be allocated to debugging. If Hu = 0, then the construction effort u needs to be determined from the condition Hu = 0. Figure 1 shows a classification of projects according to size and duration. The three forms of optimal solutions outlined in Theorem 1 are depicted in Figures 2, 3, and 4. Theorem 1. (See Appendix A in the online supplement for proof.) There exist N0 , T0 , T1 , T1 , and T2 such that the general form of the optimal solution ut, 0 ≤ t ≤ T , is dependent on both the size of the project and the time available to complete the project, T : • For small projects, i.e., those with N ≤ N0 , (a) ut = 1, 0 < t ≤ T1 , (b) ut = 0, T1 ≤ t ≤ T .
T1′
Figure 3
T
Large Relaxed Project
u 1
T1
Figure 4
T0
T
Large Tight Project
u 1
T1
T2
T
298 • For large projects, i.e., those with N > N0 , the optimal solution falls into two regions: (1) For relaxed projects, i.e., those with available time, T > T0 , 1(a) ut = 1, 0 < t ≤ T1 , 1(b) ut = −k0 + k1 n · k5 m/n/k4 n − k3 m − k0 + k1 n · k5 m/n, T1 < t < T0 , 1(c) ut = 0, T0 ≤ t ≤ T . (2) For tight projects, i.e., those with available time, T ≤ T0 , 2(a) ut = 1, 0 < t ≤ T1 , 2(b) ut = −k0 + k1 n · k5 m/n/k4 n − k3 m − k0 + k1 n · k5 m/n, T1 < t < T2 , 2(c) ut = 1 T2 ≤ t ≤ T . The expressions for N0 , T0 , T1 , T1 , and T2 are presented in Appendix A of the online supplement. The relative size of a project can be characterized by a size threshold N0 . When the number of modules N is less than N0 , it is optimal to start with all effort allocated to development. After time T1 , all the effort should switch to debugging for the rest of the available time (Figure 2). This simple, sequential effort allocation policy is sometimes referred to as the “big-bang” development approach in the software engineering literature. Our analysis shows that bigbang development is only optimal for a small project. For large projects N > N0 , it is not optimal to finish constructing all the modules before debugging is begun—the errors begin to grow too fast beyond a certain point in time (i.e., after the system has attained a certain size). Regardless of the time available to complete the project, the optimal strategy is to spend all effort on construction for the first T1 units of time. The full construction period is followed by a period of concurrent debugging. The form of the optimal construction effort ut for a relaxed project (Case 1(b)) and a tight one (Case 2(b)) is the same. For the concurrent debugging period, we can show that ut should decrease with time, implying that the effort spent on concurrent debugging should increase during this period. For a relaxed project T > T0 , the value of ut drops to zero at time T0 and stays at this level till the end of the project, i.e., a relaxed project ends with full debugging (1(c)). Figure 3 shows the optimal construction effort for a large, relaxed project. As the project duration shortens, the interval T0 T
Ji, Mookerjee, and Sethi: Optimal Software Development
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
shrinks; i.e., there is less time for full debugging at the end. If the project duration further shortens to a value below the time threshold T0 , then there is no time for full debugging at the end. Such a project is referred to here as a tight project (Figure 4). In tight projects, the period of concurrent debugging in the interval T1 ≤ t ≤ T2 is followed by full construction till the end of the project. The role of the concurrent debugging period is to curb the growth of the errors. If the concurrent debugging period is pushed to the end, then the ending project quality will be lower (more errors at the end). As the project duration T further decreases, the duration for the concurrent debugging period T1 ≤ t ≤ T2 is squeezed until it disappears. In this case T = 0 5k1 N 2 + 2k0 N , the project is so tight that there is only enough time for construction and no time for debugging. Corollary 1. Switching more than once from pure construction to pure debugging is never optimal. The situation pertaining to the above corollary is depicted in Figure 5 (for a proof, see Appendix C in the online supplement). Switching once from pure construction to pure debugging is optimal only for a small project (see Figure 2). Within the time period (for example, Ti < t < Ti+1 in Figure 5) when there are modules to be developed, at least some effort should continue to be spent on writing modules, while allocating the remaining capacity on debugging to control the growth of bugs. Otherwise, if all the effort is spent on debugging, the economic benefit of bug removal diminishes and becomes less than the economic benefit of module construction. This situation, depicted in Figure 5, is not optimal. Figure 5
Alternating Between Construction and Debugging
u 1
Ti
Ti+1
T
Ji, Mookerjee, and Sethi: Optimal Software Development
299
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
4.
Numerical Results
Table 1
In this section, we numerically study how different project parameters can affect the size and time thresholds. Furthermore, we study the evolution of a project under three different policies: the big-bang policy, the fixed allocation policy, and the optimal policy. These policies differ on how construction effort is allocated during the project. The performance measure of interest is the final number of errors at the end of the project. 4.1. Project Parameter Values We use the following reasonable, albeit hypothetical, parameter values in our numerical experiments. The values provided below are baseline (or default) values; we vary one parameter at a time in the experiments, holding the other parameters at baseline levels. Team productivity constant k0 Module growth coefficient k1 Programming quality coefficient k2 System size N System quality coefficient k3 Error growth coefficient k4 Error identification coefficient k5
4 · 10−2 6 · 10−5 10−5 500 1 5 · 10−2 10−4 50
The project team has to finish 500 modules N = 500. At the start of the project (i.e., n0 = 0), the highest possible rate for module release—corresponding to effort u = 1 allocated to construction—is 25 1/k0 = 25 modules/week). When all modules have been constructed nt = N , the highest possible release rate u = 1 is 1/k0 + k1 N = 14 3 modules/week. It would take about half a year (27.5 weeks) to finish the whole project if all programming effort is spent on writing modules u = 1. Thus, 27.5 weeks represents the minimum project duration T to finish N modules with maximum errors.4 We adjust k5 =50 such that, with a project duration 4
The values chosen for k2 , k3 , and k4 result in about 800 errors if the project is constructed with no debugging. A sampling of the literature shows this number to be quite representative of real projects: Pham and Zhang (1999) report a value of 142.32 for the data set used in their paper, Yamada et al. (1993) provide an estimate of 1,397.6 in their data set, and Yamada and Osaki (1985) report a lower bound of 2,657.
N0
Impact of Changes in k2 , k3 , k4 on the Size Threshold N0 k2
k3
k4
−
−
+
of 60 weeks, the final number of errors using the bigbang policy would be about 30. 4.2. Size Threshold N0 The size threshold N0 provides a nice separation for the optimal manner in which construction effort should be allocated in a project. For a small project N < N0 , the big-bang policy is optimal; no concurrent debugging is needed. For a large project N > N0 , the big-bang policy is no longer optimal, and concurrent debugging is needed after the number of completed modules crosses the size threshold. The size threshold is only affected by parameters: k2 , k3 , and k4 introduced in Equation (3). The impact of parameters on the size threshold is shown in Table 1 below (for a proof, see Appendix D in the online supplement). The parameter k2 measures the effect of programming quality. As the programming quality decreases (k2 increases), the growth of error is faster, and concurrent debugging is needed sooner. Parameter k3 measures the effect of system quality (number of errors) on the growth of errors. As the effect of system quality increases (k3 increases), the system deteriorates faster and requires early debugging. The effect of k3 on N0 shown in Figure 6(a) is quite significant because a higher value of k3 generates more errors and the rate of fresh errors depends on k3 m—the product of the system quality coefficient and the current number of errors. At first glance, the result in Figure 6(b) appears counterintuitive: Why does the size threshold increase with the error growth coefficient k4 ? That is, it may be reasonable to expect earlier debugging when the error growth coefficient is higher. The explanation is as follows. Everything else being equal, as the size of the system increases, the impact of the error growth coefficient is stronger (note the product term k4 n in Equation (3)). Thus, the optimal policy favors allocating more construction effort when the project size is still relatively small. As k4 increases, the size
Ji, Mookerjee, and Sethi: Optimal Software Development
300
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
Figure 6
(a) System Quality Coefficient k3 , (b) Error Growth Coefficient k4
Table 2
(a) 400 T0
Impact of Changes in ki i = 0 5 on Time Threshold T0 k0
k1
k2
k3
k4
k5
+
+
+
+
−
−
300
N0 200
100
0 2
0
k3(10–2)
6
4
(b) 150
100
N0
value of the time threshold. With regard to the programming quality parameter k2 , the impact on the time threshold is similar to the one concerning the size threshold: Increasing the programming quality decreases the time threshold, but the impact is small. Thus, this parameter does not require accurate measurement. On the other hand, the system quality coefficient k3 has a more significant impact on the time threshold. As shown in Figure 7(a), the time threshold increases almost linearly with the system quality coefficient. As the system quality coefficient increases, the allocation of effort to debugging should be increased. Figure 7
(a) System Quality Coefficient k3 , (b) Error Growth Coefficient k4
50
(a) 150 0 0.001
0.01
0.1
1
10
k4(10–4)
threshold N0 increases so as to postpone debugging, and thereby exploit the slow growth of errors that occurs when the system is relatively small. However, as k4 increases beyond a point, errors grow rapidly and lead to the saturation effect depicted in Figure 6(b). This saturation effect can easily be shown to occur when k4 N0 is much greater than k2 , and k4 much greater than k2 k3 (see Appendix D online). 4.3. Time Threshold T0 A large project can be further classified as relaxed or tight depending on the time available for construction and the time threshold T0 . The impact of various project parameters on the time threshold is summarized in Table 2 below (for a proof, see Appendix E in the online supplement). The two parameters that influence the efficiency of construction effort (k0 and k1 in Equation (1)) also affect the time threshold. As k0 and k1 increase, construction efficiency decreases, leading to a higher
100
T0 50
0 0
(b)
2
k3(10–2)
4
6
60
40
T0 20
0 0.001
0.01
0.1
k4
(10–4)
1
10
Ji, Mookerjee, and Sethi: Optimal Software Development
301
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
4.4. Policy Comparison In addition to the optimal and big-bang policies, we introduce the fixed allocation policy in this section. In the fixed allocation policy, the percentage of coding effort is constant ut = u0 over the entire project duration. The fixed allocation approach is administratively convenient because the size of the development and testing teams can be kept constant during the project. The approach may also be appropriate when development and testing are regarded as separate departments within the organization. The number of bugs at the end of T is given in Appendix F in the online supplement. In the big-bang policy, all the effort is allocated to construction until the time needed to construct N modules t = 0 5k1 N 2 + 2k0 N . For the rest of the project duration (T − t, all the effort is allocated to debugging. We show the performance of the three policies (optimal, big-bang, and fixed) over a range of values for the project duration T in Figures 8 and 9. The x-axis in Figures 8 and 9 begins at 27.5 weeks, which, for the parameter values chosen in this example, represents the minimum time to complete the project. Figure 8 shows that the difference in the number of errors between the optimal policy and the other two policies first increases as the project duration T increases. The difference becomes large in the middle of the graph, but becomes less prominent as T increases beyond one year (50 weeks). However, the percentage difference remains very large, as shown in
Figure 8
Number of Errors at the End of Project Duration T
Final number of bugs
900 800
Big-bang
700
Fixed allocation
600
Optimal
500 400 300 200 100 0 27.5
37.5
47.5
57.5
Project duration T (weeks)
Figure 9. For example, when the project duration is 50 weeks, then the number of errors under big-bang policy is approximately 84, and is only 42 under the optimal policy—a 50% improvement. The number of errors corresponding to the fixed allocation policy is 58.5 for T = 50. From Figure 8 we can see that the performance of the fixed allocation policy is in-between the other two policies. An important result from our model is that testing effort is needed in the middle of both tight and relaxed projects. By allocating effort to debugging in the middle of the project, the growth of errors can be effectively reduced owing to the smaller value of the k3 m term in the error growth Equation (2). Thus, our model provides a theoretical explanation for the growing emphasis on early testing and debugging in software development (Hayes 2003).
Figure 9
Percentage Increase in Ending Errors Over the Optimal Policy
60%
Percentage increase in bugs
As a result the time T0 needed to finish the construction work should increase. Figure 7(b) also shows that the time threshold quickly flattens out as the error growth coefficient increases. This (saturation) property is also displayed in Figure 6(b). We have shown in Appendix E that the programming effort ut between T1 < t < T0 becomes independent of k4 when k4 n is much larger than k2 (which is true in the parameter space we have used). Then the time to finish the remaining module after N0 modules have been constructed is independent of k4 . This is why the curve flattens out. Finally, as the effectiveness of the debugging effort (measured by k5 ) increases, more programming effort can be devoted to construction, and the entire construction time T0 is shortened.
50% 40% 30% 20% Big-bang
10%
Fixed allocation 0% 27.5
37.5
47.5
Project duration T (weeks)
57.5
Ji, Mookerjee, and Sethi: Optimal Software Development
302
5.
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
Endogenous Time Model
So far, we have been silent on how the project duration T is determined. In this section we consider situations where external factors can influence the choice of the project duration. The most compelling case is when the software is being sold to external customers. If a software product is released to the market with many errors, then not only will it be expensive to fix the errors discovered by customers, but the software company may also suffer from loss of goodwill and eventual loss in future business. In addition, there is the threat of legal action due to the economic damage (e.g., loss in the customer’s business operations) caused by software failures that can be attributed to insufficient testing. On the other hand, software companies incur significant testing costs—ranging from 30% to 90% of the total labor cost of a software product (Beizer 1990). Software debugging is often found to exhibit diminishing returns; i.e., beyond a point it becomes increasingly difficult and expensive to identify and fix new errors. In addition, continuing to debug a product can delay its release to the market, which can result in lost opportunities (e.g., a competitor may release the product earlier). Previous research on software reliability has studied the optimal time to release a software product by balancing the cost of software testing before release with that of fixing errors after release and potential damage of software failures arising from inadequate debugging (Hou et al. 1997, Pham and Zhang 1999, Kimura et al. 1999, Dohi et al. 1999). However, previous research has typically assumed that the software product is already in existence; i.e., the optimal development problem has been solved separately from the optimal testing and release problem. We next present a model that jointly optimizes the development problem and the testing and release problem. 5.1. Endogenous Time Optimal Control Problem In this problem, the total cost of software development consists of: (1) duration cost GT that consists of opportunity cost due to lost sales and the staffing cost (personnel and other resources tied to the project), G T > 0. (2) software error cost hmT that includes error removal cost and potential damage cost due to software failures that are caused by errors, h m > 0.
We obtain the following endogenous time optimal control problem where the objective is one of minimizing the total cost: min GT + hmT !
T ut
subject to Equations (5), (6), and (7), as before for the exogenous time problem. The solution process for the endogenous time problem is not complicated. Note that for a fixed value of T , the above objective function reduces to the objective function for the exogenous time problem shown in §2.3. To understand why, observe that for fixed T , GT is a constant, and because hm is increasing in m, the problem reduces to minimizing mT , which is the same as the objective function in the exogenous time problem. The problem of finding the optimal control variable ut for a given T has already been solved (see Theorem 1). To find the optimal value of T in the endogenous time problem, we state the following result. Theorem 2. (See Appendix G in online supplement for proof.) The optimal project duration T in the endogenous problem satisfies the following necessary condition: ˙ = Hu T uT + k5 mT /N · h mT
GT
(12)
A simple economic interpretation of Equation (12) is the following. If the project duration is increased ˙ · "T is the increased from T to T + "T , then GT opportunity cost, and Hu T uT + k5 mT /N · h mT · "T is the increased benefit of fewer errors in the final system.5 At the optimal T , these two terms should be equal. A simple search algorithm can be used to find the optimal T . Note that if the search yields several values of T that satisfy (12), we should choose the one that minimizes the total cost GT + hmT . An alternative approach to completing the software project is to divide it into two phases (development and debugging) and have two teams separately 5 The second term on the right-hand side of Equation (14) is the direct benefit of increasing the project duration (i.e., fewer errors). For large, tight projects (uT = 0, the first term captures the indirect benefit to the objective: A longer project duration leads to a longer debugging duration in the middle (see Figure 4). This leads to fewer errors in the final system.
Ji, Mookerjee, and Sethi: Optimal Software Development
303
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
responsible for developing the system during time 0 T and testing it during T T . The development team needs to finish N modules within time T
and allocate effort between development and debugging so as to minimize the number of errors mT at time T . After developing the system, the release time T is chosen to minimize the total cost GT + hmT . Therefore, development and testing can be formulated as two separate subproblems as: (1) The development problem that is similar to that in §2: min mT ut
with the end condition nT = N and
(2) The testing problem is the simple minimization problem: min GT + hmT T
where mT = mT e−k5 T −T
/N
5.2. Numerical Results Here we numerically illustrate the difference in cost between the endogenous time optimal control problem and the problem where the development and testing problems are separated. For the numerical illustrations, the form of GT is chosen to consist of: an opportunity cost term C1 1 − e−aT be−aT + 1−1 due to lost sales (or lost market share); and staffing cost term C2 T . The opportunity cost C1 1 − e−aT · be−aT + 1−1 is a simple diffusion form widely used in the marketing literature to predict the sale of a product over time (Bass 1969). The software error cost hmT includes the error removal cost C3 mT and the potential damage cost due to errors C4 · 1 − Rx T where Rx T = e−mT −mT +x is the reliability function based on a nonhomogeneous Poisson process (Goel and Okumoto 1979). The variable x represents the mission time for the software product, i.e., the length of time for which the system will be used. The number of errors mT + x at the end of mission time x depends on the error detection process, where both consumers and the software company can detect errors. We consider the form mT + x = mT e−k6 x/N because consumers will detect the errors only during the use of the software product. In addition to the previous parameter values used in the previous section, we use the following
values in the numerical illustration: Market potential C1 Diffusion model coefficient a Diffusion model coefficient b Staffing cost coefficient C2 Error removal cost coefficient C3 Damage cost coefficient C4 Mission time x Cust. error detection coefficient k6
5 · 107 0.1 103 105 105 106 50 1
The total potential market value of the product is taken to be 50 million dollars C1 = 5 · 107 . The values of a and b are chosen such that the company will lose about 10% of market share if the product is released in one year (t = 50 weeks) and lose 95% if released in two years (t = 100 weeks). While planning the construction and release schedule of a new software product, a project manager can estimate such parameter values using survey data and historical demand data on products similar to the new product.6 The weekly staffing cost is $2,000 per person for a group of 50 persons (C2 = 105 . Each error discovered by customers requires 50 persons a week to fix C3 = 105 , and the damage cost of an error in the released product is 10 times the cost of fixing the error C4 = 106 . The mission time x is taken to be one year (50 weeks) and the (customer) error detection coefficient is k6 = 1 k5 = 50 in our experiments, implying that internal error removal processes are 50 times more efficient). Using the numerical search procedure described in §5.1, we find the optimal project duration T to be 47.6 weeks. Figure 10 shows how the choice of the separation point T can affect the total project cost. Finishing the N modules too early or too late (by choosing different suboptimal separation points T ) will increase the total cost. If the separation point T
is chosen larger than the optimal duration T , there is more time left for debugging. This increases the quality of a final software product and reduces the cost due to software errors hmT . However, the staffing cost and lost-sales cost could be significant, and the 6 More recently, Bass et al. (2001) have provided an example of how the diffusion model was used to forecast the demand for DIRECTV services prior to its launch in 1992. The forecasted demand for DIRECTV in 1999 was 9.4 million subscribers, very close to the actual demand of 9.9 million subscribers in the same year.
Ji, Mookerjee, and Sethi: Optimal Software Development
304
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
Figure 10
planning the development and release of the software product should factor in this impact on the development process as well as the quality of the product at the time of release. Our study shows that market forces not only affect external decisions (the time to release), but also internal decisions (the allocation of resources between construction and debugging).
The Percentage Difference in Cost as a Function of Separation Point T
300%
Increase in total cost
250% 200% 150% 100%
6.
50% 0% 20
40
60
80
100
Separation point T ′ (weeks)
total project cost could be higher than the optimal cost. For example, if the project is released about half a year later, the total cost is double the optimal cost, mainly from the high cost of lost sales. Figure 10 shows that in a competitive market (high cost of lost sales), it is better to err toward releasing the product early, although its quality at the time of release may be lower than the optimal quality. Figure 11 shows how the total market value of the product under development affects the optimal project duration. A higher market value C1 reduces the optimal project duration. That is understandable. When C1 is higher, the opportunity cost of lost sales is very high, and hence the product should be released early. Project managers need to balance the quality of released software products with competitive market pressures. If the impact of market forces is high, then
Figure 11
The Optimal Project Duration T as a Function of Total Market Value C1
Optimal project duration
70 60 50 40 30 20 10 0 0
100
200
300
400
Total market value C1 (million$)
500
600
Summary and Conclusions
The study examined the following question in software development: In order to deliver a system of the highest possible quality within a specified time period, how should debugging effort be distributed during the course of system construction? We have approached this question in an incremental development setting where construction and debugging can take place concurrently. Our findings are that it is optimal to start with all effort allocated to construction activities. After some time has elapsed, construction and debugging must be performed concurrently. Debugging curbs the growth of errors in the system and improves the quality of future construction work. In some situations, debugging should continue concurrently with construction until a point where all the system modules have been constructed. If more time is left, then all the effort must be devoted to debugging till the end of the project. In other situations, namely, when the project time is tight, then after a certain period of concurrent debugging and construction, all the effort should be devoted to construction to finish the project. Our model shows that switching from pure construction to pure debugging is only optimal for a small project, and that for large projects it is suboptimal to perform all the debugging at the end. Instead, a large project calls for concurrent debugging in the middle to control the growth of errors. Our model also shows that it is not optimal to alternate between pure construction and pure debugging; rather, these activities must be undertaken simultaneously. We also examined the trade-off between system quality (number of errors at the end of the project) and project duration to find the optimal duration of a software development project. Our finding here is that, in a strongly competitive market, managers are better off underestimating the cost of errors left in the
Ji, Mookerjee, and Sethi: Optimal Software Development
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
system, resulting in a project duration choice that is below the optimum value. When determining the length of software projects, project managers need to consider more than software quality. They should pay attention to external economic factors such as market competition and consumer waiting costs. Our work provides a simple model that demonstrates the need to incorporate external factors in making internal decisions concerning the management of a software development project. One limitation of the current work is that our model only applies to software projects where the activities of the developers are planned. Our work does not apply to ad hoc development regimes such as prototyping, where development and debugging activities are difficult to separate, or agile and extreme programming methods where many of the formal system development processes are bypassed. Another limitation is that we considered deterministic models for module growth, error growth, and error removal; stochastic versions of these relationships may be useful to consider. At this stage of the research, we have assumed that the various model coefficients (k0 through k5 ) can be empirically estimated and provided to the model. These model coefficients may be influenced by factors such as the size of the development team and the development method. Exploring such relationships is an empirical question that could be explored in the future. There is also potential for empirical work that actually demonstrates the application of the model to a real software project. It may also be useful to extend the model in the paper to a situation involving competition—i.e., two or more firms compete to release a product and the effect this competition has on the optimal development practice within these firms. Such research would further enhance the utility of considering “external” factors (market demand, competition, etc.) to make internal development decisions (when and how much debugging should be done during the project). References Bass, F. M. 1969. A new product growth for model consumer durables. Management Sci. 15(5) 215–227.
305
Bass, F. M., K. Gordon, T. L. Ferguson, M. L. Githens. 2001. DIRECTV: Forecasting diffusion of a new technology prior to product launch. Interfaces 31(3, Part 2 of 2) S82–S93. Beizer, B. 1990. Software Testing Techniques. International Thomson Computer Press, Boston, MA. Blackburn, J. D., G. D. Scudder, L. N. Van Wassenhove. 2000. Concurrent software development. Comm. ACM 43(11) 200–214. Boehm, B. W., R. McClean, D. Urfrig. 1975. Some experiments with automated aids to the design of large scale reliable software. IEEE Trans. Software Engrg. 1(1) 125–133. Chiang, I. R., V. S. Mookerjee. 2004. A fault threshold policy to manage software development projects. Inform. Systems Res. 15(1) 3–19. Cusumano, M. A., R. W. Shelby. 1997. How Microsoft builds software. Comm. ACM 40(6) 53–61. Dohi, T., Y. Nishio, S. Osaki. 1999. Optimal software release scheduling based on artificial neural networks. Ann. Software Engrg.: Software Reliability, Testing Maturity 8 167–185. Fagan, M. E. 1986. Advances in software inspections. IEEE Trans. Software Engrg. 12(7) 744–751. Feng, Q., V. S. Mookerjee, S. P. Sethi. 2003. Application development using modifiable off-the-shelf software. Working paper, School of Management, University of Texas at Dallas, Richardson, TX. Goel, A. L., K. Okumoto. 1979. A time dependent error detection model for software reliability and other performance measures. IEEE Trans. Reliability 28(3) 206–211. Grunbacher, P., C. Hofer. 2002. Complementing XP with requirements negotiation. Third Internat. Conf. Extreme Programming Agile Processes Software Engrg. Sardinia, Italy. Ha, A. Y., E. L. Porteus. 1995. Optimal timing of reviews in concurrent design for manufacturability. Management Sci. 41(9) 1431–1447. Hailpern, B., P. Santhanam. 2002. Software debugging, testing, and verification. IBM Systems J. 41(1) 4–12. Harter, D. E., S. A. Slaughter. 2003. Quality improvement and infrastructure activity cost in software development: A longitudinal analysis. Management Sci. 49(6) 784–800. Hayes, M. 2003. Quest for quality. Inform. Week (May 26). http:// www.informationweek.com/story/showArticle.jhtml?articleID =101001.51. Heller, G., J. Valett, M. Wild. 1993. Data Collection Procedures for the Software Engineering Laboratory (SEL) Database. Software Engineering Laboratory (SEL), NASA/Goddard Space Flight Center, MD. Hou, R. H., S. Y. Kuo, Y. P. Chang. 1997. Optimal release times for software systems with scheduled delivery time based on the HGDM. IEEE Trans. Comput. 46(2) 216–221. Kimura, M., T. Toyota, S. Yamada. 1999. Economic analysis of software release problems with warranty cost and reliability requirement. Reliability Engrg. System Safety 66(1) 49–55. Korn, G. A., T. M. Korn. 2000. Mathematical Handbook for Scientists and Engineers. Dover Publications Inc., Mineola, NY. Mookerjee, V. S., R. I. Chiang. 2002. A dynamic coordination policy for software system construction. IEEE Trans. Software Engrg. 28(7) 684–694. Myers, G. J. 1976. Software Reliability: Principles and Practices. John Wiley & Sons, New York. Pham, H., X. Zhang. 1999. A software cost model with warranty and risk costs. IEEE Trans. Comput. 48(1) 71–75.
306
Research Triangle Institute (RTI). 2002. The economic impacts of inadequate infrastructure for software testing. RTI Project 7007.011, National Institute of Standards and Technology, MD. Sethi, S. P., G. Thompson. 2000. Optimal Control Theory: Applications to Management Science and Economics, 2nd ed. Kluwer Academic Publishers, Boston, MA.
Ji, Mookerjee, and Sethi: Optimal Software Development
Information Systems Research 16(3), pp. 292–306, © 2005 INFORMS
Yamada, S., S. Osaki. 1985. Software-reliability growth modeling: Models and applications. IEEE Trans. Software Engrg. 11(12) 1431–1437. Yamada, S., J. Hishitani, S. Osaki. 1993. Software-reliability growth with a Weibull test-effort, a model & application. IEEE Trans. Reliability 42(1) 100–106.