Short Running A Time=Structure Based Software ... - Semantic Scholar

141 downloads 55128 Views 393KB Size Report
The past 20 years have seen the formulation of numerous analytical software .... can be obtained using coverage measurement tools such as ATAC Horgan.
Short Running Title: Contact Author: Address:

Phone: FAX: Email:

A Time=Structure Based Software Reliability Model Swapna S. Gokhale Bourns College of Engineering University of California Riverside, CA 92521 (909) 787-6416 (909) 787-3188 [email protected]

A Time=Structure Based Software Reliability Model Swapna S. Gokhale1 and Kishor S. Trivedi2 1 Boruns College of Engineering University of California Riverside, CA 92521 [email protected] 2

Center for Advanced Computing and Communication Dept. of Electrical and Computer Engineering Duke University Durham, NC 27708

Abstract The past 20 years have seen the formulation of numerous analytical software reliability models for estimating the reliability growth of a software product. The predictions obtained by applying these models tend to be optimistic due to the inaccuracies in the operational pro le, and saturation e ect of testing. Incorporating knowledge gained about some structural attribute of the code such as test coverage, into the time-domain models can help alleviate this optimistic trend. In this paper we present an Enhanced non-homogeneous Poisson process (ENHPP) model which incorporates explicitly the time-varying test-coverage function in its analytical formulation, and provides for defective fault detection and test coverage during the testing and operational phases. It also allows for a time varying fault detection rate. The ENHPP model o ers a unifying framework for all the previously reported nite failure NHPP models via test coverage. We also propose the log-logistic coverage function which can capture an increasing=decreasing failure detection rate per fault, which cannot be accounted for by the previously reported nite failure NHPP models. We present a methodology based on the ENHPP model for reliability prediction earlier in the testing phase. Expressions for predictions in the operational phase of the software, software availability, and optimal software release times subject to various constraints such as cost, reliability, and availability are developed based on the ENHPP model. We also validate the ENHPP model based on four di erent coverage functions using ve failure data sets.

ii

1

Introduction

Conventional approaches to software reliability growth modeling are black-box based, i.e., the software is treated as a black-box and its interactions with the external world are modeled. Tests are generated from the speci ed functional properties of the program [Howden 1980; Howden 1985], based on its operational pro le [Musa 1993]. The internal structure of the program is not taken into account while generating the test cases. A stochastic model is calibrated using the failure data collected during the functional testing of the software, and this model is then used to predict the reliability of the software, and to determine when to stop testing. Thus the black-box approach relies on the assumption that the software is tested as per its operational pro le. Development of an operational pro le is in general a involved task [Musa 1993; Musa et al. 1996], and the available operational pro le could be inaccurate due to several reasons [Horgan et al. 1995]. Additionally, in some situations, if the software being developed is the rst of its kind, the operational pro le may simply be unavailable. Pasquini et al. [Pasquini et al. 1996] conduct a study to investigate the sensitivity of the reliability growth models to the predictions in the operational pro le. Due to the inaccuracies in the estimated operational pro le, or in some situations even the lack of it, the reliability predictions obtained using software reliability growth models tend to be optimistic. The other reason for optimistic estimates is the saturation e ect of testing [Chen et al. 1994; Wong et al. 1995]. As testing proceeds, it is easier for a test case to increase coverage in the earlier part of testing than in the later phases. Thus it becomes increasingly more dicult to design test cases which will execute unexercised parts of the code, and detect faults in a program. As a result, the time between failures increases as testing time increases. However, the reliability of the software will increase only if the number of residual faults in the program is reduced. Redundant test cases do not cover unexecuted part of the code, nor do they expose any new faults. However, they do increase interfailure times resulting in overestimates of reliability [Chen et al. 1997]. Incorporating knowledge gained about a structural attribute of the code such as test coverage into otherwise black-box time-domain models can help reduce the sensitivity of the reliability estimates obtained using these models to the errors in the operational pro le, and also due to the saturation of testing. Intuition and empirical evidence suggests that the higher the test coverage, the higher would be the reliability of the software product. Various experiments have been conducted and models have been proposed to explore the relationship between code coverage, fault detection e ectiveness, and software reliability. Wong et al. [Wong et al. 1994] conducted an experiment to determine the e ect of block coverage on fault detection e ectiveness. Vouk [Vouk 1993] investigated the relation between test coverage and fault detection rate, and discovered that the relation between structural coverage and fault coverage is a variant of the Rayleigh distribution. He assumed that the fault detection rate during testing is proportional to the number of faults present in the software including block, branch, data- ow, and functional group coverage. Chen et al. [Chen 1

et al. 1996] utilize coverage measures to determine e ective testing e ort, and hypothesize that a testing e ort is e ective if and only if it increases some type of coverage or reveals some type of fault. However, they do not incorporate coverage measurements explicitly into a reliability model. Chen et al. [Chen et al. 1995] present empirical evidence that testing method does a ect software reliability estimates and suggest that reliability models need to consider some form of code coverage to obtain accurate estimates. Piwowarski [Piwowarski et al. 1993] analyzed block coverage growth during functional testing of a software product, and derive an exponential model relating the number of test cases to block coverage. Their model is equivalent to the GO model attempted in [Ramsey and Basili 1985]. They also derived an exponential model relating coverage frequency to the error removal ratio. However, the utility of their model relies on the prior knowledge of the error distribution over di erent functional groups in a product. Horgan et al. [Horgan and Mathur 1996] emphasize the importance of coverage measures in order to obtain realistic estimates from software reliability growth models. Malaiya [Malaiya 1994] modeled the relation among testing e ort, coverage and reliability, and propose a coverage-based logarithmic model that relates a test coverage measure with fault detection. Frate et al. [Frate et al. 1995] report a strong correlation between coverage and reliability measures, but do not incorporate coverage into a reliability model. Jacoby et al. [Jacoby and Masuzawa 1992] discuss test coverage based reliability estimation based on the hyper-geometric distribution model. Most of the research so far either focuses on con rming the intuitive relationship between coverage and reliability, or enhancing a particular model to incorporate code coverage, for a speci c experiment. They do not present a framework which will allow coverage measures to be accounted for, in a generalized fashion. The Enhanced non-homogeneous Poisson process (ENHPP) model proposed in this paper, incorporates explicitly, timevarying test coverage into the nite failure non-homogeneous Poisson process (NHPP) models. This is an extension of the existing modeling approach based on functional testing, and does not result in any change in the development of the test set. However, it can accommodate test coverage measurements such as block, decision or data- ow coverage obtained during the functional testing of the software product. These coverage measurements can be obtained using coverage measurement tools such as ATAC [Horgan and London 1992], Hindsight, etc. The ENHPP model is capable of handling any general coverage function and provides a methodology to integrate test coverage measures in the black-box modeling approach. In particular, the realistic situation of defective coverage even after sucient testing is carried out can be captured. As a result, residual faults at the end of exhaustive or complete testing are predictable, which enables us to compute various operational phase parameters of the software product. The ENHPP model relates a class of mathematical models for software reliability estimation and prediction, namely the nite failure NHPP models, to the measurements that can be obtained from the code during functional testing. In fact, it establishes a framework which enables the parameters of these models to be estimated from coverage

2

measurements obtained during functional testing. We assert that the previously reported nite failure non-homogeneous Poisson process (NHPP) models are special cases of the ENHPP model. The ENHPP model is thus an important step towards unifying the nite failure NHPP models. We o er a new decomposition of the mean value function of the nite failure NHPP models, which enables us to attribute the nature of the failure intensity of the software to the behavior of the failure occurrence rate per fault. The failure occurrence rate per fault is then related to the coverage function. We highlight the fact that the existing nite failure NHPP models cannot capture the increasing=decreasing nature of the failure occurrence rate per fault, which can be captured by the ENHPP model with the log-logistic coverage function proposed here. We propose a methodology based on the ENHPP model for reliability prediction in the early phases of testing, when the conventional maximum likelihood techniques are unstable or may not even exist due to insucient failure data. We then discuss the parameterization of the ENHPP model, and its role in the uni cation of the various black-box approaches to software reliability assessment. We develop expressions for various operational phase parameters and software release criteria based on various constraints such as reliability, cost, availability and coverage requirements, using the ENHPP model. We validate the ENHPP model using failure data sets from the Center for Software Reliability, available on the CD-ROM accompanying the Handbook of Software Reliability Engineering [Lyu 1996] and Charles Stark Draper Laboratory, based on the trend analyses, goodness-of- t, bias and bias trend criterion [Brocklehurst and Littlewood 1996]. Based on the failure data sets and the ENHPP model, we also provide estimates of the code coverage, failure intensity, residual number of faults at the time of the last failure, and conditional reliability beyond the time of the last failure. The rest of the paper is organized as follows: Section 2 presents an overview of the commonly used notions of test coverage, Section 3 presents the existing nite failure NHPP models, Section 4 establishes a framework to incorporate test coverage into the nite failure NHPP reliability growth models to give the ENHPP model, Section 5 shows how the previously reported nite failure NHPP models can be regarded as special cases of the ENHPP model for di erent coverage functions, Section 6 presents the log-logistic coverage function, Section 7 discusses the parameterization of the ENHPP model, Section 8 presents a methodology for early reliability prediction, Section 9 develops expressions for the operational phase parameters based on the ENHPP model, Section 10 derives expressions for software availability, Section 11 discusses software release criteria subject to various constraints like reliability, availability, coverage and cost, Section 12 presents equations to estimate the parameters of the coverage functions based on interfailure data, Section 13 discusses various validation criterion, Section 14 validates the ENHPP model based on the criteria discussed in Section 13 using ve failure data sets and Section 15 summarizes the paper. 3

2

Test coverage

Test coverage in software is measured in terms of structural or data- ow units that have been covered. The various data- ow coverage criteria are de ned below [Horgan and London 1991]:

 Statement (or block) coverage: It is the fraction of the total number of blocks that have been executed by the test cases. A basic block, or simply a block, is a sequence of instructions that, except for the last instruction, is free of branches and function calls. The instructions in any basic block are either executed all together, or not at all. A block may contain more than one statement if no branching occurs between statements; a statement may contain multiple blocks if branching occurs inside the statement; an expression may contain multiple blocks if branching is implied within the expression.

 Branch (or decision) coverage: It is the fraction of the total number of branches that have been executed by the test cases.

 C-use coverage: It is the fraction of the total number of c-uses that have been covered by the test cases. A c-use is de ned as a path through a program from each point where the value of a variable is de ned to its computation use, without the variable being modi ed along the path.

 P-use coverage: It is the fraction of the total number of p-uses that have been covered by the test cases. A p-use is a path from each point where the value of a variable is de ned to its use in a predicate or decision, without modi cations to the variable along the path. Each coverage criterion discussed above captures some important aspect of a program's structure. In general, test coverage is a measure of how well a test covers all the potential fault-sites in a software product under test, where a potential fault-site is de ned very broadly to mean any structurally or functionally described program element whose integrity may require veri cation and validation via an appropriately designed test. Thus a potential fault-site could be a statement, a branch, a c-use, etc. We now o er a unifying de nition for test coverage which accommodates all the proposed specializations of the concept (i.e, statement, block, decision, condition=decision coverage, etc.) without the burden of the speci city they impose on the modeling process used to estimate or predict the reliability of software products. For a given software product, and its companion test set, we de ne test coverage to be the ratio of the number of potential fault-sites covered by the test cases divided by the total number of potential fault-sites under consideration. Thus, based on this de nition, Block coverage for a test case can be de ned as:

of blocks covered by a test case Block coverage = Number Total number of blocks

(1) 4

As mentioned earlier, there are several de nitions of test coverage but the one o ered here is most general and easily adaptable to situations which may bene t from a specialized application of the concept. The de nition allows also for the possibility of having a subset of non-sensitizable potential fault-sites, and hence defective coverage.

3

Finite failure NHPP models

This is a class of time-domain [Gokhale et al. 1996; Ohba 1984; Ramamoorthy and Bastani 1982; Shooman 1984] software reliability models which assume that software failures display the behavior of a non-homogeneous Poisson process (NHPP). The parameter of the stochastic process, (t), which denotes the failure intensity of the software at time t, is time-dependent. Let N (t) denote the cumulative number of faults detected by time t, and m(t) denote its expectation. Then m(t) = E [N (t)], and the failure intensity (t) are related as follows:

m(t) =

Zt 0

(s)ds

(2)

and,

dm(t) = (t) dt

(3)

N (t) is known to have a Poisson pmf with parameter m(t), that is: n

P fN (t) = ng = [m(t)] n! e

?m t

( )

; n = 0; 1; 2; :::1

(4)

Various time domain models have appeared in the literature which describe the stochastic failure process by an NHPP. These models di er in their failure intensity function (t), and hence m(t). The NHPP models can be further classi ed into nite failure and in nite failure categories. Finite failure NHPP models assume that the expected number of faults detected given in nite amount of testing time will be nite, whereas the in nite failures models assume that an in nite number of faults would be detected in in nite testing time [Farr 1996]. Let a denote the expected number of faults that would be detected given in nite testing time in case of nite failure NHPP models. The instantaneous failure intensity (t) in case of the nite failure NHPP models can also be written as:

(t) = [a ? m(t)]h(t)

(5)

where h(t) is the failure occurrence rate per fault. This is the rate at which the individual faults manifest themselves as failures during testing. The quantity [a?m(t)] denotes the expected number of faults remaining 5

in the software at time t. Since [a ? m(t)] is a monotonically non-increasing function of time (actually, [a ? m(t)] should decrease as more and more faults are detected and removed), the nature of the overall failure intensity, (t), is governed by the nature of failure occurrence rate per fault h(t).

4

Incorporating test coverage into reliability model

In this section we develop a framework to incorporate test coverage into nite failure NHPP models. The Enhanced non-homogeneous Poisson process (ENHPP) model thus developed states that the rate at which the faults are detected is proportional to the product of the rate at which potential fault sites are covered and the expected number of remaining faults. The ENHPP model is based on the following basic assumptions:

 Faults are uniformly distributed over all potential fault sites,  When a potential fault-site is covered, any fault present at that site is detected with probability K (t), and

 Repairs are e ected instantly and without the introduction of new faults. This assumption is the same as in case of nite failure NHPP models.

 Coverage is a continuous monotonic non-decreasing function of testing time. Analytically, the model is based on the expression:

dm(t) = a~K (t) dc(t) dt dt

(6)

or

m(t) = a~

Zt 0

K ( )  c0 ( )d

(7)

where a~ is de ned as the expected number of faults that would be detected given in nite testing time, perfect fault detection coverage implying K (t) = 1, and complete test coverage leading to tlim !1 c(t) = 1. If K (t) is assumed to be a constant value K , then the expected number of faults detected by time t, denoted by m(t) is given by:

m(t) = a~Kc(t):

(8)

6

Equation (8) is intuitively simple ?? the expected number of faults detected by time t is equal to the expected total number of faults in the product times the probability of detecting a fault given that it is sensitized (covered) times the fraction of potential fault sites covered by time t. Substituting a = a~K , Equation (8) can be written as:

m(t) = ac(t)

(9)

This results in a failure intensity function (t), given by:

(t) = dmdt(t) = ac0 (t)

(10)

0 (t) = [a ? m(t)] 1 c?(ct()t)

(11)

The failure intensity function can also be rewritten as:

Equation (11) clearly shows that failure intensity depends not only on the number of remaining faults, but, also, on the rate at which the remaining potential fault sites are covered divided by the current, fractional population of uncovered potential fault sites. From Equations (5) and (11), the failure occurrence rate per fault, or the hazard function h(t), is given by: 0 c h(t) = 1 ?(ct()t)

(12)

lim c (t) = 1 ? c1 t!1 defct

(13)

The ENHPP model allows for time-dependent failure occurrence rate per fault, i.e., the rate at which an individual fault will surface can vary with testing time. The failure occurrence rate per fault or the hazard function can be constant, increasing, decreasing, or increasing=decreasing function of the testing time. The coverage functions corresponding to each of these hazard functions are discussed in the sequel. In addition, the ENHPP model postulates that the \distribution" corresponding to this hazard function is the coverage function evaluated at time t. The ENHPP model also permits \defective or imperfect coverage" which is de ned as the inability to cover all potential fault-sites. Potential fault-sites may be left uncovered due to insucient testing, and=or it may be simply impossible to cover all the potential fault-sites as in the case of a program with unexecutable paths [Rapps and Weyuker 1985]. Figures 1 shows a defective coverage function, cdefct (t) [Sahner et al. 1996]. Note that:

7

c(t) 1-p

0

t

0

Figure 1. A defective coverage function: cdefct (t) = (1 ? c1 )c(t), where c(t) is non-defective. where c1 is the probability that some potential fault-sites will never be covered. When the coverage is defective, the mean value function, assuming detection probability K = 1, is given by:

mdefct (t) = acdefct (t) = a(1 ? c1 )c(t)

(14)

From Equation (14), the failure intensity function in case of defective coverage is given by: (t) = a(1 ? c )c0 (t) defct (t) = dmdefct 1 dt

(15)

The conditional reliability R(tjs), based on the ENHPP model is:

R(tjs) = e?

R

s+t ( )d s

= e?

R

s+t 0 ac (t) s

= e?a c s t ?c s [ ( + )

( )]

(16)

where s is the time of the last failure and t is the time measured from the last failure. Figure 2 explains the phenomenon of reliability growth, using the notion of conditional reliability presented in Equation (16). The conditional reliability is plotted against the testing time. Just after the point of detection and subsequent instantaneous repair, the conditional reliability jumps up to 1.0, and then decays slowly until the next detection. The lower envelope or the reliability trenches in each interval decrease initially and then increase. Thus initially there is reliability decay followed by reliability growth.

8

Reliability growth for log−logistic model 1

0.9

0.8

Conditional reliability

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

50

100

150

200

250

Time

Figure 2. Phenomenon of reliability growth.

5

Existing NHPP models and coverage functions

In this section we describe some of the well-known forms that the coverage function c(t) can assume, in the event that the coverage is not measured during the functional testing of the software. The coverage functions discussed in this section have previously been reported as separate nite failure NHPP models. 5.1

Exponential coverage function

The exponential coverage function (appeared in the literature as Goel-Okumoto (GO) model [Goel and Okumoto 1979]) has had a strong in uence on software reliability modeling. The failure occurrence rate per fault is constant in case of the exponential coverage function. Table 1 gives the expressions for m(t), (t), c(t), and h(t) for the exponential coverage function. 5.2

Weibull coverage function

In case of exponential coverage function, the failure occurrence rate per fault is time independent; however since the expected number of remaining faults decreases with time, the overall software failure intensity decreases with time. The software quality continues to improve as testing progresses. However, in most real-life testing scenarios, the software failure intensity increases initially and then decreases. The Weibull coverage function (appeared in the literature as generalized Goel-Okumoto(GO) model [Goel 1985]) was proposed to capture this increasing/decreasing nature of the failure intensity. The nature of the failure occurrence rate per fault is determined by the parameter , and is increasing for < 1 and decreasing for

> 1. The exponential coverage function is a special case of the Weibull coverage function for = 1. Refer to Table 1 for expressions for m(t), (t), c(t), and h(t) for the Weibull coverage function.

9

Table 1. Coverage Functions - m(t), (t), c(t) and h(t). Coverage Function m(t) (t) c(t) h(t) Exponential a(1 ? e?gt ) age?gt 1 ? e?gt g

Weibull a(1 ? e?gt ) ag e?gt t ? 1 ? e?gt g t ? gt S-shaped a[1 ? (1 + gt)e?gt ] ag te?gt 1 ? (1 + gt)e?gt gt 1

1

2

2

1+

5.3

S-Shaped coverage function

The S-shaped coverage function (appeared in the literature as S-shaped reliability growth model [Yamada et al. 1983]) captures the software error removal phenomenon in which there is a time delay between the actual detection of the fault and its reporting. The testing process in this case can be seen as consisting of two phases: fault detection and fault isolation. The S-shaped coverage function has an increasing failure occurrence rate per fault. The expressions for m(t), (t), c(t), and h(t) for the S-shaped coverage function are presented in Table 1.

6

Log-logistic coverage function

The coverage functions discussed in Section 5 exhibit either a constant, monotonic increasing or monotonic decreasing failure occurrence rates per fault. The rate at which individual faults manifest themselves as testing progresses, can also exhibit a increasing/decreasing behavior. The increasing/decreasing behavior of the failure occurrence rate per fault can be captured by the hazard of the log-logistic distribution [Leemis 1995]. The log-logistic coverage function is given by Equation (17). ) c(t) = 1 +(t(t )

(17)

The failure intensity, (t), the cumulative expected number of faults, m(t), and the hazard function h(t), are given by Equations (18), (19) and (20) respectively. ?

(t) = a (t)  [1 + (t) ]

1

(18)

2

) m(t) = a 1 +(t(t )

(19)

c0 (t) = (t)? h(t) = 1 ? c(t) 1 + (t)

1

(20)

10

Failure Occurrence Rate per Fault vs. Time 0.018

Exponential S−shaped Weibull (0.9) Log−logistic

0.016

Failure Occurrence Rate per Fault

0.014

0.012

0.01

0.008

0.006

0.004

0.002

0

0

50

100

150

200

250

Time

Figure 3. Failure occurrence rate per fault for various coverage functions. Failure Intensity vs. Time 0.3

Exponential 0.25

S−shaped Weibull Log−logistic

Failure Intensity

0.2

0.15

0.1

0.05

0 0

50

100

150

200

250

Time

Figure 4. Failure intensity for various coverage functions. Figure 3 shows the failure occurrence rate per fault and Figure 4 shows the failure intensity for exponential, Weibull, S-shaped and log-logistic coverage functions. The parameters of the coverage functions and \a" are set in such a way that there are about 26 failures in 250 time units. These values are chosen for the sake of illustration only.

7

Parameterization of the ENHPP model The parameters of the ENHPP model are:

 Expected number of faults initially residing in the software, denoted by a.  Coverage as a function of testing time, c(t). Figure 5 shows the di erent ways in which the ENHPP model can be parameterized in the context of the di erent phases of the software life-cycle as given by the waterfall model [Boehm 1986]. A conventional way is to drive the ENHPP model using the failure data collected during testing to estimate a, and parameters 11

Requirements Specification

Design & Verification

Historical Data

Implementation Phase

Static Complexity Metrics

Regression Tree Models, Fault density, Poisson analysis, et. al. Estimate of ‘‘a’’ ENHPP Model

Predict R(t),m(t), λ (t)

Measured Test Coverage Failure Data

Testing and Validation Phase

Measured Test Coverage Estimate of ‘‘a’’

ENHPP Model

ENHPP Model

Predict R(t),c(t),m(t),λ (t)

Predict R(t),m(t),λ (t)

Deployment & Maintenance

Figure 5. Di erent ways of parameterizing the ENHPP model in the context of the Waterfall model. of c(t). A second way is to estimate a using the failure data, and measure coverage during the functional testing of the software product using a coverage measurement tool such as ATAC [Lyu et al. 1994]. Yet another approach is to estimate a from software metrics using techniques such as the regression tree model described in the previous paper, and measure coverage during the functional testing of the software product using a coverage measurement tool such as ATAC [Horgan and Mathur 1992]. By providing a way to combine the estimates obtained from software metrics with coverage measurements that can be obtained during the functional testing, the ENHPP model o ers a framework to combine three important methods (test coverage, complexity metrics, and failure data based) of predicting the reliability of a software product as shown in Figure 6.

8

Early prediction of software reliability

The traditional approach to reliability prediction using software reliability growth models requires the estimation of parameters from either the interfailure times or grouped data, using maximum likelihood or least square approaches. The more commonly used maximum likelihood estimates which possess many desirable properties such as minimum variance, and unbiasedness [Wood 1996], may not even exist, or converge to a reasonable value earlier during the testing phase, due to the unavailability of sucient failure data. It is useful to be able to estimate the reliability growth earlier in the testing phase, so that adequate 12

QUANTIFICATION OF SOFTWARE QUALITY

STATIC COMPLEXITY METRICS

TEST COVERAGE

FAILURE DATA

ENHPP MODEL

Figure 6. Combination of three approaches to reliability prediction. resources can be allocated so as to release the software at a desired time. This problem has been addressed by Kna et al. [Kna and Morgan 1996], Hossain et al. [Hossain and Dahiya 1993], and Joe [Joe 1989]. However, no practical solution to the problem is presented except that the users are warned of the possibility of the non-existence of the maximum likelihood estimates, and are advised against estimating the reliability until a sucient number of failure data are accumulated. Xie et al. [Xie et al. 1997] present a methodology to circumvent this problem, which relies on the use of the information from the testing phases of similar past projects. The disadvantage of the approach discussed by Xie lies in the assumption that the current software product and the past project, are developed and tested in similar environments. This assumption needs to be veri ed and it may not hold in general. The ENHPP model provides an alternative approach for early prediction of software reliability, which does not require the use of any past information. The ENHPP model interprets the parameters of the nite failure NHPP models as the expected number of faults that would be detected in in nite testing time and coverage as a function of testing time. Thus substituting m(t) in Equation (9) by N which is the number of failures observed up to time t, and c(t) by coverage measured up to time t, we can obtain an estimate of the parameter \a", which is the expected number of faults that would be detected in in nite time. From Equation (9), \a" is given by:

of failures observed by time t (N ) a = NumberCoverage measured by time t

(21)

The parameters of the coverage function can also be obtained from coverage measurements at various points during the testing phase, using the non-linear least squares estimation method [Bates and Chambers 1993]. Once the parameters have been estimated, predictions about the reliability, failure intensity, etc. can be obtained. These predictions can be used in determining the release time, and allocation of resources to ensure timely release of the software. 13

9

Predictions in the operational phase

The ENHPP model can be used to make predictions about the operational behavior of a software product. As the software product transitions from the testing phase into the operational phase, it becomes essential to provide estimates of the key parameters in the operational phase in order to make accurate predictions about various metrics of interest. The fundamental di erence between the test phase execution and the operational execution can be captured through coverage and fault detection probability. Towards this end, we rst compute the expected number of faults remaining at the end of the testing phase. In this section, the test phase parameters are identi ed by the subscript T , and the operational phase parameters are identi ed by the subscript L. For instance, KT (t) denotes the fault detection probability in the test phase, while KL (t) denotes the fault detection probability in the operational phase. Let mT (tr ) denote the expected number of faults detected by the end of the testing phase which ends at time tr . Thus the expected number of faults remaining at the end of the test phase is given by [~a ? mT (tr )]. The test coverage, cT (t), and the fault detection probability KT (t), must be adjusted to re ect the operational usage of the software product. Let these modi ed values be denoted by cL(t) and KL(t) respectively, and they re ect the way the product is used in the eld. Assuming constant fault detection probabilities in both testing and the operational phases, i.e., KT (t) = KT and KL(t) = KL, the mean value function in the operational phase can be written as:

mL (t) = [~a ? mT (tr )]  KL  cL (t) = a~  f1 ? [KT  c(tr )]g  KL  cL(t)

(22)

The resulting failure intensity function for the operational phase is:

L (t) = [~a ? mT (tr )]  KL  c0L (t) = a~  f1 ? [KT  c(tr )]g  KL  c0L (t):

(23)

It is intuitive that during the test phase KT and cT (t) should be as high as possible, whereas during the operational phase KL and cL (t) should have low values. The failure rate per fault, or the hazard function in the operational phase, is given by: 0 hL (t) = 1 ?cLc(t)(t) L

(24)

The NHPP model proposed by Yamada [Yamada and Osaki 1985] describing software error detection in the operational phase is a special case of the ENHPP model, if the failure rate per fault is assumed to be constant, or equivalently, the coverage function in the operational phase is assumed to be exponential. It is important to note that the ENHPP model does not impose any restriction on the nature of the coverage function in the operational phase of the software product. 14

If the coverage function cT (t) or fault detection KT is defective, then the expected number of faults remaining at the end of test phase, given in nite testing time, denoted by aH , is given by:

aH = a~ ? mT (1) = a~ ? a~  KT  (1 ? p T ) = a~f1 ? [KT  (1 ? p T )]g 0

0

(25)

where mT (1) is the expected number of faults detected given in nite test time, KT is the constant fault detection probability, and p T is the defect in test coverage during the test phase. The quantity aH can be thought of as the expected number of Heisen bugs [Gray 1986], or faults that will never be found even after in nite testing time. Using Equations (8) and (25), the mean number of faults detected by time t from the beginning of the operational phase, given that the testing was carried out for in nite time, is given by: 0

mH (t) = a~  [1 ? KT  (1 ? p T )]  KL  cL(t) 0

(26)

From Equation (26), the failure intensity due to Heisen bugs is given by:

dmH (t) =  (t) = a~  [1 ? K  (1 ? p )]  K  c0 (t) (27) L L H T T dt The quantity [~a ? mT (tr ) ? mL (1)] denotes the expected number of faults which remain in the software product after tr units of testing time, and in nite units of operational use. These faults may be due to defective test coverage and=or imperfect fault detection in the operational and=or testing phase. Conditioning 0

on the faults that manifest during system operation, the corresponding reliability function [Sahner et al. 1996] for the operational phase is given by: ?mL t

?mL 1

RLc (t) = e 1 ? e??meL 1 ( )

(

(

)

)

(28)

Predictions about the reliability, failure intensity and availability during the operational phase can now consider the operational usage of the software as well as the e ect of the testing process of the software. Equation (28) will be used to compute software availability as described in the sequel. In a similar manner, we can extend the above ideas to any two sequential test phases by making minor modi cations to Equations (22) and (23).

10

Software availability

In many real time applications, such as telecommunications software, software availability is a more critical system metric than reliability [DeMillo 1995]. The expression for software availability, A, during the operational phase, is as follows:

15

MTTF A = MTTF + MTTR

(29)

where MTTF is the mean time to failure and MTTR is the mean time to repair. In this case, faults occurring during operation are not being xed either because this is too expensive to do so or because they are not easily reproducible to initiate an e ective repair. In other words, we assume that faults remaining after release are elusive Heisen bugs. Faults continue to reside in the software, and there is no opportunity for availability growth. The system is rebooted in the event of a failure. MTTR is then the mean time to reboot the system. From Equation (28) MTTF is given by:

MTTF =

Z1 0

RLc (t)dt =

Z 1 e?m

? e?mL 1 dt: 1 ? e?mL 1 L (t)

(

(

0

If the coverage function is exponential, as given by:

)

cL (t) = 1 ? e?gL t

)

(30)

(31)

we can derive a closed form expression for MTTF as follows: 1 ?aLKL i X MTTF = g (1e? e?aLKL )  (aiLKiL! ) L i

Equation (29) can then be re-written as:

A=

e?aL KL gL ?e?aL KL (1

e?aL KL gL ?e?aL KL (1

(32)

=1

)

 P1 i

 P1 i )

=1

aL KL i ii

(

=1

)

!

aL KL i + MTTR ii

(

)

(33)

!

MTTF as given by Equation (30) in general may not have a closed-form solution for other coverage functions, and a numerical solution may be required.

11

Software release criteria

The basic question while testing a software product is to decide when to stop testing. Optimal release time of the software, is one of the most important issues and has been addressed extensively in the literature [Dalal and Mallows 1988; Dalal and Mallows 1992; Ross 1985]. In this section we derive expressions for the release times of the software, subject to various release criteria, based on the ENHPP framework. Most of the expressions derived in this section, can also be used to determine the minimum required coverage (we will refer to this as the product release coverage) for the release of a software product, without explicitly computing the release time. The coverage function can then be inverted to obtain the release time. The various stopping criteria which can in uence product-release time, or product release coverage are as follows: 16

11.1 Number of remaining faults If testing is to stop when a fraction  of the expected total number of faults are detected, then the minimum required coverage can be determined from Equation (8):

cT (t ) = K :

(34)

T

where  = mTa t , and KT is the constant fault detection probability during the testing phase, and a~ is the expected number of faults that would be detected in in nite testing time assuming perfect detection as well as test coverage. If the coverage function is exponential, release time t is given by: ( )

~

t = ? g1 ln (1 ? K )

(35)

T

If the coverage function is Weibull, release time t is given by:

t = (? g1 ln (1 ? K )) 1

T

(36)

In case of S-shaped and log-logistic coverage functions, a numerical solution is required to obtain release time t . 11.2 Failure intensity requirements If testing is to stop when the failure intensity as measured at the end of the test phase reaches a speci ed value f , then the release time tf can be determined from Equation (6) which yields:

cT (tf ) = a~fK tf + C T

0

(37)

where C = cT (0). It should be noted that the failure intensity function, (t), in general may not be monotonic, (refer to Figure 4), and hence there can be multiple values of tf . We must choose the value associated with the region of decreasing failure intensity. An alternative approach to obtaining tf could require that the failure intensity at the start of the operational phase (which is the failure intensity at the end of the test phase) is lower than L (0). Using Equation (23), the minimum required coverage is given by: 0

cT (tf ) = K1 ? a~K KL (0)c0 (0) T T LL

(38)

where L (0) denotes the failure intensity at the start of the operational phase. If the coverage function is exponential, then the release time tf is given by: 17

tf = ? 1g ln(1 ? ( K1 ? a~K KL (0)c0 (0) )) T T LL If the coverage function is Weibull, release time tf is given by:

(39)

(40) tf = (? g1 ln(1 ? ( K1 ? a~K KL (0)c0 (0) ))) T T LL Numerical solution is necessary to obtain the release time tf in case of s-shaped and log-logistic coverage functions. 1

11.3 Reliability requirements This rule is used in the test phase and is based on the operational phase parameters. If the required conditional reliability in the operational phase is Rr at time t after product release, then the minimum required coverage, cT (tr ) can be determined using Equation (28) as: 0

?mL 1 cT (tr ) = ln [1 ? (1 ? e a~K K)(1c?(tRr))] + a~KLcL (t ) T LL If the coverage function is exponential, release time tr is given by: (

)

(41)

0

0

?mL 1 tr = ? g1 ln(1 ? ( ln [1 ? (1 ? e a~K K)(1c?(tRr))] + a~KL cL(t ) )) T LL If the coverage function is Weibull, release time tr is given by: (

)

(42)

0

0

?mL 1 tr = (? g1 ln(1 ? ( ln [1 ? (1 ? e a~K K)(1c?(tRr))] + a~KL cL(t ) ))) (

1

)

T LL

(43)

0

0

Release time tr in case of s-shaped and log-logistic coverage functions can be obtained numerically. 11.4 Cost requirements Following the cost model given in GO [Goel and Okumoto 1979], we derive the optimal release time tr from parameters d , d and d , where d is the expected cost of removing a fault during testing; d is the expected cost of removing a fault during operation (d is at least two orders of magnitude higher than d ); and d is the expected cost of software testing per unit time. The total cost TC of testing up to time t is given by: 1

2

3

1

2

2

1

3

TC = d  mT (tr ) + d  mL (1) + d  tr 1

2

3

(44)

which can be expressed as: 18

TC = d [~a  KT  cT (tr )] + d fa~  [1 ? KT  cT (tr )]  KL  (1 ? p )g + d  tr 1

2

0

3

(45)

From Equation (45), we can obtain coverage cT (tr ) which minimizes the cost of testing, TC , and is given as Equation (46).

cT (tr ) = a~  K  [d dK  tr(1 ? p ) ? d ] + C T L where C = cT (0). If the coverage function is exponential, release time tr is given by: 3

0

2

0

1

(46)

0

tr = ? g1 ln(1 ? a~  K  [d dK  tr(1 ? p ) ? d ] ? C ) T L If the coverage function is Weibull, release time tr is given by:

(47)

3

0

2

0

1

tr = (? g1 ln(1 ? a~  K  [d dK  tr(1 ? p ) ? d ] ? C )) T L 1

3

0

2

0

1

(48)

This is a very basic cost model, and can be enhanced to include more sophisticated costs like the cost of the software failure to customer operations in the eld [Ehrlich et al. 1993], penalty cost incurred due to the delay from the scheduled delivery time [Hou et al. 1997], coverage cost, etc., to make Equation (44) more realistic. However, since the development of the stopping rules is not the emphasis here, we use a basic cost model, to illustrate the utility of the ENHPP framework to determine software release times. 11.5 Availability requirements Assuming that MTTR is the time needed to reboot and that there is no availability growth, the basic equation which needs to be solved for release time ta , based on an operational availability requirement of Ar , from Equation (29) is given by:

Ar 1 ? Ar MTTR = MTTF For an exponential coverage function, from Equation (30), we have:

(49)

1 (a K )i X Ar MTTR = e?aLKL L L  (50) 1 ? Ar gL (1 ? e?aLKL ) i i  i! This gives us Equation (51) which can be solved numerically for aL . Substituting the value of aL into Equation (52), we can determine the minimum required coverage, which can then be inverted to give ta . =1

1 (a K )i A  MTTR  g X e?aL KL L L r L  gL(1 ? e?aL KL ) i i  i! = 1 ? Ar

(51)

=1

19

? aL cT (ta ) = a~aK

(52)

T

The release criteria actually used in a particular software project will depend on the metric which is of great importance in that context. For example, in case of safety-critical applications, maximizing reliability is of paramount importance, whereas in case of telecommunications software, availability is an essential metric to maximize.

12

Estimation of parameters

In this section we develop expressions to estimate the parameters of the coverage functions presented in Section 5 and Section 6 and a based on time between failures data. Let ftk ; k = 1; 2; : : :g denote the sequence of times between successive software failures. Then tk denotes the time between (k ? 1)st and kth failure. Let sk denote the time to failure k, so that:

sk

k X = t:

(53)

k

i

=1

The joint density or the likelihood function of S ; S ; : : : ; Sn can be written as [Trivedi 1982]: 1

2

fS ;S ;:::;Sn (s ; s ; : : : ; sn ) = e?m sn 1

2

1

(

2

)

Yn (s ): i

(54)

i

=1

For a given sequence of software failure times s = (s ; s ; : : : ; sn ), that are realizations of the random variables S ; S ; : : : ; Sn , the parameters of the ENHPP model for di erent coverage functions are estimated using the maximum likelihood method. The log likelihood in case of exponential coverage is given by [Goel and Okumoto 1979]: 1

1

2

2

L(a; gjs) = n log a + n log g ? a(1 ? e?gsn ) ? g

Xn s i

k

(55)

=1

Maximizing Equation (55) with respect to a and g, we have:

n = 1 ? e?gsn a

(56)

n n =X ?gsn g i si + asn e

(57)

and

=1

Solving these two simultaneous non-linear equations, we obtain the point estimates of a and g. Similarly, the log likelihood in case of S-shaped coverage can be written as: 20

L(a; gjs) = ?a(1 ? (1 + gsn )e?gsn ) + n log a + 2n log g + Maximizing Equation (58) with respect to a and g, we have:

Xn log s ? g Xn s i

i

i

i

(58)

=1

=1

n = 1 ? (1 + gs )e?gsn n a

(59)

n 2n = ags e?gsn + X si n g

(60)

and 2

i

=1

Solving the above two coupled non-linear equations, we obtain the point estimates of parameters a and g. The log likelihood in case of Weibull coverage can be written as:

L(a; g; js) = ?a(1 ? e?gt ) + n log a + n log g +n log ? g

X s + ( ? 1) X log s n

i

n

i

i

(61) i

=1

=1

For a xed value of , Equation(62), can be maximized with respect to a and g to give:

n = 1 ? e?gsn a

(62)

n n =X

?gsn g i si + asn e

(63)

and

=1

Simultaneously solving the above two equations for a xed value of , gives the point estimates of a and g. Di erent values of give a family of Weibull coverage functions, and we choose the best among these. The log likelihood in case of log-logistic coverage is:  L(a; ; js) = ?a ssn n  + n log a + n log  (

)

P +n log  + ( ? 1) ni 1+(

)

=1

P log si ? 2 ni

=1

log (1 + (si ) )

(64)

Maximizing Equation (64) with respect to a, , and  gives:  a = n(1 (+s(s)n ) ) n

(65) 21

=

n sn  2(1 +  sn  )

Xn

si     i 1 + si 

? n log  ? Pni log si

(66)

=1

n =n k

sn t  2

log [1+(

)

]

P +2 n i

=1

si

(

=1

k log(s ) i  1+(si )

(67)

)

Solving the above three equations simultaneously, yields the point estimates of the parameters a,  and .

13

Validation of the ENHPP model

The ENHPP model was validated using the exponential, Weibull, S-shaped, and log-logistic coverage functions with ve di erent interfailure data sets. The following tests were carried out for validation. 13.1 Trend analysis Software reliability studies are usually based on the application of reliability growth models to obtain various measures of interest. Reliability growth can be analyzed by trend tests. In this section we discuss the most frequently used trend tests to analyze reliability growth for failure data collected in the form of interfailure times. The two trend tests that are commonly carried out are [Kanoun and Laprie 1994]:

 Arithmetic Mean Test: This test consists of computing the arithmetic mean  (i) of the observed interfailure times tj ; j = 1; 2; : : :; i:

i X 1  (i) = t

ij

j

(68)

=1

An increasing sequence of  (i) indicates reliability growth and a decreasing sequence indicates reliability decay.

 Laplace Test: Laplace test is superior from an optimality point of view and is recommended for

use when the NHPP assumption is made [Gauodin 1992]. Let N (t) denote the cumulative number of faults detected over the period (0; t). The failure intensity (t) determines reliability growth=reliability decay=stable reliability. A decreasing (t) implies reliability growth, increasing (t) implies reliability decay, and a constant (t) implies stable reliability. The test procedure is to compute the Laplace factor l(t) given by [Cox and Lewis 1978]: 22

X Xt ? t j 2 n j q

Nt n ( )

1

l(t) =

Nt

( )

=1

t

=1 1

Nt

12

(69)

( )

The Laplace factor is evaluated step by step, after every failure occurrence. Here t is then equal to the time of occurrence of the ith failure, and the failure at time t is excluded. Equation (69) can then be modi ed as:

Xi t

j i? X n X j tj ? 2 n j q 1

i? 1

l(t) =

1

=1

=1

t

=1

i? 1

12(

(70)

1)

Intuitively, the Laplace test can be interpreted as follows [Kanoun et al. 1997]:

{ {

Pi

j

i? 1

1

tj is the midpoint of the observation interval.

Pni? Pnj

=1

2

1

=1

=1

tj is the statistical center of interfailure times.

Under the assumption of failure intensity decrease (increase), the interfailure times tj will tend to occur before (after) the midpoint of the observation interval; hence the statistical center tends to be smaller (larger) than the mid-interval. Laplace factor can be interpreted as follows:

{ Negative values indicate a decreasing failure intensity, and thus reliability growth. { Positive values indicate an increasing failure intensity, and thus a decrease in the reliability. { Values between ?2 and +2 indicate stable reliability. Trend analysis can signi cantly help in choosing the appropriate coverage function for a given sequence of interfailure times, so that they can be applied to data displaying trends in accordance with their assumptions rather than blindly. Using a coverage function for the analysis of a failure data set, without taking into consideration the trend displayed by the data can lead to unrealistic results, when the trend displayed is di erent than that assumed by the coverage function [Kanoun and Laprie 1996]. The classi cation of the failure data according to the trend and the corresponding coverage functions are summarized in Table 2.

23

Table 2. Trend and corresponding coverage functions. Trend Coverage Function Reliability growth Exponential and Weibull ( < 1) Reliability decay followed by growth Log-logistic / S-shaped Stable reliability Homogeneous Poisson process Table 3. Conditional reliability expressions. Coverage Function R(tjs) ? Exponential e?a e gs ?e?g s t

Weibull e?a e?gs ?e?b t s S-shaped e?a gs e?gs ? g t s e?g t t  t s  Log-logistic e?a t  ?  t s  [

[

[(1+

)

[

( ) 1+( )

( + )

]

( + )

]

( +s)

(1+ ( + ))

( ( + )) 1+( ( + ))

]

]

13.2 Goodness-of- t The ability of a coverage function to reproduce the observed failure behavior of the software, also known as its retrodictive capability [Kanoun et al. 1991], is examined by the goodness-of- t test. The observed failure data is used to estimate a, and the parameters of the four coverage functions. The estimated mean value function is computed and plotted along with the observed mean value function. The error sum of squares is then calculated to evaluate the goodness-of- t. The lower the error sum of squares the better is the t. 13.3 Bias The extent to which a coverage function, based on the observed data up to a given time, correctly predicts the failure behavior in the future is determined by bias. We focus upon the simplest form of prediction, i.e., use of the data to predict the current reliability of the software under consideration. This is accomplished through the distribution function given in terms of the conditional reliability as follows:

F (Si jSi? ) = 1 ? R(Si jSi? ) = P (Si < si jSi? = si? ): 1

1

1

1

(71)

where Si? is the time of (i ? 1)st failure and Si is the time of the ith failure. The conditional reliability expressions for the exponential, Weibull, S-shaped and log-logistic coverage functions are given in Table 3. The objective then is to compare the relative merits of the prediction systems corresponding to the various coverage functions. The raw data available is in the form of times between successive failures, i.e, 1

24

t ; t ; : : : ; ti , or equivalently the times of occurrence of the failures, i.e., S ; S ; : : : ; Si . A prediction system, 1

2

1

2

which should be able to predict future failure times from the observed past, is comprised of the following steps [Abdel-Ghally et al. 1989]:

 A probabilistic model which speci es the joint distribution of any subset of Sj 's, conditional on a parameter (unknown) .

 A statistical inference procedure for , that uses the available data, which are realizations of Sj 's.  A prediction procedure combining the above two steps, which allows us to make probabilistic statements about the future failure times. Thus, one needs a good estimator of F (Si jSi? ), and the prediction system described above, to calculate a predictor F~ (Si jSi? ). The actual failure time si is then observed and transformed by using the probability integral transform corresponding to the predicted distribution 1

1

ui = F (Si jSi? )

(72)

1

Ideally F~ (Si jSi? ) are supposed to be identical to the true F (Si jSi? ). In that case, ui would be uniformly distributed [Rosenblatt 1952]. Thus, we develop a set of independent ui 's to be tested. Intuitively, if the coverage function is optimistically biased then the estimated next time to failure is higher than what is actually observed, while if the coverage function is pessimistically biased, the estimated next time to failure is lower than what is actually observed [Nikora and Lyu 1995]. Various techniques exist to analyze the set of independent ui 's that are generated. We analyze the quality of this sequence using the u-plot [Abdel-Ghally et al. 1989]. This is equivalent to comparing the empirical cumulative distribution function (cdf) of the ui 's and the cdf of the uniform distribution which is a line of unit slope through the origin. The \distance" between the empirical cdf and the uniform cdf is summarized using the Kolmogorov distance, which is the maximum absolute vertical di erence. 1

1

13.4 Bias trend The u-plot detects consistency of the bias of the predictions from reality. However, since the temporal ordering of ui 's is not taken into consideration while computing the bias, some kinds of departures average out, and cannot be detected. It is thus necessary to examine the ui 's for trend. One of the techniques to examine the trend in the bias known as the y-plot is described here [Abdel-Ghally et al. 1989]. This technique gives us a plot which has a similar interpretation to that of the u-plot. As discussed in the previous section, the sequence of ui 's in an ideal situation should look like a sequence of independent identically distributed random variables on (0; 1). The sequence of ui 's which is on the constant range (0; 1) looks very regular 25

and any trend is dicult to detect. The transformation xi = ? log (1 ? ui ) produces a sequence of numbers which should look like realizations of independent identically distributed unit exponential random variables. Thus this sequence should look like the realizations of successive interfailure times of a homogeneous Poisson process. Any trend in ui 's will manifest as the nonconstant rate for this process. The transformed sequence is then normalized onto (0; 1). Thus for a sequence of predictions from stage s to i, we have:

Xk x

yk = j i s

j

Xx =

j s

k = s; : : : ; i ? 1

(73)

j

=

A step function with steps of size 1=(i ? s + 1) at the points ys ; ys ; : : : ; yi? is drawn from the left on the interval (0; 1), as in the case of u-plot. The distance is again summarized by Kolmogorov distance, a large Kolmogorov distance indicates nonstationarity in the prediction errors. +1

14

1

Software failure data analyses

The validation techniques described in the previous section were used to analyze ve failure data sets. It is very important to note that the two crucial parameters p and K are assumed to be equal to unity in the following analysis, where p is the probability that some fault-sites will never be covered and hence the faults present at those sites will never be detected, and K is the probability of detecting a fault when the potential fault site is covered by a test case. The current techniques, to the best of our knowledge cannot estimate these two parameters based on failure data. Coverage measurement using tools like ATAC [Horgan et al. 1994] or Hindsight is necessary during the functional testing of the software, for the estimation of p . Alternatively stated, a thorough knowledge of the software and the test cases is necessary to estimate p and K . In the absence of this information, the number of faults that we estimate are only those that are present at the potential fault-sites that can be covered and among those that can be detected. Data set 1, referred to hereafter as DS1 contains 36 months of defect discovery times for a release of Controller Software consisting of about 500,000 lines of code installed on over 100,000 controllers. The defects are those that were present in the code of a particular release of the software, and were discovered as a result of failures reported by the users of that release, or possibly of the follow-on release of the product [Kenney 1993]. Data Set 2, referred to hereafter as DS2 collected from a single-user workstation at the Center for Software Reliability represents 129 failures, that are known to be due to software faults. It is a subset of another data set collected at the Center of Software Reliability, which consists of 397 user-perceived failures and includes genuine software failures, plus failures caused by usability problems, inadequate documentation, 0

0

0

0

26

and so on [Brocklehurst and Littlewood 1992]. Data Set 3, referred to as DS3 is also from the Center for Software Reliability and consists of 104 interfailure times. Data Set 4, referred to as DS3 is collected from a desktop workstation-based Flight Dynamics System developed at the Charles Stark Draper Labs. The Flight Dynamics system includes about 190,000 lines of FORTRAN code and operates under the VAX/VMS Operating System [Cefola et al. 1994]. The data set is in the failure count form, and it is converted to time between failures form using CASRE [Lyu and Nikora 1992]. The data set comprises of 111 failures. Data Set 5, referred to as DS5 is collected from Project A developed at Charles Stark Draper Labs. The data set consists of 161 failures. Figure 7 shows the results of the trend tests, goodness-of- t, bias, and bias trend for DS1. In addition, it also shows the coverage and failure intensity during testing estimated from failure data, and conditional reliability beyond the time of the last failure. The trend tests indicate that the reliability initially decays and then grows, and consistent with this trend the S-shaped coverage function has the best retrodictive and predictive capability as seen in the goodness-of- t and u-plot. The S-shaped coverage function also exhibits least non-stationarity in the prediction errors as seen in the y-plot. The estimated coverage at the time of the last failure is 0:99, the failure intensity at the time of the last failure is 0:0052 and the residual number of faults is 0:93. The low failure intensity at the time of last failure is consistent with the fact that the expected number of faults remaining is also very low, and thus the conditional reliability beyond the time of last failure decays slowly and goes to zero after about 600 time units beyond the last failure. Figure 8 shows the results of the trend tests, goodness-of- t, bias, and bias trend for DS2, along with the coverage and failure intensity estimated from failure data during testing, and conditional reliability beyond the time of the last failure. Reliability growth is indicated by the trend tests, which is con rmed by the fact that the Weibull coverage function with = 0:55, ( < 1) has the best retrodictive and predictive capability as seen in the goodness-of- t and u-plot respectively. The same coverage function has the least non-stationarity in the prediction errors as seen in the y-plot. The estimated coverage at the time of the last failure is 0:65, the expected number of faults remaining at the time of last failure is 69:0, and the failure intensity is 0:00045. Moderate coverage is achieved during testing, as a result the expected number of faults remaining is quite high. The conditional reliability decreases to zero at about 6000 time units beyond the last failure. Figure 9 shows the results of the trend tests, goodness-of- t, bias, and bias trend for DS3. In addition, it also shows the coverage and failure intensity during testing estimated from failure data, and conditional reliability beyond the time of the last failure. The trend tests indicate that reliability grows, which is 27

Arithmetic mean test

Laplace test

12

4

11

2

10 0

Laplace Factor

Arithmetic Mean

9

8

−2

−4

7 −6 6

−8

5

4

0

20

40

60 i

80

100

120

−10

0

20

40

60 i

Expected number of faults during testing

80

100

120

Estimated coverage during testing

120

1

Field Data Exponential S−shaped Weibull (0.90)

100

0.9

0.8

0.7 Expected Number of Faults

80

Coverage

0.6

60

0.5

0.4 40 0.3

0.2 20 0.1

0

0

100

200

300

400

500 600 Time

700

800

900

0

1000

0

100

200

300

400

u−plot

700

800

900

1000

1

Exponential S−shaped Weibull (0.90)

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.1

0.2

0.3

0.4

0.5

Log−logistic S−shaped Weibull (0.80)

0.9

0.8

0

500 600 Time

y−plot

1

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

Estimated failure intensity‘d5ring‘t%sting

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Conditional reliability beyond last failure

0.35

1

0.3

0.9

0.25 Conditional Reliability

Failure Intensity

0.8 0.2

0.15

0.7

0.6 0.1

0.5

0.05

0

0

200

400

600 Time

800

1000

1200

0.4

0

100

200

300 400 Time Beyond Last Failure

500

600

Figure 7. Validation results for DS1.

28

Arithmetic mean test

Laplace test

800

5

700

600

Laplace Factor

Arithmetic Mean

0 500

400

−5 300

200

100

0

20

40

60

80

100

120

−10

140

0

20

40

60

80

i

120

140

Estimated coverage during testing

140

1

Field Data Log−logistic S−shaped Weibull (0.65)

120

0.9

0.8

100

0.7

0.6 80

Coverage

Expected Number of Faults

100

i

Expected number of faults during testing

60

0.5

0.4

0.3

40

0.2 20 0.1

0

0

1

2

3

4

5

6

7

0

8

Time

0

1

2

3

4

u−plot

7

8 4

x 10

1

Log−logistic S−shaped Weibull (0.55)

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.1

0.2

0.3

0.4

0.5

Log−logistic S−shaped Weibull (0.55)

0.9

0.8

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

Failure Intensity vs. Time

1

0.009

0.9

0.008

0.8

0.007

0.7

0.006

0.005

0.004

0.2

0.001

0.1

2

3

4

5 Time

0.6

0.7

0.8

0.9

1

0.4

0.3

1

0.5

0.5

0.002

0

0.4

0.6

0.003

0

0.3

Conditional reliability beyond last failure

0.01

Conditional Reliability

Failure Intensity

6

y−plot

1

0.9

0

5 Time

4

x 10

6

7

8

9 4

0

0

1000

x 10

2000

3000 4000 Time Beyond Last Failure

5000

6000

Figure 8. Validation results for DS2. 29

con rmed by the fact that the Weibull coverage function with = 0:9 ( < 1) which has a decreasing failure occurrence rate per fault has the best retrodictive and predictive capabilities as well as least nonstationarity in the prediction errors as shown in the goodness-of- t, u-plot, and y-plot respectively. The estimated coverage at the time of last failure is 0:83, the failure intensity is 0:22, and the expected number of faults remaining is 21:2. Although the coverage achieved is not very low, the absolute value of the failure intensity at the time of the last failure is quite high, as a result the conditional reliability beyond the time of the last failure decays rapidly, and decreases to zero after about 30 time units after the last failure. Figure 10 shows the results of the trend tests, goodness-of- t, bias, and bias trend for DS4, along with the coverage and failure intensity during testing estimated from failure data, and conditional reliability beyond the time of the last failure. The trend tests indicate that the reliability grows which is captured by the exponential coverage function which also has the best retrodictive and predictive capabilities as shown in the goodness-of- t and the u-plot respectively. The Weibull coverage function with = 1:05 exhibits least non-stationarity in the prediction errors, followed by the exponential coverage function, as indicated by the y-plot. The di erence in Kolmogorov distance for bias trend between the Weibull coverage function, and exponential coverage function is 0:01. The estimated coverage at the time of last failure is 0:5, the failure intensity is 0:098, and the expected number of faults remaining is 98:2. Low coverage is achieved during testing, as a result of which a signi cant number of faults are remaining in the software, and the conditional reliability beyond the time of last failure decays quite rapidly. Figure 11 shows the results of the trend tests, goodness-of- t, bias, and bias trend for DS5, along with the coverage and failure intensity during testing estimated from failure data, and conditional reliability beyond the time of the last failure. Trend tests indicate reliability decay followed by reliability growth which suggests the use of either S-shaped or log-logistic coverage function. Log-logistic coverage function provides the best retrodictive capability, and the best predictive capability, while s-shaped coverage function exhibits the least non-stationarity in the predictions. The estimated coverage at the time of last failure is 0:96, and the residual number of faults is 5:8. The trend exhibited by the data sets, the results of the Kolmogorov-Smirnov goodness-of- t test, and the minimum Kolmogorov distance for bias and the bias trend tests is summarized in Table 4, Table 5, Table 6 and Table 7 respectively.

15

Conclusions

In this paper, we have presented an enhanced non-homogeneous Poisson process (ENHPP) software reliability model which allows the explicit incorporation of test coverage measurements into the black box, nite failure NHPP models. The test coverage can be either measured during the functional testing of the 30

Arithmetic mean test

Laplace test

160

2

140

1

0 120 −1 Laplace Factor

Arithmetic Mean

100

80

−2

−3

60 −4 40 −5

20

0

−6

0

20

40

60 i

80

100

−7

120

0

20

40

60 i

Expected number of faults during tesing

80

100

120

Estimated coverage during testing

120

1

Field Data Log−logistic Exponential Weibull (0.90)

100

0.9

0.8

0.7 Expected Number of Faults

80

Coverage

0.6

60

0.5

0.4 40 0.3

0.2 20 0.1

0

0

50

100

0

150

0

50

100

Time

y−plot

1

1

Log−logistic S−shaped Weibull (0.90)

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Log−logistic S−shaped Weibull (0.90)

0.9

0.8

0

150

Time

u−plot

0.7

0.8

0.9

0

1

0

0.1

0.2

Estimated failure intensity during testing

0.3

0.4

0.5

0.6

0.7

0.8

0.9

70

80

90

1

Conditional reliability beyond last failure

2.5

1

0.9

2

0.8

Conditional Reliability

Failure Intensity

0.7

1.5

1

0.6

0.5

0.4

0.3

0.5

0.2

0.1

0

0

50

100

150

0

0

10

20

Time

30

40 50 60 Time Beyond Last Failure

100

Figure 9. Validation results for DS3. 31

Arithemtic mean test

Laplace test

8

1

7 0 6 −1 Laplace Factor

Arithmetic Mean

5

4

−2

3 −3 2 −4 1

0

0

10

20

30

40

50 i

60

70

80

90

100

−5

0

10

20

30

40

Expected number of faults during testing

Field Data Exponential S−shaped Weibull (1.15)

80

0.8

70

0.7

60

0.6 Coverage

Expected Number of Faults

70

80

90

100

0.9

50

0.5

40

0.4

30

0.3

20

0.2

10

0.1

0

100

200

300

400

500

0

600

0

100

200

300

Time

400

500

600

Time

u−plot

y−plot

1

1

Exponential S−shaped Weibull (1.05)

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Exponential S−shaped Weibull (1.05)

0.9

0.8

0

60

1

90

0

50 i

Estimated coverage during testing

100

0.8

0.9

1

0

0

0.1

0.2

0.3

Estimated failure intensity during testing

0.4

0.5

0.6

0.7

0.8

0.9

1

Conditional reliability beyond last failure

0.2

1

0.9 0.18 0.8

0.7 Conditional Reliability

Failure Intensity

0.16

0.14

0.6

0.5

0.4

0.12 0.3

0.2 0.1 0.1

0.08

0

100

200

300

400

500

600

700

0

0

10

20

Time

30

40 50 60 Time Beyond Last Failure

70

80

90

100

Figure 10. Validation results for DS4. 32

Arithmetic mean vs. i

Laplace factor vs. i

12

10

8

10

6

Laplace Factor

Arithmetic Mean

8

6

4

2

4 0

2

0

−2

0

20

40

60

80

100

120

140

160

−4

180

0

20

40

60

80

100

i

120

140

160

180

350

400

450

i

Expected Number of Faults vs. Time (DS1)

Estimated coverage during testing

180

1

Field data Log−logistic S−shaped

160

0.9

0.8

140

0.6 100

Coverage

Expected Number of Faults

0.7 120

80

0.5

0.4 SSE (S−shaped) = 3.5126e+04 SSE (Log−logistic) = 9.6939e+03

60

0.3

40

0.2

20

0

0.1

0

50

100

150

200

250 Time

300

350

400

450

0

500

0

50

100

150

200 250 Testing Time

u−plot 1

1

Log−logistic S−shaped Weibull (1.15)

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0.1

0.2

0.3

0.4

0.5

Log−logistic S−shaped Weibull (1.15)

0.9

0.8

0

300

y−plot

0.6

0.7

0.8

0.9

0

1

0

0.1

0.2

Failure intensity during testing

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Conditional reliability beyond last failure 1

0.9

0.8

0.8

0.7

0.7 Conditional Reliability

Failure Intensity

0.6

0.5

0.4

0.6

0.5

0.4

0.3 0.3 0.2

0.2

0.1

0

0.1

0

50

100

150

200 250 Testing Time

300

350

400

450

0

0

100

200

300 Time Beyond Release

400

500

600

Figure 11. Validation results for DS5. 33

Table 4. Trends in the data. Data Set Trend DS1 Reliability decay followed by growth DS2 Reliability growth DS3 Reliability growth DS4 Reliability growth DS5 Reliability decay followed by growth

Table 5. Goodness-of- t - Kolmogorov-Smirnov test. DS1 DS2 DS3 DS4 DS5 Exponential 7586:5 6661:9 1670.3 62635 S-shaped 1076.3 49124:0 22010 2897 35126 Log-logistic 2746:5 10665 9129 22440 9693.9 Weibull ( ? 0:55) 9510:5 Weibull ( ? 0:60) 8284 Weibull ( ? 0:65) 7443 Weibull ( ? 0:80) 12286 Weibull ( ? 0:85) 11081 Weibull ( ? 0:90) 9892:9 4419.6 Weibull ( ? 0:95) 5469:6 64656 Weibull ( ? 1:05) 7989:7 1795:1 60148 Weibull ( ? 1:10) 10970 1783:8 57883 Weibull ( ? 1:15) 1791:3 55650 Best t is indicated in boldface

34

Table 6. Coverage Bias - Kolmogorov distance. DS1 DS2 DS3 DS4 DS5 Exponential 0:1807 0.0813 0:3771 S-shaped 0.1569 0:5402 0:5351 0:2384 0.3770 Log-logistic 0:2582 0:4621 0:4558 0:4673 0.3770 Weibull ( ? 0:55) 0.1876 Weibull ( ? 0:60) 0:1794 Weibull ( ? 0:65) 0:1909 Weibull ( ? 0:80) 0:1702 Weibull ( ? 0:85) 0:1788 Weibull ( ? 0:90) 0:1827 0.2033 Weibull ( ? 0:95) 0:2815 Weibull ( ? 1:05) 0:1094 0:3771 Weibull ( ? 1:10) 0:1112 Weibull ( ? 1:15) 0:1175 Smallest Kolmogorov distance is indicated in boldface Table 7. Coverage Bias trend - Kolmogorov distance. DS1 DS2 DS3 DS4 DS5 Exponential 0:2352 0:1465 0:2645 S-shaped 0.0926 0:5237 0:1319 0.1587 0.1672 Log-logistic 0:1969 0:2601 0.1222 0:1617 0:1703 Weibull ( ? 0:55) 0.1119 Weibull ( ? 0:60) 0:1484 Weibull ( ? 0:65) 0:1918 Weibull ( ? 0:80) 0:3207 Weibull ( ? 0:85) 0:3023 Weibull ( ? 0:90) 0:2825 0:2843 Weibull ( ? 0:95) 0:3293 Weibull ( ? 1:05) 0:1455 0:2581 Weibull ( ? 1:10) 0:1476 Weibull ( ? 1:15) 0:1488 Smallest Kolmogorov distance is indicated in boldface 35

software product, or can be estimated from the failure data collected during testing. Furthermore, the model allows for imperfect detection coverage and provides a new decomposition of the mean value function. The new decomposition of the mean value function allows us to attribute the nature of the failure intensity function to the failure occurrence rate per fault. We have shown that the previously reported NHPP models are instances of the ENHPP model for di erent coverage functions. These existing NHPP models and their corresponding coverage functions can capture constant, monotonically increasing, and monotonically decreasing nature of the failure occurrence rate per fault. The log-logistic coverage function proposed here can capture the increasing=decreasing nature of the failure occurrence rate per fault, and hence may be suitable for application to data sets which exhibit initial reliability decay followed by reliability growth. We have presented a methodology for early reliability prediction based on the ENHPP framework. Discussion of the parameterization of the ENHPP model leads us to unify complexity metrics, failure data and test coverage based approaches to achieve reliable software. Expressions for predictions into the operational phase, optimal software release times and software availability are developed using the ENHPP model. We have validated the ENHPP model for four coverage functions using goodness-of- t, bias, and bias trend criteria using ve failure data sets.

REFERENCES Abdel-Ghally, A. A., P. Y. Chan, and B. Littlewood (1989), \\Evaluation of Competing Software Reliability Predictions"," IEEE Trans. on Software Engineering SE-12 , 9, 538{546. Bates, D. M. and J. M. Chambers (1993), Statistical Models in S, J. M. Chambers and T. J. Hastie, Eds., chapter Nonlinear Models, Chapman & Hall, New York, pp. 421{453. Boehm, B. W. (1986), \\A Spiral Model of Software Development and Enhancement"," ACM SIGSOFT Software Engineering Notes 11 , 4, 14{24. Brocklehurst, S. and B. Littlewood (1992), \\New Ways to Get Accurate Reliability Measures"," IEEE Software 9 , 4, 34{42. Brocklehurst, S. and B. Littlewood (1996), Handbook of Software Reliability Engineering, M. R. Lyu, Editor , chapter Techniques for Prediction Analysis and Recalibration, McGraw-Hill, New York, NY, pp. 119{166. Cefola, P., R. Proulx, R. Metzinger, M. Cohen, and D. Carter (1994), \\The RADARSAT Flight Dynamics System: An Extensible, Portable, Workstation-based Mission Support System"," In Proc. AIAA / AAS Astrodynamics Conference . Chen, M., M. R. Lyu, and E. Wong (1996), \\An Empirical Study of the Correlation between Code Coverage and Reliability Estimation"," In Proc. of METRICS'96 , Berlin, Germany, pp. 133{141. Chen, M., M. R. Lyu, and W. E. Wong (1997), \\Incorporating Code Coverage in the Reliability Estimation 36

for Fault-Tolerant Software"," In Proc. of Seventeenth Intl. Symposium on Reliable and Distributed Systems , Durham, North Carolina, pp. 45{52. Chen, M., A. P. Mathur, and V. J. Rego (1994), \\A Case Study to Investigate Sensitivity of Reliability Estimates to Errors in Operational Pro les"," In Proc. of the Fifth International Symposium on Software Reliability Engineering , Monterey, CA. Chen, M., A. P. Mathur, and V. J. Rego (1995), \\E ect of Testing Techniuqes on Software Reliability Estimates Obtained Using A Time-Domain Model"," IEEE Trans. on Reliability 44 , 1, 97{103. Cox, D. R. and P. A. W. Lewis (1978), The Statistical Analysis of a Series of Events , London: Chapman and Hall, London. Dalal, S. R. and C. L. Mallows (1988), \\When Should One Stop Testing Software ?"," Journal of the American Statistical Association 83 , 403, 872{879. Dalal, S. R. and C. L. Mallows (1992), \\Buying with Exact Con dence"," The Annals of Applied Probability 2 , 3, 752{765. DeMillo, R. (1995), \Keynote Address," In Fourth Bellcore/KPN/Purdue Workshop on Issues in Software Reliability , Leidschendam, The Netherlands. Ehrlich, W., B. Prasanna, J. Stampfel, and J. Wu (1993), \\Determining the Cost of a Stop-Test Decision"," IEEE Software 10 , 2, 33{42. Farr, W. (1996), Handbook of Software Reliability Engineering, M. R. Lyu, Editor , chapter Software Reliability Modeling Survey, McGraw-Hill, New York, NY, pp. 71{117. Frate, F. D., P. Garg, A. Mathur, and A. Pasquini (1995), \\On the Correleation Between Code Coverage and Software Reliability"," In Proc. Sixth Intl. Symposium on Software Reliability Engineering , Tolouse, France, pp. 124{132. Gauodin, O. (1992), \\Optimal Properties of the Laplace Trend Test for Software-Reliability Models"," IEEE Trans. on Reliability 41 , 4, 525{532. Goel, A. L. (1985), \\Software Reliability Models: Assumptions, Limitations and Applicability"," IEEE Trans. on Software Engineering SE-11 , 12, 1411{1423. Goel, A. L. and K. Okumoto (1979), \\Time-Dependent Error-Detection Rate Models for Software Reliability and Other Performance Measures"," IEEE Trans. on Reliability R-28 , 3, 206{211. Gokhale, S., P. N. Marinos, and K. S. Trivedi (1996), \\Important Milestones in Software Reliability Modeling"," In Proc. 8th Intl. Conference on Software Engineering and Knowledge Engineering (SEKE '96), Lake Tahoe, pp. 345{352. Gray, J. (1986), \\Why Do Computers Stop and What Can Be Done About It?"," In Proc. Fifth Symposium on Reliability in Distributed Software and Database Systems , pp. 3{12. 37

Horgan, J. R. and S. London (1991), \\Data Flow Coverage and the C Language"," In Proc. of the Symposium on Testing, Analysis, and Veri cation , Victoria, British Columbia, pp. 87{97. Horgan, J. R. and S. London (1992), \\ATAC: A Data Flow Coverage Testing Tool for C"," In Proc. of Second Symposium on Assessment of Quality Software Development Tools , New Orleans, Louisiana, pp. 2{10. Horgan, J. R., S. A. London, and M. R. Lyu (1994), \\Achieving Software Quality with Testing Coverage Measures"," IEEE Computer 27 , 9, 60{69. Horgan, J. R. and A. P. Mathur (1992), \\Assessing Testing Tools in Research and Education"," IEEE Software 9 , 3, 61{69. Horgan, J. R. and A. P. Mathur (1996), Handbook of Software Reliability Engineering, M. R. Lyu, Editor , chapter Software Testing and Reliability, McGraw-Hill, New York, NY, pp. 531{566. Horgan, J. R., A. P. Mathur, A. Pasquini, and V. J. Rego (1995), \\Perils of Software Reliability Modeling"," Technical Report SERC-TR-160-P, Dept. of Computer Sciences, Purdue University, West Lafayette, IN. Hossain, S. A. and R. C. Dahiya (1993), \\Estimating the Parameters of a Non-Homogeneous Poisson-Process Model for Software Reliability"," IEEE Trans. on Reliability 42 , 4, 604{612. Hou, R., S. Kuo, and Y. Chang (1997), \\Optimal Release Times for Software Systems with Scheduled Delivery Time Based on the HGDM"," IEEE Trans. on Software Engineering 46 , 2, 216{221. Howden, W. E. (1980), \\Functional Program Testing"," IEEE Trans. on Software Engineering SE-6 , 2, 162{169. Howden, W. E. (1985), \\The Theory and Practice of Functional Testing"," IEEE Software 2 , 5, 6{17. Jacoby, R. and K. Masuzawa (1992), \\Test Coverage Dependent Software Reliability Estimation by the HGD Model"," In Proc. Third Intl. Symposium on Software Reliability Engineering , Texas. Joe, H. (1989), \\Statistical Inference for General-Order-Statistics and Nonhomogeneous-Poisson-Process Software Reliability Models"," IEEE Trans. on Software Engineering 15 , 11, 1485{1490. Kanoun, K., M. R. de Bastos Martini, and J. M. de Souza (1991), \\A Method for Software Reliability Analysis and Prediction Application to the TROPICO-R Switching System"," IEEE Trans. on Software Engineering 17 , 4, 334{344. Kanoun, K., M. Ka^aniche, and J. C. Laprie (1997), \\Qualitative and Quantitative Reliability Assessment"," IEEE Software 14 , 2, 77{86. Kanoun, K. and J. C. Laprie (1994), \\Software Reliability Trend Analysis from Theoretical to Practical Considerations"," IEEE Trans. on Software Engineering 20 , 9, 740{747. Kanoun, K. and J. C. Laprie (1996), Handbook of Software Reliability Engineering, M. R. Lyu, Editor , chapter Trend Analysis, McGraw-Hill, New York, NY, pp. 401{437. 38

Kenney, G. Q. (1993), \\Estimating Defects in Commercial Software During Operational Use"," IEEE Trans. on Reliability 42 , 1, 107{115. Kna , G. and J. Morgan (1996), \\Solving ML Equations for 2-Parameter Poisson-Process Models for Ungrouped Software-Failure Data"," IEEE Trans. on Reliability 45 , 1, 43{53. Leemis, L. M. (1995), Reliability - Probalistic Models and Statistical Methods , Prentice-Hall, Englewood Cli s, New Jersey. Lyu, M. R. (1996), Handbook of Software Reliability Engineering , McGraw-Hill, New York. Lyu, M. R., J. R. Horgan, and S. London (1994), \\A Coverage Analysis Tool for the E ectiveness of Software Testing"," IEEE Trans. on Reliability 43 , 4, 527{534. Lyu, M. R. and A. P. Nikora (1992), \\CASRE-A Computer-Aided Software Reliability Estimation Tool"," In CASE '92 Proceedings , Montreal, Canada, pp. 264{275. Malaiya, Y. K. (1994), \\The Relationship Between Test Coverage and Reliability"," Technical Report CS-94-110, Dept. of Computer Science, Colorado State University, Colorado. Musa, J. D. (1993), \\Operational Pro les in Software-Reliability Engineering"," IEEE Software 10 , 2, 14{32. Musa, J. D., G. Fuoco, N. Irving, D. Krop , and B. Juhlin (1996), Handbook of Software Reliability Engineering, M. R. Lyu, Editor , chapter The Operational Pro le, McGraw-Hill, New York, NY, pp. 167{215. Nikora, A. P. and M. R. Lyu (1995), \\An Experiment in Determining Software Reliability Model Applicability"," In Proc. Sixth Intl. Symposium on Software Reliability Engineering , Tolouse, France, pp. 305{313. Ohba, M. (1984), \\Software Reliability Analysis Models"," IBM Journal Res. Develop. 28 , 4, 428{442. Pasquini, A., A. N. Crespo, and P. Matrella (1996), \\Sensitivity of Reliability-Growth Models to Operational Pro le Errors vs. Testing Accuracy"," IEEE Trans. on Reliability 45 , 4, 531{540. Piwowarski, P., M. Ohba, and J. Caruso (1993), \\Coverage Measurement Experience During Function Testing"," In Proc. 15th Int. Conf. Software Engineering , Baltimore, MD, pp. 287{300. Ramamoorthy, C. V. and F. B. Bastani (1982), \\Software Reliability: Status and Perspectives"," IEEE Trans. on Software Engineering SE-8 , 4, 354{371. Ramsey, J. and V. R. Basili (1985), \\Analyzing the Test Process Using Structural Coverage"," In Proc. 8th Int. Conf. Software Engineering , London, UK, pp. 306{312. Rapps, S. and E. J. Weyuker (1985), \\Selecting Software Test Data Using Data Flow Information"," IEEE Trans. on Software Engineering SE-11 , 4, 367{375. Rosenblatt, M. (1952), \\Remarks on Multivariate Transformation"," Ann. Math Statist. 23 , 470{472. Ross, S. M. (1985), \\Software Reliability: The Stopping Rule Problem"," IEEE Trans. on Software Engi39

neering SE-11 , 12, 1472{1476. Sahner, R. A., K. S. Trivedi, and A. Pulia to (1996), Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package , Kluwer Academic Publishers, Boston. Shooman, M. L. (1984), \\Software Reliability: A Historical Perspective"," IEEE Trans. on Reliability R-33 , 1, 48{54. Trivedi, K. S. (1982), \Probability and Statistics with Reliability, Queuing and Computer Science Applications" , Prentice-Hall, Englewood Cli s, New Jersey. Vouk, M. A. (1993), \\Using Reliability Models during Testing with Non-Operational Pro le"," In Proc. of the Second Bellcore/Purdue Symposium on Issues in Software Reliability Estimation , pp. 103{110. Wong, W. E., J. R. Horgan, S. London, and A. P. Mathur (1994), \\E ect of Test Set Size and Block Coverage on Fault DetectionE ectiveness"," In Proc. of 5th Intl. Symposium on Software Reliability Engineering , Monetery, CA, pp. 230{238. Wong, W. E., J. R. Horgan, S. London, and A. P. Mathur (1995), \\E ect of Test Set Minimization on Fault Detection E ectiveness"," In Proc. of the 17th IEEE Intl. Conference on Software Engineering , Seattle, WA, pp. 41{50. Wood, A. (1996), \\Predicting Software Reliability"," IEEE Computer 29 , 11, 69{77. Xie, M., G. Y. Hong, and C. Wohlin (1997), \\A Practical Method for the Estimation of Software Reliability Growth in the Early Stage of Testing"," In Proc. of Eighth Intl. Symposium on Software Reliability Engineering , Albuqueurque, NM, pp. 116{123. Yamada, S., M. Ohba, and S. Osaki (1983), \\S-Shaped Reliability Growth Modeling for Software Error Detection"," IEEE Trans. on Reliability R-32 , 5, 475{485. Yamada, S. and S. Osaki (1985), \\Software Reliability Growth Modeling: Models and Applications"," IEEE Trans. on Software Engineering SE-11 , 12, 1431{1437.

40

Suggest Documents