Confidence Interval Estimation of NHPP-Based ... - Semantic Scholar

2 downloads 0 Views 211KB Size Report
Confidence Interval Estimation of NHPP-Based Software Reliability Models. Liang Yin and Kishor S. Trivedi. Center for Advanced Computing and ...
Confidence Interval Estimation of NHPP-Based Software Reliability Models Liang Yin and Kishor S. Trivedi Center for Advanced Computing and Communications Department of Electrical and Computer Engineering Duke University Durham, NC 27708-0291, USA fliang, [email protected] Phone : (919) 660-5269 Fax : (919) 660-5293

Abstract The software reliability growth models (such as NHPP models) are frequently used in software reliability prediction. Estimation of parameters in these models is often done by point estimation. However, some numerical problems arise and make the actual computation hard, especially for automated reliability prediction tools. Here confidence interval computation in Goel-Okumoto model and S-shaped model is studied. The upper and the lower bounds of the parameters can be obtained. For reliability prediction, we implement a simplified Bayesian approach, which delivers improved results. The bounds on the predicted reliability are also computed. Furthermore, numerical problems encountered in earlier point estimation methods are removed by this approach. Our results thus can be used as an important part of the assessment of software quality.

rameters is often done by point estimation methods, such as MLE. Confidence interval computation is either ignored or obtained by using simple approximations [6] [10]. Second, the prediction of reliability is also in lack of confidence interval analysis. Thus, the accuracy of these models is not guaranteed. To make the software reliability assessment more realistic and trustable, these issues are studied and a Bayesian approach is presented in this paper. The rest of the paper is organized as follows. The next section presents the background of NHPP model and existing problems. In Section 3, we present our approach for parameter estimation and reliability prediction with confidence interval. In Section 4, this method is illustrated in an numerical example. Conclusion is presented in Section 5.

2 Background and motivation 2.1 NHPP model

1. Introduction The computing technology and complexity of software systems have grown rapidly. The increasing pace of changes makes the assurance of software quality a critical concern. Software reliability measurement and prediction in the development process are essential to produce high quality reliable software. Quantitative measures are required to assess the software reliability. Various approaches have been proposed for reliability measurement and prediction. Nonhomogeneous Poisson process models, as a class of Software Reliability Growth Models [10], are extensively used. Goel-Okumoto model [2] and S-shaped model [7] are among them. However, several issues remain unsolved. First, the estimation of the pa This research was supported in part by a Motorola fellowship, by National Science Foundation and Bellcore under a core project to CACC.

Software Reliability Growth Model (SRGM) represents the relationship between the time span of software testing and the number of detected errors as a process of growth in software reliability. This relationship could be described by a counting process. SRGM based on nonhomogeneous Poisson process (NHPP) is most commonly used. Here we discuss two popular NHPP models. Both assume that the initial number of faults in a software product under consideration is a random variable, with mean value a. Goel-Okumoto NHPP Model[2] The Goel-Okumoto model has a strong influence on software reliability modeling. It has a mean value function m(t) described by m(t) = a(1 , e,bt ) (1) and the failure intensity function (t), which is the derivative of m(t), is given as

(t) = abe,bt

(2)

where b is the failure occurrence rate per fault. Delayed S-shaped NHPP Model[7][9] The S-shaped NHPP model is designed to capture the software error removal phenomenon. In this case two phases of testing process can be seen: fault detection and fault isolation. There is a time delay between the actual detection of the fault and its reporting. The mean value function is given by m(t) = a[1 , (1 + bt)e,bt ] (3)

st

served time between (k , 1) and k th failure. Let Sk denote the time to failure k . Thus Sk is given by:

Sk =

(4)

2.2 Motivation We have seen that unknown parameters (a and b) are involved in the models. These parameters need to be estimated to complete the reliability assessment. Software failure data is assumed to provide information for this estimation. Much of the inference work has been carried out by using the maximum likelihood estimation (MLE). However, as we will see in next section, properties of the numerical solution have not been thoroughly investigated. In the early stage of the testing phase, the MLE may not converge to a reasonable value [4] [5]. Hossain et al. [4] proved that under certain conditions, the MLE for Goel-Okumoto model does not exist. Another problem arises in obtaining the confidence bounds. For example, by exploiting the nature of Poisson distribution and assuming the number of faults discovered, n, is very large, the confidence bounds for the MLE of the mean value function are approximately [10]:

q

(5) m t  K m^(t) where K is 100(1 + )=2 percentile of the standard nor-

mt

^( ) =

^( )

mal distribution. However, the assumption of large sample size does not always hold. So we propose a Bayesian approach, to estimate the parameters, to obtain the confidence bounds for the parameters and, even more importantly, for the reliability prediction.

3 Parameter estimation and prediction The models we introduced involve unknown parameters. These parameters need to be estimated from the observed software failure data. Here we assume our data is in the form of interfailure times. The operation on the other form of data, grouped data, is similar. We will study the GoelOkumoto model first. S-shaped model will be next. Let fTk ; k = 1; 2; : : :g denote the sequence of times between successive software failures. Then tk denotes the ob-

k

i=1

(6)

The joint density or the likelihood function of

S1 ; S2 ; : : : ; Sn can be written as [8]:

fS1;S2 ;:::;Sn (s1 ; s2 ; : : : ; sn ) = e,m(sn)

The failure intensity function (t) is given by

(t) = b2 te,bt

Xk T

Yn  s

i=1

(

i)

(7)

where the software failure times s = (s1 ; s2 ; : : : ; sn ), are realizations of the random variables S1 ; S2 ; : : : ; Sn .

3.1 Goel-Okumoto model For Goel-Okumoto model, the log likelihood function is given by

L(sja; b) = n log a + n log b , a(1 , e,bsn ) , b

Xn s i=1

k (8)

To obtain MLE, maximizing Equation (8) with respect to

a and b, we have

n = 1 , e,bsn a

(9)

n n =X ,bsn b i=1 si + asn e

and

(10)

which are solved numerically to give the estimates for a and

b.

The necessary and sufficient condition for Equations (9) and (10) to have finite positive roots is [4]: 2

sn > n , 2

Xs

n,1

i

(11)

1

This means that the MLE is not always guaranteed. Another problem arises when the behavior of the log likelihood function (8) is studied. An example with a data set from Charles Stark Draper Lab which has 97 data points (interfailure time) is shown in Figure 1. As b goes to zero, the value of a that maximizes the log likelihood function goes to infinity according to Equation (9), and the maximum of log likelihood function over the range of a for the given b is decreasing very slowly. Substituting Equation (9) into (8), and taking the limit as b approaches zero, we have:

L(sja; b)

=

n(log n , log (1 , e,bsn )) +

n log b , n , b

Xn s i=1

k

=

n log n , n , b

=

n log n , n , b +

=

Xn s

b

k + n log 1 , e,bsn

i=1 n

Xs

k

i=1

n log

X n,n,b s

! n log n , n , b > ,1 L(sja; b)

n + O(b2 ))

i=1 n

i=1

1

k + n log s

n + O(b)

Xs

cussed above. In order to reduce the ‘long tail’ as observed in p(sja; b) , a prior probability which decreases to 0 as a goes to 1 should be assigned.

p(a; bjs) / p(a; b)p(sja; b) ! 0  constant = 0

n log 1 , (1 , bsb n

a and b, because the behavior of likelihood function dis-

1

k + n log s

n (12)

Note here = logp(sja; b). This shows that the likelihood function p(sja; b), which is in the form of fS1;S2 ;:::;Sn as in Equation (7), is not decreasing to zero. This phenomenon sometimes makes the numerical solution of Equations (9) and (10) unstable, especially in automated tools.

(14)

The contour plot of likelihood function and posterior function of the previous example is shown in Figure 2 and 3. In that example, we choose the prior to be p(a; b) / 1=a. Actually, there may be other choices, as long as the posterior can converge, because the prior distribution basically comes from our knowledge about the uncertainty of the parameters. It would be better if the early prediction of software quality can be obtained and used as a prior. The early prediction may come from software complexity metrics through regression tree model or fault density model [3]. −3

2.5

x 10

2

b

1.5

1

0.5

0

50

100

150

200

250

300 a

350

400

450

500

550

600

Figure 2. Contour plot of relative likelihood function, Goel-Okumoto model Figure 1. Relative likelihood function, GoelOkumoto model To solve the above problems, we propose a Bayesian method simply by assigning a prior probability to the parameters a and b describing our uncertainty about them. The posterior probability p(a; bjs) could be derived from

p(a; bjs) / p(a; b)p(sja; b) =

p(a; b)e,a(1,e,bsn )

=

,a(1,e,bsn )

p(a; b)e P n an bn e,b i=1 si

Yn abe,bs

i

With the posterior probability of the parameters, it is easy to obtain the reliability prediction by simulation. The reliability of the software is given by

R(tjsn ) = e,(m(t+sn ),m(sn))

where sn is the time of the last failure and t is the time measured from the last failure. At any time t, the probability density function of R is given by

p(Rjs) =

i=1

(13)

where p(a; b) is the prior probability, and p(sja; b) is the likelihood function. We could not assign uniform prior to

(15)

Z

a;b

e,(m(t+sn ),m(sn )) p(a; bjs)dadb

(16)

And this integration can be done by Monte Carlo method [1]. The confidence interval can be obtained without making the normal approximation (5) and the assumption of large failure data size in Section 2.

−3

2.5

x 10

2

b

1.5

1

0.5

0

50

100

150

200

250

300 a

350

400

450

500

550

600

Figure 3. Contour plot of posterior density function p(a; bjs) under 1=a prior, GoelOkumoto model

Figure 4. Relative likelihood function, Sshaped model

4 Numerical application 3.2 S-shaped model The operation for S-shaped model is similar to the GoelOkumoto model. For S-shaped model, the log likelihood function is given by

L(sja; b)

=

,a(1 , (1 + bsn)e,bs +2

n log b +

X n

i=1

log

n)

+

n log a

Xn s ,b s i

i=1

i (17)

To obtain MLE, maximizing Equation (17) with respect to a and b, we have

and

n = 1 , (1 + bs )e,bsn n a

(18)

n n = abs 2 e,bsn + X si n b i=1

(19)

2

which are solved numerically to give the estimates for a and

b.

The difference between S-shaped model and GoelOkumoto model is that there is no limit condition for the convergence of numerical computation based on Equation (18) and (19). Besides, the behavior of the log likelihood function (8) is also studied. The same example with 97 data points (interfailure time) is shown in Figure 4. We can see it is well-behaved. The ‘long tail’ does not appear here. We could still apply Bayes method to obtain the confidence interval for parameters and reliability prediction. The procedure is similar to the Goel-Okumoto model.

A numerical example is illustrated here. The data set is from Charles Stark Draper Lab. The number of faults discovered in the testing phase is 97. The input data consists of the 97 interfailure times. Figure 1 shows the relative likelihood function of the Goel-Okumoto model. The MLE of a and b can be computed as a ^ = 150, ^ b = 0:001. However, the ’long tail’ which does not decrease to 0 can be seen on the contour plot Figure 2. This phenomenon makes it hard to compute the MLE in an automated tool and to obtain the confidence interval under the uniform prior assignment. As seen in Figures 5 and 6, a big portion of the marginal probability is distributed into large a and small b, which deviate the MLE of a and b too far. A prior probability density function p(a; b) / 1=a is suitable in this case. The contour plot of the posterior density function p(a; bjs) is shown in Figure 3. Compared with Figure 2, we can see this posterior is well behaved. The marginal probabilities of a and b are plotted in Figures 7 and 8. The 90% confidence intervals for a and b are (122,378) and (0.00041,0.0019) respectively. The reliability prediction from the MLE and the posterior mean are compared in Figure 9. And the 90% confidence interval of the reliability prediction is also shown there. The data is then fed into the S-shaped model. The well-behaved relative likelihood function is shown in Figures 4 and 10. The Bayesian approach is similar to the Goel-Okumoto model. To be consistent, we still use the prior p(a; b) / 1=a as we did in the Goel-Okumoto model, although the uniform prior works well in this model. The marginal posterior probabilities are plotted in Figures 11 and 12. The confidence intervals are (92,134) and (0.0043,0.0061) for a and b respectively. The reliability prediction and the 90% confidence interval are shown in Figure 13.

−3

7

−3

x 10

9

posterior density of a, assuming 1/a prior

x 10

8 6 7 5 6 4

5

4

3

3 2 2 1 1

0

0

100

200

300 a

400

500

600

Figure 5. Marginal posterior probability density function of a under uniform prior assumption, Goel-Okumoto model

0

0

100

200

300 a

400

500

600

Figure 7. Marginal posterior probability density function of a under 1=a prior, GoelOkumoto model posterior density of b, assuming 1/a prior 900

900 800

800 700

700

600

600

500

500

400

400

300

300

200

200

100

100

0

0

0.5

1

1.5 b

0

0

0.5

1

1.5 b

2

2

2.5 −3

x 10

2.5 −3

x 10

Figure 6. Marginal posterior probability density function of b under uniform prior assumption, Goel-Okumoto model

Figure 8. Marginal posterior probability density function of b under 1=a prior, GoelOkumoto model

References 5 Conclusion

In this paper we investigated several problems in the estimation of parameters for two NHPP models. One problem is related to the numerical computation of MLE in GoelOkumoto model. A second problem is about the confidence interval of the estimation and reliability prediction. By a Bayesian method, we could solve both these problems. We can obtain the confidence bounds for the reliability prediction at any future time t, which is essential to the software reliability assessment.

[1] B. P. Carlin and T. A. Louis. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, 1996. [2] A. L. Goel and K. Okumoto. Time-dependant errordetection rate models for software reliability and other performance measures. IEEE Trans. on Reliability, R28(3):206–211, August 1979. [3] S. Gokhale and M. R. Lyu. Regression tree modeling for the prediction of software quality. In Proc. of ISSAT’97, pages 31–36, Anaheim, CA, March 1997. [4] S. A. Hossain and R. C. Dahiya. Estimating the parameters of a non-homogeneous poisson-process model for software reliability. IEEE Trans. on Reliability, 42(4):604–612, 1993. [5] G. Knafl and J. Morgan. Solving ML equations for 2parameter poisson-process models for ungrouped software-

reliability predictions 1

by MLE by Posterior Mean 90% bound − upper 90% bound − lower

0.9

0.8

0.7 posterior density of a, assuming 1/a prior 0.035

R(t|sn)

0.6

0.5 0.03

0.4 0.025

0.3

0.2

0.02

0.1 0.015 0

0

10

20

30

40

50 t

60

70

80

90

100 0.01

Figure 9. Reliability prediction R(tjsn ) with 90% interval, Goel-Okumoto model

0.005

0

0

50

100

150

200 a

250

300

350

400

0.01

Figure 11. Marginal posterior probability density function of a under 1=a prior, S-shaped model

0.009

0.008

0.007

b

0.006

0.005

0.004

0.003

0.002

0.001

0

50

100

150

200 a

250

300

350

400

posterior density of b, assuming 1/a prior

Figure 10. Contour plot of relative likelihood function, S-shaped model

700

600

500

[6] [7] [8]

[9]

[10]

failure data. IEEE Trans. on Reliability, 45(1):42–53, March 1996. W. Nelson. Applied Life Data Analysis. John Wiley & Sons, 1992. M. Ohba. Software reliability analysis models. IBM J. Res Develop, 28(4):428–442, July 1984. K. S. Trivedi. Probability and Statistics with Reliability, Queuing and Computer Science Applications. Prentice Hall, 1982. S. Yamada, M. Ohba, and S. Osaki. S-shaped reliability growth modeling for software error detection. IEEE Trans. on Reliability, R-32(5):475–485, December 1983. S. Yamada and S. Osaki. Software reliability growth modeling: Models and applications. IEEE Trans. on Software Engineering, SE-11(12):1431–1437, December 1985.

400

300

200

100

0

0

0.001

0.002

0.003

0.004

0.005 b

0.006

0.007

0.008

0.009

0.01

Figure 12. Marginal posterior probability density function of b under 1=a prior, S-shaped model

reliability predictions 1

by MLE by Posterior Mean 90% bound − upper 90% bound − lower

0.9

0.8

0.7

R(t|sn)

0.6

0.5

0.4

0.3

0.2

0.1

0

10

20

30

40

50 t

60

70

Figure 13. Reliability prediction 90% interval, S-shaped model

80

90

100

R(tjsn ) with

Suggest Documents