Software Reliability: Statistical Modeling & Estimation. Tapan Kumar Nayak. George Washington University, Washington DC. Key Words-Decreasing failure rate, ...
566
IEEE TRANSACTIONS ON RELIABILITY, VOL. R-35, NO. 5, 1986 DECEMBER
Software Reliability: Statistical Modeling & Estimation Both the J-M and Littlewood models share the assumptions of: 1) perfect debugging 2) s-independence of T,, ..., TN, and 3) the marginal distributions of T,,. TN are identical. Since these assumptions are not apKey Words-Decreasing failure rate, Exchangeability, Maximum propriate in some applications, other models have been likelihood, Multivariate Lomax distribution proposed in [3, 9, 10]. Langberg & Singpurwalla [11] unified several commonly used models and found that the Reader AidsJ-M model is central to many of them. This paper inPurpose: Widen state of the art vestigates the possibility of relaxing assumptions 2 and 3 statistics Probability, Special math needed for explanations: and shows that assumption 3 is a consequence of assumpSpecial math needed to use results: Same Results useful to: Reliability theoreticians, Software analysts tion 1. However, the investigation by Crow & Singpurwalla [2] indicates that the assumption of s-independence is Abstract-This paper identifies some essential features of any statistical model for analyzing software failure data. A model for incor- not appropriate in some applications. Section 2 proposes a model that incorporates s-deporating dependence among the detection times of the errors is proposed and several reliability measures are calculated. The maximum likelihood pendence among T1, ..., TN, discusses its usefulness and estimator of the initial number of errors in a software is derived and in- properties, and describes the type of s-dependence that can herent difficulties in estimating the correlation from standard debugging be incorporated in a model that assumes perfect debuggdata are discussed. ing. Several reliability measures are calculated in section 3. Section 4 treats the estimation problem. 1. INTRODUCTION Some standard notation and nomenclature are given in for Readers & Authors" at rear of each "Information Let N be the number of errors in a computer program issue. = and Ti, i 1, ..., N, be the random detection times of these errors. In software reliability, we are concerned with estimating N and suitable reliability measures from the 2. MODEL SELECTION failure data obtained in the testing period. Selection of an appropriate statistical model is important for this purpose Traditionally, most of the statistical models in softand has received some attention in the literature. ware reliability are developed by specifying the probability The most commonly used model for analyzing soft- distribution of the interfailure times V1, ..., VN. Denote the ware failure data originally introduced by Jelinski & detection times of the error by T1, ..., TN. Since the errors Moranda [5] (henceforth J-M), assumes that: 1) when the are unknown, labeling of the errors is arbitrary; hence any software fails, the error causing the failure can be detected joint distribution of T, ..., TN must be exchangeable (perand removed without introducing additional errors, i.e., mutation symmetric). Then, although the transformation the debugging procedure is perfect, and 2) the inter- (T1, ..., TN) - (T(1), ..., TN)) is not one-to-one, the joint failure times, Vi_ T(i) - T(i-,), i = 1, ..., N, are distribution of T(l), ..., T(N) and the exchangeability s-independently exponentially distributed with parameters, uniquely determine the joint distribution of T1, ..., TN: xi (N-i+ I)X, where 0 = T(o) K T(1) K * ( TT(Nf are the ordered failure times and X is unknown. Spreij [15] and Joe & Reid [7] observed that the likelihood under this fTl, ..-TNQ19 * * tN) T N (X(l),) * * t(N))fT 1 Q,.. (2.1) N! model is the same as the likelihood under the assumption that Ti, ..., TN are s-independently exponentially distributed with a common parameter X. Estimation of N Since the transformation {T,} { 1V7 is one-to-one, and X have been discussed in [1, 4, 6]. specifying the joint distribution of {Vi} is equivalent to Littlewood [12] modified the J-M model using a Bayes specifying the joint distribution of { Ti}. Thus from argument, suggesting that X is a random variable with gam- statistical viewpoint, a software reliability model (assumma distribution. Joe & Reid [7] noted that the same ing perfect debugging) is a specification of the joint likelihood can be obtained by assuming that T1, ..., TN are s-independently and identically distributed with Pareto distribution of {Ti}. A theorem of de Finetti states that [14], a necessary type 2 density. This paper pushes the findings of [7] further sufficient and condition for { Ti} to be embedded in an inby arguing that, because of the nature of the uncertainty in of exchangeable random variables is that finite,sequence the problem, the joint distribution of V1, ..., VN (or { Ti} can be written as: the Cdf of joint the of }) uniquely distribution { joint equivalently T(i determine the joint distribution of T,, ..., TN. Then the difN ferences between the J-M and Littlewood models is in the rI (2.2) F(tj, tN) F,(t,) dlj,(O). marginal distributions of the T,. i=l
Tapan Kumar Nayak George Washington University, Washington DC
=
..
...,
=
I
.
0018-9529/86/1200-0566$01.00© 1986 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on March 26, 2009 at 09:42 from IEEE Xplore. Restrictions apply.
567
NAYAK: SOFTWARE RELIABILITY: STATISTICAL MODELING & ESTIMATION
Theorem 2.2. MLN(cx,Ol, ..., ON) is a multivariate So any model for this situation should have the representaL failure rate distribution. decreasing tion (2.2), where { Ti} are positively s-dependent by mixTheorem 2.3. For MLN(a!,Ol, ..., EN), the moment ture [14]. Due to exchangeability: i) the marginal distribution of the Ti must be identical, and ii) El 7Al ... 7' is: Corr{Ti,Tj} = p, for all i * j. If the common linear correlation p is independent of N then p>O, so that the for E ri < a = i F F(~rl) r [F(r1+ covariance matrix of { Ti} is non-negative definite for all N 00 ,otherwise. .1. The practical testing environment tends to increase or decrease the detection rates of all the errors simultaneousFrom theorems 2.1 - 2.3, it follows that for model ly, indicating that the common correlation should be nonnegative. Littlewood [12] justifies that: 1) the common (2.3), the common marginal distribution of the detection marginal distribution should be DFR (decreasing failure times is univariate Lomax with pdf: rate), and 2) the Pareto type 2 distribution (also known oO (2.6) t > 0, as Lomax [8]) is a reasonable and mathematically tractable +Ot)a+I (1 choice. (same as the Littlewood [12] marginals) and if ce > 2, 2.1 A Model for Incorporating s-Dependence El Ti} = 1/[O(a - 1)], Var{ Ti} = CZ/[02(a _1)2(a - 2)], and l/a, for all i * j. A multivariate generalization of the univariate Lomax Corr{Ti,Tj} /= model 2.4 shows the joint distribution of the interreliability software Theorem a useful be would distribution { the }. times among failure Ti. s-dependence Vi for incorporating Theorem 2.4. Let the joint pdf of T1, ..., TN be Model assumptions: MLN(ca,0), and bi-(N- i +1)O, i = 1, ..., N. The joint pdf 1. The debugging process is perfect. ° of V1,. .., VNisMLN(c, &, . . ., 6N). 2. The joint pdf of {Ti} is: Proof. The joint pdf of T(l,), ..., T(N) is:
I)/0p],
MLN(a!,O)
...x(+ N- 1)4>0,
0a(a + 1)
=
[1 +0
E
til+N
0>0,
a >
0.
(2.3)
The pdf in (2.3) is exchangeable and a special case of the multivariate Lomax distribution of Nayak [13] with pdf:
f(t(l1)q
..*
N![O1(a + ) ...t(+]N- 1)
t (N)))
[I
+0
E t(i)]-+
(2.7)
O '< t ( I ) '
v,)
=
( +
a
0,
is:
Authorized licensed use limited to: IEEE Xplore. Downloaded on March 26, 2009 at 09:42 from IEEE Xplore. Restrictions apply.
*r+v)a+r+l (4.2)
569
NAYAK: SOFTWARE RELIABILITY: STATISTICAL MODELING & ESTIMATION
6*
=-
(N
-
r)/(I
6ivi);
+
ao
E /ivi
L
bivi
1+
L
+
a+r
ja+r .
O(N
-
(4 .3)
r
r)tj
E bivi + O(N -
N! Or,(,t + 1)
(N-r)! I +
(a! +
..
O{(N - r)T
a+r
r -1 r
E
+
]
ar
(4.4)
The likelihood of the data obtained under censored sampling (where the software is tested until a fixed number (r) of errors are removed) is (4.1), which is similar to (4.4) except that r is replaced by t(r). The maximum likelihood (ML) estimators of N, a, 0 are found by maximizing (4.4). Such estimators for censored sampling can be derived similarly. For each Nand a, the L(N, a,0) is maximized by: r
o
o(Nr
+
S
t(i)j
.N
(4.5)
Substitute 0 in (4.4):
L(N,a,O)
N!rr(, - 1) ... (a + r--)a"I
=
(N
-
r)!(a
+
r)+rL (N
-
r)T
+
E
N(N-1)...(N -r+) |(N
-
r)T
N(N- 1)... (N r + 1) Tr(N -r + X)r (a + 1)
...
(a +
r
-
=
N!
dOT Fr(al)
A,
Xr
(N- r)!
N-r + t j Xre-X[(N9r)+ti
(4.10)
(4.11)
which is the likelihood under the J-M model. Because of (4.10), given a, 0, L(N, a,0) = L(N| X*) for some X*. Maximizing (4.10) results in finding X)* rather that a and 0. There is an apparent drawback of my model. The common correlation among T1, ..., TN is 1/a, which is also the common correlation among V1, ..., I^, (by theorems 4.3 and 4.4). Since the underlying random (vector) variable is (V1, ..., VN), one observation represents a realization of VI, ..., V, In this application, the data consists of just one realization of V1, ..., V, (r < N), which is not even one complete observation. So I am trying to estimate the common correlation from less than one observation. Hence, we are not surprised that the ML method fails; rather this is anticipated for any model that incorporates s-dependence among the failure times. To estimate the common correlation, we need data from more than one debugging experiment or some extra information. If one of the parameters a and 0 is known, the other parameter can be estimated from standard debugging data. It might be possible to estimate the parameters from ordinary debugging data in a Bayes framework. ACKNOWLEDGMENT I thank the Editors and referees for careful readings of an earlier version of the manuscript and some helpful suggestions for improvement.
[1] S. Blumenthal, R. Marcus, "Estimating population size with exponential failures," J. Amer. Statist. Assoc., vol 70, 1975, pp
r
-
L2(AU)
L(NV X)
e-x/8)ia- I
REFERENCES
E t(i)
+
tM
t(i)
(4.6) In order to find the ML estimators of N and a, we need to maximize:
L1(N)-
I
°°
OrLH (N - i + 1)jae(c + 1) ... (oy + r - 1)
LI+
E
L(N, a,0) = !F: L(NI X)
r1
=
+
which is X under the J-M model. The close connection between the ML estimators of my model and the J-M model lies in the fact that:
r)t
Combine (4.1) and (4.3). The likelihood of the data is:
L(N,a,O)
r),r
(4.9)
9
r
(N
Pr{ Vr+l > t |Vl, ..., Vr} = (1+6*6t) (+r)
I
r
=
1
1)oaa+l/(oz +
p)a+r
(4.8)
with respect to N and oa respectively. Since the ML estimator of N under the J-M model is also obtained by maximizing (4.7) [1, 7], the ML estimates of N under my model and the J-M model are identical. However, L2(a) is an increasing function of a, thus implying c = oo, and hence = 0. Thus, given N, the L(N,a,6) is maximized as -o, 0 0, so that
913-922. [2] L. H. Crow, N. D. Singpurwalla, "An empirically developed Fourier series model for describing software failures," IEEE Trans. Reliability, vol R-33, 1984 Jun, pp 176-183. [3] A. L. Goel, K. Okumoto, "Time dependent error detection rate model for software reliability and other performance measures," IEEE Trans. Reliability, " vol R-28, 1979 Aug, pp 206-211. [4] I. B. J. Goudie, C. M. Goldie, "Initial size estimation for the linear pure death process," Biometrika, vol 68, 1981, pp 543-550. [5] Z. Jelinski, P. M. Moranda, "Software reliability research," in Statistical Computer Performance Evaluation, ed. W. Freiberger, Academic Press, 1972, pp 465-484.
-
Authorized licensed use limited to: IEEE Xplore. Downloaded on March 26, 2009 at 09:42 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON RELIABILITY. VOL. R-35, NO. 5, 1986 DECEMBER
570
[6] H. Joe, N. Reid, "Estimating the number of faults in a system," J. Amer. Statist. Assoc., vol 80, 1985, pp 222-226. [7] H. Joe, N. Reid, "On the software reliability models of JelinskiMoranda and Littlewood," IEEE Trans. Reliability, vol R-34, 1985 Aug, pp 216-218. [8] N. L. Johnson, S. Kotz, Continuous Univariate Distributions, I: Distributions in Statistics, Houghton Mifflin Co., 1970. [9] G. Koch, P. J. C. Spreij, "Software reliability as an application of martingale and filtering theory," IEEE Trans. Reliability, vol R-32, 1983 Oct, pp 342-345. [10] W. Kremer, "Birth-death and bug counting," IEEE Trans. Reliability, vol R-32, 1983 Apr, pp 37-47. [11] N. Langberg, N. D. Singpurwalla, "Unification of some software reliability models," SIAM J. Sci. Stat. Computing, vol 6, 1985, pp 78 1-790. [12] B. Littlewood, "Stochastic reliability-growth: A model for faultremoval in computer-programs and hardware-designs," IEEE Trans. Reliability, vol R-30, 1981 Oct, pp 313-320. [13] T. K. Nayak, "Multivariate Lomax distribution: Properties and usefulness in reliability theory," to appear in J. Appi. Prob., 1987.
EXTENDEDABSTRACT
EXTENDED ABSTRACT
A. Ananda Raja Chari Sri Venkateswara University Post-Graduate Centre, Kurnool Key Words-Steady-state availability, Non birth-death process, Common-cause failures, Single failures, Operating ratio
This paper presents a model for steady-state availability of a system with two components. The system has individual as well as common-cause failures which result ultimately in total failure in an infinitesimal time interval. Such situations commonly occur. Total failure in an infinitesimal interval with positive probability represents the model as non birth-death but rather a Markov process. A single service facility results in a non birth-death Markov process with single server. Thus the following assumptions are made for this model: 1. The components in the system are identically distributed and can fail singly or at the same instant (common-cause). 2. The arrival stream of common-cause failures (CCF) forms a Poisson process with arrival rate, c2X)a; K c2,
AUTHOR Dr. Tapan K, Nayak; Dept. of Statistics/C&IS; George Washington University; Washington, DC 20052 USA Tapan K. Nayak was born in Kalera, West Bengal, India on 1957 February 21. He received a BSc (Stat) from the University of Calcutta, in 1976, an MStat from the Indian Statistical Institute, Calcutta in 1979 and a PhD (Stat) from the University of Pittsburgh, Pittsburgh in 1983. He is an Assistant Professor in Statistics at The George Washington University, Washington, DC. His research interests are in software reliability, categorical data analysis, and statistical modeling.
Manuscript TR 86-030 received 1986 April 4; revised 1986 September 15.
EXTENDEDABSTRACT
A Stochastic Model for Availability Measure with Common-Cause Failures
o
[14] M. Shaked, "A concept of positive dependence for exchangeable random variables," Ann. Statist., vol 5, 1977, pp 505-515. [15] P. J. C. Spreij, "Parameter estimation for a specific software reliability model," IEEE Trans. Reliability, vol R-34, 1985 Oct, pp 323-328.
Following the above assumptions, the paper studies the steady-state availability of the system for both "series"
EXTENDEDABSTRACT
and "parallel" configurations. The steady-state availability is a function of operating ratio values 71 = Xa/ts and chance of occurrence of common-cause failures. The behaviour of steady-state availability of both configurations is studied and established as a function of -q and c1, c2. Steady-state availability is a decreasing and convex function of operating ratio values such that -q < 1, and chance of occurrence of common-cause failures c2. However, for a "parallel" system, the steady-state availability system is a decreasing, concave function for all -q < 1, and a decreasing, convex function for all c2. This paper studies the effect of common-causes hitting the system for both "series" and "parallel" configurations. Thus the percentage of decrease in the availability due to the occurrence of a common-cause failure compared to that of single failures occurring in the system is obtained for both "series" and "parallel" configurations. There is at most a 20Wo decrease in availability due to commoncause failures for a "series" system and about a 33Vo decrease for a "parallel" system, for all -q < 1. The full analysis and derivation are in a separately available Supplement [1]. REFERENCE [1] Supplement: NAPS document No. 04412-C; 19 pages in this Supplement. For current ordering information, see 'Information for Readers & Authors' in a current issue. Order NAPS document No. 04412, 84 pages. ASIS-NAPS; Micro-Publications; POBox 3513, Grand Central Station; New York, NY 10163 USA.
1.
3. The arrival stream of single failures forms a Poisson 1; 04 cl (1. process with arrival rate, cIXa; cl + c2 4. The service takes place singly (ie, single server) and service times of the components are exponentially distributed with repair (hazard) rate t. 5. The system has two components.
EXTENDEDABSTRACT
AUTHOR A. Ananda Raja Chari; Department of Operations and Statistical Quality Control; Sri Venkateswara University Post-Graduate Centre; Kurnool - 518 001 (A.P.), INDIA.
Manuscript TR84-138 received 1984 December 31; revised 1986 August 28.
Authorized licensed use limited to: IEEE Xplore. Downloaded on March 26, 2009 at 09:42 from IEEE Xplore. Restrictions apply.