How Much Testing is Enough? Applying Stopping Rules to Behavioral Model Testing T. Chen 1 , M. Sahinoglu 2, A. von Mayrhauser 3 , A. Hajjar 1 , Ch. Anderson 3 1 Department of Electrical Engineering Colorado State University Fort Collins, CO 80523 970-491-6574 (ph) 970-491-2249 (fax) fchen,
[email protected] Department of Statistics Case Western Reserve Cleveland, OH 44106 216-368-6013 (ph) 216-368-0252 (fax)
[email protected] 2
Department of Computer Science Colorado State University Fort Collins, CO 80523 970-491-7016 (ph) 970-491-2466 (fax) fanderson,
[email protected] 3
A. von Mayrhauser behavioral model testing Compound Poisson, eort-domain, empirical Bayesian analysis, negative binomial distribution(NBD), Poisson LSD, stopping rule, testing strategy. Contact author: Key Words:
1
How Much Testing is Enough? Applying Stopping Rules to Behavioral Model Testing Abstract
Testing behavioral models before they are released to the synthesis and logic design phase is a tedious process, to say the least. A common practice is the test-it-to-death approach in which millions or even billions of vectors are applied and the results are checked for possible bugs. The vectors applied to behavioral models include functional vectors, but the signi cant amount of the vectors are random in nature, including random combinations of instructions. In this paper, we present and evaluate a stopping rule that can be used to determine when to stop the current testing phase using a given testing technique, and move on to the next phase using a dierent testing technique. We demonstrate the use of the stopping rule on two complex VHDL models that were tested for branch coverage with 4 dierent testing phases. We compare savings and quality of testing both with and without using the stopping rule.
1 Introduction With the advent of VLSI technology and increasing complexity of VLSI chips, ensuring correctness of behavioral models before they are released to the synthesis and logic design phase is essential to achieve high quality product with a tight time-to-market target. The current state of testing complex behavioral models for bugs is a tedious process. Millions or even billions of vectors are applied to a behavioral model hoping to nd bugs before the shipping deadline arrives. This test-it-to-death approach is wasteful and does not direct limited resources eectively to potential buggy portions of a model. Typically, vectors applied to behavioral models include functional vectors, but the signi cant amount of the vectors are random in nature, including random combinations of instructions. Therefore, a typical test process steps through several testing phases using dierent types of vectors. For example, a test ow may use functional vectors followed by random vectors, and repetitions of such a sequence using dierent functional vectors and dierent seeds for random number generation. Stopping rules can be used in each of these testing phases to save testing time and costs while still maintain the expected quality of the model. Stopping rules are guided by a set of testing criteria. Well known testing criteria commonly used in software testing include statement coverage and branch coverage. Since testing can only show the presence of bugs, not their absence, testing criteria are one way of stating how much needs to be tested. The idea is that when a behavioral model has been tested against a set of given testing criteria, it has been adequately tested. Unfortunately, it is not always clear whether it will be possible to ful ll testing criteria. For example, when testing against coverage criteria, it may not be obvious whether or not further coverage can be achieved with a given test generation technique. In practice, designers, when testing behavioral models for bugs, switch test strategies when testing yield saturates. The question they must answer is how to determine the right point at which to stop the current technique and switch to a new technique, or stop the overall testing all together. Beyond that is the issue of what order dierent testing techniques are applied. Two factors will in uence this: 1. cost, which consists of cost of generating tests and cost of validation, 2. yield, which consists of coverage elements or bugs found during a test. This paper presents a stopping rule for determining viability of continuing a testing technique for increasing coverage. Section 2 summarizes existing approaches to this problem that range from hypothesis testing to Markov processes. Section 3 explains the use of a Compound Poisson Process as a Bayesian Stopping Rule. In Section 4 we evaluate the proposed stopping rule on two complex VHDL models. The concluding remarks are given in Section 5.
2 Existing Approaches A common approach to testing a behavioral model for bugs involves a simple strategy of run the same test suite on the behavioral model against a reference model which is often written in C or C++. A set of functional patterns (diagnostics) and random combinations of machine instructions are applied to both models. The outputs from 2
both models are then compared to nd bugs. This process is continued until or even beyond the behavioral model is released for synthesis and logic design. Testing using this approach is typically indiscreminate in nature, although a limited coverage measures such as statement coverage have been used to monitor the eectiveness of the test. One example of such an approach is the veri cation of UltraSPARC microprocessor [1] where more than 5 billions of tests were applied. Using stopping rules in behavioral model testing can save a lot of wasteful simulation cycles while still maintain the quality goals of the test process. Deriving stopping rules for testing software has been an active research topic in software engineering. Howden [6, 7] used a simple binomial distribution to determine probability of nding another coverage element and an associated con dence interval. It is assumed that test runs are independent of each other. Testing stops when probablity of nding another coverage element is low enough and associated con dence level high enough. Cost of testing is not part of the model. Other approaches similar to [6, 7] include work done by Hamlet [5] and by Schouwen [16]. Various statistical testing techniques have been used to determine the number of test cases needed to achieve a particular test objective [17, 18]. Stopping rules were used to attain a given reliability criterion [8, 21]. Poore et al. [9, 20] use statistical testing based on a usage model. Their methods model usage process and testing process as Markov chains, assuming that the current state (usage as well as failure behavior) completely determines the next state. The model is more accurate than others, because it explicitly models states in the software. On the other hand it assumes memoryless behavior which may not always be the case, thus limiting the model's usability. Lastly, through appropriate replacement of failure events in the models by `new coverage' events, one could apply software reliability growth models [19]. The interpretation of failure versus `new coverage' requires some thought. If one maps, for example, each branch in the coverage model into a fault in the reliability sense, the total number of \faults" is known (i. e. the number of branches). Covering a branch in the resulting coverage model is then analogous to a failure in the reliability model. However, many models assume one failure at a time, while it is quite common to cover more than 1 branch at a time. Thus, coverage shows a certain clumping eect [10]. This limits the applicability of some of the reliability growth models.
3 Statistical Methodology For Bayesian Stopping Rule 3.1 Background and Motivation
An ecient and economical stopping rule using empirical-Bayesian principles for the Poisson counting process compounded with logarithmic series distribution(LSD) was derived and satisfactorily applied to time-domain software testing in [10, 18].The resulting compound distribution is also known as a Negative Binomial distribution(NBD) provided a certain underlying assumptions (cf. Equations (4) and (5) below). This empirical-Bayesian estimator is further applied to the problem of testing of software with a series of testing techniques or strategies. It is assumed that software coverage items cluster when tests are executed and that they are true-contagious in that occurrence of one coverage item adversely aects the occurrence of some others. This phenomenon is often observed in software testing practice. For example, the control structure of the program may, upon execution of some branches, prevent the execution of others. This is why the clump size distribution is assumed to be LSD while the distribution of the count of testing incidents (or test-cases) is Poisson. Then, the distribution of the total number of clumped coverage items is compound Poisson [11, 12, 13, 14]. In our case, it is a Poisson LSD i.e. NBD given the assumption of Equations (4) and (5). We also have to replace failure events ( nding bugs) in the model by coverage events. A binary Markov process model [2] for random testing is able to consider relationships between consecutive test cases in terms of a correlation factor. Again, if we want to use it for coverage rather than failure modeling, we have to reinterpret failures as coverage events. Then one can use the model to determine an upper con dence bound for not obtaining any more coverage elements. For simplicity, we use branch coverage as the coverage event. In the analysis, we group sets of test inputs into a test suite. This grouping is likely to be problem speci c (e. g. execution of a feature) or code speci c (e. g. the algorithm needs to process a certain number of inputs in each step). We check for coverage increase at the end of each group of inputs. For each checkpoint in time, either the model satis es a desired level of coverage, or model testing with the current strategy is permitted to continue. Finally, one reaches a point in time when testing with the current strategy is stopped, because coverage 3
has saturated. Then the question is \how do we recognize the point at which we should not use the current testing strategy any longer". Although there may still be a number of uncovered branches that might be executed by the current strategy, the chances of nding them within a reasonable time may be so small that it is not economically feasible to continue testing [10] with current strategy. Thus, the objective is to nd the optimal stopping point for the current testing strategy. The answer is based on the quality goal and the cost constraints. Costs should include cost for test generation, execution, and validation. So, for example, a functional test may be more expensive to generate, but results might be less expensive to validate than random test generation. In our stopping rule we assume that the number of interruptions over time or eort is distributed as Poisson, and the number of coverage items that occur as a clump at an epoch or incident is distributed according to a true-contagious logarithmic series distribution where the coverage items within a clump adversely aect and correlate with each other. A Poisson distribution compounded by a logarithmic-series will be denoted as a Poisson LSD, namely a Negative Binomial distribution (NBD) given the assumptions of Equations (4) and (5).
3.2 Notation and Formulation
Given a compound Poisson process [11, 12, 13],
fX (t); t 0g; X (t) =
XW
N (t) i=1
i
(1)
where W1 ; W2 ; : : : are independent, identically distributed random variables with f (W = w) LSD (Logarithmic series distribution) w f (W = w) = ; w = 1; 2; : : : (2) w
where
1 (3) ln(1 ? ) Then, fX (t); t 0g is a Poisson^LSD process when N (t) Poisson() and wi LSD for i = 1; 2; : : :. However, if we let = ?k ln(1 ? ) = k ln q (4) where 1 (5) q= 1? =?
and k > 0, then the Poisson^LSD process is a negative binomial distribution (NBD). Since 0 < < 1, we let the prior p.d.f. of the r.v. be a Beta(; ) p.d.f. with h(q) =
?()?( ) ?1 (1 ? ) ?1 ; 0 < < 1; ; > 0 ?( + )
(6)
and, therefore, after a series of algebraic derivations, the posterior distribution of (jx) is found to be as follows: h(q) =
?( + x)?( + k) +x?1 (1 ? ) +k?1 ?( + + x + k)
(7)
This is the well known Beta distribution as in Equation(6), h(jx) = Beta( + x; + k)
(8)
With respect to the squared error loss function L(; t) = ( ? t)2, the Bayes estimate of is given by the expected value of the posterior distribution of : +x (9) E (jx) = + +x+k 4
We know that the expected value of the r.v. X , which is a NBD (Negative Binomial distribution), is given as follows E (X ) = kp (10) Using Equation (5) for p = 1 ? q, E (X ) = k (11) 1? If one substitutes the Bayes estimate of r.v. from Equation (9) into Equation (11), +x E (X ) = k +k
(12)
Therefore = ?k ln(1 ? ) = k ln q and thus k = lnq from Equation (4) can be approximated recursively by a new Equation (13) below, when the Bayes estimate of Equation (9), is entered for in Equation (4). + +x+k ek = (13) +k Note that in Equation (13), k can be recursively calculated after reaching convergence by applying a nonlinear solution algorithm, such as Newton-Raphson. and are given constants. At each step, i, we measure the following data: wi represents the new coverage items discovered in step i. xi is the measured cumulative coverage at step i (associated with random variable X). The associated distributions are f (W = w) = LSD (Logarithmic series distribution) and f (X ) = Poisson^LSD or NBD when Equation (4) is assumed.
3.3 An Algorithm for a Stopping Rule
For sequential steps i = 1; 2; 3; : : :, i denotes the testing interval in terms of groups of test inputs applied. If the expected incremental dierence between sequential steps i = 1; 2; 3; : : : exceeds a given economic criterion \d", we continue testing with the current strategy. Otherwise we switch to another testing strategy. Below is the one-step-ahead formula, using Equation (12) that represents this decision, e(x) = E (Xi+1 ) ? E (Xi ) d
(14)
Equation (14) can be rearranged into Equation (15) by utilizing Equation (12), + xi + xi+1 ? ki d (15) e(x) = ki+1 + ki+1 + ki where ki are approximated at each step by Equation (13) given and , Xi from the input data set for each group (interval) of test inputs i is the accumulated amount of coverage. Also, d=
c a?b
(16)
If for the ith unit interval beginning at time t or for test-group i, the expected cost of stopping is greater than or equal to the expected cost of continuing, i.e., if aE (Xi+1 ) bE (Xi ) + c
(17)
then it is economical to continue testing for the next group of test inputs. On the other hand, if the expected cost of stopping is less than the expected cost of continuing (when the inequality sign is reversed), it is more economical to stop testing with the current strategy. aE (Xi+1 ) < bE (Xi ) + c
(18)
If we were to stop at interval or test-group i, we assume that the cost of coverage items as yet uncovered is \a" per coverage item. Thus, there is an expected cost over the interval fi; i + 1g of aE fXi g for stopping at a discrete 5
Lines Of Code Branches Control Bits Data Bits Process Blocks Hierarchy
sys7 model 8251 model 3785 3113 591 207 7 11 62 8 62 3 5 1
Table 1: Characteristics of the sys7 and 8251 models. time t = i or test-group i. If we continue testing over the interval, we assume that there is a xed cost of \c" for testing, and a variable cost of \b" related to the uncovered coverage elements. Note that usually \a" is larger than \b" (we are assuming that subsequent testing strategies will be more expensive than the current one. This may be for two reasons: (1) increasing coverage becomes progressively harder regardless of the technique, (2) the technique itself may be more costly). Thus, the expected cost for the continuation of testing for the next time interval or test-case is bE (Xi ) + c. This cost model is similar to that expressed in [3]. However, such estimates depend on the history of testing, which implies the use of empirical-Bayes decision procedures as described above and in the \statistician's reward" or \secretary" problem of the optimal stopping chapter where a xed cost \c" per observation is considered [4]. A cross section of stopping rules are also given in the literature by Yang [21].
4 Experimental Results We used two VHDL models to examine the eectiveness of the proposed stopping rule. One model is an image processing chip, referred to as the sys7 model. It contains 3785 lines of code with 591 branches. The other model is an Intel 8251 microcontroller chip, referred to as the 8251 model. It contains 3113 lines of code with 207 branches. Both models are highly sequential in nature, making it more dicult to thoroughly test them. Table 1 summarizes the characteristics of both models. In our experiments, we also followed a common practice that functional vectors (diagnostic routines) were applied rst followed by a large amount of random vectors. Due to their highly sequential in nature, random vectors were applied and held for a varying amount of clock cycles, emulating loading data through a chain of storage elements. The goal of the experiments is to determine whether we can achieve high branch coverage with a minimum amount of test vectors using the stopping rule and compare the branch coverage to that achieved by simply applying a substantial more amount of vectors without using any stopping rules. For the sys7 model, the original (extended) test procedure without using any stopping rules, in the order of execution, consists of 1. a set of 283 functional vectors which correspond to loading and comparing a truck image with three dierent templates already stored in the chip, 2. 5000 random vectors each of which was held for 7 clock cycles considering the sequentiality of the circuit, 3. 1000 random vectors each of which was held for 4 clock cycles, and 4. 5000 random vectors each of which was held for 2 clock cycles. Each set listed here is treated as a separate testing strategy (technique). Similarly, for the 8251 model, the original test procedure consists of 1. a set of 1500 functional vectors which correspond to testing the asynchronous receive mode with 5 bits per character using even, odd and no parity 16x baud rate. 2. 10000 random vectors each of which was held for 1 clock cycle, 6
Test Case # patterns Branch Coverage Functional 283 88.66 % Random H=7cc 5000 92.55 % Random H=4cc 1000 92.55 % Random H=2cc 5000 92.72 % Table 2: Test Coverage Results for sys7 without Stopping Test Case # patterns Branch Coverage Functional 1500 20.28 % Random H=1cc 10000 39.13 % Random H=2cc 10000 68.60 % Random H=4cc 5000 78.74 % Random H=6cc 5000 80.19 % Table 3: Test Coverage Results for 8251 without Stopping 3. 10000 random vectors each of which was held for 2 clock cycles, 4. 5000 random vectors each of which was held for 4 clock cycles, and 5. 5000 random vectors each of which was held for 6 clock cycles. Tables 2 and 3 show the number of test patterns applied and the cumulative branch coverage at the end of applying the original test sequence for both sys7 and 8251 models. Figures 1 and 2 show the progress in increasing branch coverage on a log scale for both models. (for the number of tests applied in the original test sequence). The Original Runs
90
85
Branch Coverage
80
75
70
65
60
55
50
0
10
1
10
2
10 Patterns
3
10
4
10
Figure 1: Branch Coverage Progress for sys7 without Stopping Rule We then applied the proposed stopping rule to each testing strategy to see whether we could reduce the number of tests without sacri cing coverage. The stopping rule parameters used in our experiments for both models were listed in Table 4. This amounts to a conservative set-up, that is, the stopping rule is trying not to miss any branches and is likely to stop later, rather than sooner. The test coverage data represent a situation where contribution of each group of input patterns to 7
The Original Runs 90
80
% Branch Coverage
70
60
50
40
30
20
10
3
4
10
10 Patterns
Figure 2: Branch Coverage Progress for 8251 without Stopping Rule Parameters values d 0:2E ?6 c 1E ?6 a 6 b 1 8 2 Table 4: Stopping rule parameters used in the experiments increasing coverage is uneven, leading to a left-skewed Beta distribution. This is represented by our choice of and . Tables 5 and 6 summarize the results for the test strategies for both models when the stopping rule is applied. Figures 3 and 4 show the progress in increasing branch coverage on a log scale for both models (for the number of tests applied). Several interesting results emerged.
for sys7, the stopping rule clearly identi ed that the functional testing strategy should have been continued
and was abandoned too quickly. This is expected because the rst test strategy often exercise more branches with a few vectors. When going to the second strategy, the stopping rule suggested after only 229 patterns that random patterns with a clock cycle of 7 were no longer a promising strategy to increase coverage, thus saving 4,771 inputs compared to the initial strategy. Coverage with the stopping rule is lower, missing 17 branches. The third strategy was considered nished after 432 inputs, saving 568 inputs. At this point, we still have less coverage using the stopping rule, but are only missing 12 branches. In a sense, we have Test Case # patterns Branch Coverage Functional 283 88.66 % Random H=7cc 229 89.68 % Random H=4cc 432 90.52 % Random H=2cc 448 91.03 % Table 5: Coverage Results for sys7 using Stopping Rule 8
Test Case # patterns Branch Coverage Functional 1500 20.38 % Random H=1cc 8400 39.13 % Random H=2cc 3600 52.62 % Random H=4cc 1875 69.57 % Random H=6cc 1750 71.98 % Table 6: Coverage Results for 8251 using Stopping Rule The Runs with Stopping Points 90
85
80
Branch Coverage
75
70
65
60
55
50
0
10
1
2
10
3
10
10
Patterns
Figure 3: Branch Coverage Progress for sys7 with Stopping Rule The Runs with Stopping Points 90
80
% Branch Coverage
70
60
50
40
30
20
10
3
4
10
10 Patterns
Figure 4: Branch Coverage Progress for 8251 with Stopping Rule `caught up" a little. Note that the 1,000 test patterns in phase 3 of the case without stopping rule were not able to achieve a coverage increase, while phase 3 with the stopping rule did and now includes some of the coverage it failed to achieve in phase 2 compared to the case not using a stopping rule. The last test strategy ends with 10 fewer branches covered. In all, applying the stopping rule saved 9,891 test inputs. Thus we achieved slightly lower coverage using only 13% of the original test cases (cycles). 9
for 8251, the advantages to use the stopping rule is not as great as that for sys7. The stopping rule also
identi ed that the functional testing should have continued. During the phase of the second test strategy, the stopping rule suggested a saving of 16% vectors with no coverage loss. Then going to the third and forth test strategies, the stopping rule was able to save more than 60% the vectors with a loss of 19 branches (11%). The coverage using the stopping rule got a little compensated during the phase of the last test strategy. The nal result ended up with a saving of 57% of the test cases (cycles) with missing about 8% branches. We do not suggest that the number of vectors and clock cycles used for this experiment is the complete set of tests applied to the 8251 model. The actual number of vectors used for a complete test is more than what we used here. Our purpose of cutting it shorter is to make the simulation more manageable while still prove that the stopping rule is useful in determining the right point to end a test phase and switch to and dierent test strategy.
While this did not happen in our experiment, the possibility of achieving higher coverage with stopping than without deserves some explanation. Consider that coverage is a function not only of the input values, but also of the internal state of the program. Thus, stopping as we did, before all inputs were executed and switching to the next strategy may nd the behavioral model in a dierent state and cause it to execute dierent parts of the model for the inputs of the new test strategy than if all tests of the prior strategy had been executed. In our case, stopping was bene cial. However, the internal state of the model did not completely make up for coverage lost due to stopping. While the results are encouraging, several issues must be considered for this technique to show good results in practice. First, the results might depend on the particular seed used to generate the random data (although we don't expect to see large dierences). Second, how one groups the inputs for stopping rule analysis matters. We selected the group sizes that made sense from a model perspective. Other models will likely need dierent group sizes. At this point, we plan to investigate the issue of determining appropriate group sizes for the stopping rule application both model driven and coverage data driven. Third, the choice of parameters for the stopping rule is important. The smaller d is, the more conservative the decision. For example, in phase 4 of testing for sys7 with the stopping rule, if we had used d = :2E ?3 , the stopping rule would have told us to stop at 272 test patterns instead of 448. On the other hand, the more stringent stopping rule might miss branches.
5 Conclusion This paper presented a statistical stopping rule that can be used to determine when to switch testing strategies for increased code coverage or stop testing behavioral models all together. We applied the technique to two complex VHDL models and found that using a stopping rule yields huge improvements in testing eciency in one case and good improvements in the other case. In the future, we would like to investigate the viability of this technique for more behavioral models with a variety of structures and characteristics, and a variety of other testing techniques, including automated test generation. Further, how tests are grouped for the analysis is an important part of using the stopping rule. We plan to investigate ways to systematically determine a good grouping size.
Acknowledgement
This work was supported in part by the National Science Foundation through grant MIP-9628770.
References [1] J. Gateley, \Verifying a Million Gate Processor", Integrated System Design, pp.18{24, Oct. 1997. [2] S. Chen, S. Mills, \A binary markov process model for random testing", IEEE Transactions on Software Engineering, vol. 22, no. 3 (March 1996), p. 218 { 223. 10
[3] S.R. Dallal and C.L. Mallows, "When Should One Stop Testing Software", J. Am. Stat. Assn., 83, 872-879, 1988. [4] M.H. DeGroot, Optimal Statistical Decisions, Mc Graw Hill, New York, 1970. [5] D. Hamlet, J. Voas, \Faults on its sleeve: amplifying software reliability testing", Procs. International Symposium on Software Testing and Analysis, Cambridge, MA, August 1993, p. 89 { 98. [6] W. E. Howden, \Systems testing and statistical test data coverage", Procs. COMPSAC 97, Washington, D. C., August 1997, p. 500 { 505. [7] W. E. Howden, \Con dence-based reliability and statistical coverage estimation", Procs. International Sympoisum on Software Reliability Engineering, Computer Society Press, Nov. 1997, p. 283 { 291. [8] B. Littlewood and David Wright, "Some Conservative Stopping Rules for the Operational Testing of SafetyCritical Software", IEEE-TSE, Vol.23, No.11, Nov. 1997. [9] J. H. Poore, G. H. Walton, C. J. Trammell, \Statistical testing based on a software usage model", Software Practice and Experience, January 1995. [10] P. Randolph, M. Sahinoglu, "A Stopping Rule for a Compound Poisson Random Variable", Applied Stochastic Models & Data Analysis, Vol. 11, 135-143, June 1995. [11] M. Sahinoglu, "The Limit of Sum of Markov Bernoulli Variables in System Reliability Evaluation", IEEE Trans. Reliability, Vol.39, 46-50, April 1990. [12] M. Sahinoglu, "Negative Binomial Density of the Software Failure Count", Proc. Fifth Int. Symp. Computer and Information Sciences(ISCIS), Vol. 1, pp. 231-239, October 1990. [13] M. Sahinoglu, "Compound Poisson Software Reliability Model", IEEE Trans. Software Engineering, Vol. 18, 624-630, July 1992. [14] M. Sahinoglu, U . Can, "Alternative Parameter Estimation Methods for the Compound Poisson Software Reliability Model with Clustered Failure Data", Software Testing Veri cation and Reliability, Vol. 7, No.1, pp. 35-57, March 1997. [15] M. Sahinoglu, A.K. Alkhalidi, (1997) "A Compound Poisson LSD Stopping Rule for Software Reliability", 5th World Meeting of ISBA, Satellite Meeting to ISI-97, Istanbul, August 1997. [16] A. J. van Schouwen, D. L. Parnas, S. P. Kwan, \Evaluation of Safety-critical software", Communications of the ACM, vol. 33, no. 6(June 1990), p. 636 { 648. [17] P. Thevenod-Fosse, H. Weselynck, \An investigation of statistical software testing", Journal of Software Testing, Veri cation, and Reliability, vol. 1, no. 2(July-Sept. 1991), p. 5 { 25. [18] P. Thevenod-Fosse, H. Weselynck, Y. Crouzet, \Software statistical testing", In: Predictably Dependable Computing Systems, Eds. B. Randell, J.-C. Lapri e, H. Kopetz, B. Littlewood, Springer Verlag, 1995, p. 253 { 272. [19] M. Trachtenberg, \A general theory of software reliability modeling", IEEE Transactions on Reliability, vol. 39, no. 1(April1990), p. 92 { 95. [20] J. A. Whittaker, M. G. Thomason, \A markov chain model for statistical software testing", IEEE Transactions on Software Engineering, vol. 20, no. 10(October 1994), p. 812 { 824. [21] Mark C.K.Yang, "Comparisons of Stopping Rules and Reliability Estimation in Software Testing", SERCTR-58-F, Purdue University, 1992. 11