Efficiently Generating Test Vectors with State Pruning - Semantic Scholar

3 downloads 0 Views 284KB Size Report
Using an IBM Power4 multiprocessor system with the Berkeley. Active Message library, we show that this new method of state pruning is efficient and produces ...
PIII-15

Efficiently Generating Test Vectors with State Pruning Ying Chen†

Dennis Abts*

David J. Lilja†

† Electrical and Computer Engineering *Cray Inc. University of Minnesota P.O. Box 5000 Minneapolis, Minnesota 55455 Chippewa Falls, Wisconsin 54729 {wildfire, lilja}@ece.umn.edu [email protected]

Abstract - This paper extends the depth first search (DFS) used in the previously proposed witness string method for generating efficient test vectors. A state pruning method is added that exploits different search heuristics in simultaneous searches. Using an IBM Power4 multiprocessor system with the Berkeley Active Message library, we show that this new method of state pruning is efficient and produces quantitatively better witness strings compared to both pure and guided DFS.

I. Introduction State space traversal is a well-understood technique for verifying the abstract specification of a complex system with cooperating finite-state machines (FSMs). Unfortunately, traversal of a large state space is mired in computational complexity admitting solutions that are exponential to the number of variables that form the state space. This “state explosion problem” has motivated techniques to make model checking more practical in an industrial setting. One approach is to apply heuristics to the search process. System verification in the pre-silicon state of development is crucial for controlling the budget and the time to market. System verification of safety critical systems are not trivial to verify because they consist of multiple cooperating FSMs which yield a very large state space. In order to alleviate this problem, the models are usually “down-scaled.” Verification of these reduced models cannot guarantee the correctness of the designs, but it is useful as a debugging tool [1]. The witness string approach [2] is a method that captures bug traces from the depth-first-search (DFS) of a state space to help debug the RTL design of a cache coherence protocol. The efficiency of a witness string is measured by its length [3], with shorter strings providing more efficient inputs to drive simulations of the RTL model. To enhance the bug-finding capability of a model checker, several heuristic algorithms have been proposed [4] to improve buy discovery. But the efficiency is not consistent across different bugs. To generate efficient witness strings, we propose a method of pruning out redundant states using multiple simultaneous DFSs each of which exploits a different heuristic. With this state pruning method, each DFS targets a different type of bug in the state space. The state pruning method proposed in this study is applicable to general verification including both hardware and software. To implement the multiple DFSs in the state pruning method, we use a parallel algorithm executed on an IBM Power4 multiprocessor system. Similar to the parallel BFS [5], the hash table for the whole state space is distributed among the processors with each state sent to its owner processor according to a hash function. First, a master DFS uses a breadth fist search (BFS) to generate the frontier states, similar to [6]. Then each DFS uses the received frontier state as the start state to begin its

0-7803-8736-8/05/$20.00 ©2005 IEEE.

search. To reduce redundant state exploration, we use communication among the multiple DFSs to coordinate the searches. We tested the state pruning method on the Stanford DASH cache coherence protocol model [7] and a benchmark from the Counter benchmark suite [8] that models an abstract state space, which can be applied to any application that can be abstracted into a finite state space. In the remainder of this paper, Section 2 discusses the problems in test vector generation. In Section 3, we explain the new state pruning method. The experimental setup and benchmarks are described in Section 4 with the results shown in Section 5. Section 6 discusses related work. Finally, Section 7 summarizes and concludes.

II. Test Vector Generation To verify the implementation of a digital system, manually generated test vectors are typically simulated on a hardware language description of the design. However, neither manually nor automatically generated test vectors can guarantee equivalence between the logic design and the specification. To provide a solid link between the specification and the implementation, the witness string method [2, 4] has been proposed. This method records the events, which are called witness strings, during a DFS of the state space of a formal model. A trace recorded from the start state S0 to a leaf state Sn is a witness string (Figure 1), which is later executed as a test vector on the RTL implementation simulation. In this manner, high quality test vectors are automatically generated by avoiding a large fraction of the redundant states that would be visited during random simulations. a witness string

Sn

S0

Figure 1. A witness string generated by a DFS of the cache coherence protocol state space.

As mentioned above, in real systems, the number and sizes of the generated witness strings could be very large. We tried to improve the efficiency of the generated witness strings by exploring several DFS-based heuristics to guide the DFS to generate shorter witness strings [4]. However, a single heuristic is insufficient for generating good test vectors across a broad spectrum of error types.

III. State Pruning To solve the “one heuristic does not fit all” problem, we propose a state pruning method that uses multiple heuristics simultaneously in several separate DFSs. Each simultaneous

1196

ASP-DAC 2005

search is targeted towards different portions of the state space with different types of bugs. Each search maintains its own hash table and uses a different DFS-based heuristic. Communication among the multiple searches prunes out redundant state exploration to thereby choose shorter witness strings than those obtained with both straight DFS and guided DFS [3]. We study this new approach for generating witness strings in the context of verifying the implementation of a multiprocessor cache coherence protocol and a Counter benchmark for the model checking of general applications A. DFS-based heuristics Previous work [4] explored the benefit of applying a heuristic to guide the depth-first search (DFS) of a model checker verifying a cache coherence protocol. This work compared four different heuristics [4] with the goal of improving the efficiency of the final test vectors by reducing the number of searched states before discovering the error. B. Basic algorithm We use multiple DFSs concurrently. Each DFS uses a different one of the four heuristics described above to guide its search. The hash table for the whole state space is distributed among the DFSs using a hash function. Each DFS is the owner of the states distributed in its hash table. Step 1

s0

Initial BFS

Step 2 s1

s2

s3

DFS1

DFS0

s4

DFS3

DFS2

s14 s8

s5

s12

Step 3 s9

s15

s6

s13

s7

s10 s16 s11

Step 1: A BFS on a master search (DFS0) generates enough initial frontier states. Step 2: The frontier states are distributed among the DFSs (DFS0,1,2,3) using a hash function. Each DFS receives one start state. Step 3: Upon receiving its start state, each search begins a DFS. There is communication among the FS searches to avoid redundant state exploration. Step 3.1: In each current state a DFS chooses its next state according to its local heuristic algorithm. Step 3.2: Each DFS sends its chosen next state to its owner according to a hash function. Step 3.3: On receiving a state sent by a sender to an owner, each DFS checks whether the state is in its hash table. If it is not, the state is added into the owner’s hash table. Then the owner sends a message to the sender telling whether the state has already been explored. Step 3.4: Each DFS receives a message from the next state’s host. If the next state has not been explored, th e DFS continues and sets the next state to be the current state. Step 3.5: Whenever a DFS reaches a bug in the design being tested, or finishes searching, a final report for this search is printed.

Figure 2. The basic state pruning algorithm using multiple DFSs (e.g., DFS0,1,2,3) for generating witness strings.

The basic algorithm for state pruning is shown in Figure 2. For example, as DFS1 traverses to s8, it discovers that s5 has already been explored by DFS0. Thus, DFS1 goes from s8 to s9 instead of to s5. It turns out the path s8 à s9 à s10 is a shorter path than s8 à s5 à s6 à s7 à s10. In this manner, the number states is pruned. A similar situation happens when DFS1 goes s10 à s11 instead of s10 à s14 à s15 à s16 à s11 since DFS1 finds out s14 has already been explored by DFS3 when DFS1 is in s10. Since the types of bugs that may exist in an actual system are unknown, this new algorithm solves the problem of not knowing which search heuristic to use in a specific situation by using several different heuristics simultaneously.

IV. Experimental Setup A.

Model checker Our model checker is built on our previous work [4] using Murphi [11], the parallel Murphi model-checker [5] written for

the Berkeley Active Messages library, and the parallel RW algorithm [6]. We add new functions in Murphi to implement the heuristics and the state pruning algorithm and new message handlers for the communications among the DFS processes. B.

Benchmarks

We the used both Stanford DASH multiprocessor model [7] and a Counter benchmark [8] for this study. The Stanford DASH multiprocessor implemented a directorybased coherence protocol that uses NAKing. It is the predecessor to the SGI Origin2000 [10] cache coherence protocol. The formal model of DASH has three types of requesters: the local or remote cluster, and the home cluster. It also has two classes of memory references to local or remote memory. The total state space is 10,466 states. We implanted bugs in the system by inspecting state diagrams of the protocol given in [9]. The specific bugs implanted are described below. BUG1 Handle read exclusive request to home No invalidation is sent to the master copy requesting cluster, which has already invalidated the cache. Invariant violated: Consistency of data BUG2 Send reply message instead of NAK Instead of a negative acknowledgement (NAK) message, an acknowledgement (ACK) message is sent in reply. Invariant violated: Explicit writeback for non-dirty remote BUG3 Handle read request to remote cluster Instead of changing the cache block in the remote cluster to be locally shared after sending the data block to the requesting remote cluster, the cache block remains locally exclusive. Invariant violated: Consistency of data BUG4 Handle read exclusive request to remote cluster Instead of changing the cache block in the remote cluster to be not locally cached, the cache block remains locally exclusive. Invariant violated: Only a single master copy exists

The Counter benchmark suite is a set of benchmarks to test and compare different model checking tools. It simulates different shapes of state spaces and, for each shape, different locations and distributions of errors [8]. We choose the bidirectional deep -sparse benchmark [8] because it models the state space shape of a general application. The space size is 132,651 states. C.

Experimental environment

There are many ways to implement this state pruning method. It can be multithread on either a single processor or on a multiprocessor, or it can be implemented as a coarse-grained parallel algorithm on a multiprocessor system using messagepassing. We use one pSeries 690+ node with 8 processors on an IBM Power4 system. The Berkeley Active Message library [10] is used to communicate among the DFS processors.

V. Performance Evaluation Our experimental results for the DASH system and the bidirectional deep -sparse benchmark are shown in Tables 1 and 2 and in Figure 3. For the DASH system we experiment with four unique DFSs in four separate processes where each DFS uses one of the following heuristics: max Hamming distance, min Hamming distance, max cache_score, min cache_score [4]. We also experiment with 8 DFSs in 8 processes where each of two DFSs use the same heuristic. We use one -step BFS, which generates 5 frontier states, for four DFSs and two-step BFS, which generates 19 frontier states, for 8 DFSs. For the bi-directional deep -sparse Counter benchmark we experiment with four unique DFSs in four separate processes where each of two DFSs use one of either the max Hamming distance or min Hamming distance heuristics. We also experiment with 8 DFSs in 8 processes where each of four DFSs use the same heuristic. We use a two-step BFS, which generates

1197

A.

ii.

State pruning vs. multiple independent DFSs

The results for the state pruning method are generated from 20 separate runs and are shown in the form of (mean ±standard deviation) in Table 1.b. Table 1. States explored using the witness string method for the Stanford DASH coherence protocol and a Counter benchmark. (a) The lengths of the witness strings generated using one DFS -based heuristic in a single DFS DFS DASH BUG1 DASH BUG2 DASH BUG3 DASH BUG4

4,719 421 610 533

Counter Benchmark

79,221

Hamming Distance MIN MAX 9,093 4,958 25 437 3,825 755 10,265 504 1,611

2,412

Cache Score MIN MAX 932 8,553 54 409 1,027 413 924 78 N/A

N/A

Min-Max Predict 2,411 124 593 410 N/A

iii.

1,131±378 3 3 ±14 449±136

960±244 1 8 ±7 429±125

386±135

1,221 38 791 362

1,336±190

57,774

1,288±118

226±104

Effect of the number of searches in state pruning

In order to further explain the effect of the number of searches in the state pruning method, we generate the probability diagram shown in Figure 3 because each result for the state pruning experiments is generated from 20 separate runs. The diagram shows the probabilities of finding a bug (e.g., bug1) using different numbers of searches (i.e., 4, 8) in the state pruning method. (Due to space limitations, please refer to [13] for the results for the other bugs.) The probabilities are calculated using the following formula: N x : total number of runs that find the bug before reaching the x th states N: the number of total runs, which is 20 in this study

Nx N

p(x ) =

The number of states where the probability is 0.5 is the median case for finding a bug. The number of states where the probability is 1 is the worst case for finding a bug. Probability of Finding DASH Bug1 4 searches

8 searches

1 0.8 0.6 0.4 0.2

30 00 .00

28 00 .00

26 00 .00

24 00 .00

22 00 .00

20 00 .00

18 00 .00

16 00 .00

14 00 .00

12 00 .00

80 0.0 0

10 00 .00

60 0.0 0

0

0 40 0.0 0

As we can see in Table 1.a, different heuristics are better at finding different types of bugs for the DASH system. The minmax-predict method [4] was proposed as a compromise among the different heuristics. As shown in [4], min-max-predict is the most efficient DFS-based heuristic for a single exhaustive search. For the bi-directional deep-sparse benchmark, both min and max Hamming distance are more efficient than the DFS. By exploiting multiple heuristics in multiple simultaneous DFSs, we can choose the shortest trace (witness string) from the traces produced by the different heuristics. Thus, the final trace is close to the one that would be generated by the most efficient heuristic for each bug. As a result, this method produces good traces for all of the various types of bugs. When there is no communication among the multiple DFSs, Table 1.b shows that the four DFSs provide an improvement for most of the DASH bugs even when compared with the min-maxpredict approach, which combines the four heuristics. When we use eight DFSs with no communication, some results improve compared with the results of four DFSs and some do not. This inconsistent behavior occurs because some of the start states for the DFSs are different from the ones used by the four DFSs. Some start states may be randomly closer to the bug, while others may be further. When there is no communication among the multiple DFSs for the bi-directional deep -sparse Counter benchmark, more efficient traces are produced than with the original DFS, but these traces are less efficient than the min/man Hamming distance. This difference is because the start states of each DFS in the multiple DFSs are generated by the initial BFS search and thus differ from the start state of the original DFS. When there is communication among the multiple DFSs, the resulting traces are more efficient than the min/man Hamming results and the multiple independent DFSs results. The state pruning approach with eight DFSs generates shorter witness strings than the four DFSs approach.

8 searches Indepen-dent State Pruning

In the state pruning method using four DFSs, most of the results are better than the case of multiple DFS with no communication. For DASH bug 2 and bug 3, even the upper bounds of the results for state pruning are better than the results of multiple DFSs with no communication, which shows the efficiency of the state pruning method for witness string generation. In the case of eight DFSs, we show further improvements even when compared to the experiment with no communication, which showed no improvement. For all of the bugs, the upper bounds of the results for state pruning are better than the results with no communication. This occurs because with more DFSs, there is more communication among the searches, which helps to prune out more redundant states.

0.00

Multiple DFSs vs. single DFS

4 searches State Pruning

57,702

Counter Benchmark

20

i.

Independent 1,426 70 720 363

DASH BUG1 DASH BUG2 DASH BUG3 DASH BUG4

probability p(x)

Efficiency of the generated witness strings The efficiency of a generated witness string is measured by its length, which is the number of states it contains. The shorter a witness string, the more efficient it is.

(b) The lengths of the witness strings generated using four different DFS -based heuristics (min Hamming, max Hamming, min cache_score , max cache_score) in four and eight DFSs. Each DFS uses a different heuristic. “Independent” means there is no communication among the multiple DFSs, while “State Pruning” allows communication among the multiple DFSs.

0.0

6 frontier states, for four DFSs and a three-step BFS, which generates 10 frontier states, for 8 DFSs. The communication among the multiple DFSs generate nondeterministic results each time the algorithm is executed since the communication timing and ordering is different each time. Consequently, the results shown are the mean and standard deviation of 20 separate runs.

number of states explored

Figure 3. The probabilities of finding a bug (e.g., bug1, due to page limitations please refer to [13] for the diagrams of the other bugs) in the Stanford DASH cache coherence protocol using different numbers of searches (i.e., 4, 8) in the stat e pruning method.

As shown in Figure 3, the curve for the 8 searches case is always above the curve for the 4 searches case. This relationship means that state pruning with 8 searches is more efficient than state pruning with only 4 searches. The improv ement can be up to 100% [13]. Furthermore, in the worst case, state pruning with 8 searches finds a DASH bug faster than with only 4 searches. Therefore, in the median cases, state pruning with 8 searches always finds a DASH bug faster than with 4 searches. The improvement is up to 30% [13].

1198

B.

Efficiency of the state pruning method To provide a more direct comparison of the state pruning method to the previous methods, we keep the system resources constant as a uniprocessor. We compare the state pruning method of multiple interleaved searches with the method of multiple independent searches and the method of executing multiple searches sequentially with the four different heuristics. The main difference between these methods is that the first two methods stop whenever a search reaches a bug first and thus the generated witness string is the shortest while the third method needs to finish all of the searches in sequence and then choose the shortest witness string generated. The efficiency of a method is proportional to the length of the generated witness string and the time required to generate it. The time to generate the witness string is proportional to the number of states explored. So the efficiency of a method is quantified as the product of the length of the witness string and the number of states explored during the generation. The smaller this product, the more efficient the method. As shown in Table 2, the method of multiple independent searches improves the efficiency compared to the sequential multiple searches for most of the cases because it stops whenever the shortest witness string is generated rather than exhaustively finishing each search. But this improvement is not always very significant because in the independent method, each search uses an independent hash table, which is the same as in the sequential multiple search method. The state pruning method, however, uses a distributed hash table structure and thus avoids state re-exploration through interaction. Therefore, the state pruning method significantly improves the efficiency of the witness string generation. Table 2. The efficiency of the state pruning method compared to previous approaches for the Stanford DASH coherence protocol and a Counter benchmark. 4 searches

Sequential multiple searches

Independent

and the memory requirements, this approach is not suitable for witness string generation because RW does not use a hash table and thus generates huge witness strings. Unlike previous studies, we exploit multiple heuristics to choose the most efficient witness strings by pruning out the redundant states through communications among the multiple DFSs.

VII. Conclusions The state pruning method presented in this study gives the flexibility to choose the most efficient witness strings from all of the witness strings generated by the multiple DFSs. Our results show a significant improvement when using this multiple DFS approach compared to the previously proposed min-max-predict approach for cache coherency protocols, which combines multiple heuristics in a single exhaustive DFS [4]. Furthermore, this state pruning method is efficient for general verification. As the number of simultaneous searches increases, the state pruning benefits tend to increase compared to using multiple DFSs with no communication. In summary, instead of a single heuristic, the state pruning method exploits multiple heuristics in different regions of the global state space to choose the most efficient overall witness strings. By pruning out redundant states in a DFS, this method enhances the efficiency of the witness strings generated and also improves the generation method itself making it a very efficient approach for automatically generating good test vectors for model checking.

References [1]

[2]

State Pruning

DASH BUG1 DASH BUG2

7

2.194 *10 2.313*10 4

7

2.012 *10 2.249*10 4

(1.684±0.343)*10 7 (1.930±0.423)*10 4

DASH BUG3 DASH BUG4

2.486*10 6 9.180*10 5

2.357*10 6 8.979*10 5

(1.812±0.245)*10 6 (8.122±0.982)*10 5

Counter Benchmark

1.296*10 7

1.364*10 7

(9.671 ± 2.533)*10 6

[3] [4]

VI. Related Work

[5]

The previous work most closely related to this study can be categorized into two classes: search heuristics and parallel algorithms. Yang and Dill [7] examined the efficiency of several BFS based heuristics. They concluded that Minimum Hamming distance is easy to implement but has inconsistent efficiency. Tracks and Guidepost are more robust, while requiring more manual labor. Our previous study [4] examined several DFSbased heuristics. By exploiting the benefits of different heuristics, the proposed min-max-predict heuristic quickly discovered the embedded bugs in several different coherence protocols. All of these search heuristics were studied using a single exhaustive search that targeted a single embedded bug at a time. Some parallel algorithms are related to our state pruning method. Stern and Dill [5] studied a parallel BFS that generates short paths to states. However, this approach requires a large amount of memory per node to maintain the queue while not being able to get quickly into the deeper state space as can DFS or a random walk (RW). Sivaraj et al. [6] combined BFS with RW. Although they vastly reduced the message exchange rate

[6]

[7] [8] [9] [10] [11] [12]

[13]

1199

U. Stern and D. L. Dill, “ Automatic Verification of the SCI Cache Coherence Protocol”, In Correct Hardware Design and Verification Methods: IFIP WG10.5 Advanced Research Working Conference Proceedings, 1995 D. Abts and M. Roberts, “Verifying large -scale multiprocessors using an abstract verification environment”, In Proceedings of the 36 th Design Automation Conference (DAC 99), pages 163-68, June 1999. D. Abts, S. Scott, and D. J. Lilja, "So Many States, So Little Time: Verifying Memory Coherence in the Cray X1", International Parallel and Distributed Processing Symposium (IPDPS), April, 2003 D. Abts, Y. Chen and David J. Lilja, “Heuristics for Complexity -Effective Verification of a Cache Coherence Protocol Implementation”, Laboratory for Advanced Research in Computing Technology and Compilers Technical Report No. ARCTiC 03-04, November 2003. U. Stern and D.L. Dill, “Parallelizing the Murphi Verifier”, Formal Methods in System Design, vol. 18, no. 2, pp. 117 -129, 2001, (Journal version of their CAV 1997 paper). H. Sivaraj and G. Gopalakrishnan, “Random Walk Based Heuristic Algorithms for Distributed Memory Model Checking”, 2n d International Workshop on Parallel and Distributed Model Checking (PDMC’03), Boulder, Colorado, USA, July 2003 Lenoski, D., J. Laudon, K. Gharachorloo, W. -D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam., “The Standford DASH Multiprocessor”, IEEE Computer (Mar 1992), 63-79. http://vv.cs.byu.edu/~tonga/ C. H. Yang and D. L. Dill, “Validation with Guided Search of the State Space”, In Proceedings of the 35th Annual Design Automation Conference (DAC98), June 1998. T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser, “Active messages: a mechanism for integrated communication and computation”, In 19 th Annual International Symposium on Computer Architectures, 1992. D. L. Dill, “The Murphi verification system”, in Computer Aided Verification, Lecture Notes in Computer Science, pp. 390 -393, 1996. Springer-Verlag. James Laudon and Daniel Lenoski, “The SGI origin: A ccNUMA highly scalable server”, In Proceedings o f the 24 th Annual International Symposium on Computer Architecture (ISCA -97), volume 25,2 of Computer Architecture News, page 241-251, New York, 2-4 1997. ACM Press. Y. Chen, D. Abts, D. J. Lilja, “State Pruning for Generating Efficient Test Vectors”, Laboratory for Advanced Research in Computing Technology and Compilers Technical Report No. ARCTiC 04-05, July 2004

Suggest Documents