A Scalable BIST Architecture for Delay Faults - CiteSeerX

0 downloads 0 Views 151KB Size Report
For BIM with limit g2 also IV d5 is needed, since the test pair 'diamond' is produced in g2 steps neither by d3 ..... [14] P. Tafertshofer, A. Ganz, and M. Henftling.
A Scalable BIST Architecture for Delay Faults Martin Keim

Ilia Polian

Harry Hengster

Bernd Becker

Albert-Ludwigs-University, Institute of Computer Science Am Flughafen 17, 79110 Freiburg im Breisgau, Germany email: < keim, polian, hengster, becker >@informatik.uni-freiburg.de

Abstract We present a scalable BIST (Built-In Self Test) architecture that provides a tunable trade-off between on-chip area demand and test execution time for delay fault testing. So, the architecture can meet test execution time requirements, area requirements, or any target in between. Experiments show the scalability of our approach, e.g. that considerably shorter test execution time can be achieved by storing only a few additional input vectors of the BIST architecture. The gain of test execution time possible with the proposed method ranges from a factor of 2 up to a factor of more than 800000.

1 Introduction Delay fault testing is likely to become industrially accepted in the near future. However, there is no single delay fault model, but several models that compete for acceptance. A discussion of the advantages and disadvantages of these fault models is published e.g. in [12]. Besides their differences, all these models have in common that a test for a fault consists of two successive test patterns (a test pair), that have to be applied to the circuit-under-test (CUT) at speed. The first pattern is denoted as the initialization vector. It is followed by the propagation vector. Computing such two-pattern tests is known to be NP-hard [4]. However, there exist test pattern generators (TPG), e.g. for the path delay fault model [7]. Unfortunately, it becomes no longer manageable to apply all precomputed test patterns to the CUT by Automatic Test Equipment (ATE). Problems that come with this task are first of all the costs for a high speed ATE, that is necessary for today's chips. Furthermore, chips no longer consist of a single circuit, but host many different modules. In general, not all i/opins of a module will be accessible. So, the precomputed test patterns cannot directly be applied to the target module. BIST is an accepted method to solve the mentioned problems. Two approaches for delay fault BIST are possible: One is to apply all possible transitions at the inputs of the CUT. This results in very long test execution time, since there are n n different pairs of n-bit vectors for an n-input CUT (exhaustive testing). However, in [13] it was shown that it is sufficient to test robustly for path delay faults with test pairs that differ only at a single position (adjacency testing). The number of these patterns is n n . BIST architectures that generate such patterns are presented in [15, 8, 5]. The test execution time of this approach is n n+1 for an n-input CUT with an area overhead of n measured in the number of memory elements used. Adding n AND gates and considering so-called test cones, the actual test execution time can be reduced [8]. Accordingly, the number of memory elements is reduced to n. The other approach starts with a predetermined set of test pairs. It tries to build a hardware that generates sequences in which the test pairs are embedded, i.e. the initialization vector and the propagation vector are generated successively. E.g. in [6] the test pairs are embedded in the sequences generated by an LFSR, or

2 (2 ? 1)

2

3

2

2

in [16] in the sequences generated by a Multi Input Signature Register (MISR), shown in Figure 1. For an n-input CUT and k test pairs [16] needs n memory elements for the MISR and k0 n clock cycles to generate all test pairs, with k0 k, as explained later. Additionally, it needs to store k0 n-bit input vectors (IVs) for the inputs of the MISR. The disadvantage of all these methods is, that the test execution time is fixed and raises exponentially with the number of inputs. Based on [16], we present a method that offers a trade-off between the available clock cycle time for testing and the need of storing additional IVs for the MISR. The underlying idea is to store a very small set of IVs on chip or off chip and let the BIST hardware expand it so, that all given test pairs are embedded in the sequence generated. It bases on the fact that the test pairs contain many don' t-care values. This means, that many fully specified instances of each test pair exist. Some of them might be easier to embed in a sequence than others. Furthermore, we do not let the generator produce sequences for each IV that are of maximal length n n but of specified length g . Then, at least one instance of each of the k test pairs must occur in the first g vectors of the k n of testsequence. For a given upper limit k g ing time, the proposed method computes a (minimum) number k0 of IVs needed to cover all given test pairs, resulting in a test execution time of k0 g . Moreover, for a given limit k0 k of IVs, e.g. by an area restriction for storing the IVs, the proposed method will find a (minimum) number of clock cycles, such that all k test pairs are covered by the generated test sequences. Since the task of our method is to cover all given test pairs, it does not depend on a specific delay fault model nor on the CUT itself. Furthermore, we do not have any requirement on the given test patterns. The proposed method runs as follows: Given a set of test pairs and a description of a MISR, find at first all input vectors of the MISR with that at least one test pair is embedded in the full length sequence generated by the MISR. Then, the method tries to reduce the number of input vectors and/or the overall test execution time. Comparing with other methods, the actual test execution time of the proposed method is reduced up to a factor of several mag11 (compared to nitudes, e.g. for circuit s510 by a factor of : 5 exhaustive testing), by a factor of : (compared to our reim5 with respect to plementation of [16]), and by a factor of : adjacency testing. The rest of the paper is structured as follows: In the next section, we shortly introduce the MISR and some of its properties. Section 3 shows the proposed method. Experimental results are provided in Section 4 followed by the conclusions.

2



 2 ?1

2 ?1

   (2 ? 1)





8 0  10

8 4  10 6 2  10

2 Multi Input Signature Register We use a MISR as the generator structure for the test sequences, that are applied to the CUT. In Figure 1 a MISR of dimension n is shown: It has n inputs, n outputs, and n memory elements (stages) numbered from right to left. The Boolean values h0 ; : : : hn?1 de-

dn?1

A

A

?

dn?2

?

yn?1    yn?2    ?

d2

A ?

...   



y1

?

h0

d1

A

A

?



?

h1

?





y0

6

?

hn?2

hn?1

-



? A

? A

d0

? A

? A

Figure 1: A MISR of dimension n.

=1

? ?1

fine its feedback connections, i.e. hi iff the n i th stage is selected. The next state is a function of the current state and the IV, i.e. the vector d applied to the inputs. Thus, the state transition relation M can be defined as y 0 M y; d , where y0 is the next state (vector), y is the current state (vector), and d is the input (vector):

= ( )

pair may need different IVs. Moreover, some instances of the test pair may be before others in the sequences, and it is very unlikely that all of them are at the end of the corresponding sequences. This means, it is not necessary to let the MISR run sequences of full length. In fact, the experimental results show, that nearly all of the test pairs are produced in very few steps. Definition 1

yi0

= (M (y; d))i = Ln?1 hi yn?yii??11  dd0i :: 0i =< 0i  n ? 1 i=0

(1)

Given an input vector d and a starting vector (seed) y 0 , the MISR produces the sequence M 0 y 0 ; d y0 , M 1 y0 ; d 0 2 0 0 M y ; d , M y ; d M M y ; d ; d , : : : on its outputs. We denote with M  y 0 ; d the whole sequence. As for LFSRs the properties of a MISR can be described by a characteristic polynomial. If a characteristic polynomial is primitive, it has some interesting attributes. Since we take advantage of some of these in our work we choose the primitive polynomials from a list in [1]. An important property of a MISR of dimension n is the following: given an IV d and a starting state y 0 , the sequence M  y 0 ; d has period n . Furthermore: given two states y and y 0 , there is one and only one IV d with y' = M y; d . The first property ensures, that essentially all output combinations are generated exactly once, if the MISR runs through all of its n states for each IV. In particular this means, if a fully specified test pair is embedded in a sequence M  y 0 ; d this pair occurs exactly once. But this does not mean that any test pair is n embedded in all sequences (since there are n pairs, but the sequence length is only n ). However, the second property ensures, that for all test pairs there is (exactly one) sequence.

(

)

(

(

) := ( ( ( )

)

(

) := ) )

(

2 ?1

) :=

( )

2 ?1

(

) 2  (2 ? 1)

2 ?1

3 Solution Scheme

Since we let the MISR start with the seed y 0 = (0; : : : ; 0) when-

ever possible (this seed may be forbidden due to a fully initialized ; : : : ; ), we thereby obtain an test pair, that needs the IV d ordering of the vectors generated by the MISR, i.e. we can talk of the ith vector produced by the MISR. The proposed method takes advantage of the fact that, in general, the computed test pairs of a TPG contain a large number of don' t-care values. Thus, different fully specified instances of a test

= (0

0)

(

)

1. A fully specified test pair v; w is produced by the IV d and the starting state y 0 iff there is an i > with v M i?1 y0 ; d and w M i y0 ; d . We say v; w is produced in g steps iff i g .

(

)

=



(

)

(

0

)

=

2. A fully specified test pair is produced in exactly g steps iff it is produced in g steps, but not in g steps.

?1

3. A not fully specified test pair is produced (in g steps) iff at least one of its fully specified instances is.

From now on, we do not mention y 0 explicitly and use produced by d instead of produced by d and y 0 . It is important to note, that for a not fully specified test pair y; y 0 there are at least two different IVs that produce all instances of the test pair, as can be proven easily. To solve the test-pair-embedding-task, we implemented two methods: Unbounded IV Minimization (UIM) and Bounded IV Minimization (BIM(g )) for a limit g of steps. Both methods run in two phases:

(

1. Determine for each test pair TPj the set

)

j of all IVs

(a) that produce it (Method UIM). (b) that produce it in g steps (Method BIM(g )).

D

2. Determine a subset of IVs so, that every test pair is produced by at least one IV from . Since we want to minimize , we are solving a set covering problem. This algorithm is the same for UIM and BIM(g ), and is discussed later in Section 3.2.

jDj

D

Figure 2 depicts the methods. Three test pairs ( TP1 ; TP2 ; TP3 ) are denoted by a triangle, a square, and a diamond. For the five IVs d1 ; : : : ; d5 , the step in that an instance of a test pair is produced is shown horizontally. Three different limits gi are drawn. The figure shows that for UIM all test pairs can be produced with IV d4 .

2

d1 d2

11111111111111111 00000000000000000 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000000000000000000000000000 11111111111111111111111111111111111111111 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111

111 000 UIM

BIM(g2 ) BIM(g3 )

11111111111111111111111111111111111111111111111111111111 00000 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 00000000000000000 11111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111 000000000000000000000000000000000000000000000000000 111111111111111111111111111111111111111111111111111

111 000

111 000

11111 00000

g3

d4 d5

2-1

Figure 2: Example of different solutions for UIM and BIM(g ). BIM(g3 ) needs only one IV more, namely d3 and d4 . A different solution for BIM(g3 ) is d2 , d4 . For BIM with limit g2 also IV d5 is needed, since the test pair 'diamond' is produced in g2 steps neither by d3 nor by d4 . Finally, the figure shows that not for all limits gi a solution exists. Here, g1 is too small to produce all test pairs. (The 'diamond' is not produced.)

3.1 Determine the IVs 3.1.1 IVs for UIM

!

MISR (y0 ; y; d) = 1 () M (y; d) = y0

(2)

MISR can be constructed component-wise according to equation (1) [16]. MISR y 0 ; y; d describes all possible transitions from state y to the next state y 0 . But for the j th test pair V; W , we want to know only the set j of all IVs d with M v; d w for all instances v of V and w of W . For that restriction, the n IB (with j d characteristic function j IB d ) can be obtained by restricting the y 0 and y variables in j MISR on W and V , respectively:

(

)

2

 : !

( ) ( )= ( ) = 1 ()

 (d) = 9y0 ; y MISR (y0 jW ; yjV ; d) j

 ) 

(3)

In the set j , all IVs producing any instance of the j th test pair V; W are contained. For UIM the method continues with combining the j s, i.e. the subset of IVs that covers all test pairs is determined. But for BIM(g ) the method continues with some simulations, as described next, before constructing .

(

D

D

3.1.2 IVs for BIM(g) For the IVs computed above there is no information available, which IV produces the test pair in how many steps. The only thing that is for sure is, that at least one instance of each of the k test pairs occurs anywhere in the sequences M  y0 ; d , d ; : : : ; k , resulting in a test application time of k jn; j . In BIM(g ) only those IVs are of interest that produce the test pair in g steps. We find these IVs by simulation. Two simulation algorithms have been implemented. The first one is based on a two-valued forward simulation, and the second one is using implicit backward simulation.

 = f1 (2 ? 1)

gg

:

()

f (

)8 2 

!

3.2 Set Covering

So far, we obtained for both, UIM and BIM(g ), for each test pair TPj the set j of all IVs producing it (in g steps). Now, we want to combine and reduce the j -th to a minimal size set of IVs . Obviously, a set covering problem is to be solved. The matrix can be figured with TP1 ; : : : ; TPk rows, each n columns wide (for the n possible combinations of the n-bit IVs for d). An element of the matrix is marked, iff that test pair is produced (in g steps) by the IV d. We are using a heuristic greedy approach to solve the covering problem. (Besides this heuristic, an exact algorithm has been implemented, also. Since the matrix to cover is very large an exact solution cannot be obtained for the considered circuits, except for the smallest one. Moreover, no different result has been obtained.) For the two-valued simulation algorithm, a standard covering greedy heuristic has been used: In each iteration, take the IV (column) that has the most marked entries, i.e. that produces the most test pairs. Delete that column and the marked rows and start over until the matrix is empty. However, for large n the matrix size of k n becomes unhandable. Thus, a symbolic set covering algorithm has been implemented. Furthermore, the number of test pairs to consider can then be reduced efficiently by row dominance.



Computing all IVs that produce all instances of a given test pair can easily be obtained by symbolic methods based on BDDs [3]. The necessary Boolean operations are executed efficiently on BDDs. Firstly, we functionally describe a given MISR [16]: Let M be the state transition relation of the MISR, then the characteristic function MISR IB 3n IB of the state transition relation M is defined as:

:



( )

d3

n

g1 g2

The two-valued forward simulation algorithm runs as follows: For the j th test pair, get the set j of IVs producing the test pair. Then, for each IV d j initialize the MISR with y 0 . Continue to compute the next state until either the current and the next state is an instance of the test pair, or the limit g is reached. In the latter case the test pair has not been produced in g steps. The next state M y; d of the MISR is implemented by bit operations on the machine word: At first, a left shift operation L ?1onh yy is per-. formed. Then, the bit representing y00 is set to n i=0 i n?i?1 Finally, the XOR-operation with the IV d is executed. Due to this implementation, the two-valued simulation runs very efficiently. The algorithm of implicit backward simulation is shown in Figure 3. It works as follows: In the ith iteration, all IVs that produce the test pair in exactly i steps are determined. To perform that, the IV-Current State relation ICSi IB 2n IB is defined. ICSi represents the (IV, state) pairs producing the test pair in exactly i steps. In j ;i d all IVs are collected that produce the j th test pair in i steps.



jf

2

gj

D

2

2

3.2.1 Applying Row Dominance The idea of row dominance (RD) is to exclude some rows that can be considered irrelevant, since they are covered by another row. Thus, they can be deleted without sacrificing any good solution: Imagine i and j with j i . That is, every IV producing the j th test pair also produces the ith one. Thus, it makes no sense to choose a IV from i j for producing the ith test pair, since we have to choose a IV from j anyway, which is also an element of i and thus produces the ith test pair, too. We say ' i is dominated by j ' . If so, the row of the ith test pair can be deleted from the covering matrix. The dominance relationship can also be computed symbolically: i is dominated by j iff j i .







:(



)_

=1



   n







3.2.2 Greedy Symbolic Set Covering In the following, the greedy algorithm is described that symbolically solves the covering problem. The algorithm is similar to the one in [11] and has also been used in [16].

Algorithm: Implicit backward simulation Input: j th test pair V; W , g IN Output: All IVs that produce V; W in g steps.

(

) 2 ( ) 0 ICS1 (y; d) := 9 y MISR (y0 jW ; yjV ; d)  ;1 (d) := 9 y ICS1 (yjy0 ; d) for (int i = 2; i  g ; i + +) f tmp := ICSi?1 (y;0 d)jy y0 ICSi(y; d) := 9 y (tmp ^ MISR(y0 ; y; d))  ;i (d) := 9 y ICSi(yjy0 ; d) _  ;i?1 g return All IVs d with  ;g (d) = 1 j

j

j

// All instances of y V , d with y 0 W as the next state. // All d for that y is in y 0 . // Doing the time steps: // Make the current state to the new next state ... // ... and get the new current state. // All d for that y is in y 0 , joined with all former d.

j

j

j

Figure 3: Algorithm of the implicit backward simulation. Let m be the number of rows to consider. If RD has not been exploited, m is the number of test pairs k to produce, else m can be less. At first, the number j of test pair TPj is encoded by the monom cj IB dlog2 me . Then, cj is combined with the corresponding j :

Applied test vectors (Test Execution time)

Exhaustive Testing e.g. by (2n)-LFSR Adjacency Testing

UIM

2



set cov (c; d) =

( )

m _

j =1

(cj ^  )

(4)

j

set cov c; d describes the covering matrix. It can be represented symbolically by a single BDD. Let the boolean variables used in the BDD be ordered as follows: The n variables d0 ; : : : ; dn?1 representing the IVs d are ordered before the log2 m variables of the cj -monoms. After the BDD has been built, a cut is placed between the d and the c nodes of the BDD. The idea of the covering algorithm is as follows: A path di ; cj in the BDD of set cov c; d consists of two parts: A IV component di in the d variables and a test pair component cj in the c variables. A One-path di ; cj in the BDD means that the ith of the n IVs produces the j th test pair. Since the On-sets of the BDDs rooted by c variables directly after the cut, i.e. an incoming edge of c crosses the cut, are (in general) larger than 1, any di path leading to this root-nodes produces all test pairs in the On-set of that BDD:

d

2

 

Calculate the size of the On-set for all BDDs that have a root-node directly after the cut. Find the maximal size of the On-set, and let the corresponding BDD be rooted by cmax .

Select any path dj to cmax , i.e. an IV producing most test pairs.

D

After the IV has been chosen it is added to . Finally, the produced test pairs is removed from the matrix by symbolic operations. After that, the sizes of the On-Sets and a new cmax must be recalculated. The algorithm stops if there are no more test pairs to produce, i.e. set cov . In this case contains a number of IVs that produce at least one instance of every test pair.

=0

D

3.2.3 Comparing the di erent Methods If BIM(g ) requires k0 := jD BIM (g) j IVs, the test application time is gk0 . Obviously, the smaller g is, the more IVs are tendencially needed. However, experimental results will show that k0 is rising slower than g is falling, i.e. lowering g also lowers the total test execution time gk0 . However, there is no guarantee that for given g BIM(g ) has a solution at all, as explained earlier. On the other hand side UIM has in all cases a test execution n time of k0 , with k0 UIM .

 (2 ? 1)

BIM(g) with small g

e

( )



BIM(g) with large g

:= jD

j

Storing in ROM

Area Overhead

Figure 4: Test application comparison of different BIST schemes. Figure 4 compares the proposed method with other BIST methods. The area overhead is approximated by the number of used memory elements respectively the number of bits to store. While the other methods are fixed in the consumed resources, the figure shows that the proposed method of BIM(g ) can be adjusted for a given limit. The experiments, presented next, underline this scalability.

4 Experimental Results We applied both methods UIM and BIM(g ) to the combinational parts of ISCAS-89 [2] benchmark circuits. For generating two-patterns tests we used the tool TIP [9, 14], the successor of DYNAMITE [7] used in [16]. It computes robust test pairs for the path delay fault model. Note that due to TIP' s advances, the numbers of test pairs here are smaller than the ones used in [16]. Therefore, we compare only to our reinplementation UIM of [16]. The computations were made on a Sun Ultra workstation with 512 MB main memory and 300 MHz. The computations for circuits with 23 or more inputs were made on a comparable workstation but with 1 GB main memory. All execution times are CPU times measured in seconds except for some values given in hours. Furthermore, we use a BDD package with improved techniques for the synthesis of the relational product operator that is fundamental for the image computation during state traversals [10]. The first table shows the results for the two-valued forward simulation and a standard covering heuristic. The second table shows the experimental results obtained by using the implicit backward simulation, row dominance, and symbolic covering. The first column of the tables contains the name of the circuit, followed by the number #In of its inputs, and the number of test pairs, #TP. The next column gives execution time of the (optimal) adjacency

Circuit Name s27

#In 7

#TP 32

Adjac. Testing 896

s386

13

232

106496

s1488

s1494

s298

s208

14

14

17

18

733

725

177

209

229376

229376

2228224

4718592

s832

23

488

1:9  108

s820

23

475

1:9  108

s526

24

356

4:0  108

s444

24

303

4:0  108

369

8:4  108

s510

25

Result Method UIM BIM(64) UIM BIM(4096) BIM(2048) UIM BIM(8192) BIM(4096) BIM(2048) BIM(1024) UIM BIM(8192) BIM(4096) BIM(2048) BIM(1024) UIM BIM(256) BIM(128) BIM(64) BIM(32) BIM(16) BIM(8) UIM BIM(32768) BIM(16384) BIM(8192) BIM(4096) BIM(2048) BIM(1024) BIM(512) BIM(256) UIM BIM(64) UIM BIM(128) BIM(64) UIM BIM(64) UIM BIM(16) UIM BIM(16)

OvL 768 384 417792 208896 100352 950272 483328 253952 141312 100352 983040 516096 266240 157696 95232 2097152 4096 2176 1280 672 464 296 8388608 1048576 507904 286720 143360 77824 40960 23552 13312 536870912 7104 553648128 11904 7104 671088640 4160 503316480 1296 1140850688 1408

Time [s] Gain 2.00 2.00 4.16 1.97 3.74 6.72 9.47 1.90 3.69 6.23 10.32 512.00 963.76 1638.40 3120.76 4519.72 7084.97 8.00 16.52 29.26 58.51 107.79 204.80 356.17 630.15 75573.05 46509.42 77934.70

GetIV 0.01 0.06 0.68 291.98 246.81 3.30 1419.64 1220.88 935.62 644.12 3.23 1267.48 1079.48 825.95 570.25 13.93 1047.89 719.93 465.05 281.91 162.77 92.29 18.80 14233.12 11106.64 8387.87 6280.60 4741.74 3548.18 2541.79 1718.24 797.92 10.3 hrs 771.15 17.1 hrs 9.2 hrs 7856.17

Cover 0.01 0.01 14.71 7.61 3.85 108.98 56.23 30.70 18.12 12.75 112.15 58.59 31.67 19.11 12.49 56.74 3.85 3.75 3.72 3.69 3.65 3.65 249.68 38.04 23.54 16.97 12.74 10.91 9.81 9.25 9.05 627.51 664.02 607.31 612.69 619.00 7.3 hrs

9369.46 10.5 hrs 2943.07 11.2 hrs

1032.12 1059.52 1962.71 2001.03

161319.38 388361.48 810263.27

Table 1: Results for the two-valued based simulation.

testing, i.e. In #In . The first column of the Results section describes at first the applied method. The column ' Left' contains the number of test pairs still to be considered by the set covering algorithm after RD. In Table 1, this column is omitted, since RD has not been applied. The next column is representing the amount of IVs needed to generate all test pairs, #IVs. The #IVs vectors must be stored on the chip or in the ATE. Thus, that value describes the hardware demand of the approach besides the MISR itself. The following column contains the Overall Length (OvL) of all test sequences combined. OvL is directly proportional to the test execution time and can be determined for UIM by #In IV s and for BIM(g) by g IV s. Finally, the gain is computed by the division of OvL of UIM by OvL of BIM(g ). The Time section of the tables contains the execution times for getting the appropriate MISR IVs (column GetIV), the time for RD, the time for building up the data structure appropriate for doing set covering (Adapt column) and the set covering itself (Cover column). Since for the two-valued case simulation and filling the matrix runs interleaved, the GetIV times of Table 1 and of Table 2 cannot be compared directly. In Table 1 the results for the two-valued simulation are presented. The table shows very well the scalability of the approach. In particular for BIM(16384) for circuit s208 one IV less needs to be stored than with UIM, for a test sequence that is 16.5 times

# 2

2

#IVs 6 6 51 51 49 58 59 62 69 98 60 63 65 77 93 16 16 17 20 21 29 37 32 32 31 35 35 38 40 46 52 64 111 66 93 111 40 65 30 81 34 88

#

#

shorter. Another interesting result is obtained for circuit s298 and BIM(256). For the very same area demand as for UIM, the test sequence length of BIM(256) is shorter by a factor of 512. Similar gains are also shown for the symbolic simulation in Table 2. Moreover, it has proven, that there is no solution of BIM(16) for s420. To judge the application of row dominance, we performed again the experiments presented in Table 2 but with row dominance disabled. It turned out, that the number of IVs without RD are about the same as in Table 1, i.e. RD reduces one to three IVs on average, see Table 2. Concerning the execution time of RD, it showed, that computing the cover is accelerated by RD since less rows have to be considered. However, the computing time for RD itself mostly consumes the preserved seconds. This means, applying RD reduces the number of IV and has no negative impact on the overall execution time of the method. Comparing the results in Table 1 with the results in Table 2 shows clearly, that for a small number of inputs, the two-valued based method is faster. However, if the number of inputs increases, the symbolic method outperforms the two-valued based method, see s510. Furthermore, RD can only be applied efficiently in the symbolic case. Moreover, due to the size of the covering matrix, circuits with more than 25 inputs can hardly be processed. The symbolic method has not this restriction. Finally, the method UIM performs well only in the symbolic case.

Circuit Name s27

#In 7

#TP 32

Adjac. Testing 896

s386

13

232

106496

s1488

14

733

229376

s1494

14

725

229376

s298

17

177

2228224

s208

18

209

4718592

s832 s820 s526 s444 s510

23 23 24 24 25

488 475 356 303 369

1:9  108 1:9  108 4:0  108 4:0  108 8:4  108

s420

34

641

5:8  1012

Result Method UIM BIM(64) UIM BIM(4096) BIM(2048) UIM BIM(1024) UIM BIM(1024) UIM BIM(256) BIM(128) BIM(64) BIM(32) BIM(16) BIM(8) UIM BIM(256) UIM UIM UIM UIM UIM BIM(16) UIM

Left 12 15 77 104 133 185 562 198 554 25 154 171 171 171 171 169 77 128 161 158 69 109 85 345 231

#IVs 6 6 48 48 49 56 93 58 100 15 17 18 19 22 27 36 32 50 65 62 40 26 32 84 96

Time [s]

OvL 768 384 393216 196608 100352 917504 95232 950272 102400 1966080 4352 2304 1216 704 432 288 8388608 12800 5:5  108 5:2  108 6:7  108 4:4  108 1:1  109 1344

1:6  1013

Gain 2.0 2.0 3.9 9.6 9.3 451.8 853.3 1616.8 2792.7 4551.1 6826.7 655.4

798915.1

GetIV 0 3.90 0.14 19.1 hrs 10.1 hrs 0.47 17.96 hrs 0.40 16.3 hrs 0.12 15.1 hrs 17401.24 3846.47 584.47 58.95 7.12 0.12 17.7 hrs 0.61 0.65 0.35 0.30 0.66 193.45 1.00

RD 0.01 0.01 0.09 3.81 9.18 0.81 274.34 0.92 223.18 0.01 95.56 165.16 215.18 184.38 90.63 14.84 0.10 102.62 0.86 0.71 0.16 0.14 0.22 997.96 1.53

Adapt 0 0.01 0.04 0.54 1.10 0.78 14.02 0.96 13.13 0 51.60 64.73 72.41 59.20 33.66 7.79 0.15 12.80 11.15 9.76 0.07 0.39 0.39 3984.74 29.89

Cover 0 0 0.10 0.40 0.79 1.53 12.78 1.75 11.82 0.01 12.64 17.45 15.38 20.05 15.74 10.05 0.20 9.88 6.30 5.18 0.12 0.59 0.40 661.88 25.19

Table 2: Results using Row Dominance and symbolic simulation. The overall results for BIM(g ) show the following: given BIM(g ) requiring k0 IVs, and BIM(g= ) requiring k00 IVs, then k00 < k0 , That means, by lowering g the overall length of the sequence is lowered, too. So, a practical approach would be to divide the total number of clock cycles available by the number of test pairs. This is the time, each test pair may run, i.e. g . However, since many test pairs are eliminated by RD the actual test application will run in less time than predicted. So, g may be increased in a second iteration to further reduce the number of IVs. The overall results for UIM show that the number of IVs, #IVs, is usually not much less or even greater than the number of circuit' s inputs, #In, which is the key number for adjacency testIV s test vectors generated ing. Since UIM results in #In by MISR, it is not significantly better or even worse than an adjacency testing scheme generating #In In vectors. Since there is usually an upper bound for OvL, determined by the clock rate and maximal test execution time, some values are inacceptable. For example, when testing a 20 MHz device for at most 0.1s, OvL must be ; ; , which is not fulfilled by most UIM results. Furthermore, often there is a small g , so that BIM(g ) requires about the same amount of IVs as UIM, but with tremendously less test vectors, i.e. with tremendously shorter test execution time.

2

2

2

#

2

#

 2 000 000

5 Conclusions In this paper a scalable BIST architecture for delay faults has been presented. A MISR is used to generate sequences in which a given set of test pairs is embedded. These sequences have the property that at least one instance of all test pairs is contained within the first g vectors of the sequences. In many cases this results in a test execution time that is several magnitudes shorter than adjacency testing and UIM as well. Shorter sequences can be obtained in exchange for additional IVs of the MISR. Thus, the presented method offers a trade-off between area for storing the IVs and test execution time. A large number of experimental results show this scalability. Finally, if the IVs are not stored on the chip, but can be received from Automatic Test Equipment – at reasonable slower speed than delay fault testing would demand – the showed

improved delay fault test execution time is nearly for free.

References [1] P.H. Bardell, W.H. McAnney, and J. Savir. Built-In Test for VLSI: Pseudorandom Techniques. John Wiley & Sons, 1987. [2] F. Brglez, D. Bryan, and K. Kozminski. Combinational profiles of sequential benchmark circuits. In Int' l Symp. Circ. and Systems, pages 1929–1934, 1989. [3] R.E. Bryant. Graph - based algorithms for Boolean function manipulation. IEEE Trans. on Comp., 35(8):677–691, 1986. [4] S. Chakravarty. On the complexity of computing test sets for complex CMOS gates. IEEE Trans. on CAD, 8:973 – 980, 1986. [5] D.K. Das, I. Chaudhuri, and B.B. Battacharya. Design of an optimal test pattern generator for built-in self testing of path delay faults. In VLSI Design, pages 205 – 210, 1998. [6] C. Dufaza and Y. Zorian. On the generation of pseudo-deterministic twopatterns test sequence with LFSRs. In European Design & Test Conf., pages 69 – 76, 1997. [7] K. Fuchs, F. Fink, and M.H. Schulz. DYNAMITE: An efficient automatic test pattern generation system for path delay faults. IEEE Trans. on CAD, 10(10):1323–1335, 1991. [8] P. Girard, C. Landrault, V. Mor´eda, and S. Pravossoudovitch. An optimized BIST test pattern generator for delay testing. In VLSI Test Symp., pages 94 – 100, 1997. [9] M. Henftling and H. Wittmann. Bit parallel test pattern generation for path delay faults. In European Design & Test Conf., pages 521–525, 1995. [10] A. Hett, C. Scholl, and B. Becker. A.MORE - a multi-operand BDD package. To be published, 1999. [11] T. Kam, T. Villa, R. Brayton, and A. Sangiovanni-Vincentelli. A fully implicit algorithm for exact state minimization. In Design Automation Conf., pages 684–690, 1994. [12] A.K. Majhi and V.D. Agrawal. Tutorial: Delay fault models and coverage. In VLSI Design, pages 364 – 369, 1998. [13] G.L. Smith. Model for delay faults based upon paths. In Int' l Test Conf., pages 342–349, 1985. [14] P. Tafertshofer, A. Ganz, and M. Henftling. A SAT-based implication engine for efficient ATPG, equivalence checking, and optimization of netlists. In Int' l Conf. on CAD, pages 648 – 655, 1997. [15] W. Wang and S.K. Gupta. Weighted random robust path delay testing of synthesized multilevel circuits. In VLSI Test Symp., pages 291–297, 1994. [16] B. Wurth and K. Fuchs. A BIST approach to delay fault testing with reduced test length. In European Design & Test Conf., pages 418–423, 1995.