Progressive Bridge Identification - CiteSeerX

8 downloads 0 Views 578KB Size Report
Thomas J. Vogels, Wojciech Maly, and R.D. (Shawn) Blanton. Center for Silicon System Implementation (Test Group),. Carnegie Mellon University, Pittsburgh, ...
Progressive Bridge Identification Thomas J. Vogels, Wojciech Maly, and R.D. (Shawn) Blanton Center for Silicon System Implementation (Test Group), Carnegie Mellon University, Pittsburgh, PA 15213 [email protected] Abstract We present an efficient algorithm for identification of two-line bridges in combinational CMOS logic that narrows down the two-line bridge candidates based on tester responses for voltage tests. Due to the implicit enumeration of bridge sites, no layout extraction or precomputed stuck-at fault dictionaries are required. The bridge identification is easily refined using additional test pattern results when necessary. We present results for benchmark circuits and four common fault models (wiredAND, wired-OR, dominant, and composite), evaluate the diagnosis against other possible fault types, and summarize the quality of our results.

1. Introduction Classical logic diagnosis has been focused on locating the physical defect in a defective circuit. Modern IC technologies, however, force us to go beyond localization by asking the question about the “nature” of the detected defect [1]. For instance, one may ask about the nature of the logic misbehavior, which may be needed to improve the diagnostic resolution and the conjectures about the source of the defect causing the circuit to malfunction [2]. In this paper, we focus on circuit deformations that lead to an unwanted connection between two signal lines. We address the diagnosis, i.e., the detection and the identification of the behavior of two-line bridges in combinational or full-scan sequential CMOS logic and evaluate the resulting diagnosis w.r.t. its accuracy. This evaluation becomes important as the likelihood of multiline bridges increases [1].

1.1. Prior relevant work There is little record about defect diagnosis that targets identification as defined in this paper. We summarize below the relevant research that has mostly targeted localization alone. In [3], the authors present a dictionary of composite stuck-at signatures to diagnose two-line bridge faults in CMOS logic. Since there are N∗(N-1)/2 possible pairs of N signal lines, a dictionary that has to list all possible bridges is impractical to use for large N. A large test set only worsens the problem with dictionary size, since

dictionaries include information for all test vectors. Using layout information helps to keep the dictionary size manageable [3]. Improvements to this method were presented, for example, in [4] and [5]. Layout extraction is used to find all neighboring lines and, obviously, requires a layout of the circuit. Two approaches that use dictionary-based diagnosis if layout information is not available are described in [4]. Poirot ([6], [7]) similarly uses aggregates of stuck-at faults and extends the method to sequential logic. They introduce a multi-phase diagnosis that has an initial phase involving an analysis of the layout or the input cones. Unlike [5], they point out that a two-line bridge usually results in a dominant bridge where the logic value on one line overrides the value on the other. In [8], the authors add primitive-fault types and composite bridge faults. Instead of explicitly listing all possible or “likely” bridge faults, [9], [10], and [11] describe ways to implicitly enumerate bridge faults, thereby avoiding the huge size of fault dictionaries. The fault list is implicitly enumerated using sets of ordered pairs of sets of nodes. A path-tracing algorithm is used in [9] which means that no information about the fault behavior is available in the end. In [10], the diagnosis always includes both nodes. They use a voting model ([13], [14]) for a more accurate fault simulation, which requires a design or cell library analysis and build a modified stuck-at fault dictionary dynamically. The dictionary is built before inactive bridge faults (whose nodes have the same logic value) are dropped. They do not determine bridge fault types from the information gained during simulation.

1.2. Our approach We suggest a data structure that is similar to [10], but is more accessible, allows for an efficient updating, and includes fault type information. The list of all nodes in a circuit is organized into sets. Two sets will build a pair such that any one node from one set may be bridged to another node in the other set. These sets are split and pruned based on tester results. The algorithm presented here particularly focuses on the linkage between site and type as a diagnosis goal. (For the purposes of this paper, we will consider the wiredAND, wired-OR, dominant, and composite bridge fault models.) The diagnosis algorithm is built such that it can

ITC INTERNATIONAL TEST CONFERENCE 0-7803-8106-8/03 $17.00 Copyright 2003 IEEE

Paper 12.1 309

be started without large setup costs from layout extraction or dictionary computation. Additional tester responses can be easily used to increase diagnostic resolution. The diagnosis result is evaluated in the context of structural and test limitations for diagnostic resolution and in the context of diagnosis accuracy. After definitions and conditions concerning bridge faults and their detection in Section 2, we will present the diagnosis algorithm in Section 3. In that section, the pseudo-code defining the algorithm, the operations on the data structure used and examples of the operations are presented. Section 4 provides results obtained by applying our algorithm to benchmark circuits. In Section 5, we draw our conclusions from these results and outline some future work.

2. Preliminaries This section briefly defines and explains terms used in this paper. It states the conditions upon which our diagnosis is built. We also state our assumptions about the information available for diagnosis. In general, we assume the circuit is combinational or full-scan sequential CMOS logic. Also, we assume that if the bridge fault has feedback, then the fault has no state. If the fault introduces sequential behavior, the algorithm cannot be guaranteed to perform correctly.

2.1. Test patterns and responses Every test pattern where the values on the primary outputs do not match the expected results (from good-state simulations) is a failing pattern. The primary outputs with the mismatches are the failing primary outputs. The collection of failing patterns and the list of failing primary outputs for each of them is the tester response. The number of failing test patterns used in the diagnosis of a particular device is Nfp. We assume that we have full test results for failing test patterns. That means for every failing test pattern, we know all the primary outputs having a wrong value. The diagnosis, as presented here, targets static information and not delay faults possibly caused by bridges. Thus the test patterns need not have been applied “at-speed.” The test set used is a “production” ATPG test set and not a diagnostic test set. We will show how the diagnosis can be improved if additionally a test set with random patterns is available.

2.2. Bridge fault models Defects are circuit deformations—undesired and unexpected circuit structures—that can lead to logic misbehavior. For example, a resistance between two lines is a defect model for an extra metal particle physically located between these two lines. A fault model attempts to capture the misbehavior at the logic level and abstracts the Paper 12.1 310

defect’s behavior. If the fault simulation of a failing pattern causes a circuit to fail with the same failing primary outputs as the defective device, the fault is said to explain that failing pattern. We will refer to the two nodes involved in the bridge fault as the two bridge participants. When examining one node, we will refer to the other as its counterpart. A bridge fault is activated if the two bridge participants have different logic values, as dictated by a good-state simulation. In this paper, we will consider bridges between fanout-free or stem lines only and assume that all fanout lines from both bridge participants, x and y, will carry the same logic value—we can then say, that all downstream gates see the value of a function Z(x, y) [15]. We distinguish four types of (two-line) bridge faults. In a wired-AND bridge fault, the node driven to logic value 0 dominates and all fanout lines are 0 if the bridge is activated: Z(x, y) = AND(x, y). In a wired-OR bridge fault, the node driven to logic value 1 dominates and all fanout lines are 1 if the bridge is activated: Z(x, y) = OR(x, y). In a dominant bridge fault, one of the two lines always dominates the other line by imposing its value. So we either have Z(x, y) = x or Z(x, y) = y. The two bridge participants for a dominant bridge fault are usually referred to as aggressor (the dominating node) and victim (the dominated node). Note that in this case, all failing patterns must be explained by the victim line failing stuck-at v with v being the aggressor’s logic value. The classical stuck-at fault model, which assumes that a node is stuck-at 0 (or stuck-at 1) for all patterns, can be seen as a special case of the dominant bridge fault where node x stuck-at-0 is the same as Z(x, GND) ≡ 0 and x stuck-at-1 is the same as Z(x, VDD) ≡ 1. The composite bridge fault is used when Z(x, y) is a function of the test pattern and the fault behavior does not match any of the other three models described above. This case occurs for example, when the short behaves like a wired-AND bridge fault for some test patterns and like a wired-OR bridge fault for the remaining test patterns.

2.3. Detection conditions We assume the behavior caused by a defect connecting two lines has the following effects on the lines involved: • Single-line fault. For a given test pattern, the fault effect will propagate only on one line, that is, the logic value on only one line will change. • Controlling values. If a gate has a controlling value on an input ([15]), the error due to the fault is not propagated even if the value is in the “illegal” band for CMOS voltages. Any pair of nodes (x, y) must fulfill the following three conditions to be considered a potential bridge site: 1. Activation condition. The fault is activated iff the two nodes of the bridge are driven to opposite values.

If we suspect node x to be a bridge participant and it is driven to v, then a node y must be driven to v to be considered a counterpart. 2. Back-cone condition. At least one node of a potential bridge site must be in every back-cone of the failing primary outputs. This intersection of all backcones is performed separately for every failing pattern and each time one node must (and both nodes may) lie in it. (The back-cone from a primary output includes all nodes encountered during a reverse topological traversal of the netlist towards the primary inputs.) 3. Observation condition. For every failing pattern, one node stuck-at 0 or 1 must explain that test pattern. Note that sometimes a fault on either node may cause the same failing primary outputs and thus sometimes both nodes may explain the pattern. The conditions for the different bridge types can be derived from the definitions of the fault models and are summarized in Table 2.

2.4. Circuit node data structures The set of all circuit nodes S has N nodes: |S| = N. We consider primary inputs and gate outputs as nodes, and include the power lines, GND and VDD, in S. The number of nodes considered to be possible bridge participants at diagnosis step i is N(i ) (starting at N(0)=N). The node data structure introduced below is built iteratively by performing logic simulations of subsequent failing test patterns. After logic simulation of the first failing test pattern, the set of all nodes can be divided into the zero-subset S0 and the one-subset S1. All nodes with logic value 0 (1) become elements of S0 (S1). Each set can be further divided into their respective subsets, S00, S01 and S10, S11, by using the next failing test pattern. If each set is again divided, one arrives at eight sets and so on. The data structure introduced in this work is a list of node sets: L(i ) = [ S1(i ) , S 2(i ) , L , S 2(ij)−1 , S 2(ij) , L]

In particular, we arrive at L(1) = [ S0, S1 ] and L(2) = [ S00, S11, S10, S01 ] based on two failing-test pattern simulations. Note that the ordering of the node sets is such that it is guaranteed by construction that for every j, with 1 ≤ j ≤ L(i ) / 2 , all nodes in S2j-1 have the opposite value

of the nodes in S2j for every failing test pattern. The algorithm is defined such that no empty sets are kept. The size of all sets at step i of the algorithm must be equal to N

(i )

= ∑k S k .

The number of possible two-line bridge sites described in L(i) can be easily computed based on set sizes

as sites (i ) =



S 2 j −1 1≤ j ≤ L( i ) / 2

∗ S 2 j . Consider any two sets

of nodes S0(i) = S00(i+1) ∪ S01(i+1) and S1(i) = S10(i+1) ∪ S11(i+1) in L(i). Then their contribution to the number of sites is: sites(i) = … + |S00(i+1) ∪ S01(i+1)| ∗ | S10(i+1) ∪ S11(i+1)| + … =… + |S00(i+1)| ∗ |S11(i+1)| + |S01(i+1)| ∗ | S10(i+1)| + |S00(i+1)| ∗ |S10(i+1)| + |S01(i+1)| ∗ | S11(i+1)| + … . Because of the third and fourth term, which are non-negative and not part of sites(i+1) due to the activation condition, we arrive at: sites(i) ≥ sites(i+1). Every time that it is possible to split a set in L(i), the number of possible two-line bridge sites is reduced.

2.5. Diagnosis callout A diagnosis callout is the list of suspect faults obtained from diagnosis performed on the tester response. In our case, the callout is a list of suspect two-line bridge sites and types. A diagnosis is said to be exact if the only entry in the callout is the actual fault, i.e., the site and fault type (of all types considered in the diagnosis) that best represents the defect behavior at the logic level. The diagnosis is partial if there is more than one entry but the actual fault is still among the entries in the callout. The diagnosis is empty or, worse, misleading, if the list is empty or the list of faults does not contain the actual fault. Note that unlike previous work, a diagnosis is here still considered to be partial with respect to two-line bridge identification, if the two bridge participants are the only nodes in the callout but more than one fault type can account for the tester response. Also, the diagnosis is limited by the circuit structure and the test set. The “best possible” diagnosis thus provides a minimal callout under the given test set and circuit structure.

3. Diagnosis algorithm Figure 1 summarizes pictorially our multi-phase approach to finding the bridge site and type while Figure 2 provides the pseudo-code for the top-level diagnosis routine. In each phase, the number of potential two-line bridges is reduced based on the conditions posited in Section 2.3. The later subsections provide details for subroutines in this procedure. It must be noted, that before the procedure diagnose can be called, the initial list of sets of nodes must be constructed. If more tester responses are available, the resulting list of the initial diagnosis serves as the starting point for subsequent invocations. The procedure and its subroutines are illustrated by diagnosing an example tester response.

Paper 12.1 311

Read circuit netlist and tester response, initialize data structures

Back-coning of failing primary outputs ⇒ possibly observed bridges Fault-simulation of nodes ⇒ signature-matching 2-line bridges Type attribution ⇒ identified bridges

Repeat if more tester responses are available.

Good-state simulation of failing test patterns ⇒ activated bridges

a 101

Validation with passing patterns ⇒ callout

Figure 1. The diagnosis reduces the number of candidates in multiple steps.

3.1. Diagnosis of C17 We will use an example throughout Section 3 to illustrate the data structures and operations used. We use the venerable circuit C17 from the ISCAS ’85 benchmark suite [16]. We assume that the device failed three test patterns. Both the netlist as well as the good-state 1: procedure diagnose: 2: /* (1) Use activation condition */ 3: for every failing pattern p: 4: simulate_good_circuit(p) 5: split_sets() 6: /* (2) Use back-cone condition */ 7: for every failing pattern: 8: B = {node x | x ∈ L(i)} 9: for all failing primary outputs po: 10: B = B ∩ back_cone(po) 11: sweep_with(B) 12: /* (3) Use observation condition */ 13: for every failing pattern: 14: M={} 15: for every node n in L(i): /* and in back-cone */ 16: simulate_sa_vbar(n) 17: if failing_POs(sim) match failing_POs(DUT): 18: M=M∪{n} 19: /* Update counters, nr_sa_0 and nr_sa_1 */ 20: sweep_with(M) 21: /* (4) Determine bridge type */ 22: enumerate() 23: /* (5) Validate */

Figure 2. The routine diagnose operates on L, reducing the number of candidates with every step.

Paper 12.1 312

f 011

j 100

b 100 h 111

c 110 d 101

k 001

g 011

e 001

i 110

Figure 3. This netlist of C17 is annotated with good-state simulation values for three failing test patterns. simulation of the circuit are shown in Figure 3. The underlined values indicate that the primary output failed the corresponding test pattern with the opposite value. Primary output k fails the first test pattern, and j fails the second and third. The task is to identify the possible twoline bridge (by site and type) that may have caused this tester response. We first examine the activation condition. For any bridge participant candidate, there must exist a counterpart that has the opposite value for each failing pattern. In this simple example, it is possible to find nodes that always have opposing values just by inspecting the annotated netlist. Table 1 holds this information and lists the sets of nodes with the same good-state value sequence and for each of them the set of possible counterparts. Based on this table, one can find that node e could only be bridged to either node c or i. We can drop nodes a and d from further consideration of two-line bridges because neither has a counterpart. While the entire circuit has 13 lines (5 primary inputs, 6 gate outputs, GND, and VDD) and thus has 13∗12/2=78 potential two-line bridge sites, only 9 of these remain after the activation condition has been checked. (The bridge between GND and VDD obviously need not be considered.) Table 1. Candidate sets from the example in Figure 3. Circuit nodes Counterpart with same good- nodes state value sequence S000 = {GND} {h,VDD} = S111 S001 = {e, k} {c, i} = S110 S011 = {f, g} {b, j} = S100 S010 = {} {a, d} = S101

Number of possible two-line bridges 1 4 4 0

once and the time of this operation is linear in N(i) and Nfp.

{GND, a, b, c, d, e, f, g, h, i, j, k, VDD} 0

3.3. Pruning based on back-cone condition

1

L(1) = [{GND, e, f, g, k}, {a, b, c, d, h, i, j, VDD}] 0

1

L(2) = [{GND, e, k}, {c, h, i, VDD} , 0

1

0

1

L(3) = [{GND}, {h, VDD}, {c, i}, {e, k}, 110

0

1

{a, b, d, j}, {f, g}] 0

1

0

1

{b, j}, {f, g}, {}, {a, d}]

001

pruned

Figure 4. Splitting sets produces smaller sets that encode smaller number of possible pairs.

3.2. Pruning based on activation condition This section shows how we can obtain the information from Table 1 algorithmically. The goal is to keep sets of nodes together that contain nodes that have opposite values for every failing test pattern so that sets of nodes and their sets of counterpart nodes are easily accessible. We could use a binary tree where each leaf would either hold the zero or the one subset of the nodes in its parent. But it is not necessary to store the whole tree since the last level has all the relevant information. Figure 4 depicts how the splitting of the node sets using the activation condition works. In addition, note that the actual value sequence need not be stored since we only need to know the counterparts to each node. We can define splitting based on the activation condition using the pseudo-code in Figure 5. Note that this procedure prunes the number of bridge sites as well as the number of nodes. It uses the value for each node from the good-state simulation. The result of this code is a list of node sets. Each set and thus each node is visited only 1: procedure split_sets: 2: L(i+1) = {} 3: for each index j from 1 to |L(i)|/2: 4: /* Retrieve the zero set and one set */ 5: S0 = S(i)2j-1, S1 = S(i)2j 6: /* Spilt each set by value */ 7: S00 = { x ∈ S0 | x = 0 }, S01 = { x ∈ S0 | x = 1 } 8: S10 = { x ∈ S1 | x = 0 }, S11 = { x ∈ S1 | x = 1 } 9: /* Prune empty sets */ 10: if S00 ≠ {} and S11 ≠ {} then: 11: append (L(i+1) , S00 , S11 ) 12: if S10 ≠ {} and S01 ≠ {} then: 13: append (L(i+1) , S10 , S01 )

Figure 5. Pseudo-code for the split operation.

For every failing pattern, there must be at least one node of the two bridge participants that is in the back-cone of all failing primary outputs. If for two sets of nodes, no node of one set is in the back-cone of every failing primary output, then all nodes in the set of counterparts can be removed that are also not in all back-cones. For example, if only node c of the two sets {c, i } and {e, k } is in the back-cone of every failing primary output, then node i can be removed. For one test pattern, let us define a set B which contains all nodes of L(i) that are in the back-cone of every failing primary output and the sets T0 = S0 ∩ B, T1 = S1 ∩ B. It is trivial to see, that if T0 and T1 are empty, i.e., no node from either set is in all back-cones, both sets can be removed from L(i) since no two-line bridge between nodes in S0 and S1 can have failed the same primary outputs that the DUT failed. The back-cone condition states: (x, y) ∈ (S0×S1) is a potential two-line bridge site ⇔ x∈T0 ∨ y∈T1. Thus, for a potential site: x∉T0 → y∈T1 and if T0 is empty we can drop all nodes from S1 that are not in T1, that is, S1 is replaced with T1. A similar argument holds if T1 is empty, in which case we can replace S0 with T0. Figure 6 demonstrates how the data structure is updated for the example in Figure 3. For each failing pattern, only the underlined nodes are in (all) back-cones of the failing primary outputs. Note that test pattern 3 is not processed since its set of failing primary outputs has been already checked with test pattern 2. The pseudo-code in Figure 7 provides the remaining details for this operation. A reverse traversal of the circuit topology needs to be performed for every failing primary output and test pattern. We expect the number of failing primary outputs to be much smaller than the circuit size. Since every node is checked only once during this procedure, the back-cone operation is linear in the number of nodes N(i) and the number of failing test patterns Nfp.

L(3) = [{GND}, {h, VDD}, {c, i }, {e, k }, {b, j }, {f, g }] Sweep nodes not in back-cone of k: B∩L(i) = {b, c, e, g, h, i, k }

L(4) = [{GND}, {h },

{c, i }, {e,k }, {b,j }, {f,g }]

Sweep nodes not in back-cone of j: B∩L(i) = { b, c, f, g, h, j }

L(5) = [{GND}, {h },

{c }, {e, k }, {b, j }, {f, g }]

Figure 6. Nodes VDD and i can be removed since neither they nor their counterparts are in the back-cone of the failing primary outputs. Paper 12.1 313

1: procedure sweep_with (P): 2: L(i+1) = {} 3: for each index j from 1 to |L(i)|/2: 4: /* Retrieve set and counter set */ 5: S0 = S(i)2j-1, S1 = S(i)2j 6: /* Filter each set using argument set */ 7: T0 = S0 ∩ P, T1 = S1 ∩ P 8: /* Reduce sets */ 9: if T0 ≠ {} and T1 = {} then: 10: append (L(i+1) , T0 , S1 ) 11: else if T0 = {} and T1 ≠ {} then: 12: append (L(i+1) , S0 , T1 ) 13: /* Keep sets complete */ 14: else if T0 ≠ {} and T1 ≠ {} then: 15: append (L(i+1) , S0 , S1 ) 16: /* Else: prune sets */

explain the first failing pattern and, since neither nodes b or j explain this pattern, is dropped from L(3). None of the nodes h, c, e, and k explains the second test pattern and all nodes but g and j are dropped. Figure 9 shows how the data structure is updated by the observation condition. The table in that figure also shows the final counter values for nodes g and j, which will be used to determine a possible fault type in the next section. The observation condition requires a fault-simulation for all nodes at this point. But since one of our assumptions is that the fault will only propagate on one line of the bridge, the bridge pairs need not be enumerated. This step is thus also linear in N(i) and linear in Nfp.

Figure 7. The sweep operation removes nodes from sets that have no possible counterpart.

Based on the stuck-at counters from above, we determine the fault type as shown in Table 2. (Recall that Nfp is the number of failing test patterns.) In our example, only the third row matches and we conclude that a ‘wired-OR’ bridge between nodes g and j causes the same tester response as was observed. At this point, the list sets must be enumerated before moving to the next phase. Figure 10 defines how this is accomplished. Note that the size of the callout can be calculated before enumeration as shown in Section 2.4. Also, a composite bridge will only be part of the callout if none of the other fault types satisfies the conditions discussed in Table 2.

3.4. Pruning based on observation condition The number of possible bridge sites (and nodes) can be further pruned using the observation condition. This step re-uses the sweep operation where the set P now contains all nodes in L(i) that explain the failing pattern. For every node a stuck-at- v fault simulation is performed (where v is the good state value). If the node stuck-at- v explains the failing pattern, then we increase one of two counters kept for each node. Specifically, if the good-state value of the node is 0, then the fault simulated was sa-1 and its counter nr_sa_1 is incremented. Otherwise, the good-state value was 1 and we increment nr_sa_0. Figure 8 shows the results of a stuck-at- v fault simulation for the second pattern. Node f sa- v cannot

3.5. Determining bridge types

3.6. Validation using passing patterns The last step is to validate the faults (two-line bridge sites and type) against the passing patterns. We use FATSIM ([17]) for this step and simulate the passing patterns against all two-line bridge faults in the callout. We remove those faults that in simulation fail test patterns passed by the DUT.

g sa-0 | b sa-1 | j sa-1

a 0|0|0

f 1|1|1

j 0|1|1

b 0|1|0

e 0|0|0

h 1|0|1 g 0|1|1

k 0|1|0

i 1|1|1

Figure 8. Stuck-at-vbar simulation of test pattern 2 on nodes g, b, and j. Paper 12.1 314

{c },

{e, k }, {b, j }, {f, g }]

Sweep nodes not explaining failing primary outputs:

c 1|1|1 d 0|0|0

L(5) = [{GND}, {h }, L(6) = [{GND}, {h },

{c },

{e, k }, {b, j }, {g }]

Sweep nodes not explaining failing primary outputs:

L(7) = L(8) = nr_sa_0 = nr_sa_1 =

[ { j }, { g } ] 0 2

0 1

Figure 9. The final result of the diagnosis leads to a wired-OR bridge fault between nodes g and j.

Table 2. Determination of Bridge Types. Bridge between net A and net B of type: wired-AND

Condition: nr_sa_0(A) + nr_sa_0(B) = Nfp

wired-OR

nr_sa_1(A) + nr_sa_1(B) = Nfp

A-dominant

nr_sa_0(B) + nr_sa_1(B) = Nfp

B-dominant

nr_sa_0(A) + nr_sa_1(A) = Nfp

composite

nr_sa_0(A) + nr_sa_1(A) + nr_sa_0(B) + nr_sa_1(B) ≥ Nfp

3.7. Complexity analysis The sets in L(i) and the nodes in them can be stored as a linked list. Each set contains one pointer to its node list. Therefore, 2∗S(i)+N(i) pointers are required to store L(i). In the worst case, the memory requirement is maximal if no node is dropped and each set contains only one node. Then 2∗S(i)+N(i) = 2∗N+N pointers will be needed to store L(i). In the best case, the minimum storage requirement is six pointers. This case occurs when there are just two sets with one node each. In either case, two integer variables are required for each node to enable type mapping. The number of sets can vary between the number of nodes in the list and 2Nfp. (The later is true because every failing pattern can potentially lead to splits in every set and thus double the number of sets.) The number of nodes can vary between N and 2. Since no nodes are ever added to L, but are only dropped either directly or when the set of counterparts becomes empty, the number of nodes in L can only decrease. At the beginning, all nodes must be considered: N(0) = N. But during diagnosis, this number may be reduced: N(i) ≥ N(i+1). As has been pointed out for each diagnosis step, all 1: procedure enumerate: 2: callout = {} 3: for each index j from 1 to |L(i)|/2): 4: /* Retrieve set and counter set */ 5: S0 = S(i)2j-1, S1 = S(i)2j 6: /* Pair every node in S0 with one in S1 */ 7: for each x in S0: 8: for each y in S1: 9: /* See table: Examine nr_sa_1 and 10: nr_sa_0 to determine bridge type */ 11: if fault model matches: 12: append (callout, (x, y, fault_model))

Figure 10. The enumeration must be constrained by the existence of a matching fault model.

operations to prune the number of bridge sites are linear in N(i) and Nfp. The important point is that the runtime of this algorithm does not increase quadratically with circuit size N while the number of bridge sites that are implicitly enumerated does.

3.8. Evaluation of diagnosis Based on our objective of identifying a two-line bridge we boldly claimed in the example that a wired-OR bridge fault between nodes g and j matches the faulty behavior observed on the DUT in every failing pattern. But if we look beyond this objective, we must acknowledge that this bridge is not the only possible result. At least three more scenarios must be considered. It is important to visit again Figure 4 where we observe that node b had the same logic values as node j, and node f was in lockstep with node g. This has two consequences: (1) Based on failing patterns alone, we cannot rule out that a second bridge fault is present in the circuit, e.g., between nodes b and j; (2) we cannot rule out that there is actually a three-line bridge, e.g., among nodes b, j, and g or among nodes j, f, and g. For the third scenario, consider nodes a and d that were dropped early on, but could also have been shorted since they had the same logic values for all failing test patterns. Note that if we had run an exhaustive test set for this example, nodes f and g would still have been at same logic value for all failing test patterns and the first two scenarios above would nevertheless apply. Even if we stay with the assumption of a single defect in the circuit, we are still left with the possibility of missing the existence of a three-line bridge. A validation of the faults in the callout against passing patterns can help to resolve some ambiguities. Furthermore, the possibility of additional interpretations of the diagnosis result is a function of the failing test patterns and the test set, which those patterns are from. If we had used more test patterns, then the set sizes may have been much reduced and if, e.g., nodes b and j had been driven to opposing values, we could have ruled out some of these scenarios. To make our algorithm more robust and to ensure that a three-line (or n-line) bridge is not diagnosed as a two-line bridge as well as the diagnosis of three-line bridges directly is part of our ongoing research effort. The procedure split_sets can provide some information about the number of undetectable bridges under the given test set. If we run this procedure on all test patterns but prune only pairs of sets where both are empty (|S2j| = |S2j-1| = 0), we again arrive at a list of node sets. In any set, all nodes have the same logic value for every test pattern. Any bridges between these nodes could not be detected with this test set. A limit for diagnostic resolution can then be defined as the ratio of the number of sets and the number of nodes. Paper 12.1 315

4. Results

4.2. Example – C7552

This section presents results using benchmark circuits based on an implementation of our algorithm.

We have argued that by simply finding the nodes that have opposite values in the circuit during good-state simulation of the failing test patterns, the number of bridges that have to be considered during diagnosis can be greatly reduced. As evidence of this claim, consider results based on circuit C7552. The number of nodes (gate outputs and primary inputs) is 2296. We fault-simulated 156 ATPG test patterns on the circuit to find the failing patterns. The number of possible bridges that must theoretically be considered is the number of possible pairs between 2296 nodes and is over 2.6 Million. Figure 11 shows how this number drops precipitously for most cases as the diagnosis proceeds to prune bridges that were not activated by a failing pattern. The gray shaded area represents the cover of 90% of the 100 runs. In other words, in 9 out of 10 runs, the number of possible bridge sites lies inside the gray area. The three solid lines represent the minimum, median and maximum, respectively, for the distribution of two-line bridge sites. For at least half the runs in this example, the number of possible bridge sites is 1504 when using failing ATPG test patterns alone. This is accomplished without first extracting “likely” bridges from layout. For a few cases, we are not able to reduce the number of bridge sites by much. We did not use a special diagnosis test set, and these cases include hard-to-detect faults that fail only a few or just one of the ATPG test patterns.

To understand the efficiency, complexity and accuracy of the diagnosis algorithm, we run the following experiment on the ISCAS ’85 benchmark circuits [16]. For each circuit, we created 100 ‘runs’ or test cases where we selected randomly one of the fault types in Table 2 and two of the nodes in the netlist. (We used 100 random bridge faults that did not cause oscillation for any test pattern and resulted in at least one failing test pattern.) We used FATSIM ([17], [18]) to create a response file that was then analyzed. We mapped the original netlists to a library consisting of inverters, 2- and 3-input NAND and NOR gates, and 2input XOR and XNOR gates. This ensures more realistic results since, e.g., 8-input AND gates contained in the original netlists cannot be found in modern libraries as simple gates. Also, since we use only inverting gates, we expose more internal nodes such as the node in an AND gate between the NAND gate and the inverter. Table 3 summarizes for the ten benchmark circuits the resulting circuit size, the number of ATPG patterns in the test set (created using ATALANTA, [19]), the number of random test patterns, the single-stuck-line coverage of that test set, and the limit for the diagnosis resolution discussed in Section 3.8. For the detection of a bridge fault it is necessary that the two bridge participants are driven to different values and this limit is a measure for this. A value of less than one implies that not all bridges can be detected.

Table 3. Tech-mapped ISCAS '85 benchmark circuits. Name

Number of

PI

C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

Paper 12.1 316

36 41 60 41 33 233 50 178 32 207

Gates

161 292 341 351 400 646 1056 1518 2656 2089

Number of Test Patterns ATPG

45 43 51 65 64 111 132 86 58 156

SSL Fault Cover age

RND

180 172 204 260 256 444 528 344 232 624

Structure and Test Set Limit for Diagnosis ATPG ATPG +RND

100% 100% 100% 99.8% 99.8% 99.0% 98.0% 100% 99.9% 99.0%

1 0.994 1 0.992 0.984 0.994 0.973 0.994 0.964 0.957

1 0.994 1 0.992 0.993 0.994 0.976 0.996 0.966 0.957

4.3. Results for benchmark suite This section provides results for all ten ISCAS benchmark circuits. Table 4 shows the number of candidate sites for two-line bridges after using (1) the activation condition, (2) the back-cone condition, (3) the observation condition, and (4) the condition, that at least one fault type from Table 2 must match. For each distribution of two-line bridge sites in the 100 runs, we provide the minimum, the 3.0E+06

Possible Bridge Sites

4.1. Setup

Max

2.5E+06 2.0E+06

90th Percentile

1.5E+06 1.0E+06

Median

5.0E+05

Min

0.0E+00

Start 20

40 60 80 100 120 140 Test Pattern Index

Figure 11. Number of bridge candidates based on activation condition alone (C7552, 100 runs).

Table 4. Number of possible two-line bridge sites using only ATPG patterns.

C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

65 74 78 136 233 185 350 333 803 541

1126 141 409 307 891 482 11k 3276 2378 1504

4956 633 8639 1013 8140 29k 76k 43k 29k 40k

24 49 1 70 68 1 3 1 4 1

1079 115 160 97 824 166 11k 990 1803 810

4956 622 8639 948 8140 29k 76k 27k 26k 35k

9900 4851 41K 39k 47k 194k 307k 717k 125k 680k

1 1 1 1 1 1 1 1 1 1

167 6 7 4 12 11 102 7 3 7

1 1 1 1 1 1 1 1 1 1

Max.

90th Perc.

Min.

9900 3159 41K 39k 47k 194k 307k 717k 78k 680k

2979 127 4731 178 4865 16k 67k 8737 369 14k

Median

Possible Sites Max.

90th Perc.

Min.

Median

Observation Condition Max.

90th Perc.

Min.

9900 4851 41k 38k 47k 194k 307k 717k 125k 680k

Median

Back-cone condition Max.

90th Perc.

Min.

Median

Name Activation Condition

74 5 6 4 9 7 62 6 3 6

9900 3143 41K 39k 47k 194k 307k 717k 76k 240k

2358 127 4323 111 1929 9258 51k 1722 245 2005

Table 5. Number of possible bridge sites using ATPG and random test patterns.

9 3 2 2 6 5 9 2 2 4

240 14 156 16 1245 1332 15k 29 9 133

7954 400 41k 1346 47k 56k 157k 18k 1748 240k

1 1 1 1 1 1 1 1 1 1

Max.

th

90 Perc.

Median

Min.

Max.

90th Perc.

Median

9 3 2 2 6 5 8 2 2 4

240 14 143 16 1245 1332 15k 25 9 133

7954 400 41k 1346 47k 56k 157k 18k 1481 240k

Table 6. Size of the callout (by size and type) before and after using passing patterns. Name Only using failing patterns

C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

1 1 1 1 1 1 1 1 1 1

8 5 3 3 9 3 9 2 2 3

Using also passing patterns

157 3446 30 278 149 9302 40 685 559 20k 532 10k 834 13k 45 1222 15 148 84 15k

1 1 1 1 1 1 1 1 1 1

2 2 1 2 1 1 2 1 1 1

12 6 10 5 21 34 12 10 5 8

Max.

Max.

Min. 1 1 1 1 1 1 1 1 1 1

90th Perc.

median, the 90th percentile, and the maximum. Table 5 has the same structure but is based on results of first using tester responses from ATPG patterns and then additional tester responses from the random pattern set. Table 6 shows the callout sizes without validation using passing patterns and the callout sizes after using passing patterns. (There is currently no validation of composite bridge faults.) Note that the callout includes bridge site and type. Only if the callout size is one, then the diagnosis is exact; otherwise, the callout is partial. Due to the setup of the experiment, which used logic simulation to provide tester responses, and the implicit enumeration, which does not force ranking or dropping of candidates, we had no misleading or empty diagnosis. Figure 12 shows the CPU time vs. circuit size. We measured the CPU time for a diagnosis based on failing patterns only. The CPU times shown here are indicative of how the algorithm scales with circuit size. They show that the problem scales linearly with the circuit size. (Of

8038 476 41k 1655 47k 56k 158k 53k 1922 240k

Median

313 56 271 25 1579 1876 16k 57 16 180

Min.

18 4 3 4 6 6 14 2 2 4

90th Perc.

Median 1 1 1 1 1 1 1 1 1 1

Max.

8038 478 41k 1721 47k 56k 158k 79k 1934 240k

Min.

Max.

90 Perc. 317 62 404 57 1580 1917 15k 82 18 218

Possible Sites

90th Perc.

21 4 4 4 7 6 15 4 2 5

Observation Condition

Median

1 1 1 1 1 1 1 1 1 1

Back-cone condition

Min.

C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552

th

Min.

Median

Name Activation Condition

106 62 128 28 779 244 446 22 13 260

Paper 12.1 317

Median

90th Percentile

Linear (Median)

Linear (90th Percentile)

CPU Time [s]

60 40 20 0 -20 0

1000 2000 Circuit Size [Nodes]

3000

Figure 12. CPU times increase (linearly) with circuit size. course, the number of failing patterns increases runtimes as well.) We believe that the CPU times themselves can be vastly improved by moving the prototype software from the (object-oriented) scripting language Python to either C++ or Java.

5. Conclusions In this paper we presented an efficient algorithm to identify two-line bridge participants and to distinguish different bridge fault types. Such information could be useful in determining the root cause of the circuit malfunction detected on the tester. An important attribute of the proposed approach is that it does not require a “neighbor” list from layout. This means that the diagnosis can be performed without a costly layout extraction. Furthermore, no time- or storageintensive dictionary computation is required For most demonstrated cases, the problem of bridge identification was quite tractable when just the netlist and tester responses were available. We have provided the details for a diagnosis algorithm that aims at (1) identifying both nodes involved in the two-line bridge and (2) identifying the type of bridge fault that captures the logic level misbehavior caused by a defect. It also provides insights into the accuracy of the callout. We are exploring the capabilities of the core algorithm and data structure to include more accurate fault simulation, fault types, and validation strategies.

Acknowledgements The authors would like to gratefully acknowledge Amit Goel and the members of the CMU test group, Rao Desineni, Sunil Motaparti, Sounil Biswas, Jason Brown, and especially Kumar Dwarakanath and Thomas Zanon for their valuable contributions and exciting discussions. This research was funded by the GSRC.

Paper 12.1 318

References [1] W. Maly et al., “Deformations of IC Structure in Test and Yield Learning,” Proc. Of IEEE International Test Conf., 2003. [2] R. Desineni et al., “A Multi-Stage Approach to Fault Identification Using Fault Tuples,” ISTFA, 2003. [3] S.D. Millman, E.J. McCluskey, J.M. Acken, “Diagnosing CMOS Bridging Faults with Stuck-at Fault Dictionaries,” in Proc. of IEEE International Test Conference, 1990, pp. 860-870. [4] D.B. Lavo et al., “Bridging Fault Diagnosis in the Absence of Physical Information,” in Proc. of IEEE International Test Conference, 1997, pp. 887-893. [5] B. Chess, D.B. Lavo, F.J. Ferguson, “Diagnosis of Realistic Bridging Faults with Single Stuck-at Information,” in Proc. of IEEE/ACM International Conf. on CAD, 1995, pp. 185-192. [6] S. Venkataraman and S.B. Drummonds, “POIROT: A Logic Fault Diagnosis Tool and Its Applications,” in Proc. of IEEE International Test Conference, 2000, pp. 253-262. [7] S. Venkataraman and S.B. Drummonds, “POIROT: Applications of a Logic Fault Diagnosis Tool,” in IEEE Design & Test of Computers, Jan-Feb 2001, pp. 19-30. [8] S.B. Drummonds et al., “Bridging the Gap Between Logical Diagnosis and Physical Analysis,” in IEEE International Workshop on Defect Based Testing (DBT'2002), Apr. 2002. [9] S. Venkataraman and W.K. Fuchs, “A Deductive Technique for Diagnosis of Bridging Faults,” in Proc. of IEEE/ACM International Conf. Computer-Aided Design, 1997, pp. 562-567. [10] Y. Gong and S. Chakravarty, “Locating Bridging Faults Using Dynamically Computed Stuck-At Fault Dictionaries,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 17, No. 9, Sept. 1998, pp. 876-887. [11] S. Chakravarty and Y. Gong, “An Algorithm for Diagnosing Two-Line Bridging Faults in CMOS Combinational Circuits,” in Proc. of the Design Automation Conf., June 1993, pp. 520-524. [12] S. Chakravarty and M. Liu, “IDDQ Measurement based Diagnosis of Bridging Faults in Combinational Circuits”, Journal of Electronic Testing: Theory and Applications, Vol. 3, Dec. 1992, pp. 377-385. [13] S.D. Millman and J.M. Acken, “Special Applications of the Voting Model for Bridging Faults,” in IEEE Journal of SolidState Circuits, Vol. 29, No. 3, March 1994, pp. 263-270 [14] P. Maxwell and R. Aitken, “Biased Voting: A Method for Simulating CMOS Bridging Faults in the Presence of Variable Gate Logic Thresholds,” in Proc. of International Test Conference, 1993, pp. 63-72 [15] M. Abromovici, M.A. Breuer, M.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press, Piscataway, NJ, 1990. [16] F. Brglez and H. Fujiwara, “A Neutral Netlist of 10 Combinational Benchmark Designs and a Special Translator in Fortran,” in International Symposium on Circuits and Systems, June 1985, pp. 695-698. [17] K.N. Dwarakanath and R.D. Blanton, “Universal Fault Simulation Using Fault Tuples,” in Proc. of the 37th ACM/IEEE Conf. On Design Automation, June 2000, pp. 786-789. [18] K.N. Dwarakanath, Fault Tuples: Theory and Applications, Ph.D. Thesis, Carnegie Mellon University, 2003. [19] H.K. Lee and D.S. Ha, On the Generation of Test Patterns for Combinational Circuits, Technical Report No. 12_93, Dept. of Electrical Eng., Virginia Polytechnic Institute and State University.