Counter-Based Output Selection for Test Response ... - IEEE Xplore

0 downloads 0 Views 10MB Size Report
Abstract—Output selection is a recently proposed test response compaction method, where only a subset of output response bits is selected for observation.
152

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

Counter-Based Output Selection for Test Response Compaction Wei-Cheng Lien, Student Member, IEEE, Kuen-Jong Lee, Member, IEEE, Tong-Yu Hsieh, Member, IEEE, Krishnendu Chakrabarty, Fellow, IEEE, and Yu-Hua Wu

Abstract—Output selection is a recently proposed test response compaction method, where only a subset of output response bits is selected for observation. It can achieve zero aliasing, full X-tolerance, and high diagnosability. One critical issue for output selection is how to implement the selection hardware. In this paper, we present a counter-based output selection scheme that employs only a counter and a multiplexer, hence involving very small area overhead and simple test control. The proposed scheme is ATPG-independent and thus can easily be incorporated into a typical design flow. Two efficient output selection algorithms are presented to determine the desired output responses, one using a single counter operation for simpler test control and the other using more counter operations for achieving a better test-response reduction ratio. Experimental results show that for stuck-at faults in large ISCAS’89 and ITC’99 benchmark circuits, 48%∼90% reduction ratios on test responses can be achieved with only one counter and one multiplexer employed. Even better results, i.e., 76%∼95% reductions, can be obtained for transition faults. It is also shown that the diagnostic resolution of this method is almost the same as that achieved by observing all output responses. Index Terms—Fault diagnosis, output selection, test compression, test response compaction.

I. Introduction

T

EST RESPONSE compaction techniques are widely used to reduce test data volume for scan-based designs [1]. These techniques can generally be categorized as being either space compaction, time compaction, or a combination of both [1]. It is common practice to use XOR-gate-based networks for space compaction and linear feedback shift register-based multiple input signature registers (MISRs) for time compaction [1]. However, these schemes suffer from the problems of

Manuscript received December 28, 2011; revised March 26, 2012 and June 8, 2012; accepted July 24, 2012. Date of current version December 19, 2012. This work was supported in part by the National Science Council of Taiwan under Contracts NSC100-2221-E-006-058-MY2 and NSC 100-2218E-110-001, and the MediaTek Fellowship. This paper was recommended by Associate Editor L.-C. Wang. W.-C. Lien and K.-J. Lee are with the Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Taiwan (e-mail: [email protected]; [email protected]). T.-Y. Hsieh is with the Department of Electrical Engineering, National Sun Yat-sen University, Kaohsiung 804, Taiwan (e-mail: [email protected]). K. Chakrabarty is with the Department of Electrical and Computer Engineering, Duke University, Durham, NC 27710 USA (e-mail: [email protected]). Y.-H. Wu is with MStar Semiconductor, Inc., Hsinshu 302, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2012.2214479

aliasing, unknown response (X) values, and poor diagnosability [1]. Many innovative techniques, such as zero aliasing [2]–[8], X-blocking [9], [10], X-masking [11]–[15], X-impact [16], and X-compact/X-filter [8], [17], have been proposed to address these problems. These techniques are mainly focused on masking out the effects of unknown values. Thus, when the number of unknowns increases, more control logic or control data may be needed. Also, since these techniques still require XOR operations, most diagnosis information cannot be preserved. Recently, an alternative method called output bit selection, or simply output selection, has been proposed for test response compaction [18]. Rather than trying to mask out undesired response bits, this method attempts to identify a subset of desired output response bits such that by observing only these bits, the same fault coverage as observing all output bits can still be achieved. Since all desired output bits will be observed directly without any XOR operations, no aliasing of errors can occur. Also, since one can always select deterministic output bits and ignore all unknown bits, this method can tolerate any number of unknown values as long as the selected output bits can provide sufficient fault coverage. In [18], efficient output selection algorithms are proposed to determine the desired output bits for a given test set. Based on the assumption that one can select any set of desired bits and discard any undesired bits, the achievable compaction ratio is analyzed. It is shown that for most ISCAS benchmark circuits, less than 10% of the output response bits of an already highly compacted test set can achieve 100% single stuck-at fault coverage. Even better results are obtained for the large ITC’99 circuits, as less than 3% of the output bits are sufficient to achieve complete fault coverage for all these circuits. It is also shown that the increased ratio of selected bits to cover other types of faults is quite small if these faults are taken into account during test generation. Another major advantage is that most diagnosis information can be preserved. In general, the diagnosis resolution loss is less than 0.02%. Despite its numerous benefits, a drawback of [18] is that it does not address hardware implementation issues for the selection logic. The implementation complexity of the output selection logic depends on the test architectures employed. For the random access scan (RAS) test architecture [19]–[22] that allows one to select any desired bits, very little additional hardware is required since the controller and the row or column decoders in an RAS architecture can be reused as the selection logic.

c 2012 IEEE 0278-0070/$31.00 

LIEN et al.: COUNTER-BASED OUTPUT SELECTION FOR TEST RESPONSE COMPACTION

The RAS test architecture itself, however, requires very high area overhead. For the serial scan designs that are commonly used in the industry, some additional selection-specific logic is needed. In [23] and [24], a scan-chain-based output selection method for serial scan designs is proposed. A subset of scan chains is selected for observation using a scalable selector that consists of a multiplexer-based device and a sophisticated control circuit. When a scan chain is selected for observation, all bits in this scan chain will be scanned out. A custom test generation flow is, therefore, used to generate test patterns for which fault effects are propagated to the preselected scan chains that will be observed. This technique can, therefore, be considered a special implementation instance of the output selection method. In this paper, we present a counter-based output selection scheme for serial scan designs. Unlike the chain-based method in [23] and [24], we deal with output selection at the bit level, i.e., we allow the selection of output bits from different scan chains during the scan-out operation of each test vector. Because of this freedom, we are able to develop efficient algorithms to select all required bits to achieve full fault coverage without modifying the ATPG algorithm. On the other hand, unlike the algorithms developed in [18] that are allowed to select any required bits, our algorithms here have to consider the selection limitation based on counter operations. We have developed two new selection algorithms that are suitable for the counter-based selection logic. In summary, compared to [18], [23], and [24], the contributions of the proposed output selection scheme are as follows. 1) Only one counter and one multiplexer are employed for output selection. Thus, the hardware overhead is very small and the control complexity is quite low. 2) The proposed scheme can be applied to any pregenerated test set, and thus there is no need to modify the ATPG tool. 3) Two efficient output selection algorithms are developed to determine the output response bits to be observed and their associated counter operations. These two algorithms are considerably different from those in [18]. Experimental results show that significant output data reduction can be achieved even if only a simple counter is used for the output selection. 4) The diagnosis resolution using the counter-based schemes is even better than that of [18]. 5) In addition to stuck-at faults, in this paper we also consider transition faults. Experimental results show that our method can deal with these time-related faults even more effectively than the stuck-at faults in terms of the reduction ratio of output bits that need to be observed. 6) When both transition and stuck-at faults need to be considered at the same time, we show that the increasing ratio of response bits required to detect those transition faults that cannot be detected by a stuck-at test set is less than the increasing ratio of test patterns required to detect these faults. The first output selection algorithms is a static method by which the output bits to be observed are determined based on only a single counter operation and thus requires very small

153

area overhead and very simple control. The second one is a dynamic method that employs multiple counter operations for output selection, and it can achieve a better output reduction ratio. Both algorithms start with a given set of test patterns that were generated without any output selection consideration. We then try to identify as few output bits as possible that can still achieve the same fault coverage as the original test method. Note that the outputs bits that can be observed include those at the primary outputs as well as those at the pseudoprimary outputs (scan cells). Experimental results for ISCAS’89 and ITC’99 benchmark circuits show that 48%∼81% reduction ratios on test responses of an already very compact test set for all detectable stuck-at faults can be achieved by the static method. The test control for this method is quite simple, as only one counter operation is employed for output selection. On average, the required area overhead is 0.31%∼1.08%. By using the dynamic method the reduction ratios can be further increased to 77%∼90% at the expense of more complicated test control and thus a larger area overhead. Nevertheless, the overhead is still quite small, ranging between 0.39% and 1.29%. We also apply our output bit selection scheme to transition faults and find that even better results can be obtained. On average, over 92% test response reduction can be achieved by using the dynamic method. As for diagnosis effectiveness, it is shown that the difference between the diagnosis resolutions of the dynamic method and that achieved by observing all output bits is only 0.00097% on average. The remainder of this paper is organized as follows. The proposed counter-based output selection logic is described in Section II. Sections III and IV then, respectively, present the static and dynamic output selection algorithms. Experimental results for stuck-at faults are given in Section V. In Section VI, fault diagnosis results using only the selected bits are compared with the case where all bits are observed. Section VII further examines the effectiveness of our methods when they are applied to transition faults. This paper is concluded in Section VIII. II. Counter-Based Output Bit Selection Fig. 1 illustrates a circuit under test (CUT) equipped with the proposed counter-based output selection scheme. The proposed scheme comprises a counter and an n-to-1 multiplexer, where n is the total number of scan chains to be selected for the CUT. The input control sequence is used to initialize the counter and provide the required counter operations. The input control data for the counter can be provided by an external tester or by a simple on-chip device as to be described later. In Fig. 1, the CUT contains three scan chains and a total of 12 scan flip-flops. At each scan cycle, the counter generates a control signal for the multiplexer to select an output response bit to observe. This leads to a compaction of an n-bit-wide output response pattern to 1-b-wide. Under counter-based test control, the key optimization problem in the proposed scheme is how to select as few response bits as possible for observation such that all the testable faults are detected. In this paper, we first present a static output selection algorithm to determine the desired output bits based

154

Fig. 1.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

Example of the proposed counter-based output selection logic.

on a single counter operation, such as +1, 0, −1, +2, −2. We use Fig. 1 for illustration. Assume that the selection process uses the counter operation of +1 and starts by selecting the response stored in the scan flip-flop SFF1 . The response bits will then be selected in the order of SFF1 , SFF5 , SFF9 , and SFF10 during test application time. Similarly, SFF2 , SFF5 , SFF8 , and SFF11 will be successively selected for observation if the counter holds its value (i.e., count by 0) during the scan operation, and the initial selection bit is in SFF2 . Throughout this paper, we call such an ordered set of scan flip-flops an observation path. Formally, an observation path is defined as a set of scan flip-flops whose captured responses for an applied test pattern are observed under the control of a counter sequence during the scan-out process. In particular, we call an observation path a static path if the observation path is formed under a single counter operation. In Fig. 1, three possible static paths exist under the counting-up-by-plus-1 sequence for a pattern applied to the CUT, namely, (SFF1 , SFF5 , SFF9 , SFF10 ), (SFF2 , SFF6 , SFF7 , SFF11 ), and (SFF3 , SFF4 , SFF8 , SFF12 ). If we allow four different counter sequences, say +1, −1, 0, +2, then we can select 12 different static paths from the multiplexer. It is clear that to cover all scan flip-flops for each test pattern, at least n static paths need to be observed, where n is the number of scan chains. With the proposed static output selection algorithm, we will show that the number of static paths to be observed can be significantly reduced. Note also that the chain-based selection method [23], [24] is a special case of our static method corresponding to count by 0, i.e., a 0-value counter sequence. We next present a dynamic output selection algorithm that employs multiple counter operations, such as +1/−1, +2/−2, +1/0/−1. Note that in this algorithm the counter operation employed is allowed to be changed at each scan cycle. Since much higher flexibility exists in this method, much fewer observation paths are needed, as will be shown in Section V. In this paper, we refer to an observation path formed by multiple counter operations as a dynamic path. Let us use Fig. 1 for illustration again. Assume that the bits stored in SFF2 , SFF6 , SFF8 , and SFF12 are to be observed. Then, only one dynamic path is needed by using the dynamic algorithm. We can first set the counter to 1 to select SFF2 , and then apply a sequence of +1, −1, and +1 to select SFF6 , SFF8 , and SFF12 . III. Static Output Selection Algorithm The static output selection algorithm identifies a set of observation paths based on a pregenerated test set and a specific

operation of a counter. This algorithm attempts to select as few static paths as possible from all possible ones such that 100% fault coverage for all detectable faults can be achieved. Before detailing the algorithm, we first define the terms essential fault, essential bit and essential static path. A fault is called an essential fault if it is detected by only one output bit with the pregenerated test set. An output bit that detects at least one essential fault is called an essential bit, and a static path that contains at least one essential bit is called an essential static path. In order to achieve 100% fault coverage, the static algorithm must select all the essential static paths. Thus, the basic strategy of the static output selection algorithm is to first select all essential static paths and drop all faults detected by these paths, and then select more static paths for all the remaining faults. Note that throughout this paper when a fault is said to be detected by a static path, it means that the path contains at least one output bit that detects the fault. To facilitate efficient selection of the nonessential static paths, three attributes will be used to quantify the fitness of these static paths, as described in the following. 1) PDCi : Static path detection count of a fault fi , i.e., the total number of static paths that detect fi . 2) PDCmin,j : The minimum value of PDCi associated with the faults detected by a static path pj . 3) #Fj,min : The total number of faults detected by a static path pj with the static path detection count of PDCmin,j . 4) wi : Weight of a fault fi , which is calculated by wi = 1/PDCi . This definition implies that a higher weight will be assigned to a fault that is harder to detect. 5) W (pj ): Weight of a static path pj , which is calculated by summing wi of each fault fi detected by pj . Note that a static path having a larger weight can detect more hard-to-detect faults. Prior to selection, all nonessential static paths will be first sorted in the increasing order of their values of PDCmin,j . This facilitates selection of the static path that detects the hardestto-detect faults, denoted by Fhd , i.e., the faults having the smallest value of static path detection count. Note that because the essential faults have already been dropped, multiple static paths can detect a fault in Fhd . To reduce the number of observed static paths, we select the static path that detects the largest number of faults in Fhd , i.e., the path having the largest value of #Fj,min . We thus sort the static paths that have the same values of PDCmin,j in the decreasing order of their values of #Fj,min . For static paths having both the same values of PDCmin,j and #Fj,min , they are sorted in the decreasing order of their W (pj ) values and the one with the largest W (pj ) value will be selected. Since a static path having a larger weight is likely to detect harder-to-detect faults, this sorting criterion can lead to the reduction of the total number of selected static paths. Also note that the static paths detecting no new undetected faults will not be taken into account. The static algorithm is described as follows with its pseudocode shown in Algorithm 1. 1) (Lines 1, 2) A fault dictionary is generated based on a list of testable faults F and a pregenerated test set T .

LIEN et al.: COUNTER-BASED OUTPUT SELECTION FOR TEST RESPONSE COMPACTION

155

Algorithm 1 Static output selection

Input: 1. A test set T 2. The set of all testable faults F detected by T 3. A specified counter operation to be employed Output: a set of selected observation static paths, PS 1 Generate a fault dictionary based on F and T, and form the set of all possible static paths, P, based on the specified counter operation. 2 Use the fault dictionary to build a static path detection table based on P 3 Identify all essential static paths in P, and move them from P to PS 4 Drop all faults detected by the essential static paths from F 5 if F =  then Exit 6 else 7 Sort remaining static paths in P 8 while F ⫽ then 9 pbest ← The first sorted static path 10 Add pbest to PS and remove pbest from P 11 Drop all faults detected by pbest from F 12 Update the order of static paths in P endwhile

A path detection table (PDT) to record fault(s) detected by each static path based on the fault dictionary is then built. The set of all possible static paths for T, referred to as P, can be determined based on the initial state of the counter and the specified counter operation for each applied pattern in T . 2) (Lines 3–5) According to the constructed PDT, all essential static paths are identified and selected. After dropping all faults detected by the essential static paths, we check if any undetected fault exists. If not, the algorithm is complete. Otherwise, more static paths need to be selected. 3) (Lines 6–12) The remaining static paths in P are sorted based on their values of PDCmin,j , #Fj,min , and W(pj ). The first static path is then selected as the best path pbest , added to the final set of the selected observation paths PS and removed from P. All faults detected by pbest are dropped from F, and the order of the remaining static paths in P is then updated. This procedure is repeated until all faults are detected. The required runtime for the static algorithm is dominated by the processes for constructing the needed fault dictionary (Lines 1 and 2) and for updating the static path set (Lines 8–12). Let #OL and #SC be the total numbers of output lines and scan chains, respectively. The time complexity of the fault dictionary construction process is O(|T |×#OL×|F|). The total number of paths to be updated (Line 12) in each iteration is |P|×(|P|−1)/2 in the worst case, where |P|=|T |×#SC, and the time complexity to update the fitness for one path in P is O(|F|). Thus, the worst case time complexity of the static algorithm is O(|P|2 ×|F|+|T |×#OL×|F|). Although this time complexity appears to be high, in practice the runtime

Fig. 2.

Construction of a static path detection table PDT.

required for the static algorithm is quite small. In fact, a large portion of the runtime for our algorithm is consumed in the initialization step, i.e., constructing and accessing the fault dictionary. The runtime for selecting the static paths is relatively small. Experimental results on large ISCAS’89 and ITC’99 benchmark circuits, presented later, show that only seconds or minutes are required to determine the observation paths of these circuits. The construction of an example PDT is illustrated in Fig. 2, where we assume that three static paths (p1 , p2 , and p3 ) exist for a pattern. The left table shows a fault dictionary, based on which a PDT can be constructed as shown in the right table. For simplicity, we only show the data for one pattern (TP1 ). Clearly, the memory usage required by the proposed algorithm is dominated by the size of the constructed static path detection table PDT. In our implementation, we only keep a list of detected faults for each output bit [18]. Thus, the required storage space for the PDT is very small compared to the fault dictionary. The worst-case space complexity is O(|P|×|F|), or equivalently O(|T |×#SC×|F|). For the largest circuit under consideration, i.e., b19, which contains 1 374 217 uncollapsed or 528 548 collapsed testable stuck-at faults, the memory requirement is about 630 MB. After executing Algorithm 1, we perform a reverse compaction routine to further reduce the number of selected observation paths in PS , i.e., all the generated paths in PS are simulated in the reverse order that they were generated, and paths detecting no new faults are dropped. To demonstrate the effectiveness of the static output selection algorithm, 12 largest ISCAS’89 and ITC’99 benchmark circuits are employed in our experiments. For each circuit, a full scan design is implemented using 8, 16, 32, and 64 scan chains (#SC = 8, 16, 32 and 64), respectively. The numbers of selected static paths and the resulting output compaction ratios for each scan design of each circuit are shown in Table I. In these experiments, the counter operation of +1 is employed for output selection. The notations #IN, #OL, and #TP in Table I represent the total numbers of input lines, output lines, and test patterns for 100% fault coverage, respectively. The total number of testable single stuck-at faults in a circuit is shown in the parentheses below the circuit name. Column |PS | shows the total number of selected static paths along with the number of essential static paths in parentheses. As indicated, the number of static paths to be observed is much smaller than the total number of static paths, i.e., #TP ×#SC. The resulting reduction ratio of the total number of observed static paths, i.e., (1−|PS |/(#TP ×#SC))×100%, is shown in column RD%. On average, the reduction ratio is 48.70%, 61.14%, 72.87%,

156

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

TABLE I Output Compaction Results of the Proposed Static Output Selection Algorithm Using +1 Counter Operation

and 81.93% for #SC = 8, 16, 32, and 64, respectively. It can be seen that the reduction ratio becomes more significant as the number of scan chains increases. We find that the s35932 circuit has a much lower reduction ratio than other circuits. This is due to the fact that s35932 is composed of multiple identical submodules, which makes the effects of essential faults quite likely to be propagated to the scan flip-flops of the same scan slice. As a result, more static paths are needed in order to observe these output bits. When we exclude the results of s35932, the average reduction ratios range from 51.25% to 85.04% for various #SC. For the same reason, we also find that both b18 and b19 contain relatively large fractions of essential static paths for the generated test set, and thus have lower reduction ratios. The required area overhead (counter and multiplexer) for the static output selection scheme is also reported in Table I (see Column A%). The area of a counter is estimated by using a commercial synthesis tool based on the 130-nm TSMC technology library. As for the multiplexer, a cost-effective multiplexer architecture in [31] is employed. As indicated, the overhead for the proposed architecture is quite small. On average, only 0.31%, 0.47%, 0.71%, and 1.08% area overheads are needed when #SC = 8, 16, 32, and 64, respectively. Since only one counter operation is employed for output selection, control bits are required only for initializing the counter for each static path to be observed. The total number of required control bits to carry outthe output selection process can thus be calculated by |PS | × log2 #SC . The column C% in Table I indicates the ratio of the control bits over that of the test set, which is calculated by   C% = |PS | × log2 #SC /(#TP × #IN) × 100%.

TABLE II Numbers of Observed Static Paths per Pattern for the Static Output Selection Algorithm Using +1 Counter Operation

CKT

#TP

s13207 s15850 s35932 s38417 s38584 b14 b15 b17 b18 b19 b20 b21 Avg. (w/o s35932)

248 126 32 101 146 685 448 1053 1142 1333 726 734

#SC=8 1.98 2.88 6.34 4.75 5.01 2.48 2.93 3.31 5.82 6.61 3.54 3.60 4.10 (3.90)

#Static Paths #SC=16 2.50 4.09 11.53 7.05 7.66 3.38 3.91 4.57 8.91 10.88 4.99 5.12 6.22 (5.73)

per Pattern #SC=32 #SC=64 3.06 3.73 5.60 7.32 19.94 33.47 9.93 14.06 11.41 15.67 2.98 3.16 4.54 4.38 5.81 7.39 12.71 16.50 16.32 22.27 5.86 5.29 6.00 5.54 8.68 11.57 (7.66) (9.57)

It can be seen that the average increases on the input data volume due to the additional control bits are 1.28%, 2.45%, 3.82%, and 5.50% when #SC = 8, 16, 32, and 64, respectively. Table I also shows the total CPU time required for the proposed static selection algorithm in column T (with the unit of seconds) and the actual run-time for determining the static paths to be observed (in parentheses), i.e., the time excluding the initialization time. As can be seen, the proposed method is quite efficient. For b19, a total of only 161 s are required when #SC=64, among which only 69 s are consumed by the static path-selection process. In Table II, we show the average number of static paths to be observed for a test pattern. On average, the numbers are 4.10,

LIEN et al.: COUNTER-BASED OUTPUT SELECTION FOR TEST RESPONSE COMPACTION

157

TABLE III Output Compaction Results of the Proposed Static Output Selection Algorithm Using Various Single Counter Operations

6.22, 8.68, and 11.57 for #SC=8, 16, 32, and 64, respectively. These results show that observing multiple static paths for a pattern are generally required in order to achieve 100% fault coverage. This is inevitable when two output bits on the same scan slice need to be selected simultaneously. Later, we will show that the dynamic selection algorithm can greatly reduce these numbers. In Table III, we compare the results of different single counter operations, including +1, 0, −1, +2, and −2 for #SC = 32. The best results for each circuit are highlighted. As indicated, using the 0-counting counter sequence can lead to the minimal number of selected static paths for most circuits. This is because, in general, the faulty effects for an applied pattern are likely to be captured by only a few chains of a circuit, which coincides with the analysis results of [25]. However, in many cases, we still observe that not all response bits in the selected chains are the desired ones. This implies that by allowing higher flexibility for output selection, it is possible to use fewer paths to observe the desired bits. In the next section, we present the dynamic output selection algorithm that can further (and significantly) reduce the number of observed paths.

IV. Dynamic Output Selection Algorithm For the static selection algorithm, only one specific operation of the counter is used throughout the whole selection procedure, thus each output bit is covered by only one static path. In this section, we describe the dynamic output selection algorithm that employs multiple counter operations for output selection. Because multiple dynamic paths can be selected for each output bit, a more efficient set of dynamic paths can be determined to cover all the desired output bits, i.e., the total number of required dynamic paths can be made even smaller. The basic idea of the dynamic selection algorithm is to first select the desired output bits and then to determine appropriate dynamic paths to cover these bits. All the essential bits are first identified, and appropriate dynamic paths are determined to cover these bits. Then, all faults detected by essential bits are dropped. If now no faults escape from detection, the algorithm is complete. Otherwise, more output bits are selected using a mechanism similar to that employed in the static algorithm

based on the following attributes. 1) BDCi : Bit detection count of a fault fi , i.e., the total number of output response bits that detect fi . 2) BDCmin,j : The minimum value of BDCi associated with the faults detected by an output response bit bj . 3) #Fj,min : The total number of faults detected by an output response bit bj with the bit detection count of BDCmin,j . 4) wi : Weight of a fault, which is calculated by wi =1/ BDCi . 5) W (bj ): Weight of an output response bit bj , which is calculated by summing wi of each fault detected by bj . All the unselected bits are first sorted using a procedure similar to that used in the static algorithm, with BDCmin,j being the first sorting criteria, #Fj,min second, and W (bj ) third. Then, we select the first bit from the sorted list, determine an appropriate dynamic path to cover this bit, drop the faults detected by this bit, and update the order of the remaining bits. This procedure continues until all bits are covered. The pseudocode of the dynamic output selection algorithm is given in Algorithm 2. 1) (Lines 1–3) A fault dictionary based on the testable fault list F and a pregenerated test set T is built. We then create an output-bit detection table (ODT) that records detected fault(s) by each output response bit for T. We denote the set of all output bits by B. According to the ODT, all essential bits are identified, dropped from B, and added to a set EB. 2) (Lines 4–13) We next determine appropriate dynamic paths to cover all identified essential bits. Initially, no dynamic paths exist, and thus we create a new dynamic path that contains one essential bit b in EB, and drop this bit from EB. Since this dynamic path contains only one output bit so far, it still has much flexibility to include more output bits. We refer to one dynamic path whose number of included bits is less than the scan chain length as a partial dynamic path. For example, in the circuit shown in Fig. 1, {SFF1 , SFF11 } is a partial dynamic path where two more flip-flops can still be included or added. For each remaining bit in EB, we then check its compatibility with the existing partial dynamic paths so far and

158

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

Algorithm 2 Dynamic output selection

Input: 1. A test set T 2. A set of all testable faults, F detected by T 3. Specified counter operations to be employed Output: a set of selected observation dynamic paths, PS 1 Generate a fault dictionary based on F and T 2 Use the fault dictionary to build an output-bit detection table based on the set of all output response bits, B 3 Identify all essential bits in B, drop them from B and add them to a set EB 4 while EB⫽ then 5 b ← an essential bit in EB. 6 if b is compatible to at least one dynamic path in PS then 7 Select an appropriate dynamic path p among the compatible ones 8 Add b to p 9 else 10 Create a new dynamic path pnew and add b to pnew 11 Add pnew to PS 12 Remove b from EB 13 end while 14 Drop all faults detected by the essential bits from F 15 if F =  then Exit 16 else 17 Sort the output bits in B 18 while F ⫽ then 19 bbest ← the first sorted output bit in B 20 if bbest is compatible to at least one dynamic path in PS then 21 Select an appropriate dynamic path p among the compatible ones 22 Add bbest to p 23 else 24 Create a new dynamic path pnew and add bbest to pnew 25 Add pnew to PS 26 Drop all faults detected by bbest from F 27 Remove bbest from B 28 Update the order of the remaining output bits in B endwhile

determine an appropriate dynamic path for the bit to be added to, or create a new dynamic path to cover the bit. 3) (Lines 14 and 15) After determining the dynamic paths for all essential bits in EB, we drop all faults detected by essential bits from F and check if any undetected fault still exists. If no, the algorithm is complete. Otherwise, more bits are selected. 4) (Lines 16–28) The remaining output bits in B are sorted based on their values of BDCmin,j , #Fj,min , and W (bj ). Then, we select the first sorted bit bbest , and determine an appropriate dynamic path in PS or create a new dynamic path to cover this bit. We next drop the faults detected by the selected bit from F, remove the selected bit from B, and update the order of the remaining bits in B. The above procedure is repeated until all faults are detected.

Fig. 3. Example of depth-first-search algorithm for the partial static path {SFF1 , SFF11 } with the counter operations of +1/−1.

Finally, all selected dynamic paths are contained in PS . Similar to the static algorithm, the time complexity of the dynamic algorithm is dominated by the processes to construct the needed database and to update the response bit set (Lines 18–28). The bit set updating process requires a total of |B|×(|B|−1)/2 times to update all remaining bits of B in each iteration, where |B| = |T |×#OL, and updating the fitness of one bit in B needs runtime of O(|F|). Thus, the time complexity is O(|B|2 ×|F|) = O(|T |2 ×(#OL)2 ×|F|). Again, the number of output bits to be updated is actually much smaller than |B|×(|B|−1)/2. As will be shown later, our dynamic procedure can also efficiently deal with the considered benchmark circuits. In the dynamic algorithm, the required memory usage is dominated by the size of the constructed output-bit detection table. Again, we only keep a list of detected faults for each output bit, and thus the required storage space is very small compared to the fault dictionary. The worst-case space complexity is O(|B| × |F |), or equivalently, O(|T |×#OL×|F|). For b19, the memory space is about 1.38 GB. The dynamic path filling process in the dynamic output selection algorithm, i.e., adding an output bit to an existing partial dynamic path, involves two steps: 1) the compatibility checking between an output bit and a dynamic path, and 2) the determination of an appropriate dynamic path among the compatible ones for the bit to be added to, as explained in the following. A bit is said to be compatible to a partial dynamic path if the response bit is observable along the path under the allowed counter operations. Use Fig. 1 for illustration. Assume the current partial dynamic path contains {SFF1 , SFF11 }. It can be seen that SFF5 , SFF6 , SFF7 , and SFF9 are compatible to the partial path if the counter operations of −1 and +1 are available. On the other hand, the SFF8 cannot be added to the partial path as SFF8 is incompatible with SFF11 under the counter operations of −1 and +1. In the dynamic output selection algorithm we carry out an efficient compatibility checking process based on the depth-first-search (DFS) algorithm [26]. We use the DFS algorithm to trace and record every observable output bits along a partial dynamic path based on the existing output bits in the path and the available counter operations. In Fig. 3, we illustrate the application of the DFS algorithm to an existing partial dynamic path {SFF1 , SFF11 }. The result shows that the observation output bit SFF5 , SFF6 , SFF7 , and

LIEN et al.: COUNTER-BASED OUTPUT SELECTION FOR TEST RESPONSE COMPACTION

159

TABLE IV Output Compaction Results of the Proposed Dynamic Output Selection Algorithm Using +1/−1 Counter Operations

TABLE V

TABLE VI

Numbers of Observed Dynamic Paths per Pattern

Output Compaction Results of the Proposed Dynamic Output Selection Algorithm Using Triple Counter Operations

SFF9 as shaded are observable using a sequence of +1/−1 counter operations, and thus are compatible with the current dynamic path. On the other hand, the SFF8 is incompatible with the partial path and cannot be added in. By keeping a list that records every observable output bit along a partial dynamic path, the compatibility of an output bit with the path can easily be determined. As for the dynamic path determination for an output bit to be added, a particular attribute called partial dynamic path flexibility is employed. The flexibility of a partial dynamic path is quantified by the total number of scan flip-flops that are compatible to the path, but are not selected so far. Use Fig. 1 for illustration again. The flexibility of the dynamic path (SFF1 , SFF11 ) are 4 for the counter operations of +1/−1 since four compatible scan flip-flops (SFF5 , SFF6 , SFF7 , and SFF9 ) exist. In our algorithm, we add an output bit to a compatible partial path that results in the minimal path flexibility decrease.

At the end of the dynamic selection procedure, some of the generated paths in PS may still be partial ones, especially for the most recently generated ones. Two postprocessing routines are further executed to make use of the available flexibilities of these dynamic paths to further reduce the number of paths to be observed. The first one examines the dynamic paths in the reverse order that they are generated. For a dynamic path p under examination, we try to add to p as many output bits as possible that are contained in other dynamic paths. During this process, if all contained output bits of a dynamic path are added to other paths, the path is dropped. The second routine checks all the selected dynamic paths in the increasing order of the numbers of output bits they contain. For each dynamic path p under checking, we identify each fault f detected only by p and try to fill an unselected alternative detector (output response bit) in other partial paths to detect f. If every fault

160

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

TABLE VII Output Compaction Results of the Proposed Dynamic Output Selection Algorithm Using Various Double Counter Operations

ops = (+1 -1) CKT #TP |P |/ (#SC=32) |PS| RD % S #TP s13207 248 434 94.53 1.75 s15850 126 399 90.10 3.17 s35932 32 334 67.38 10.44 s38417 101 472 85.40 4.67 s38584 146 704 84.93 4.82 b14 685 1311 94.02 1.91 b15 448 940 93.44 2.10 b17 1053 2236 93.36 2.12 b18 1142 4034 88.96 3.53 b19 1333 4969 88.35 3.73 b20 726 1822 92.16 2.51 b21 734 1914 91.85 2.61 Avg. 88.71 3.61

ops = (+1 0) |P |/ |PS| RD % S #TP 433 94.54 1.75 395 90.20 3.13 307 70.02 9.59 492 84.78 4.87 714 84.72 4.89 1412 93.56 2.06 959 93.31 2.14 1902 94.36 1.81 3260 91.08 2.85 3840 91.00 2.88 1984 91.46 2.73 2118 90.98 2.89 89.17 3.47

ops = (0 -1) |P |/ |PS| RD % S #TP 438 94.48 1.77 395 90.20 3.13 310 69.73 9.69 490 84.84 4.85 706 84.89 4.84 1292 94.11 1.89 876 93.89 1.96 1868 94.46 1.77 3272 91.05 2.87 4008 90.60 3.01 1770 92.38 2.44 1914 91.85 2.61 89.37 3.40

detected by p can also be detected by other dynamic paths, the path p can be dropped. Note that if augmenting a partial dynamic path is required, we will add the output bits that result in the minimal flexibility decrease. V. Experimental Results The experimental results of the dynamic output selection algorithm for the 12 ISCAS’89 and ITC’99 benchmark circuits are shown in Tables IV–VII. In Table IV, we show the numbers of selected dynamic paths along with the number of essential paths in parentheses and the resulting reduction ratios. In these experiments, the counter operations of +1/−1 are used. It can be seen that the selected dynamic paths are efficient. Much fewer paths need to be observed by using the dynamic algorithm compared with the static one. This leads to much higher reduction ratios (see Column RD%). On average, the reduction ratios are 77.01%, 85.23%, 88.71%, and 90.82% for #SC = 8, 16, 32, and 64, respectively. Although the test control becomes more complicated and thus larger area overhead is required, the overhead is still small, as shown in column A% of Table IV. On average, only 0.39%, 0.58%, 0.88%, and 1.29% overheads are needed when #SC = 8, 16, 32, and 64, respectively. The total number of control bits required to initialize the counter and to select the output bits for all  the dynamic  paths can be calculated by |PS | × ( log2 #SC + log2 #OP × (max− scan− length − 1)) where #OP is the total number of counter operations employed. The ratio of the number of required control bits over that of the input pattern bits is shown in column C%, which is calculated by     |PS |×( log2 #SC + log2 #OP ×(max− scan− length−1)) #TP × #IN

×100%.

As indicated, the average increases in the input data volume due to the additional control bits are 25.02%, 16.73%, 13.86%, and 13.04% when #SC=8, 16, 32, and 64, respectively. As can be seen from Tables I and IV, on average, the dynamic selection scheme has 28.31%, 24.09%, 15.84%, and 8.89% more reductions on the output response data volume (RD%), while requiring additional 23.74%, 14.28%, 10.04%, and 7.54% of control bits (C%) than the static selection one

ops = (+2 -2) |P |/ |PS| RD % S #TP 459 94.22 1.85 401 90.05 3.18 305 70.21 9.53 471 85.43 4.66 746 84.03 5.11 1402 93.60 2.05 1071 92.53 2.39 2459 92.70 2.34 4747 87.01 4.16 6184 85.50 4.64 2215 90.47 3.05 2354 89.98 3.21 87.98 3.85

ops = (+2 0) |P |/ |PS| RD % S #TP 427 94.62 1.72 389 90.35 3.09 300 70.70 9.38 477 85.24 4.72 709 84.82 4.86 1420 93.52 2.07 929 93.52 2.07 1921 94.30 1.82 3603 90.14 3.15 4420 89.64 3.32 1820 92.17 2.51 1872 92.03 2.55 89.25 3.44

ops = (0 -2) |P |/ |PS| RD % S #TP 413 94.80 1.67 389 90.35 3.09 305 70.21 9.53 486 84.96 4.81 706 84.89 4.84 1340 93.89 1.96 906 93.68 2.02 1960 94.18 1.86 3615 90.11 3.17 4495 89.46 3.37 1702 92.67 2.34 1788 92.39 2.44 89.30 3.42

when #SC = 8, 16, 32, and 64, respectively. It is a tradeoff between test response data volume and input control data volume. However, since only one copy of input test data is required during the entire test procedure while each circuit tested will generate a copy of test response data, the output data reduction appears to be much more significant when all test response data need to be stored and analyzed (e.g., for diagnosis purposes). The total required runtime for each circuit is shown in Column T of Table IV, in seconds. The actual runtime for the dynamic path selection procedure is indicated in parentheses. Although the dynamic method needs longer runtime than the static one, it is still quite efficient. For the most complicated circuit under consideration b19, 298 s are required, among which 210 s are consumed by the dynamic path determination procedure. In Table V, we show the average number of dynamic paths to be observed for a test pattern. On average, the numbers are 1.84, 2.36, 3.61, and 5.87 for #SC = 8, 16, 32, and 64, respectively. These results show that the increase in test time can be greatly reduced by the dynamic algorithm. The speedup ratios, defined as the ratio of the number of paths per pattern for the static algorithm (see Table II) over that for the dynamic algorithm, are 2.17, 2.61, 2.42, and 1.91 for #SC = 8, 16, 32, and 64, respectively. In Table VI, we compare the results for different double counter operations, including +1/−1, +1/0, 0/−1, +2/−2, +2/0, and 0/−2 when #SC=32 where the best results are highlighted. As indicated, the results are quite diverse. No specific counter operations can lead to overwhelmingly best results. However, for each individual circuit the reduction ratios for various counter operations are in a small range. For example, the reduction ratios for circuits s13207 and s15850 are in the range of 94.22%∼94.80% and 90.05%∼90.35%, respectively. This implies that, in general, by using any two counter operations significant compaction results over the static method can be achieved. We also show the output selection results using triple counter operations, including +1/0/−1 and +2/0/−2 in Table VII. As expected, higher compaction ratios and shorter test application time can be achieved using more complicated test control (larger area overhead).

LIEN et al.: COUNTER-BASED OUTPUT SELECTION FOR TEST RESPONSE COMPACTION

161

TABLE VIII Comparisons of the Proposed Static and Dynamic Output Selection Algorithms on b19 Under 32 Scan Chains

Methods Static

Average Best |PS |/#TP A% C% RD% RD% 21 208 50.28 56.35 15.91 ∼0.05 1.19 |PS |

Dynamic (two operations) 7073 Dynamic (three operations) 4839

Fig. 4. Statistical results of reduction ratio (RD%) on test response data volume using (a) static selection algorithm for 63 cases, (b) dynamic selection algorithm with two different counter operations for 200 cases, and (c) dynamic selection algorithm with three different counter operations for 200 cases.

To further understand the effects of using different counter operations, we have done extensive experiments on b19, the largest circuit we can obtain. We carry out 63 cases of different counter operations (−31 to 31 with a 6-b counter) for the static method, 200 cases for the dynamic method with two counter operations, and another 200 cases for the dynamic method with three counter operations. The number of scan chains is set to 32. The results on the reduction ratios are shown in Fig. 4, where we can see that for the static method all the reduction ratios are in the range of 45%∼60%. For the two-operation counter cases, there are eight cases below 70%, one case higher than 90% (actually 90.79%), and all other 191 cases are in the range of 75%∼90%. For the threeoperation cases, all are within 75%∼95%. The results of these experiments are summarized in Table VIII, which shows that the average reduction ratios for the cases of static, two-counter operations, and three-counter operations are 50.28%, 83.42%, and 88.66%, respectively. Clearly, more counter operations lead to more reductions and less test time, but also require more area overhead and control data. The above experiments show that tradeoff among test time, reduction ratios, area overhead, and control data does exist when different numbers of counter operations are used. However, when the numbers of operations are the same (1, 2, or 3), the reduction ratios are in a relatively smaller range. Nevertheless, because our algorithms are very efficient, it is possible to run a large number of cases to obtain near optimum

83.42

90.79

5.31

88.66

93.26

3.63

T

140 (50) 321 ∼0.07 16.95 (219) 406 ∼0.08 22.95 (307)

solutions. For example, the 200 cases for either two-counter operations or three-counter operations can be done in one day, and the results show that the best reduction ratios of the 200 cases with two and three counter operations are 90.79% and 93.26%, respectively. To evaluate the efficiency and effectiveness of our algorithms, we also compare our results with those obtained from the tool lp− solver that solves the minimum set covering problem by a linear-programming-based branch-and-bound algorithm [32]. For simplicity and considering the high time complexity of lp− solver, we only compare the results of static selections. The results are shown in Table IX. As shown, our method can be carried out in less than 3 min for all cases, while the lp− solver cannot solve the problem for many cases (the entries with NC) under the time limitation of 48 h. For those cases in which lp− solver can obtain the results, we can see that in almost all cases the differences between our method and lp− solver are less than 1%. Only in a few cases are the differences higher than 1%, but they are all still less than 2%.

VI. Diagnosability Diagnosis is the process of identifying the location of a fault in a CUT. During manufacturing testing, the test response data are often used to help this process. For the output compaction methods that using MISR or XOR gates, almost all diagnosis information gets lost because information compressed by both MISR and XOR are very difficult to recover. For the output selection method, the test response data that are selected to observe will be stored using their original values, and thus much information on fault effects is retained. The diagnosability of the test response data obtained by the proposed output selection scheme is shown in Table X. Here, we quantify the diagnosability using a measure commonly adopted in the diagnosis field called “diagnosis resolution,” which is defined as the number of distinguished fault pairs divided by the total number of all possible fault pairs [27]– [30], where a fault pair is distinguished by a test pattern if the two faults in the pair result in different output responses when the pattern is applied. Clearly, the more the fault pairs can be distinguished, the higher the diagnosis resolution is. Since our purpose is only to determine how good the diagnosability of the selected bits is, we do not develop a new diagnosis algorithm for this problem. Instead, we just employ the one that has been shown to be very time or space effective in the literature [28]. We respectively apply this algorithm to all the

162

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

TABLE IX Comparisons Between the Proposed Static Output Selection and Linear Programming Solver [32] Using +1 Counter Operation CKT s13207 s15850 s35932 s38417 s38584 b14 b15 b17 b18 b19 b20 b21 ∗

#SC = 8 #SC Ours lp solver Ours |PS | T |PS | T |PS | T 492 1 488 167 620 1 363 1 357 94 515

Suggest Documents