IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011
455
Using Launch-on-Capture for Testing Scan Designs Containing Synchronous and Asynchronous Clock Domains Shianling Wu, Member, IEEE, Laung-Terng Wang, Fellow, IEEE, Xiaoqing Wen, Senior Member, IEEE, Zhigang Jiang, Lang Tan, Yu Zhang, Yu Hu, Member, IEEE, Wen-Ben Jone, Senior Member, IEEE, Michael S. Hsiao, Senior Member, IEEE, James Chien-Mo Li, Member, IEEE, Jiun-Lang Huang, Member, IEEE, and Lizhen Yu, Member, IEEE
Abstract—This paper presents a hybrid automatic test pattern generation (ATPG) technique using the staggered launch-oncapture (LOC) scheme followed by the one-hot LOC scheme for testing delay faults in a scan design containing asynchronous clock domains. Typically, the staggered scheme produces small test sets but needs long ATPG runtime, whereas the one-hot scheme takes short ATPG runtime but yields large test sets. The proposed hybrid technique is intended to reduce test pattern count with acceptable ATPG runtime for multi-million-gate scan designs. In case the scan design contains multiple synchronous clock domains, each group of synchronous clock domains is treated as a clock group and tested using a launch aligned or a capture aligned LOC scheme. By combining these schemes together, we found the pattern counts for two large industrial designs were reduced by approximately 1.7X to 2.1X, while the ATPG runtime was increased by 10% to 50%, when compared to the one-hot clocking scheme alone. Manuscript received September 11, 2009; revised September 7, 2010; accepted September 15, 2010. Date of current version February 11, 2011. This work was supported in part by the National Science Foundation of America, under Grant CCF-0541103, and by the Japan Society for the Promotions of Science Grant-in-Aid for Scientific Research (B) 22300017. This paper was recommended by Associate Editor F. Lombardi. S. Wu is with SynTest Technologies, Inc., Princeton, NJ 08550 USA, and also with the Department of Creative Informatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail:
[email protected]). L.-T. Wang is with SynTest Technologies, Inc., Sunnyvale, CA 94086 USA, and also with the Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan (e-mail:
[email protected]). X. Wen is with the Department of Creative Informatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail:
[email protected]). Z. Jiang is with the ATPG Research and Development Group, SynTest Technologies, Inc., Sunnyvale, CA 94086 USA (e-mail:
[email protected]). L. Tan, Y. Zhang, and L. Yu are with SynTest Technologies, Inc., Shanghai 201200, China (e-mail:
[email protected];
[email protected];
[email protected]). Y. Hu is with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
[email protected]). W.-B. Jone is with the Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, OH 45221 USA (e-mail:
[email protected]). M. S. Hsiao is with the Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061 USA (e-mail:
[email protected]). J. C.-M. Li and J.-L. Huang are with the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan (e-mail:
[email protected];
[email protected]). Digital Object Identifier 10.1109/TCAD.2010.2092510
Index Terms—Aligned launch-on-capture, at-speed scan testing, double-capture, hybrid launch-on-capture, launch-oncapture, one-hot launch-on-capture, staggered launch-on-capture.
I. Introduction CAN DESIGN is a design-for-testability (DFT) technique in which the storage elements in a sequential circuit are converted into scan cells and then these scan cells are stitched together to form scan chains during scan testing [1]– [5]. By reconfiguring all storage elements into scan cells, the complexity of automatic test pattern generation (ATPG) for sequential circuits is transformed to that of manageable ATPG for combinational circuits. Since the late 1990s, scan design has become the most widely used DFT technique. In recent years, with shrinking device geometry due to advances in design and manufacturing technologies, circuits containing millions or tens of millions of logic gates have become common. While the scan design technique has offered many benefits, it is now becoming a bottleneck for such large designs due to the associated explosive increase in test data volume. To fully detect defects in manufactured chips, the amount of scan test data can easily overflow the storage capacity of automatic test equipment (ATE). These all contribute to an increase in test cost. Traditionally, one of the most popular capture-clocking schemes is one-hot clocking, in which every clock domain is tested one by one. This scheme, however, often only saves ATPG runtime but results in much more test patterns than expected. Another scheme is simultaneous clocking in which all clock domains are tested in parallel as long as data propagating across clock domains are marked with unknown (X) values whenever needed during ATPG. This scheme can result in small pattern count but may lead to significant fault coverage loss caused by the Xs. In this paper, we first propose two capture-clocking schemes, namely aligned clocking and staggered clocking, which can be used to remedy the problems found in one-hot clocking and simultaneous clocking. For ease of explanation, we consider only delay faults, such as transition and path-delay faults. Aligned clocking is mainly used for testing synchronous
S
c 2011 IEEE 0278-0070/$26.00
456
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011
Fig. 1. Basic at-speed test schemes. (a) Launch-on-shift (a.k.a. skewedload). (b) Launch-on-capture (a.k.a. broad-side or double-capture).
clock domains; whereas staggered clocking is for testing asynchronous clock domains. Next, we propose to partition clock domains into clock groups, each of which contains a group of synchronous clock domains or noninteracting asynchronous clock domains. Synchronous clock domains are a group of clock domains whose frequencies have integer multiple relations, e.g., 25, 50, and 100 MHz. Asynchronous clock domains are a group of clock domains whose frequencies are totally unrelated, e.g., 30, 48, and 100 MHz. By clock grouping, we can effectively reduce the number of clock controls during ATPG. We then analyze why using the staggered clocking scheme alone achieves smaller pattern count but its ATPG runtime could be much longer. Lastly, we demonstrate that using a hybrid scheme, which combines staggered clocking and one-hot clocking, we can reduce pattern counts by 1.7X to 2.1X for two large industrial designs while the ATPG runtime is increased by 10% to 50%, when compared to the case of using the one-hot clocking scheme alone. The rest of this paper is organized as follows. Section II discusses two basic test timing control diagrams for detecting delay faults. Section III presents the proposed hybrid launchon-capture (LOC) schemes. Section IV discusses the hybrid ATPG techniques. Section V shows results on two industrial designs, and Section VI concludes this paper.
Fig. 2.
One-hot LOC.
Fig. 3.
Simultaneous LOC.
Unlike the LOS technique, which uses the last shift clock pulse to launch a transition, LOC uses a capture clock pulse to launch the transition. Fig. 1 shows the two basic at-speed test schemes. Typically, testing a scan-based BIST design based on LOS for at-speed delay fault testing can achieve higher fault coverage with a shorter test pattern count [8]– [14]. The problems are that LOS can cause unwanted overtesting because more false paths may be exercised, and incur higher implementation costs because the scan enable signal SE must be operated at-speed for each clock domain. This is in sharp contrast to LOC in which only a slow-speed, global scan enable signal GSE for all clock domains is needed. A. One-Hot Launch-on-Capture Using the one-hot LOC approach, a launch clock pulse followed by a capture clock pulse are applied to only one clock domain during each capture window, while all other clocks are held inactive. An example timing diagram is shown in Fig. 2. It applies two capture pulses (C1 -followedby-C2 or C3 -followed-by-C4 ) at their respective clock domains’ frequencies (of period d1 or d2 ) to detect intraclock-domain delay faults, and uses a single, slow-speed GSE to drive both clock domains. Thus, this approach can be used for at-speed testing of intra-clock-domain delay faults. The major disadvantage of one-hot LOC is long test time. B. Simultaneous Launch-on-Capture
II. Background There are two basic capture-clocking schemes for testing multiple clock domains at-speed: 1) launch-on-capture (LOC), and 2) launch-on-shift (LOS). LOC was referred to as broadside in [6] or double-capture in [4]. LOS was referred to as skewed-load in [7]. Both schemes are helpful in detecting structural faults and delay faults within each clock domain (called intra-clock-domain faults) or across clock domains (called inter-clock-domain faults). Delay faults include transition faults and path-delay faults.
The long test time problem of one-hot LOC can be resolved by using the simultaneous LOC scheme illustrated in Fig. 3. The simultaneous LOC scheme allows testing to be performed on all clock domains in parallel, which is quite helpful when clock domains do not interact with each other. For clock domains where data may propagate from one clock domain to the other, the values of source scan cells in the originating clock domains will have to be forced to Xs during ATPG in order to avoid any pattern mismatch. This could cause significant fault coverage loss.
WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS
Fig. 5. Fig. 4.
Capture aligned LOC.
III. Proposed Test Timing Control This section proposes test timing control methods for capture-clocking. Techniques to improve fault coverage and reduce pattern count and ATPG runtime are discussed in Section IV. A. Aligned Launch-on-Capture for Synchronous Domains The X-masking problem in simultaneous LOC can be mitigated by using the proposed aligned LOC scheme for synchronous clock domains, where the clock frequency in either of two clock domains is an integer multiple of the other. The aligned LOC scheme is effective in reducing the sequential depth of the capture clocks as opposed to the staggered scheme (to be described later) in the capture window. Hence, test time can become shorter. Also, the scheme can detect interclock-domain faults among all synchronous clock domains simultaneously. There are two possible ways to implement the aligned LOC scheme, namely, capture aligned LOC and launch aligned LOC. Fig. 4 shows the timing of the capture aligned LOC scheme. The major advantage of this approach is that all intraclock-domain and inter-clock-domain faults can be tested. A Ci to C arrow in Fig. 4 represents a set of delay faults that can be detected by a pair of clocks. For example, there are three arrows from C1 to C. The horizontal arrow from C1 to C represents those intra-clock-domain delay faults within the clock domain CK1 . The other two arrows represent those inter-clock-domain delay faults from CK1 to CK2 and from CK1 to CK3 , respectively. The remaining six arrows can be interpreted in the same manner. Since the active edges (rising edges) of the three capture pulses (see dashed line C) must be aligned precisely, the circuit must contain one reference clock, and the frequency of all remaining test clocks must be derived from the reference clock. In the example given here, CK1 is the reference clock operating at the highest frequency, and CK2 and CK3 are derived from CK1 and designed to operate at 1/2 and 1/4 the frequency of CK1 , respectively. Therefore, this approach is only applicable for at-speed testing of intra-clock-domain and inter-clock-domain delay faults in synchronous clock domains. A similar aligned LOC approach is shown in Fig. 5 that aligns all first capture edges (i.e., the launch edges) rather than second capture edges. This approach is referred to as launch aligned LOC. Similar to capturing aligned LOC, it is also only
457
Launch aligned LOC.
applicable for at-speed testing of intra-clock-domain and interclock-domain delay faults in synchronous clock domains. Consider the three clock domains, driven by CK1 , CK2 , and CK3 , again. The eight arrows among the dashed line C and the three capture pulses, C1 , C2 , and C3 , represent the intraclock-domain and inter-clock-domain delay faults detected by the corresponding clocks. Note that in order to detect the interclock-domain delay faults from CK1 to CK3 a special capture pulse C4 is required. As this method requires much more complex timing-control diagram, a clock suppression circuit similar to those proposed in [15]–[19] is needed to enable or disable the selected capture pulses. The dotted clock pulses shown in the figure indicate the suppressed capture pulses. The main advantages of both aligned LOC approaches are that: 1) all intra-clock-domain faults and inter-clock-domain faults can be detected, and 2) a single, slow-speed GSE is used. Hence, both approaches can be used for true at-speed testing of synchronous clock domains. However, one major drawback is that precise alignment of the capture pulses is still required. B. Staggered Launch-on-Capture for Asynchronous Domains The staggered LOC scheme relaxes the capture alignment requirement problem in the aligned LOC approaches [20], [21]. A test timing control example is shown in Fig. 6. In this figure, capture pulses C1 -followed-by-C2 and C3 -followed-byC4 are applied in a sequential or staggered order in the capture window to test all intra-clock-domain faults and inter-clockdomain structural faults in the two clock domains. A daisychain clock-triggering or token-ring clock-enabling technique similar to that described in [22] can be employed to generate the ordered sequence of capture clock pulses. Although this figure only shows the case of C1 -followedby-C2 occurring before C3 -followed-by-C4 , the reversed order is also feasible. We will explain the selection of clock order in the later section. The two capture pulses (C1 and C3 ) are used to launch transitions at the outputs of some scan cells, and the output responses to these transitions are captured by the following two capture pulses (C2 and C4 ), respectively. Both delays d2 and d4 are set according to the operating frequency of their respective clock domains. Since d1 , d3 , and d5 do not affect the detection of delay faults, we can simply use a single, slow-speed GSE for driving all clock domains. Hence, this scheme can be used to test all intra-clock-domain faults and inter-clock-domain structural faults in asynchronous clock domains.
458
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011
Fig. 6.
Staggered LOC.
When the logic crossing synchronous clock domains is tested, d3 must satisfy the specified timing relation between the two clock domains. However, there can be some delay fault coverage loss among clock domains if a fixed order of capture clocks is used across all capture windows. This fault coverage loss is mostly related to sequentially redundant faults that can be detected when one-hot clocking is employed. IV. Hybrid ATPG Techniques This section discusses the hybrid ATPG techniques that can be used to reduce pattern count with reasonable ATPG runtime. First, clock grouping and clock ordering along with clock specification are performed. Then, the proposed hybrid staggered-followed-by-one-hot ATPG scheme, combining the staggered LOC and one-hot LOC approaches, based on the clock grouping results, is discussed. A. Clock Grouping The first step in reducing ATPG runtime of testing a scan design is to identify all asynchronous clock domains that do not interact with each other or run in a synchronous manner. For data paths that originate and terminate at different asynchronous clock domains, additional care must be taken in terms of the way the clocks are applied, in order to guarantee the success of the capture operation. This is mainly due to the fact that the clock skew between different clock domains is typically nondeterministic. A data path originating in one clock domain and terminating in another might result in erroneous captured values when both clocks are pulsed simultaneously, and the clock skew between the two clocks is larger than the data path delay from the originating clock domain to the terminating clock domain. In order to avoid the mismatch, the timing governing the relationship of such a data path shown in the following equation must be observed: Clock-skew < Data-path-delay + Originating-Clock-to-Q delay. If the above relation does not hold, a mismatch may occur during the capture operation. In order to prevent this problem, clocks belonging to different clock domains can be applied sequentially (using the staggered clocking scheme), as opposed to simultaneously, such that any clock skew which exists between the clock domains can be ignored during the test generation process. It is also possible to apply only one clock during each capture operation using the one-hot clocking scheme. However, almost all designs have noninteracting clock domains that can be applied simultaneously to reduce the complexity and final pattern count of the pattern generation
Fig. 7.
Clock grouping example.
and fault simulation processes. Clock grouping is a process used to analyze all data paths in the circuit in order to determine all independent or noninteracting clocks, which can be grouped and applied simultaneously. Note that we can still group two noninteracting asynchronous domains together when the operating frequencies are different. The reason is because an on-chip clock is typically used for controlling each asynchronous clock domain. An example of the clock grouping process is shown in Fig. 7. This example shows the results of performing a circuit analysis operation on a scan design to identify all clock domain interactions, where an arrow indicates a data transfer from one clock domain to a different clock domain. As shown in Fig. 7, the circuit in this example has seven clock domains, CD1 –CD7 , and five crossing-clock-domain data paths, CCD1 – CCD5 . From this example it can be seen that CD2 and CD3 are independent from each other, and hence their related clocks can be applied simultaneously during test of CK2 . Similarly, clock domains CD4 through CD7 can also be applied simultaneously during test of CK3 . Therefore, in this example, three grouped clocks instead of seven individual clocks can be used to test the circuit during the capture operation. B. Clock Ordering Each clock group thus consists of a group of noninteracting asynchronous clock domains or a group of synchronous clock domains running at frequencies of multiple integers. As each clock group varies in gate count (circuit size), the order of these clock groups plays an important role in the circuit’s fault coverage that can be obtained. There are n! different ways to order n clock groups when staggered clocking is employed. Using the gate count of each clock group as an ordering criterion, a common approach is to place the clock groups either in descending order or in ascending order. Although it is difficult to predict which of the n! clock orders would give the best result [23], [24], one logical reasoning would be that the descending order can yield better results than the ascending order. In the staggered approach, the clock groups that receive their capture clock pairs later are at a disadvantage due to higher sequential depth during ATPG. This is because their generated patterns should traverse through a larger number of clock cycles to justify values through the storage elements of other clock groups that received their clock pairs earlier. Thus, larger sized clock groups should receive their clock pairs earlier, so justification
WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS
459
Because all of the above-mentioned clocking schemes can cause fault coverage loss, large pattern count, or long ATPG runtime, we propose a hybrid scheme below to reduce pattern count with reasonable ATPG runtime while maintaining the same fault coverage as the one-hot clocking scheme. D. Staggered-Followed-by-One-Hot ATPG
Fig. 8. Clock order when GCD1 is larger than GCD2 . (a) In descending order. (b) In ascending order.
can be done earlier, resulting in better fault coverage, pattern count, and ATPG runtime overall. Assume the gate count of grouped clock domain GCD1 is larger than that of grouped clock domain GCD2 , Fig. 8(a) and (b) shows the clock order of CK1 (controlling GCD1 ) and CK2 (controlling GCD2 ) in descending order and ascending order, respectively. C. Clock Specification The identified ordered clock groups can now be used for capture-clocking using the basic LOC schemes described in the previous two sections. During ATPG, we specify the clock pulses in the capture window according to the given clock order. For instance, in Fig. 8(a), we can specify the two clocks and GSE as %CK1 = “010100000”; %CK2 = “000001010”; %GSE = “000000000.” Similarly, in Fig. 5, we can specify the clock order as %CK1 %CK2 %CK3 %GSE
= = = =
“01010001000000”; “01100110000000”; “01111000011110”; “00000000000000.”
The idea on specifying the clocks in the above format is to allow for the ATPG program to properly perform circuit expansion (time-frame expansion) prior to ATPG, depending on whether some clocks may have overlapping or nonoverlapping clock pulses. For example, 10 rather than 14 time frames will be expanded for the ATPG process of Fig. 5 shown above, so as not to have two or more consecutive clock phases (columns), like “0100” or “0010.” This is particularly helpful when an aligned LOC approach is used for testing synchronous clock domains.
The hybrid approach is to apply the staggered LOC scheme in the first phase and the one-hot LOC scheme in the second phase. During the first phase, all clock groups are specified in a predetermined, sequential, or staggered order. ATPG is then performed based on the given staggered order. In order to reduce ATPG runtime, circuit expansion based on the ordered sequences of clock groups is done on the scan design during a preprocessing step. Since the staggered clocking scheme specifically deploys physically disjoint capture clock pulses from different clock domains (in our case, different clock groups), there is no need to insert Xs at the fanout branches of each originating flip–flop in any originating clock domain. Therefore, this staggered approach will not create unnecessary Xs, complicating response compaction in a compression design. Since staggered clocking can cause the ATPG program to mark hard-detected faults as untestable or undetected due to the ordered sequence of clock groups, the second phase running one-hot ATPG is required to detect those missed faults. During ATPG, fault coverage, pattern count, and ATPG runtime are closely monitored in the program to determine the timing to switch over from staggered ATPG to one-hot ATPG. The switch-over criteria of this two-phase capture-clocking scheme can be made more intelligent, e.g., by monitoring the increment in fault coverage and runtime versus pattern count or a percentage of faults already processed. All such rule sets are used in the program to automatically determine a switchover point that achieves a balance between ATPG runtime and pattern count.
V. Experimental Results The proposed hybrid staggered-followed-by-one-hot LOC scheme has been applied to many industrial designs. We present two large designs in the range of 1–5 million primitives to illustrate the effectiveness of the proposed scheme. A. Design Statistics Table I summarizes the statistics of the two designs. We developed a program to identify all independent clock groups. A clock group consists of the clocks that do not interact with each other or control a group of synchronous clock domains. This allows all clocks in the clock group to be activated simultaneously during capture without suffering from any clock skew issue. In the experiments, we then performed ATPG based on the number of clock groups identified by the program. As an LOC scheme is employed for testing scan designs and an internal PLL-triggered on-chip test clock is often used to control a clock domain, the proposed hybrid LOC
460
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011
No. No. No. No. No.
of of of of of
TABLE I
TABLE IV
Design Statistics
Hybrid Clocking Results on Two Industrial Designs
Design A 1.1 M 3 109 012 102 K 33 5
primitives faults flip–flops clock domains clock groups
Design B 4.7 M 8 879 940 281 K 8 8
Circuit Design A
Design B
TABLE II Application Results on Design A One-Hot 2 556 801 84.24% 8309 9:09:06
Hard-detected faults Fault coverage (%) Pattern count (one-hot/staggered) ATPG runtime
Staggered 2 490 101 80.09% 7705(1.08X) 21:20:40
TABLE III Application Results on Design B
Hard-detected faults Fault coverage (%) Pattern count (one-hot/staggered) ATPG runtime
One-Hot 7 667 136 86.34% 39 099 41:45:53
Staggered 7 063 030 82.93% 12 401(3.15X) 66:19:39
scheme presented here would not have any additional impact on design and its implementations. One only has to ensure that in the staggered approach, enough delay is inserted between interacting clock domains so data can propagate from an originating domain to all receiving domains. The one-hot clocking and staggered clocking schemes were first applied independently to the two industrial designs listed in Table I. Only intra-clock-domain transition faults are considered. The computer used was a 64-bit based PC operating at 2.5 GHz under the Linux operating system. B. Results Tables II and III summarize the test application results. We consider only transition faults existing between flip–flops within each clock domain. The results show that one-hot clocking leads to shorter ATPG runtime and higher fault coverage but larger pattern count than staggered clocking. This is expected as staggered clocking can result in sequentially untestable faults. The results using the proposed hybrid clocking scheme on Designs A and B are listed in Table IV. In our experiments, staggered clocking is automatically switched to onehot clocking after the switch-over criteria are met. The first column shows the circuit name. In the next three columns, fault coverage, pattern count, and ATPG runtime are associated with three numbers each. The first number is the result using staggered clocking, the second using one-hot clocking, and the third is the sum of the two steps. For Design A, the pattern count using one-hot clocking alone given in Table II is 1.77 (= 8309/4697) times the pattern count using the hybrid approach given in Table IV.
Fault Coverage 78.84% + 5.40% (84.24%)
Pattern Count 1505 + 3192 (4697) (1.77X) 1792 + 16 857 (18 649) (2.10X)
76.34% + 10.00% (86.34%)
ATPG Runtime 6:18:16 + 7:05:58 (13:24:14) 08:04:52 + 39:09:43 (47:14:35)
TABLE V Results on Design A in Descending and Ascending Orders Design A In descending order
In ascending order
Fault Coverage 78.84% + 5.40% (84.24%) 78.60% + 5.64% (84.24%)
Pattern Count 1505 + 3192 (4697) (1.77X) 1428 + 3522 (4950) (1.68X)
ATPG Runtime 6:18:16 + 7:05:58 (13:24:14) 6:06:12 + 7:45:33 (13:51:45)
The ATPG runtime, on the other hand, was increased by approximately 50%. On the contrary, for Design B, the pattern count using one-hot clocking alone given in Table III is 2.10 (= 39099/18649) times the pattern count using the hybrid approach given in Table IV. The ATPG runtime, on the other hand, was increased by approximately 10%. C. Comparison Table V shows the ATPG results using hybrid clocking with the gate counts in the five clock groups processed in descending and ascending orders for Design A. ATPG in descending order means that a clock group with the largest gate count is captured first. The result indicates that performing ATPG based on the descending order of the gate counts of all clock groups yields smaller pattern count and ATPG runtime than on the ascending order. The reason was mainly due to reduced sequential depth, as explained in Section IV-B. By making larger sized clock groups receive their clock pairs earlier, faults inside these clock groups would not have to justify through other smaller sized clock groups. This will lead to higher fault coverage, smaller pattern count, and shorter ATPG runtime. D. Summary In summary, the applications results show that: 1) proper clock grouping and clock ordering help reduce pattern count, and 2) the proposed hybrid scheme on average can yield 1.7X to 2.1X reduction in pattern count as compared to using the one-hot scheme alone. One-hot clocking, however, has the benefit of shorter ATPG runtime. Hence, we recommend simply using one-hot ATPG at the early development stage
WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS
to have a feel of the fault coverage and pattern count of the design. When the design is being taped out, the hybrid scheme is then run to reduce pattern count. Since the same maximum fault coverage can be reached as one-hot clocking, the proposed hybrid scheme provides an ideal solution to reduce manufacturing test cost. VI. Conclusion Modern scan designs can contain tens of millions of logic gates and dozens of clock domains. When a scan design contains a mix of synchronous and asynchronous clock domains, the conventional one-hot LOC scheme for captureclocking results in very large test sets. This can substantially increase both test application time and scan test cost. While the simultaneous LOC scheme could be used to reduce pattern count, the loss of fault coverage and applicability to test compression could be unacceptable. In this paper, we first presented an aligned LOC scheme, either launch aligned or capture aligned, for testing synchronous domains in a scan design. We next presented a staggered LOC scheme for testing asynchronous domains. After clock grouping, we then presented a hybrid ATPG technique that combines staggered LOC and one-hot LOC clocking schemes together. The hybrid staggered-followed-by-one-hot scheme resulted in 1.7X to 2.1X reduction in pattern count with ATPG runtime increase by approximately 10% to 50%, compared to the one-hot scheme alone, on two large industrial scan designs that contain asynchronous clock groups. Because onehot clocking is always used after staggered clocking, the hybrid scheme causes no loss in fault coverage. It should be noted that novel, commercial ATPG approaches, such as those proposed in [25] and [26] for detecting stuck-at faults, may also be applicable for testing delay faults. Although we are unable to compare the results, we predict that the proposed hybrid scheme could result in smaller pattern count, because the patented staggered clocking scheme is applied to all asynchronous clock domains in the first phase. Acknowledgment The authors are grateful to the anonymous referees for pointing out unclear descriptions of this paper and giving constructive suggestions. References [1] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990. [2] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Boston, MA: Springer, 2000. [3] N. K. Jha and S. K. Gupta, Testing of Digital Systems. London, U.K.: Cambridge University Press, 2003. [4] L.-T. Wang, C.-W. Wu, and X. Wen, Eds., VLSI Test Principles and Architectures: Design for Testability. San Francisco, CA: Morgan Kaufmann, 2006.
461
[5] L.-T. Wang, C. E. Stroud, and N. A. Touba, Eds., System-on-Chip Test Architectures: Nanometer Design for Testability. San Francisco, CA: Morgan Kaufmann, 2007. [6] J. Savir and S. Patil, “Broad-side delay test,” IEEE Trans. Comput.-Aided Design, vol. 13, no. 8, pp. 1057–1064, Aug. 1994. [7] J. Savir and S. Patil, “Scan-based transition test,” IEEE Trans. Comput.Aided Design, vol. 12, no. 8, pp. 1232–1241, Aug. 1993. [8] S. Wang, X. Liu, and S. T. Chakradhar, “Hybrid delay scan: A low hardware overhead scan-based delay test technique for high fault coverage and compact test sets,” in Proc. IEEE/ACM Design, Autom. Test Eur. Conf., Feb. 2004, pp. 1296–1301. [9] J. Abraham, U. Goel, and A. Kumar, “Multi-cycle sensitizable transition delay faults,” in Proc. IEEE VLSI Test Symp., Apr.–May 2006, pp. 306– 313. [10] Z. Zhang, S. M. Reddy, I. Pomeranz, X. Lin, and J. Rajski, “Scan tests with multiple fault activation cycles for delay faults,” in Proc. IEEE VLSI Test Symp., Apr.–May 2006, pp. 343–348. [11] N. Ahmed and M. Tehranipoor, “Improving transition delay test using a hybrid method,” IEEE Design Test Comput., vol. 23, no. 5, pp. 402–412, Sep.–Oct. 2006. [12] G. Xu and A. D. Singh, “Delay test scan flip–flop: DFT for high coverage delay testing,” in Proc. Int. Conf. VLSI Des., Jan. 2007, pp. 763–768. [13] G. Xu and A. D. Singh, “Achieving high transition delay fault coverage with partial DTSFF scan chains,” in Proc. IEEE Int. Test Conf., Oct. 2007, pp. 1–9. [14] I. Park and E. J. McCluskey, “Launch-on-shift-capture transition tests,” in Proc. IEEE Int. Test Conf., Oct. 2008, pp. 1–9. [15] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, and J. Rajski, “Logic BIST for large industrial designs: Real issues and case studies,” in Proc. IEEE Int. Test Conf., Sep. 1999, pp. 358–367. [16] M. Beck, O. Barondeau, M. Kaibel, F. Poehl, X. Lin, and R. Press, “Logic design for on-chip test clock generation: Implementation details and impact on delay test quality,” in Proc. IEEE/ACM Design Autom. Test Eur. Conf., Mar. 2005, pp. 56–61. [17] H. Furukawa, X. Wen, L.-T. Wang, B. Sheu, Z. Jiang, and S. Wu, “A novel and practical control scheme for inter-clock at-speed testing,” in Proc. IEEE Int. Test Conf., Oct. 2006, pp. 1–10. [18] X.-X. Fan, Y. Hu, and L.-T. Wang, “An on-chip test clock control scheme for multi-clock at-speed testing,” in Proc. IEEE Asian Test Symp., Oct. 2007, pp. 341–348. [19] B. Keller, A. Uzzaman, B. Li, and T. Snethen, “Using programmable onproduct clock generation (OPCG) for delay test,” in Proc. IEEE Asian Test Symp., Oct. 2007, pp. 69–72. [20] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J. Chao, and X. Wen, “Multiple-capture DFT system for detecting or locating crossing clock-domain faults during scan-test,” U.S. Patent 7 260 756, Aug. 21, 2007. [21] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J. Chao, and X. Wen, “Multiple-capture DFT system for detecting or locating crossing clock-domain faults during self-test or scan-test,” European Patent 1 360 513, Apr. 2, 2008. [22] L.-T. Wang, X. Wen, S. Wu, H. Furukawa, H.-J. Chao, B. Sheu, J. Guo, and W.-B. Jone, “Using launch-on-capture for testing BIST designs containing synchronous and asynchronous clock domains,” IEEE Trans. Comput.-Aided Des., vol. 29, no. 2, pp. 299–312, Feb. 2010. [23] L.-T. Wang, K. S. Abdel-Hafez, X. Wen, B. Sheu, and S.-M. Wang, “Smart capture for ATPG (automatic test pattern generation) and fault simulation of scan-based integrated circuits,” U.S. Patent 7 124 342, Oct. 17, 2006. [24] K. S. Abdel-Hafez, L.-T. Wang, B. Sheu, Z. Wang, and Z. Jiang, “Method for performing ATPG and fault simulation in a scan-based integrated circuit,” U.S. Patent 7 210 082, Apr. 24, 2007. [25] V. Jain and J. Waicukauski, “Scan test data volume reduction in multiclocked designs with safe capture technique,” in Proc. IEEE Int. Test Conf., Oct. 2002, pp. 148–153. [26] X. Lin and R. Thompson, “Test generation for designs with multiple clocks,” in Proc. IEEE/ACM Design Autom. Conf., Jun. 2003, pp. 662– 667.
462
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011
Shianling Wu (S’88–M’09) received the M.S. degree in computer science from Columbia University, New York, NY. She joined SynTest Technologies, Inc., Princeton, NJ, in 2003, and is currently the Vice President of Engineering focusing on advanced very large scale integration design-for-testability (DFT) research and development. Prior to SynTest, she was with Bell Laboratories, Madison, WI, for over 23 years. In 2008, she was with the Department of Creative Informatics at Kyushu Institute of Technology, Iizuka, Fukuoka, Japan, where she is now a Ph.D. candidate. She currently holds five U.S. patents and has three pending U.S. patents. She has published over 15 DFT papers and contributed chapters to two DFT textbooks: VLSI Test Principles and Architectures in 2006 and Electronic Design Automation in 2009. Ms. Wu has served as a Program Committee Member for the IEEE International Test Conference, the Asian Test Symposium, and the North Atlantic Test Workshop. She won numerous AT&T and Lucent Awards and received the Best Panel Award with her panelists in the 2005 IEEE International Test Conference. She was a member of SEMATECH, SRC, GSRC, STARC-International, VSIA, and the IEEE1500 Standard Committee. Laung-Terng Wang (M’87–SM’04–F’08) received the B.S.E.E. and M.S.E.E. degrees from National Taiwan University, Taipei, Taiwan, in 1975 and 1977, respectively, and the M.S.E.E. and E.E.Ph.D. degrees under the Honors Cooperative Program from Stanford University, Stanford, CA, in 1982 and 1987, respectively. He has been the Chairman and Chief Executive Officer with SynTest Technologies, Inc., Sunnyvale, CA, since January 1990, and a Visiting Professor with the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University since July 2009. Prior to founding SynTest in 1990, he held several positions in industry, including Intel, Santa Clara, CA, from 1980 to 1983, and Daisy Systems, Mountain View, CA, from 1983 to 1986, and was with the Department of Electrical Engineering, Stanford University, as a Research Associate and Lecturer from 1987 to 1991. He currently holds 28 U.S. patents, 15 European patents, one Japanese patent, and one Chinese patent in the areas of scan synthesis, test generation, at-speed scan testing, test compression, logic built-in self-test, and design for debug-and-diagnosis. The design-for-testability technologies developed by him have been successfully implemented in thousands of application-specific integrated circuit designs worldwide. He has also co-authored and co-edited three internationally used DFT/EDA textbooks: VLSI Test Principles and Architectures in 2006, Systemon-Chip Test Architectures in 2007, and Electronic Design Automation in 2009. Dr. Wang is a member of Sigma Xi. He received the 2007 Meritorious Service Award from the IEEE Computer Society and was a co-recipient of the 2008 IEICE Information and Systems Society Excellent Paper Award for an excellent series of papers that appeared in the IEICE Transactions on Information and Systems during a period of 5 years. He is a Golden Core Member of the IEEE Computer Society, and is a member of the 2010 IEEE Computer Society Fellow Evaluation Committee.
Xiaoqing Wen (S’89–M’93–SM’08) received the B.E. degree in computer science and technology from Tsinghua University, Beijing, China, in 1986, the M.E. degree in information engineering from Hiroshima University, Hiroshima, Japan, in 1990, and the Ph.D. degree in applied physics from Osaka University, Osaka, Japan, in 1993. From 1993 to 1997, he was an Assistant Professor with Akita University, Akita, Japan. He was a Visiting Researcher with the University of Wisconsin, Madison, from October 1995 to March 1996. He joined SynTest Technologies, Inc., Sunnyvale, CA, in 1998, and served as its Chief Technology Officer until 2003. In 2004, he joined the Department of Creative Informatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan, where he is currently a Professor. He co-authored and co-edited two books: VLSI Test Principles and Architectures: Design for Testability (San Francisco, CA: Morgan Kaufmann, 2006) and Power-Aware Testing and Test Strategies for Low Power Devices (New York: Springer, 2009). He currently
holds 23 U.S. patents and five Japanese patents in logic built-in self-test, test compression, and low-capture-power (LCP) test generation. His current research interests include design, test, and diagnosis of integrated circuits. Dr. Wen is a member of the IEICE, IPSJ, and REAJ. He received the 2008 IEICE-ISS Best Paper Award for LCP X-filling/test generation.
Zhigang Jiang received the B.S. degree from the Department of Electrical Engineering, Tsinghua University, Beijing, China, in 1995, the M.S. degree from the Department of Electrical Engineering, San Jose State University, San Jose, CA, in 1997, and the Ph.D. degree from the Department of Electrical Engineering, University of Southern California, Los Angeles, in 2005. He currently manages the ATPG Research and Development Group, SynTest Technologies, Sunnyvale, CA. His current research interests include design for testability, built-in self-test, fault diagnosis, and design of high-performance computer-aided design tools.
Lang Tan received the B.S. degree in computer science from Central South University, Changsha, China, in 2004, and the M.S. degree from the Department of Computer Science, Shanghai Jiaotong University, Shanghai, China, in 2007. He is currently a Research and Development Engineer with SynTest Technologies, Inc., Shanghai. His current research interests include design for testability, test compression, low-power testing, and fault diagnosis.
Yu Zhang received the B.S. degree from the Department of Computer Science, Anhui University, Hefei, China, in 2005, and the M.S. degree from the Department of Computer Science, University of Science and Technology of China, Hefei, in 2008. He is currently a Research and Development Engineer with the ATPG Group, SynTest Technologies, Inc., Shanghai, China. His primary research interests include design for testability, fault modeling, test generation, test compression, and low-power testing.
Yu Hu (M’06) received the B.S., M.S., and Ph.D. degrees, all in electrical engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 1997, 1999, and 2003, respectively. She is currently an Associate Professor with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. Her current research interests include reliable design, fault diagnosis, and testing. Dr. Hu is a member of ACM, IEICE, and CCF.
Wen-Ben Jone (M’84–SM’02) was born in Taipei, Taiwan. He received the B.S. degree in computer science and the M.S. degree in computer engineering, both from National Chiao-Tung University, Hsinchu, Taiwan, in 1979 and 1981, respectively, and the Ph.D. degree in computer engineering and science from Case Western Reserve University, Cleveland, OH, in 1987. In 1987, he joined the Department of Computer Science, New Mexico Institute of Mining and Technology, Socorro, where he was promoted as an Associate Professor in 1992. From 1993 to 2000, he was a Full Professor with the Department of Computer Engineering and Information Science, National Chung-Cheng University, Chiayi, Taiwan. Since 2001, he has been an Associate Professor with the Department of Electrical and Computer
WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS
Engineering, University of Cincinnati, Cincinnati, OH. He was a Visiting Scholar with the Institute of Information Science, Academia Sinica, Taipei, Taiwan, and with the Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, Hong Kong. His current research interests include very large scale integration design for testability and reliability, low-power circuit design and test, and computer architecture and parallel processing. Dr. Jone was a co-recipient of the 2003 IEEE Donald G. Fink Prize Paper Award. He was also a co-recipient of the Best Paper Award of the 2008 International Symposium on Low-Power Electronics and Design.
Michael S. Hsiao (S’95–M’97–SM’04) received the B.S. degree in computer engineering (highest honors), and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois at Urbana-Champaign, Urbana, in 1992, 1993, and 1997, respectively. He was a Visiting Scientist with NEC America, Inc., Princeton, NJ, in 1997, and in 2002 he was a Visiting Professor with Intel, Santa Clara, CA. He was an Assistant Professor with the Department of Electrical and Computer Engineering, Rutgers, NJ, and the State University of New Jersey, Piscataway, between 1997 and 2001. From 2001 to 2006, he was an Associate Professor with the Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg. Since 2006, he has been a Professor with the same department. He and his research group have published more than 180 refereed journal and conference papers. His current research interests include very large scale integration testing, design verification, diagnosis, and power management. Dr. Hsiao was a recipient of the Digital Equipment Corporation Fellowship, the McDonnell Douglas Scholarship, the National Science Foundation CAREER Award, and is recognized for most influential papers in the first 10 years (1998–2007) of the Design Automation and Test Conference in Europe. He has served on the program committees for more than 40 IEEE international conferences and workshops, in addition to serving as an Associate Editor on ACM Transactions Design Automation of Electronic Systems, as well as on editorial boards of several journals.
463
James Chien-Mo Li (S’93–M’02) received the B.S.E.E. degree from National Taiwan University, Taipei, Taiwan, in 1993, and the M.S.E.E. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1997 and 2002, respectively. He is currently an Associate Professor with the Graduate Institute of Electronics Engineering, National Taiwan University. His current research interests include design for testability, built-in selftesting, low-power testing, and fault diagnosis. Jiun-Lang Huang (S’96–M’99) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1992, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of California, Santa Barbara (UCSB), in 1995 and 1999, respectively. From 2000 to 2001, he was an Assistant Research Engineer with the ECE Department, UCSB. In 2001, he joined National Taiwan University and is currently an Associate Professor with the Graduate Institute of Electronics Engineering and the Department of Electrical Engineering. His current research interests include design for testability, built-in self-test, and calibration for mixed-signal systems.
Lizhen Yu (M’10) received the B.S. degree in computer science and applications and the M.S. degree in nuclear technologies and applications, both from the University of Science and Technology of China, Hefei, China, in 2001 and 2005, respectively. She is currently a Research and Development Manager with Syntest Technologies, Inc., Shanghai, China. Her current research interests include design for testability, design methodology, simulation, and so on, for static random access memories and digital logic. She has published three papers in these areas.