IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113
1
Scan Test Cost and Power Reduction through Systematic Scan Reconfiguration Ahmad Al-Yamani, Member, IEEE, Narendra Devta-Prasanna, Member, IEEE, Erik Chmelar, Member, IEEE, Mikhail Grinchuk, , Member, IEEE, and Arun Gunda, Member, IEEE 1
Abstract—This paper presents Segmented Addressable Scan (SAS), a test architecture that addresses test data volume, test application time, test power consumption, and tester channel requirements using a hardware overhead of a few gates per scan chain. Using SAS, the paper also presents Systematic Scan Reconfiguration (SSR), a test data compression algorithm that is applied to achieve 10x to 40x compression ratios without requiring any information from the ATPG tool about the unspecified bits. The architecture and the algorithm were applied to both single-stuck as well as transition fault test sets. Index Terms—Design for testability, Integrated circuit testing, Self–testing, Test set compression.
contents of the flip-flops and compare them with the correct response. Shifting the response out is done concurrently while shifting in a new test pattern.
SI1 SI2 SIn
1
CUT 2 3
m
1
2
3
m
1
2
3
m
SO1 SO2 SOn
Fig. 1 Multiple scan chains architecture. I.
INTRODUCTION
T
HE quality of structural testing for digital circuits is a function of the accessibility to the internal nodes of the circuit. The most widely used design for testability (DFT) technique to improve accessibility is scan-path, which is based on serialization of test data [1]. In scan-based testing, the flipflops in the circuit under test are connected together to form one or multiple scan chains as shown in Fig. 1. Through these scan chains, arbitrary test patterns can be shifted into the flipflops and applied to the circuit. The main advantage of scan testing is improving the controllability and observability of the circuit under test by having direct access to the flip-flops. In scan-path methods, the circuit is designed so that it has two modes of operation: normal functional mode and scan mode. In scan mode, the flip-flops in the circuit are connected as one or more shift registers so that it is possible to shift arbitrary test patterns into the flip-flops. The test pattern is applied by returning the circuit to normal mode for one clock cycle during which the contents of the flip-flops are applied to the combinational circuitry and the outputs of the combinational circuitry are stored back in the flip-flops. The circuit can then be placed in the scan mode to shift out the
Manuscript received January 11, 2006. This work was done at LSI Logic. A. A. Al-Yamani is with King Fahd University of Petroleum and Minerals, Dhahran31261, Saudi Arabia, and with the Center for Reliable Computing,, Stanford University, Stanford, CA, USA. Phone: +9663-860-3186; fax: +9663-860-3059; e-mail:
[email protected]. N. Devta-Prasanna is with the electrical and computer engineering department at the University of Iowa (e-mail:
[email protected]). E. Chmelar (e-mail:
[email protected]), M. Grinchuk (e-mail:
[email protected]), and A. Gunda (
[email protected]) are with LSI Logic, Milpitas, CA 95035 USA. Copyright © 2006 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to
[email protected].
While scan-based testing improves the test quality significantly, it causes some challenges resulting in significant increase in test cost. These challenges are: (1) Test time and pin count trade off: every test pattern needs to be shifted into these shift registers before being applied. For example, a circuit with 128K flip-flops organized into 32 balanced scan chains will have a chain length of 4,000 flip-flops. For every pattern to be applied, 4,000 clock cycles are spent loading that pattern into the scan chains. Increasing the number of scan chains to reduce the loading time causes an increase in another costly parameter, which is the number of tester pins available for loading and unloading the scan chains. (2) Test power consumption and shift speed trade off: Because all flip-flops are clocked while shifting patterns in and out of the scan chains, the power consumption of the circuit is much higher during test than it is during normal operation. Since the circuit is designed to work within the functional power budget, power consumption during shift operations causes major test validity concerns. One of the solutions for this problem is reducing the frequency at which patterns are shifted in and out but that negatively contributes to the previous problem. Another fundamental problem with test today is the test data volume. The major cause for the problem is accessibility limitations. When scan-based testing is used, the size of every pattern is equivalent to the number of flip-flops and primary inputs in the circuit, a very large and ever increasing parameter. When the circuit is tested only through the primary inputs (sequential testing) it requires many more patterns to get the circuit to the desired state. Such patterns are, again, a very large and ever increasing parameter. Existing solutions in the industry often address some but not all of the above challenges simultaneously. The most popular solution includes several compression techniques used
...
...
...
Output Copmressor
Output Compressor
Input Decompressor
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 2 II. RELATED WORK to reduce the data volume and the tester channel requirements. A generic architecture for such techniques is shown in Fig. 2. Illinois Scan Architecture (ISA) was introduced to reduce In such techniques, a compressed vector is loaded from the data volume and test application time. The basic architecture tester into the decompression circuitry, which expands the for Illinois scan is shown in Fig. 3 [6]. A given scan chain is vector into a test pattern in the scan chains. The test response split into multiple segments. Since a majority of the bits in is also compressed into a smaller vector using the output ATPG patterns are don’t care bits, there are chances that these compression circuitry. To name a few, [2], [3], [4], and [5] segments will have compatible vectors (not having opposite discuss such compression techniques. Using shadow registers care bits in one location). In this case, all segments of a given for the decompressor, the number of tester channels required chain are configured in broadcast mode to read the same is reduced. vector. This speeds up the test vector loading time and reduces the data volume by a factor equivalent to the number of segments. In case if the segments within a given scan chain CUT are incompatible, the test vector needs to be loaded serially by SI1 SO1 1 2 3 m reconfiguring the segments into a single long scan chain. The fact that a majority of the ATPG bits (95-99% [8]) are don’t SI2 SO2 care bits makes ISA an attractive solution for data volume and 1 2 3 m test time. Several enhancements to the Illinois scan architecture have SIn SOn 1 2 3 m been proposed and discussed in the literature for multiple reasons. Lee et. al. presented a broadcasting scheme where ATPG Tester channels patterns are broadcasted to multiple scan chains within a core or across multiple cores. The broadcast mode is used when the Fig. 2 A generic architecture for input and output compression. vectors going into multiple chains are compatible [9]. Illinois Scan Architecture (ISA) shown in Fig. 3 is another [10] introduced a token scan architecture to gate the clock class of solutions that was introduced in [6] to reduce data to different scan segments while taking advantage of the volume and test application time by splitting the scan chain regularity and periodicity of scan chains. Another scheme for into multiple segments and broadcasting the data to all of them selective triggering of scan segments was proposed in [11]. as long as the segments data are compatible. A novel scheme was presented in [12] to reduce test power consumption by freezing scan segments that don’t have care bits in the next test stimulus. By only loading the segments Scan chain 1,1 i1 that have care bits, data volume, application time, and test Scan chain 1,2 power consumption are all reduced at once. Only one segment of the scan chain is controlled and observed at a time. Chain 1,N [13] presented a scheme for resolving conflicts and dependencies between care bits in different segments of an Scan chain M,1 iM ISA architecture to improve the compression ratio. Scan chain M,2 A reconfigurable scheme was introduced in [14] to use mapping logic to control the connection of multiple scan Scan chain M,N chains. This increases the chances of compatibility between multiple chains and hence makes room for additional Fig. 3 Illinois scan architecture. compaction. The scheme we present here achieves a similar Very recently [7], we presented a new architecture and effect while eliminating the need for the mapping logic and circuitry for significantly reducing test data volume, test reducing power consumption significantly. SAS can also deal application time, test power consumption and tester channel with engineering change orders (ECOs) more efficiently due requirements. The new architecture, called segmented to the built-in flexibility in selecting compatibility classes. addressable scan (SAS), is based on ISA but it enables much Recently, [15] proposed a broadcast technique whose concept more aggressive segmentation of the scan chains by enabling is to only encode the minority specified 1 or 0 bits (either 1 or many different compatibility configurations among multiple 0) in scan slices for compression. segments. The X-pand scheme presented in [16] also presented a Section II of this paper gives appropriate credit to previous mapping scheme for an ISA based compression. The paper work. Section III presents segmented addressable scan and discussed compression using don’t care bits and using ATPG Sec. IV explains the multiple hot decoder associated wit SAS. configurations. X-pand, which was a major first step in the Section V explains systematic scan reconfiguration. Section right direction for compression, differs from SSR in two major VI discusses optimizing SSR based on power consumption. ways: (1) it doesn’t offer any power reduction. (2) it’s a Section VII shows experimental results and Sec. VIII combinational compactor, so shadow registers cannot be used concludes the paper. for further reduction in tester channel requirements.
The segmented addressable scan (SAS) architecture incorporates some of the basic concepts from Illinois scan [6] and from scan segment decoding [19] [20]. It has the following distinguishing features: (1) Multiple segments can be loaded simultaneously while maintaining the freedom of changing which segments are grouped together within a give
Tester Channel or Input Decompressor Fig. 4 Segmented Addressable Scan (SAS)
Output Copmressor
Multi-Hot Decoder
SEGMENTED ADDRESSABLE SCAN
...
III.
...
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 3 A new scan architecture was proposed in [17] to order the test pattern. Unlike the technique in [14], this is done without scan cells and connect them based on their functional additional mapping logic. Such reconfiguration of interaction. The technique was enhanced in [18] for routability compatibility allows for significant additional compaction and fault aliasing. While the technique produces promising leading to gains in data volume reduction. It also enables aggressive parallelism of the scan chains leading to reduction results, its complexity seems prohibitive for industrial designs. A circular scan scheme was presented in [19] to reduce test in test application time. (2) The compatible segments are data volume. The basic concept is to use a decoder to address loaded in parallel using a multiple-hot decoder (explained in different scan chains at different times. This increases the Sec. IV) instead of loading them serially and this further number of possible scan chains (2N–1 for an N-input decoder). reduces the time required to load the scan chains. For previous Also, the output of each scan chain is reconnected to its input. schemes, if only a subset of the segments satisfies This enables reusing the contents of the response captured in compatibility, that subset cannot be loaded in parallel. (3) The the chain as a new test stimulus if they are compatible. segments that are not loaded within a given round are not The previous schemes are either limited in how much they clocked leading to power savings which in turn allows for can benefit from compatibility between some of the segments faster clocking of the test patterns within the same power or don’t address the issue of power consumption during scan budget. This freezing leads to an increase in the scan time that or both. is compensated for by the aggressive parallelization and the Another attempt for using decoder-based segmentation is extra power budget as discussed in the results section. available in [20]. In this scheme the authors control the clocks The basic blocks of the SAS architecture are shown in Fig. to various segments through a regular decoder. The main 4. advantage of the scheme is power reduction during scan and The architecture shown can be implemented with a single capture. The solution doesn’t address data volume, or test decoder for the entire design or multiple decoders for multiple application time. scan chains. The multiple-hot decoder (MHD) is used to SAS hardware enhances the benefit from all scan activate all compatible segments within a given compatibility segmentation schemes by avoiding the limitation of having to class. A given address is loaded into the MHD to refer to a have all segments compatible to benefit from the single or multiple segments. The multiple activation of segmentation. In other words, any combination of segments compatible segments allows the technique to benefit from the can be compatible to lead to reduction in the test stimuli compatibility between the segments even when not all loaded. This is done with minimal overhead due to the segments are compatible. This gives a significant advantage multiple-hot decoder explained in Sec. IV. The scheme over previous segmentation techniques. Take for example the simultaneously addresses data volume, test time, power, and following 30-bit test pattern: tester channel requirement. Pattern : Recently, a scan chain segmentation technique was 0X11XXX0110X0XX101X1X11X0X011X presented in [21]. The technique is a BIST solution that Segment 1: 0X11X selectively inserts inversions at some locations in the scan path Segment 2: XX011 based on the ATPG patterns to minimize the number of Segment 3: 0X0XX weights required for weighted random patterns to achieve the Segment 4: 101X1 desired coverage. Segment 5: X11X0 The technique in [22] is a recent attempt for test cost Segment 6: X011X reduction through scan reconfiguration. The technique is based on finding the matches between the test response of Clock pattern n and the bits of pattern n+1. This technique requires Tree very high routing overhead due to the individual addressing of flip-flops just like random access scan presented in [23] and Segment 1 enhanced in [24]. Also, the matching could be done with a pseudorandom sequence generated from an LFSR and the match will be equally probable. Segment 2 Segment It should be clear to the reader that although the titles are Address close to each other, these two recent solutions are in essence very different from SSR. Segment M
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 4 T ABLE II Assume that the scan cells are segmented into 6 5-bit 2-INPUT 4-OUTPUT MULTIPLE HOT DECODER segments as shown above. If a regular segmentation scheme Inputs Outputs (like ISA) is applied to the pattern, we will not be able to take I I 1 0 advantage of partial compatibilities between segments. In O2 O1 O0 O3 a b c d other words, we will have to store and scan in the entire 30 0 0 0 0 X X X X bits. However, because of the segmentation scheme presented 0 0 0 1 X X X X here, we can split the above pattern into 3 compatibility classes as follows: 0 0 1 0 X X X X Class 1= {Seg 1, Seg 5}= 01110 0 0 1 1 X X X X Class 2= {Seg 2, Seg 3}= 0X011 0 1 0 0 X X X X Class 3= {Seg 4, Seg 6}= 10111 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 0 A regular decoder scheme like the ones in [19] and [20] 0 1 1 1 0 0 1 1 would take advantage of the above compatibility for data 1 0 0 0 X X X X volume reduction only. Because of the MHD used in the SAS 1 0 0 1 0 1 0 0 architecture, test time can also be optimized based on this 1 0 1 0 1 0 0 0 compatibility since the compatible classes will be loaded in 1 0 1 1 1 1 0 0 parallel. 1 1 0 0 X X X X 1 1 0 1 0 1 0 1 IV. SAS MULTIPLE-HOT DECODER 1 1 1 0 1 0 1 0 The multiple-hot decoder is supposed to take the address of 1 1 1 1 1 1 1 1 the segment(s) to be enabled and activate the clocks to those segments through the clock gating AND gates shown in Fig. 4. For regular one-hot decoders, the input is an address of the O3 a selected output. For the SAS multiple-hot decoder, the address can include don’t care bits (d’s) allowing multiple outputs to O2 b be activated. For the example shown above, compatibility class 1 can be loaded in parallel by combining the addresses of c O1 Segment 1 and Segment 5 {001, 101}, which means that the O0 address for this class is d01. The address for class 2 is 01d, d and the address for class 3 is 1d0. For the multiple-hot decoder, don’t care bits in the address Fig. 5 2-to-4 multiple-hot decoder. need to be encoded too. We use positional cube notation [25] The address for a given compatibility class can be loaded in to encode 0s, 1s and don’t cares as shown in Table I. parallel from the tester in which case the number of tester TABLE I channels needed is 2⎡log2S⎤+1 (the multiplication by 2 is a POSITIONAL CUBE NOTATION result of encoding 3 values 0, 1, and d). It can also be loaded Code Value serially. While shifting in a test pattern for one compatibility 10 0 class, the address for the next compatibility class can be 01 1 loaded into a shadow register. The total number of flip-flops 11 d for the shadow register is 2⎡log2S⎤. 00 Unused The block diagram for SAS architecture with 8 segments Positional cube encoding scheme results in an and serial loading of address register is shown in Fig. 6. This implementation for the multiple-hot decoder that requires the decoder shown in the diagram was optimized using standard same hardware as a regular one-hot address decoder. logic optimization schemes for decoders. Table II shows an example for the implementation of a 2The decoder and the register hardware overhead is very input 4-output multiple-hot decoder. The function in Table II minimal even with very aggressive segmentation. We can be implemented with 4 2-input AND gates as shown in calculated the size of the multiple-hot decoder, the clock Fig. 5. This is exactly the same hardware needed for a 2-input gating circuitry, the address register and the shadow registers one-hot decoder. Similarly, for 8 segments, we need 8 3-input for different segmentation options. The resulting worst-case AND gates. In general, if we have S segments, we need S number of transistors is shown in Fig. 7. The figure shows the AND gates each with ⎡log2S⎤ inputs for the multiple-hot number of transistors needed for the additional hardware. For decoder. For clock gating, we need additional S 2-input AND 128 segments, we need around 1,500 transistors i.e., less than gates. 500 gates.
Output Copmressor
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 5 Shadow address register address dd corresponds to all 4 segments. Serial address load Without the SAS architecture, we could choose Clock a multiplicity of configurations and generate Address Tree register patterns with them tied together. However, this Segment 7 Test load would require multiple multiplexers at the clock Segment 6 inputs and outputs of the scan segments to reconfigure them. It would also either cause Segment 5 problems with engineering changes or require these multiplexers to be highly reconfigurable Segment 4 which translates into high hardware overhead. Segment 3 The high flexibility and simplicity of the SAS architecture allows for a very large number of Segment 2
configurations ( 3⎡ 2 ⎤ , where S is the number Segment 1 of scan segments) with very simple hardware that doesn’t need to be changed with Segment 0 engineering changes. Physically, all segments in the SAS architecture are tied together. The decoder Tester channel or input decompressor controls which segments to load together by activating a subset of the clocks to these Fig. 6 8-segments SAS block diagram. segments based on the address loaded. Our SSR algorithm selects a set of The delay overhead is also not significant because 1024 segments need 10-input AND gates which can be configurations for combining scan segments together and then implemented using 2 or 3 levels of AND gates that are very fakes to the ATPG tool that these segments are tied together so unlikely to exceed the clock width. Furthermore, the shift that it generates compatible patterns for them. It continues speed is normally much lower than the actual speed of the with these configurations until complete fault coverage is circuit due to tester channel speed limitations. During the achieved. By complete, we mean the same coverage that can be achieved without tying any segments together). normal operation of the circuit the entire decoder is bypassed. SYSTEMATIC SCAN RECONFIGURATION
As it is obvious from the previous section, we need the information about the don’t care bits to generate the compatibility classes that are needed for SAS decoder address generation. There are two issues with this requirement: (1) Some ATPG vendors don’t provide don’t care bits information because they consider them confidential. (2) A fault can be d etected by multiple patterns. With the ATPG unaware of the SAS architecture, the selection of which patterns to generate by the ATPG tool will not be driven by higher compatibility but rather by ease of generation. As a result of the above two issues, we were not only forced to come up with an algorithm that doesn’t require don’t care bits but we were also convinced that we could drive the ATPG tool to generate more highly compatible patterns that would require the fewest number of addresses or configurations with SAS. The SSR algorithm is based on SAS hardware presented in Sec. III. It works by configuring the scan chains in the circuit such that they appear to be tied together to the ATPG tool with multiple configurations. The selection of which segments to tie together is done such that the number of addresses required to be loaded into the multiple-hot decoder is minimized. Basically, an address corresponds to a subset of the segments. For example, for a 2-to-4 multiple-hot decoder, the address 00 corresponds to segment 0, and so on. Also, the address 0d (d = don’t care) corresponds to segments 0 and 1. Finally, the
5000 Number of Transistors (with serial load)
V.
log S
4500 4000 3500 3000 2500 2000 1500 1000 500 0 2
4
8
16
32
64
128
256
512
Number of Segments
Fig. 7 Number of transistors needed for SAS architecture with different segmentation options (worst case estimate).
The algorithm is best explained by an example. Take a SAS architecture with 8 segments (the addresses for the individual segments are 000 through 111). First, we tie all segments together and we call this Category 0. There is only one configuration in this category, which corresponds to the address ddd. We run the ATPG tool with this configuration to detect as many faults as it can. Notice that during test application, all we need to do is load the address ddd in the decoder and then start loading the patterns in category 0. Also note that every pattern generated with this configuration is 1/8th (generally 1/S) of the size of the regular pattern
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
6
Classify all detectable faults as undetected Start with the configuration dd…d While (there are undetected faults) Generate ATPG patterns If the address care bit(s) are not the least significant Move address care bits to lower significance Else Increase the number of care bits in address Make the care bits the most significant Endif Endwhile End
Fig. 8 Systematic Scan Reconfiguration Algorithm.
(assuming segments are balanced). Most of the time, there will be undetected faults with this configuration. So, we switch to category 1. In category 1, only one of the address bits is specified and the remaining bits are all d’s. Notice that there are 3 possible configurations (generally ⎡log2 S ⎤
have two care bits instead of one. The first configuration will be ccd, which corresponds to tying the segments in four groups (0 with 1, 2 with 3, 4 with 5, and 6 with 7). We continue with these categories and configurations until all detectable faults are detected. The general algorithm for SSR configurations) where only one bit is specified. We start with is shown below in Fig. 8. Experiments show that we normally the configuration cdd, where c stands for a care bit. The care don’t need to go beyond category 1. A flow chart outlining the test pattern generation process of bit will take the values 0 and 1. This means that we use the SSR is shown in Fig. 9. addresses 0dd and 1dd. These two addresses correspond to By going through the example above, it seems that the tying segments 0, 1, 2, and 3 together and segments 4, 5, 6, ATPG runtime will be very long and that’s true. However, and 7 together. We invoke the ATPG tool to generate patterns there are multiple solutions that could be used for this and load only the faults that were not detected with category 0 problem. Here are some of them: patterns. The next configuration within category 1 is dcd, (1) The first solution is not to try all configurations but to cut which corresponds to segments 0, 1, 4, and 5 tied together and the process in the middle and jump to the configuration ccc, segments 2, 3, 6, and 7 tied together. We again invoke the which separates all segments from each other. This ATPG tool with the undetected faults. After the last configuration will detect all remaining detectable faults at any configuration in category 1, we go to category 2 where we step. (2) Another solution is not to start with the Add all detectable faults to the set F configuration dd…d but rather with cd…d or ccd…d. This will cut the runtime significantly Start with the SAS decoder configuration dd…d (all segments tied) or with because the first configuration is the hardest for a configuration cc…cdd…d (most significant address bits are care bits) the ATPG tool. (3) A third solution is to reduce the effort level Generate ATPG patterns for the faults in F given the selected decoder configuration with the first few configurations to the minimum such that the ATPG tool only picks Yes F = ∅? the faults that can be easily detected with No compatible patterns. Not surprisingly, the price for all of the above No Are the address care bits solutions is reduction in the compression ratio. the least-significant? The compression ratios that we achieved are Move the address care bits more than twice that we were looking for and so Yes to lower significance the task we leave to the test designer is to choose the right mix between compression ratio Increase the number of care bits in the address and runtime. It’s well-known to the reader by now that the SSR ATPG runtime is a one time Make the care bits cost while the SSR compression ratio is a the most significant recurrent saving because it cuts the test time by the same ratio. Exit and report a set of patterns Pi for each configuration i ∈ C, Proper credit should be given to [14] in where C is the set of selected which the idea of using multiple configurations configurations. of Illinois Scan was presented. However, the idea we are presenting here has the following distinguishing features: (1) The architecture in Fig. 9 Flow chart for SSR test generation.
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 7 [14] is based on mapping logic, multiplexers-based added are shifted into the scan chains, which causes many more fliphardware that combines multiple subsets together. The flops to switch. This switching increases the dynamic power which is modeled by hardware is designed based on reducing the number of consumption, compatibilities required because more compatibilities will 1 P= cn ∆VVdd α n f , where n corresponds to nodes in require more multiplexers and more scan inputs. In addition to 2 n the processing time required for these compatibilities, such information about which faults are detectable with which is the circuit, αn is the average number of times node n switches only available to ATPG vendors. Our SSR hardware does not per cycle, cn is the capacitance at node n, V is the voltage require any such information and does not need such extensive swing, Vdd is the supply voltage and f is the switching frequency. Reducing the scan in and out frequency is a very log S processing time. Furthermore, it allows 3⎡ 2 ⎤ different popular solution to avoid power problems but this contributes configurations without any additional overhead. For example, to test time. Another solution is to reduce the activity factor by an SSR configuration of 256 segments will automatically filling don’t care bits with a repeated value as that of the last allow more than 6500 configurations. For such flexibility, the care bit. This solution may impact the defect detection quality technique in [14] will require 256 6500-input multiplexers. of the test patterns. A third class of solutions gates the clock to SSR will require 256 8-input AND gates and 256 2-input some of the scan chains or segments and activates only a AND gates. (2) For the same example above, the number of subset of the flip-flops at a time. SSR belongs to the third class tester pins required for SSR is 17. For their technique to allow in its power saving capability, and it saves scan power not similar flexibility, the number of tester channels is more than capture power since capture power can be controlled using the 6500 tester pins. It can be argued that not all such right test patterns. For more details about power consumption configurations are needed to achieve an acceptable in scan test and its solutions, we refer the reader to [28]. compression ratio. However, these configurations can be used Assume that in normal scan-based testing the maximum to reduce runtime too (3) Any engineering change orders may shift frequency due to power constraints is f MHz. Since alter the compatibilities based on which the hardware in [14] Category 0 patterns (ddd address) activate all segments, the was synthesized. With SSR, all we need is to select a different maximum frequency for these patterns is f. For that category, set of compatibilities. No hardware changes are needed. (4) time saving is achieved through loading all segments together SSR inherently offers power reductions by selective which is an S factor saving, where S is the number of activation. (5) The technique in [14] is heavily based on segments. For category 1, with every pattern, we load half of broadcasting mode, which as will be shown in the results the segments at a time. This means that we are reducing the section is very time-consuming for the ATPG tool and it gets power consumption by a half. So, the loading frequency can worse with more aggressive parallelization. Their results show be doubled safely (all linear components). This means that up to 50x increase in ATPG runtime. As shown in the category 1 patterns can be loaded at a 2f frequency. So the experimental results, we found that it is very helpful in terms of runtime to use configurations with fewer chains in speed up for category 1 is 2 × S = S . A similar relation broadcast mode. This is something that SSR automatically 2 allows. scales with all categories. So, as long as the tester speed SynTest corporation has a new product called VirualScan™ allows it, categories that activate smaller subsets of segments [26], which reduces data cost by up to 50x. The technique is can shift patterns at higher speeds. Looking at test time based on Illinois Scan Architecture enhanced with the tool's independent from data volume, the optimal segmentation is ability to generate highly compatible patterns for broadcasting the one that increases the frequency to the maximum since the them to many short scan chains. Such a tool is ideal for ATPG runtime for that category of patterns is smallest due to integration with SSR since it capitalizes on the strength of the the reduced constraints. However, it’s well-known that data ATPG tool for producing highly compatible patterns of which volume is a significant test cost factor because the tester SSR can take extreme advantage. memory is limited, which makes the decision a relative Mentor Graphics has another compression tool significance one. TestKompress™ [27] that is actually based on linear VII. EXPERIMENTS AND RESULTS compression and decompression concepts. The technical details of the tool are discussed in [5]. Linear decompression SSR experiments were performed on the circuits in Table is an older category of compression techniques that is not as III, both of which are 180 nm designs. It has been evident to widely used as compatibility based techniques due to its cost us that SSR achieves better results with bigger designs. and complexity.
∑
TABLE III CIRCUIT CHARACTERISTICS.
VI. POWER-OPTIMIZED SSR
A significant concern with scan-based testing is power consumption. The circuit is designed for a certain power budget in the normal functional mode during which only a fraction of the flip-flop is active. During scan mode, patterns
Ckt1 Ckt2
flip-flops 29K 35.5K
Gates 350K 450K
Clk domains 10 26
Patterns 1.5K 3.4K
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 8 terms of test time) with a single scan chain loaded one flipA. Test Data Volume Although the actual number of patterns is almost doubled or flop at a time. If multiple channels are available on the tester, tripled with SSR, the storage required for those patterns is then SSR could also be enhanced with several decoders and a more than order of magnitude less than the storage required hierarchical system of segments where every group of for the regular patterns. Table IV shows the compression ratio segments has a dedicated decoder and a dedicated tester achieved by SSR for stuck-at test patterns using different channel. The details of the advantages and disadvantages of such a hierarchical system are available in [29]. numbers of segments. Connecting segments together creates constrains on the TABLE IV STUCK-AT TESTS DATA VOLUME COMPRESSION. ATPG tool to generate compatible vectors for segments that Ckt1 are tied together. Because of that, the tool ends up generating Total data volume 40 Mb Comp. Ratio many more patterns than in the case where the segments are SSR 32 Segments 3.3 Mb 12x not tied. If this was not the case, the speedup factor will data 64 Segments 2.4 Mb 16x always be the number of segments in case of tying them volume 128 Segments 2.0 Mb 19x together, and half the number of segments in case of tying half 256 Segments 1.9 Mb 21x of them together and so on. Table VI shows the scan time reduction for SSR using different numbers of segments. Ckt2 TABLE VI Total data volume 120 Mb Comp. Ratio STUCK-AT SCAN TIME COMPARISON. SSR 32 Segments 7.5 Mb 16x Ckt1 data 64 Segments 5.8 Mb 20x Regular scan time 40 M cycles Red. Ratio volume 128 Segments 4.8 Mb 25x SSR 32 Segments 3.2 Mc 12x 256 Segments 3.7 Mb 32x scan 64 Segments 2.3 Mc 17x time 128 Segments 1.7 Mc 23x Similar data for transition fault patterns is shown in Table 256 Segments 1.4 Mc 28x V. The results are slightly better. It’s obvious that the Ckt2 compression ratio increases as the number of segments Regular scan time 120 M cycles Red. Ratio increases for both single-stuck as well as transition patterns. SSR 32 Segments 7.1 Mc 17x The price for increasing the segments is the ATPG runtime, scan 64 Segments 5.7 Mc 21x which we will discuss. time 128 Segments 4.6 Mc 26x Similar reduction ratios are achieved for test time. 256 Segments 3.3 Mc 36x Furthermore, the fact that the cost for additional scan chains is minimal (just a few gates per chain), promises for significant reduction in test time. With only 20 decoder inputs and 1 scan input pin, our technique can support 1,024 scan chains. The 20 decoder inputs can be loaded serially using a shadow shift register. For Ckt1, that means 29 flip-flops per scan chain. The above parallelization considers parallel loading into the decoder without any shadow registers. Using shadow registers allows for more parallelization. TABLE V TRANSITION TESTS DATA VOLUME COMPRESSION.
Ckt1 Total data volume SSR 32 Segments data 64 Segments volume 128 Segments 256 Segments
98 Mb 7.7 Mb 5.3 Mb 4.5 Mb 3.6 Mb
Comp. Ratio 12x 18x 22x 27x
300 Mb 21.7Mb 14.1Mb 11.8Mb 7.7Mb
Comp. Ratio 14x 21x 25x 39x
Ckt2 Total data volume SSR 32 Segments data 64 Segments volume 128 Segments 256 Segments
B. Test Time Since SSR requires only one tester channel (see Fig. 6) that is shared between all segments, it is fair to compare it (in
As explained in Sec. VI, SSR can be enhanced through exploiting the power reduction from freezing some of the segments. This allows for increasing the scan speed within the same power budget. Using this power optimized SSR, the scan time reduction ratios are shown in Table VII. Notice that the table expresses scan time in cycle-equivalent not actual number of cycles since the cycles are now shorter for some patterns. TABLE VII POWER OPTIMIZED SSR SCAN TIME.
Ckt1 40 M cycles 2.7 Mc 1.9 Mc 1.5 Mc 1.1 Mc Ckt2 Regular scan time 120 M cycles SSR 32 Segments 6.8 Mc scan 64 Segments 5.5 Mc time 128 Segments 3.4 Mc 256 Segments 1.7 Mc Regular scan time SSR 32 Segments scan 64 Segments time 128 Segments 256 Segments
Red. Ratio 14x 21x 27x 37x Red. Ratio 18x 22x 35x 66x
SSR test data volume
Fault Coverage
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 9 manner). The figures deliver 2 significant messages: (1) It is C. Scan Power Using Category 0 means that all segments are active. clear that the first category (all segments tied together) Although this category achieves the highest compression ratio, achieved more than 99% of the achievable coverage it does not offer any power reduction. The other categories (achievable = 97.3, achieved = 96.7). (2) We don’t need more achieve a lower compression ratio and lower power than the first two categories to achieve the achievable consumption. For example, Category 1 activates half of the coverage. In fact, we even slightly exceeded it. Fig. 14 and Fig. 15 show similar results to those in Fig. 12 segments simultaneously, which leads to reducing scan power consumption by 50%. Fig. 10 shows the trade off between and Fig. 13 but for the transition fault model instead of the data volume reduction and normalized power consumption for single-stuck model. The observations for transition patterns the single-stuck test set as a function of the starting category. are consistent with those for single-stuck patterns. Fig. 11 shows similar data for the transition test set. By Normal Coverage SSR Coverage starting with a category other than 0, the scan power consumption is significantly reduced. As mentioned earlier, 97.5 SSR does not deal with capture power consumption which can be manipulated using the right test patterns. Normalized Estimated Power
8
1 0.8
6 5
0.6
4 0.4
3 2
0.2
Normalized Power
Data Volume (Mb)
7
96.5
96 Category 0
Cat1Conf1
Cat1Conf2
Cat1Conf3
Cat1Conf4
Cat1Conf5
Fig. 12 Progressive SSR coverage with 2 categories of single-stuck patterns (Ckt1).
1 0
97
0 Category 0Category 1Category 2Category 3Category 4 Normal Coverage
Fig. 10 SSR single-stuck data volume and power reduction.
92.5
20
1
16
0.8
12
0.6
8
0.4
4
0.2
0
0 Category Category Category Category Category 0 1 2 3 4
Fig. 11 SSR transition data volume and power reduction.
D. Progressive Fault Coverage To give the reader an idea about how much fault coverage can be achieved while tying multiple segments together, we show the fault coverage progressive improvement of SSR together with the normal fault coverage achieved with basic ATPG without SSR. Fig. 12 and Fig. 13 show the fault coverage vs. the categories and configurations used with 32 segments (the other segmentations behaved in a similar
Fault Coverage
Normalized Estimated Power
Normalized Power
Data Volume (Mb)
SSR test data volume
SSR Coverage
92.6
92.4 92.3 92.2 92.1 92 Category 0
Cat1Conf1
Cat1Conf2
Cat1Conf3
Cat1Conf4
Cat1Conf5
Fig. 13 Progressive SSR coverage with 2 categories of single-stuck patterns (Ckt2).
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 10 giving up on some of the faults and so not generating any Normal Coverage SSR Coverage patterns for them. 85.2
84.7
SSR runtime 14000
5400
12000
5350
10000
5300
8000
5250
6000
5200
4000
5150
2000
84.2
83.7 Category 0
Cat1Conf1
Cat1Conf2
Cat1Conf3
Cat1Conf4
Cat1Conf5
5100
runtime (sec)
Fault Coverage
No. of patterns 5450
0 30
20
10
5
Abort limit
Fig. 14 Progressive SSR coverage with 2 categories of transition patterns (Ckt1).
Fig. 16 SSR single-stuck runtime reduction as a function of abort limit.
85.2
Fault Coverage
85 84.8 84.6 84.4 84.2 Category 0
Cat1Conf1
Cat1Conf2
Cat1Conf3
Cat1Conf4
Cat1Conf5
Fig. 15 Progressive SSR coverage with 2 categories of transition patterns (Ckt2).
E. ATPG Runtime We tried some of the solutions proposed earlier for reducing ATPG runtime for SSR. The first one was to reduce the effort level (abort limit) of the ATPG tool. Since the technique is based on constraining the way the ATPG tool works, the tool spends a significant amount of time trying to satisfy the constraints and detect the faults. By reducing the abort limit, the tool won’t try as hard in a given step. This scheme works because the undetected faults can be detected with one of the next configurations or categories. Fig. 16 shows the effect of reducing the abort limit on the SSR runtime. It is clear that this would save some runtime but not significantly enough. Specially, given that the increase in runtime with SSR is very large. Notice that this experiment was tried only with the first category i.e., with all segments tied together. We didn’t go through all steps of the algorithm as this step was the one that contributed the most to runtime, pattern count and fault coverage. Changes in the other steps won’t be significant. The fact that the number of patterns decreases as the abort limit is reduced has is due the tool
The very promising and best performing runtime reduction scheme was the second one. Recall that in this solution we do not start with the configuration dd…d but rather with cd…d or ccd…d. This means that we start with category 1 or category 2, etc. The higher the category we start with, the higher the number of care bits in the initial address. This translates into relaxed constraints on the ATPG too to find matching patterns for the tied segments. On the other hand, since there are fewer segments tied together, the compression ratio will be lower. So, which category to select as a starting category for SSR is a trade off between runtime and compression ratio. Fig. 17 shows the reduction in runtime and the increase in stuck-at data volume as a function of the starting category for SSR. The figure also shows the reference data volume and runtime. We arbitrarily chose the 128 segments configuration of Ckt1. As expected, starting with higher categories reduces the runtime and as well as the compression ratio. For starting with category 0, the compression ratio is 19x and runtime is 4.5 hours. For starting with category 3, the compression ratio is 9x and the runtime is 25 minutes. Similar fault coverage was achieved by all starting categories as well as the basic ATPG. Total test data volume Basic ATPG runtime
SAS test data volume SAS runtime
40
300 250
30 200 20
150 100
Runtime (min)
SSR Coverage
Data volume (Mb)
Normal Coverage
10 50 0
0 Category 0
Category 1
Category 2
Category 3
Starting category
Fig. 17 SSR single-stuck data volume increase and runtime reduction as a function of the starting category.
Fig. 18 shows similar data for transition test patterns also for the 128 segments configuration of Ckt1. For starting with
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 11 [6] I. Hamzaoglu and J. Patel, “Reducing Test Application Time for Full category 0, the compression ratio is 22x and runtime is 60 Scan Embedded Cores” IEEE International Symposium on Fault hours. For starting with category 3, the compression ratio is 9x Tolerant Computing (FTC’99), pp. 260-267, 1999. and the runtime is 6 hours. Similar fault coverage was [7] A. Al-Yamani, Erik Chmelar, and Mikhail Grinchuk, "Segmented Addressable Scan Architecture," VLSI Test Symposium (VTS'05), May achieved by all starting categories as well as the basic ATPG. Total test data volume Basic ATPG runtime
[8]
SAS test data volume SAS runtime
100
4000
[9]
3500 3000 2500
60
2000 40
1500
Runtime (min)
Data volume (Mb)
80
1000
20
[10]
[11]
500 0
0 Category 0
Category 1
Category 2
Category 3
Starting category
Fig. 18 SSR transition data volume increase and runtime reduction as a function of the starting category.
The selection of the starting category is a tradeoff between test data and test time reduction on one side, and scan power and ATPG runtime on the other side. These parameters depend largely on the size of the circuit. Depending on the available resources, guidelines can be set for the right combination of parameters when running the algorithm given the range of the circuit size.
[12]
[13]
[14]
[15]
[16] VIII. CONCLUSIONS
This paper presented a scan architecture and a corresponding compression algorithm. The architecture together with the algorithm provide very flexible tools to control the trade off between many factors that affect scan test cost. These factors include test data volume, test application time, test power consumption, tester channel requirements, and test generation time. The technique satisfies the test data volume and test time requirements of all of our designs without requiring any information about the unspecified bits in the test patterns. It also reduces tester pin requirements while requiring minimal hardware overhead. The paper also presented a power-optimized version of the technique which can be efficiently utilized for improving scan speed. ACKNOWLEDGMENT
The authors would like to acknowledge the support provided by King Fahd University of Petroleum and Minerals and by the University of Iowa. REFERENCES [1] [2] [3]
[4]
[5]
E.J. McCluskey, Logic Design Principles with Emphasis on Testable Semicustom Circuits, Prentice-Hall, Englewood Cliffs, NJ, USA, 1986. B. Koenemann “LFSR-Coded Test Patterns for Scan Designs,” European Test Conference (ETC’91), pp. 237-242, 1991. E.J. McCluskey, D. Burek, B. Koenemann, S. Mitra, J. Patel, J. Rajski and J. Waicukauski, “Test Data Compression,” Design & Test of Computers, Vol. 20, No. 2, pp. 76 – 87, March-April 2003. A. Al-Yamani and E.J. McCluskey, "Seed Encoding for LFSRs and Cellular Automata," 40th Design Automation Conference (DAC'03), June 2003. J. Rajski, J. Tyszer, M. Kassab and N. Mukherjee, “Embedded Deterministic Test,” IEEE Transactions on Computer-Aided Design (TCAD), Vol. 23 , No. 5 , pp. 776-792, May 04.
[17]
[18]
[19]
[20]
[21]
[22]
[23] [24]
[25] [26] [27]
[28]
[29]
2005. T. Hiraide, K.O. Boateng, H. Konishi, K. Itaya, M. Emori and H. Yamanaka, “BIST-Aided Scan Test – A New Method for Test Cost Reduction,” VLSI Test Symposium (VTS’03), pp. 359-364, Apr. 2003. K-J. Lee, J-J. Chen and C-H. Huang, “Using a Single Input to Support Multiple Scan Chains,” IEEE International Conference on ComputerAided Design (ICCAD'98), pp. 74-78, Nov. 1998. T-C. Huang and K-J. Lee, “A Token Scan Architecture for Low Power Testing,” International Test Conference (ITC’01), pp. 660-669, Oct. 2001. S. Sharifi, M. Hosseinabadi, P. Riahi and Z. Navabi, “Reducing Test Power, Time and Data Volume in SoC Testing Using Selective Trigger Scan Architecture,” International Symposium on Defect and Fault Tolerance (DFT’03), 2003. O. Sinanoglu and A. Orailoglu, “A Novel Scan Architecture for PowerEfficient, Rapid Test,” International Conference on Computer-Aided Design (ICCAD’02), pp. 299-303, Nov. 2002. N. Oh, R. Kapur, T. Williams, and J. Sproch, “Test Pattern Compression Using Prelude Vectors In Fan-out Scan Chain with Feedback Architecture,” Design, Automation, and Test in Europe Conference (DATE’03), pp. 110-115, 2003. S. Samaranayake, E. Gizdarski, N. Sitchinava, F. Neuveux, R. Kapur and T. Williams, “A Reconfigurable Shared Scan-in Architecture” VLSI Test Symposium, Apr. 2003. Y. Shi, N. Togawa, S. Kimura, M. Yanagisawa, and T. Ohtsuki, "FCSCAN: An Efficient Multiscan-based Test Compression Technique for Test Cost Reduction," Asia and South Pacific Design Automation Conference (ASPDAC'06), pp. 653 - 658, Jan. 2006 S. Mitra, and K. Kim, “XMAX: X-Tolerant Architecture for MAXimal Test Compression,” International Conference on Computer Design (ICCD’03), pp. 326-330, Oct. 2003. D. Xiang, J. Sun, M. Chen and S Gu, “Cost-Effective Scan Architecture and a Test Application Scheme for Scan Testing with Non-scan Test Power and Test Application Cost” US Patent 6,959,426 B2. D. Xiang, K. Li, and H. Fujiwara, "Design for cost-effective scan testing by reconstructing scan flip-flops," Proc. of 14th IEEE Asian Test Symposium, pp. 318-321, 2005. A. Arslan and A. Orailoglu, “CircularScan: A Scan Architecture for Test Cost Reduction,” Design, Automation and Test in Europe Conference and Exhibition (DATE’04), Vol. 2, pp. 1290-1295, Feb. 2004. P. Rosinger, B.M. Al-Hashimi, and N. Nicolici, “Scan Architecture With Mutually Exclusive Scan Segment Activation for Shift- and CapturePower Reduction,” IEEE Transactions on Computer-Aided Design (TCAD), Vol. 23 , No. 7 , pp. 1142-1153, July 04 L. Lay, J. Patel, T. Rinderknecht, and W-T. Cheng, “Logic BIST with Scan Chain Segmentation,” International Test Conference (ITC’04), pp. 57-66, Nov. 2004. B. Arslan and A. Orailoglu, “Test Cost Reduction Through a Reconfigurable Scan Architecture,” International Test Conference (ITC’04), pp. 945-952, Nov. 2004. H. Ando, "Testing VLSI with Random Access Scan," IEEE Computer Society Conference (COMPCON’80), pp. 50-52, Feb, 1980. D. H. Baik, K. K. Saluja, and S. Kajihara, " Random Access Scan: A solution to test power, test data volume, and test time," International Conference on VLSI Design (VLSID’04), pp. 883-888, Jan. 2004. G. De Micheli, Synthesis and Optimization of Digital Circuits, McGrawHill, 1994. Virtual Scan™ Product Data Sheet http://www.syntest.com/ProdDataSheet/VirtualScan.pdf. TestKompress™ Product Data Sheet http://www.mentor.com/products/dft/atpg_compression/testkompress/up load/TestKompress.pdf K. Butler, J. Saxena, T. Fryars, G. Hetherington, A. Jain, and J. Lewis, “Minimizing Power Consumption in Scan Testing: Pattern Generation and DFT Techniques,” International Test Conference (ITC’04), pp. 355364, Nov. 2004. A. Al-Yamani, N. Devta-Prasanna, and A. Gunda, "Should Illinois-scan based architectures be centralized or distributed?," IEEE International Symposium on Defect and Fault Tolerance (DFT'05), Oct 2005.
IEEE Transactions on Computer Aided Design of Integrated Circuits – Paper 3113 Ahmad Al-Yamani is an assistant professor in the computer engineering department at King Fahd University of Petroleum and Minerals and a consulting assistant professor in the electrical engineering department at Stanford University. He received a PhD in Electrical Engineering and an MSc in Management Science and Engineering from Stanford. Before that, he received an MSc and a BSc in Computer Engineering from KFUPM. In 2002 he was appointed as the assistant director of Stanford Center for Reliable Computing. Ahmad served as an adjunct faculty at Santa Clara University and worked for the advanced development labs of LSI Logic between 2004 and 2005. Ahmad's research interests include VLSI design and test, design-for-testability, built-in self-test, computer-aided design automation, iterative heuristics and their parallelization for VLSI design, and reliable computing. Narendra Devta-Prasanna obtained a bachelors in electrical engineering from Indian Institute of Technology, Roorkee in 2000. Presently he is a candidate in the PhD program at the University of Iowa. His research interests include defect-based testing, delay testing and test data compression.
Erik Chmelar received his Bachelor's degree in Electrical Engineering at Michigan Technological University and his Master's and Ph.D. degrees in Electrical Engineering from Stanford University. Dr. Chmelar is currently working at LSI Logic in Milpitas, CA, and is a Consulting Assistant Professor of Electrical Engineering at Stanford University Mikhail Grinchuk obtained an M. Sc. in mathematics from Lomonosov Moscow State University (MSU) in 1986, and a Ph. D. in physico-mathematical sciences in 1989 also from MSU. In 1989, he started working at the math department of MSU. In 1996, he joined LSI Logic. His current research interests include various aspects of netlist synthesis, optimization and testing.
Arun Gunda is a Senior Manager at LSI Logic and he is currently managing the Test Automation Group that is part of Silicon Methodology organization. Arun received his MS degree from University of Iowa in 1990 and since then he has been working at LSI Logic, California. Arun’s interests are in the area of digital testing, he is currently working on improving the effectiveness of stuck at fault and delay fault testing so as to catch more kinds of defects in 65nm technology.
12