On System-on-Chip Testing Using Hybrid Test Vector ... - IEEE Xplore

0 downloads 0 Views 2MB Size Report
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT. 1. On System-on-Chip ... Abstract—This paper presents a comprehensive hybrid test.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

1

On System-on-Chip Testing Using Hybrid Test Vector Compression Satyendra N. Biswas, Member, IEEE, Sunil R. Das, Life Fellow, IEEE, and Emil M. Petriu, Fellow, IEEE

Abstract— This paper presents a comprehensive hybrid test vector compression method for very large scale integration (VLSI) circuit testing, targeting specifically embedded coresbased system-on-chips (SoCs). In the proposed approach, a software program is loaded into the on-chip processor memory along with the compressed test data sets. To minimize on-chip storage besides testing time, the test data volume is first reduced by compaction in a hybrid manner before downloading into the processor. The method uses a set of adaptive coding techniques for realizing lossless compression. The compaction program need not to be loaded into the embedded processor, as only the decompression of test data is required for the automatic test equipment (ATE). The developed scheme necessitates minimal hardware overhead, while the on-chip embedded processor can be reused for normal operation on completion of testing. This paper reports results on studies of the problem and demonstrates the feasibility of the suggested methodology with simulation runs on the International Symposium on Circuits and Systems (ISCAS) 85 combinational and ISCAS 89 full-scan sequential benchmark circuits. Index Terms— Automatic test equipment (ATE), Burrows– Wheeler transformation (BWT), design-for-testability (DFT), frequency directed run-length coding, intellectual property (IP) core, system-on-chip (SoC) test.

I. I NTRODUCTION

A

N IMPORTANT objective to realize through elaborate testing of very large scale integration (VLSI) circuits and systems is to ensure that the manufactured products are free from defects and simultaneously guarantee that they meet deemed specifications. In addition, the information collected during the test process may help in an increase of the product yield by improving the process technology with consequent lowering of the production cost. The integrated circuit (IC) fabrication process involves various steps, viz., photolithography, printing, etching, and doping. In a real life

Manuscript received August 17, 2013; revised January 20, 2014; accepted March 13, 2014. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant A 4750 and in part by the Department of Computer Science, College of Arts and Sciences, Troy University, Montgomery, AL 36103 USA. The Associate Editor coordinating the review process was Dr. Serge Demidenko. S. N. Biswas is with the Department of Electrical Engineering, Kaziranga University, Jorhat 785006, Assam, India. S. R. Das is with the Department of Computer Science, College of Arts and Sciences, Troy University, Montgomery, AL 36103 USA, and also with the School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada (e-mail: [email protected]). E. M. Petriu is with the School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIM.2014.2313431

environment, none of these steps is absolutely flawless and the unresolved imperfections may cause failure in the operation of the individual ICs. The introduction of VLSI technology gave rise to added complexity to the testing process of ICs, with resultant increase in the cost of electronic components. The problem of testing system-on-chip (SoC) ICs has also increased enormously due to the large numbers of intellectual property (IP) cores which are now being used on a single piece of silicon. Due to shrinking of the overall circuit geometry, sensitivity to performance variations has greatly increased but the individual components of the ICs must still be rigorously tested before being shipped to the customers. The testing undeniably improves the overall quality of the final product, although it has no relevance on the manufacturing excellence of the ICs. The testing assures the product imperfections only if implemented during the key phases in the development cycle. It can further be a strategy for validating the design and checking the processes involved. The various IP cores in an SoC are not readily accessible due to the complexity of the SoC and limited test pins. However, accessibility of a node that is either controllable or observable in a circuit can be increased using design-for-testability (DFT) strategies. Applying DFT reduces the test cost, enhances the quality of product, and makes the design characterization and test program implementation rather easy. To be able to effectively test these systems, every IP core must be duly exercised with a set of predetermined test patterns provided by the core vendor (Fig. 1). For VLSI systems, because of higher storage requirements for the fault-free responses, the customary test processes thus become highly expensive, and therefore, alternate approaches are sought at minimizing the amount of needed storage or the test data volume. Built-in self-testing (BIST) [1]–[3] is a design methodology that has the capability of solving many of the problems otherwise encountered in testing digital systems. For testing an SoC, the test patterns are first generated and stored in a high-end computer. But, the increased variety of SoCs requires increased numbers of test patterns and frequent downloads of these test patterns into an automatic test equipment (ATE). The sizes of the test patterns can be in the order of several gigabytes, thereby taking a significant amount of time for downloading into ATE. For downloading the test vectors into ATE, a dedicated high speed bus is also a necessity. But, it still takes enormous amount of time for transferring data and ATE remains idle during this period, wasting valuable system resources. So, the overall performance of ATE is affected by the transfer time of test vectors. To improve the throughput

0018-9456 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

Fig. 1.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Conventional architecture of SoC testing.

of ATE, it is crucial to reduce the data transfer rate. A cost effective way to this end is to reduce the amount of data using some kind of data compression technique. A higher level of circuit integration results in a larger volume of test data and increased testing time, since the entire set of test vectors for all the components in an SoC must be stored and have to be applied. This complexity is mainly due to the reduction in the ratio of externally accessible points (limited SoC primary inputs and outputs) to internal inaccessible points in these chips. However, the input–output channel capacity, speed, and data memory of traditional ATEs are quite limited, leading to higher cost. Therefore, any newly developed technique, if robust enough, is widely expected to reduce testing time and resulting storage capacity. Most recently, new techniques for test vector compression [4], [5] and data compression in SoC (SoC contains an embedded processor core) have emerged [6], [7]. The use of embedded processor has expanded SoC design flexibility, reduced design risk, and lengthened SoC product lifetimes significantly by allowing the devices to adapt to changing standards and extra features to be added over time [8]. Another major advantage of this type of SoCs is that the embedded processor can be used to run software for decompression of the precompressed test data rather than employing special decompression circuitry. The program is used for compacting test data before downloading the compressed data into the processor memory along with the decompression software. The processor executes a small decompression program and then the test vectors are applied to individual cores in an SoC. But, the compression process has no direct bearing on testing time and can be done in advance before downloading the data into the embedded processor for testing. Hence, the compression technique must not only be efficient enough to reduce the volume of total test data, but also be elegant enough in its decompression. Several techniques have been proposed in the literature for cost-effective compression of test data and their

decompression. A general technique for statistically encoding test vectors for full-scan circuits using selective Huffman coding is presented in [9]. A deterministic test vector compression technique for SoCs using block matching has been described in [10]. The authors also proposed a scheme for compression and decompression of test data using cyclical scan chains [11]. It uses careful ordering of the test sets and formation of cyclical scan chains to achieve compression with run-length codes. An effective approach for compressing test data using run-length coding and Burrows–Wheeler transformation (BWT) was presented in [12]. Iyengar et al. [13] developed the idea of statistical encoding of test data. They described a BIST scheme for nonscan circuits based on statistical coding using comma codes (very similar to Huffman codes) and run-length coding. Basu and Mishra [14] proposed a test data compression technique using dictionary-based and bitmask selection criterion. The compression efficiency of the dictionary-based technique is limited because it depends on the number of bits allowed to mismatch. The bitmask-based compression method [14] is employed to address the mismatch problem of the dictionary-based technique by generating more matching patterns. But, the bitmask-based procedure has a limitation as it is unable to handle the data set containing don’t care values. Recently, another novel technique for test data compression/decompression has been proposed in [15]. Although this method achieved a higher level of compression but the technique needs extra on-chip hardware. This method has a drawback not only because of the hardware overhead, but also due to excessive power consumption and more time. Saravanan et al. [16] has of late proposed a test vector compression technique based on weighted bit position. In this technique, unspecified individual bits are considered for specific values and then partitioned into weighted values. The weighted test patterns are then compressed to achieve about 5% to 16% of test pattern reduction as claimed in [16]. El-Maleh et al. [17] has proposed an algorithm for reordering test vectors to compress test patterns. That algorithm is based on geometric primitives as the test vectors are optimally reordered and divided into blocks and then those blocks are encoded based on geometric shapes. But that algorithm has a limitation as it takes long time for decoding. However, the block matching algorithm performed better as reported in [9] and also a modified version used in [18]. In this paper, a new hybrid technique has been presented for efficient implementation of test data compression and decompression based on the use of an embedded processor core. In the developed compression technique, a blockmatching algorithm [9], [18] is executed first to differentiate between the low frequency and high frequency data sets of test vectors. The block matching algorithm is used to find and rearrange the blocks of test vectors in such a way so that the matching blocks in subsequent test patterns are placed in an optimal order. This procedure generates two successive test vectors with a minimum number of different sets of blocks. Thus, the amount of information required to store the test vectors will be minimum as there are less differences between the two consecutive test vectors as

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. BISWAS et al.: ON SoC TESTING USING HYBRID TEST VECTOR COMPRESSION

Fig. 2.

3

Block diagram of the proposed method.

compared with the original vectors. Then, BWT [19], [20] is applied only on the high frequency data sets. The transformed data sets from high frequency data along with the low frequency data sets are next compressed using several coding techniques such as Huffman coding, Golomb coding [21], Lempel–Ziv–Welch (LZW) coding [22] and associative coder of Buyanovsky (ACB) [23]. The decompression program is very simple and compact and performs decompression rather quickly so that both the test data and testing time are significantly reduced. This paper is organized as follows. In Section II, the proposed technique is presented in details with brief discussion of BWT and several other compression techniques, which is necessary to understand the basics of the developed methodology. Section III incorporates simulation results on International Symposium on Circuits and Systems (ISCAS) 85 combinational and ISCAS 89 full-scan sequential benchmark circuits, while Section IV provides the concluding remarks. II. P ROPOSED M ETHODOLOGY Fig. 2 shows the block diagram of the developed technique. All the test vectors required for testing an SoC are first compressed in software mode. The compressed test vectors and an efficient decompression program are then loaded into the embedded processor core of the SoC. The processor executes the decompression program and then applies all the uncompressed original test vectors to each and every core of the SoC for generating and analyzing the output responses. In the execution of the proposed technique, the undernoted four steps are involved. A. Division of Test Data Into Blocks All the test vectors are divided into several blocks of equal size; the size of a block depends on the total number of bits in each vector. One of the test vectors is then considered as a reference test vector and the next test vector is generated from the previous vector by storing only those blocks that differ from the previous one. As shown in Fig. 3, test vector-1 has seven blocks of size four bits each. The test vector-2 also has seven blocks of the same size but all the bits in its second, third, and sixth blocks are the same as that in the test vector-1. Similarly, the third, fourth, and sixth blocks in the test vector-3 are the same as in the test vector-2. If we have the information of the first, fourth, fifth, and seventh blocks in the test vector2 along with the reference test vector-1, then, we can easily

Fig. 3.

Original test vector divided into several blocks.

compute the entire set of test vector-2. In the same way, only the information of the first, second, fifth, and seventh blocks in the test vector-3 are required to generate the complete set of test vectors. The consecutive test vector is built from the previous test vector by replacing the blocks in which they differ. For example, the blocks in which the test vector (n + 1) differs from the test vector n are shaded. Hence, the test vector (n + 1) can be built from the nth vector by replacing only the shaded blocks. The block matching algorithm [9], [11] is applied on a test data set S as a matrix of A × B. Each row of data are divided into several blocks of equal size (4 in Fig. 3), except the last one, which may have smaller size than other blocks. Because of the structural relationship among the faults in a circuit, there are lots of similarities between the test vectors. The test vectors were ordered in such a fashion such that two successive test vectors differ in a relatively fewer number of blocks. Hence, the amount of information required to store these differences will be less than that required for storing all the test vectors. The block size is selected in such a way so that it becomes most efficient during packing and decoding of the data in the memory. B. Frequency Computation of Data Blocks On completion of this block matching process, the high frequency and low frequency blocks are separated for further computation (viz., high frequency groups such as data sets in columns 1, 5, and 7 and low frequency groups such as data sets in columns 2, 3, 4, and 6 in Fig. 3). There will be lots of unspecified data (represented by x for don’t care) that appear in practical situations. Those data are redefined in such a way that we can maximize the frequency of data blocks. As shown in Fig. 4, there are 28 test data sets for three test vectors that are divided into seven blocks each. Since, there are only four data sets in block of data, we can have 16 possible combinations (viz., minterms) in each block. The frequency of a block is determined by analyzing the number of unspecified data in all the blocks. For example, the minterm 1001 is contained in block numbers 2, 4, and 7 in the test vector-1. The same minterm also appears in block number 4 of the test vector-2 and in block number 2 of the test vector-3, respectively. So, the frequency of 1001 in these sets of test vectors is determined as 5. Once, the unspecified

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

Fig. 4.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Original test vector divided into several blocks.

data in a block is specified for calculating the frequency of the block, then, all the blocks containing those minterms must be removed from the list of data blocks to calculate the frequency of another block. There exist several algorithms for filling out unspecified data fields with 1 s or 0 s. In an efficient algorithm described in [9], the most frequently occurring unspecified block is first determined. Then, it is compared with the next most frequently occurring unspecified block to see if there is a conflict in any bit position. If there is no conflict, then they are merged by specifying all bit positions in which either block has a specified value. For example, if the block x10x is merged with the block 110x, then the resulting block is 110x. Note that the merger of blocks can only increase the number of specified bits. The most frequently occurring unspecified block is compared with all the other unspecified blocks in decreasing order of frequency and whenever merger is possible, it is carried out. This is continued until no more merging can be done with the most frequently occurring unspecified block. The process is then repeated for the second most frequently occurring unspecified block. This goes on until there are no more blocks that can be merged. At that point, all the remaining blocks are unique and cannot share any minterm. Any remaining xs can now be randomly filled with 0 s and 1 s as that will have no impact on the amount of compression. In this technique, the original test vectors are divided into blocks and then those blocks are rearranged in such a way so that the maximum numbers of blocks are in common in two successive vectors with minimum number of different blocks. Since the amount of compression depends on these ordering of the matching blocks and the number of uncommon blocks, so the actual data (1 s or 0 s) in these completely uncommon blocks do not play any role in the compression technique. This algorithm fills the xs by greedily merging unspecified blocks based on their frequency of occurrence. This is a faster procedure since the number of operations is much less because of merger being carried out immediately for the reduction of the set of blocks. C. Preprocessing of High Frequency Data Blocks The BWT algorithm [19], [20] is executed on all high frequency blocks of data. Burrows and Wheeler have released the details of a transformation function that opens the door to some revolutionary new data compression techniques.

Fig. 5.

Original set of strings (S 0) associated with the buffer.

Fig. 6.

Set of strings after sorting.

BWT converts a block of data into a format that is extremely well suited from the standpoint of data compression. The BWT is performed on an entire block of data, all at once. This transformation takes a block of data and rearranges them using a sorting algorithm known as lexicographical sorting. The detail of the sorting process is explained as follows using an example string DRDOBBS. The resulting output block contains exactly the same data elements that it started with, but differing only in their ordering. This transformation is a reversible process, meaning thereby that the original ordering of the data elements can be restored with no loss of fidelity. Also, a block of data transformed by BWT can be compressed using any or a combination of standard techniques. We try to explain BWT in short here by using a sample string DRDOBBS, which contains seven bytes of data. To perform BWT, the first thing is to treat a string S, of length N, as if it actually contains N different strings, with each character in the original string being the start of a specific string that is N bytes long. It is also treated as a buffer as if the last character wraps around back to the first. As shown in Fig. 5, the matrix is formed by rotating the original sequence of the string. Then, the matrix of Fig. 5 is lexicographically sorted. After sorting, the set of strings is arranged, as shown in Fig. 6.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. BISWAS et al.: ON SoC TESTING USING HYBRID TEST VECTOR COMPRESSION

Algorithm 1 Step 1: Create a list of all possible rotations of string. Step 2: Put each rotation in a row of a large, square table. Step 3: Sort the rows of the table alphabetically, treating each row as a string. Step 4: Return the last (rightmost) column of the table.

The inverse BWT algorithm is described as follows. Algorithm 2 Step 1: Create an empty table with no rows or columns. Step 2: Repeat length (strings) times. Step 3: Insert s as a new column down the left side of the table. Step 4: Sort the rows of the table alphabetically. Step 5: Return the row that ends with EOF character. There are two important points to be recognized in the above figures. First, the strings though have been sorted, it is important to keep track of which string has which position in the original set. So, it is known that the string 0 (S0), the original unsorted string, has now moved down to row 4 in the array in Fig. 6. Second, the first and last columns in the resulting matrix should be marked with the special designations (F and L in Fig. 6). The column F contains all the characters in the original string in sorted order. So, the original string DRDOBBS is represented in F as BBDDORS. The characters in column L do not appear to be in any particular order but, in fact, they have an interesting property. Each of the characters in L is the prefix character to the string that starts in the same row in column F. The actual output of BWT, oddly enough, consists of two things: a copy of column L and the primary index, an integer indicating which row contains the original first character of the buffer B. So, performing BWT on the original string generates the output string L, which contains OBRSDDB and a primary index of 5. The integer 5 is found easily enough since the original first character of the buffer will always be found in column L in the row that contains S1. Since, S1 is simply S0 rotated left by a single character position, the very first character of the buffer is rotated into the last column of the matrix. Therefore, locating S1 is equivalent to locating the buffer’s first character position in L. We now give BWT algorithm as depicted in Algorithms 1 and 2. D. Application of Various Coding Techniques Finally, the individual coding techniques such as Huffman coding, Golomb coding, LZW coding, and ACB are employed, one by one, on all the preprocessed data sets for efficient compression. We first start with Huffman coding. 1) Huffman Coding: Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code that expresses the most common characters using shorter strings of bits than are used for less common source symbols.

5

Algorithm 3 Step 1: Sort source outputs in decreasing order of their probabilities. Step 2: Merge the two least probable outputs into a single output of which the probability is the sum of the corresponding probabilities. Step 3: If the number of remaining outputs is more than two, go to Step 1. Step 4: Arbitrarily assign 0 and 1 as codeword for the two remaining outputs. Step 5: If an output is the result of merger of two outputs in a preceding step, append the current codeword with a 0 and a 1 to obtain the codeword of the preceding outputs and repeat Step 5. If no output is preceded by another output in a previous step, stop.

Huffman coding approximates the probability for each character as a power of 1/2 to avoid complications associated with using a nonintegral number of bits to encode characters using their actual probabilities. Huffman coding works on a list of weights {wi } by building an extended binary tree with minimum weighted path length and proceeds by finding the two smallest ws, w1 , and w2 , viewed as external nodes, and replacing them with an internal node of weight w1 + w2 . The procedure is then repeated stepwise until the root node is reached. An individual external node can then be encoded by a binary string of 0 s (for left branches) and 1 s (for right branches). Huffman coding algorithm can be described in Algorithm 3. The other coding algorithms can also be readily implemented and are not provided in this paper. 2) Golomb Coding: This coding technique is employed for compression. The technique is a form of entropy encoding invented in [21] that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. If negative values are present in the input stream, then, an overlap and interleave scheme is used, where all nonnegative inputs are mapped into even numbers (x  = 2x) and all negative numbers are mapped into odd numbers {x  = 2.abs(x) + 1} . It also uses a tunable parameter b to divide an input value into two parts: q, the result of a division by b and r , the remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding. When b = 1, Golomb coding is equivalent to unary coding. Formally, the two parts are given by the following expression, where x is the number being encoded   x −1 (1) q= b and r = x − qb − 1.

(2)

The final result is given by (q + 1)r.

(3)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Note that r can be of a varying number of bits and is specifically only b-bits for Rice code, while switches between b, and b + 1 bits for Golomb code (that is, b is not a power of 2); when r lies between 0, 2log 2(b)−b , r takes a b bit representation and when r is in the interval 2log2(b) to b, we have a b + 1 bit representation. The parameter b is a function of the corresponding Bernoulli process, which is formulated by p = P(X = 0), the probability of success in a given Bernoulli trial, b and p being related by the inequality (1 − p)b + (1 − p)b+1 ≤ 1 < (1 − p)b−1 + (1 − p)b .

(4)

Golomb code for this distribution is equivalent to Huffman code for the same probabilities, if it were possible to compute the Huffman code. Golomb codes are a kind of structured Huffman codes with the useful properties that the entire codeword table can be constructed with the knowledge of a single parameter. Moreover, codeword construction process is simple enough that the codeword for each symbol can easily be generated on the fly. Golomb codes are optimal for a source symbol set that follows a geometric probability distribution, i.e., if the symbol set is mapped to integer values n = [1, 2, . . . , N]; then, the probability of a given value of n is given as: P(n) = (1 − w)wn , 0 < w < 1. Where w is related to the width of the distribution, Golomb developed a simple coding scheme for encoding symbol generated by this source. Given the value of w, a corresponding parameter m can be computed that generates the optimal codeword assignments for this source. This means that the entire code table can be specified with a single parameter m. Later, Rice rediscovered a special case of Golomb codes where m = 2k . For this subset of codes, known as Rice–Golomb codes, the encoding procedure becomes particularly simple. The binary representation of n is shifted k bits to the right to form a number q, which is equivalent to finding the quotient of n/2k . The unary representation of q is then concatenated with the last k bits of n to form the codeword. As an example, n = 18 be the sample value to be encoded with a code parameter k = 3 (i.e., m = 8). The binary representation of 18 is 10010. The result of shifting it to the right by three bits, which corresponds to dividing it by 8 is 10. That is equal to 2 in decimal. The unary representation of 2 is 001, and the least significant bits of 10010 are 010. Finally, the Rice–Golomb code of 18 using a code parameter k = 3 is the concatenation of 001 and 010 which is equal to 001010. 3) LZW Coding: This coding technique was also applied on the test data sets for compression. The technique was developed by Welch based on Lempel–Ziv universal sequential data compression algorithm [22]. This compression tool builds a string translation table from the original data sets to be compressed. The translation table maps fixed length codes to strings. The string table is initialized with all single character strings. As the compressor character serially examines the text, it stores every unique two-character string into the table as a code or character concatenation with the code mapping into the corresponding first character. As each two-character string is stored, the first character is sent to the output. Whenever a previously encountered string is read from the input, the longest such previously encountered string is determined and

next, the code for this string concatenated with the extension character (next character in the input) is stored in the table. The code for this longest previously encountered string is the output; the extension character is used as the beginning of the next string. The decompression algorithm only requires the compressed text as an input, since it can build an identical string table from the compressed text as it recreates the original text. But, an abnormal case crops up whenever the sequence character or string character (with the same character for each character and string for each string) is encountered in the input and character or string is already stored in the string table. While the decompressor reads the code for the character and string character in the input, it cannot resolve, because it has not yet stored in the table. 4) Associative Coder of Buyanovsky: This coder is employed at the end on all the preprocessed data sets for efficient compression. This is highly efficient compression technique [23]. This compression algorithm works by constructing an associative list sorted in a special manner. The technique is very useful where transmission or storage of data is very critical but encode and decode times are not so important. The algorithm organizes the past input in the list in many ways so that the next input can be better matched rather than if it had stored the past input in a simple dictionary. This can be explained using the undernoted simple example. Example: Let the alphabet be comprised of symbols {A, B, C} and string to be coded is—CACBA. Then, we can get four different contexts from this string by separating them character by character. These are as follows: (a) C|ACBA, (b) CA|CBA, (c) CAC|BA, and (d) CACB|A . (5) Then, by rearranging them lexicographically, the following dictionary is obtained: (1) CA|CBA, (2) CACB|A, (3) C|ACBA, and (4) CAC|BA . (6) The original string CACBA can be matched with entry number 3. It should be matched from right to left. First match is A, then is B, then C, next is again A and finally, another C. It does not matter for compression if we choose one over the other. III. R ESULTS ON S IMULATION E XPERIMENTS To demonstrate the feasibility and comparative effectiveness of the proposed test data compaction scheme, independent simulations were conducted on various ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits. An automatic test vector generation program was first employed for obtaining a set of test vectors that provided 100% fault coverage, MinTest [24]. The compression algorithm was implemented in C programming language on a UNIX platform. The percentage data compression was computed as follows: % Compression (Original_Bits) − (Compressed_Bits) × 100. (7) = Original_Bits

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. BISWAS et al.: ON SoC TESTING USING HYBRID TEST VECTOR COMPRESSION

7

TABLE I T EST V ECTOR C OMPRESSION R ESULTS FOR S EVERAL ISCAS 85 C OMBINATIONAL B ENCHMARK C IRCUITS U SING B OTH H UFFMAN C ODING AND P ROPOSED T ECHNIQUE

TABLE II T EST V ECTOR C OMPRESSION R ESULTS FOR S EVERAL ISCAS 89

Fig. 7. Test vector compression results for several ISCAS 85 benchmark circuits using Golomb coding.

F ULL -S CAN S EQUENTIAL B ENCHMARK C IRCUITS U SING B OTH LZW C ODING AND P ROPOSED T ECHNIQUE

The experimental results are shown in Tables I and II and in Figs. 7 and 8. Table I shows the compression results obtained from ISCAS 85 combinational benchmark circuits. The highest amount of compression was obtained for circuit c499, which is 80%, while the lowest compression was calculated for the circuit c432 as 53%. Also, the results computed by employing the well-known Huffman coding, LZW coding, and ACB coding algorithms are presented in Table I for comparison. According to the proposed technique, data compression is achieved mainly due to the redundancy of data blocks in successive test vectors. For most of the cases, higher level of compression is achieved because of that. However, in a few cases like s4444 circuit, very negligible differences are noticed as compared with other techniques. This is because of the large scale variation in the subsequent test vectors for the circuit. The results obtained by applying the proposed method along with several other coding techniques from ISCAS 89 fullscan sequential benchmark circuits are presented in Table II. In these circuits, the maximum compression was estimated for circuit s510 as 73.5% and minimum amount of compression was for circuit s444 at about 33%. Finally, Fig. 7 shows graphically the test vector compression results for several ISCAS 85 combinational benchmark circuits while using Golomb coding. Fig. 8 shows the comparative compression results obtained from ISCAS 89 benchmark circuits computed by employing the proposed technique along with one of the most recent methods presented in [18].

Fig. 8. Test vector compression results for several ISCAS 89 benchmark circuits.

From all these comparative experimental results, it is obvious that the developed compression strategy provides better compression results than the well-known conventional compression methods. These results, though not comprehensive enough, still give some good conception on the relative performance of the proposed method. IV. C ONCLUSIONS To increase the throughput of the ATE, an efficient technique to compress the test vectors is proposed in this paper. Any given algorithm for test vector compression must desirably be lossless and effective. The decompression algorithm should also be simple and must not consume lots of time to reduce the overall processing time of the ATE. The compression method presented herein is based on a hybrid technique that targets the unique characteristics of the block matching test data compression, BWT along with several coding algorithms on test data sequences. The BWT reduces the number of transitions for the test vector sequences and results in a high compression ratio in the first stage of the proposed technique. Although, BWT involves a very complex method

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

of computation and takes a longer time for compression, the reverse process is very simple and quite faster. As the compression process is done before downloading the test sequences into the on-chip memory, it has no effect on the ATE performance. The suggested approach supports testing of the SoC with embedded processor cores. It uses the computational power of an embedded processor to perform test data compression and decompression in software mode. In software-based techniques, the hardware requirements and cost of the ATE are evidently minimized. The method is completely lossless, besides being space-time efficient, because of its higher compression ratio and rapid decompression process. R EFERENCES [1] T. Reungpeerakul and D. Kay, “Partial-matching technique in a mixedmode BIST environment,” IEEE Trans. Instrumen. Meas., vol. 9, no. 4, pp. 970–977, Apr. 2010. [2] V. Groza, R. Abielmona, and M. H. Assaf, “A self-reconfigurable platform for built-in self-test applications,” IEEE Trans. Instrumen. Meas., vol. 56, no. 4, pp. 1307–1315, Aug. 2007. [3] D. Kay, S. Chung, and S. Mourad, “Embedded test control schemes using iBIST for SoCs,” IEEE Trans. Instrumen. Meas., vol. 54, no. 3, pp. 956–964, Jun. 2005. [4] S. Biswas, S. R. Das, and E. M. Petriu, “Space compactor design in VLSI circuits based on graph theoretic concepts,” IEEE Trans. Instrumen. Meas., vol. 55, no. 4, pp. 1106–1118, Aug. 2006. [5] S. Biswas and S. R. Das, “A software-based method for test vector compression in testing system-on-a-chip,” in Proc. IEEE Instrumen. Meas. Technol. Conf., Apr. 2006, pp. 359–364. [6] P. S. Zuchowski, C. B. Reynolds, R. J. Grupp, S. G. Davis, B. Cremen, and B. Troxel, “A hybrid ASIC and FPGA architecture,” in Proc. Int. Conf. Comput. Aided Des., 2002, pp. 187–194. [7] M. Abramovici, C. Stroud, and M. Emmert, “Using embedded FPGAs for SoC yield improvement,” in Proc. Des. Autom. Conf., 2002, pp. 713–724. [8] S. J. E. Wilton and R. Saleh, “Programmable logic IP cores in SoC design: Opportunities and challenges,” in Proc. IEEE Conf. Custom Integr. Circuits, May 2001, pp. 63–66. [9] A. Jas, J. G. Dastidar, and N. A. Touba, “Scan vector compression/decompression using statistical coding,” in Proc. VLSI Test Symp., 1999, pp. 114–120. [10] A. Jas and N. A. Touba, “Deterministic test vector compression/decompression for system-on-a-chip using an embedded processor,” J. Electron. Test., Theory Appl., vol. 18, nos. 4–5, pp. 503–514, 2002. [11] A. Jas and N. A. Touba, “Test vector decompression via cyclical scan chains and its application to testing core-based designs,” in Proc. Int. Test Conf., 1998, pp. 458–464. [12] M. Ishida, D. S. Ha, and T. Yamaguchi, “COMPACT: A hybrid method for compressing test data,” in Proc. VLSI Test Symp., 1998, pp. 62–69. [13] V. Iyengar, K. Chakraborty, and B. T. Murray, “Built-in self testing of sequential circuits using precomputed test sets,” in Proc. VLSI Test Symp., 1998, pp. 418–423. [14] K. Basu and P. Mishra, “Test data compression using efficient bitmask and dictionary selection methods,” IEEE Trans. VLSI Syst., vol. 18, no. 9, pp. 1277–1286, Sep. 2010. [15] S. Sivanatham, M. Padmavathy, S. Divyanga, and P. V. Anitha Lincy, “System-on-a-chip test data compression and decompression with reconfigurable serial multiplier,” Int. J. Eng. Technol., vol. 5, no. 2, pp. 973–978, 2013. [16] S. Saravanan, R. V. Sai, and H. N. Upadhyay, “Higher test pattern compression for scan based test vectors using weighted bit position based method,” ARPN J. Eng. Appl. Sci., vol. 7, no. 3, pp. 256–259, 2012. [17] A. El-Maleh, S. Al Zahir, and E. Khan, “A geometric-primitives-based compression scheme for testing system-on-a-chip,” in Proc. VLSI Test Symp., 2001, pp. 54–58. [18] S. Saravanan and H. N. Upadhyay, “Adapting scan based test vector compression method based on transition technique,” Proc. Eng. Elsevier Sci., vol. 30, no. 20, pp. 435–440, 2012. [19] M. Nelson, “Data compression with the Burrows–Wheeler transform,” Dr. Dobb’s J., vol. 9, pp. 46–50, Sep. 1996. [20] M. Burrows and D. J. Wheeler, “A block-sorting lossless data compression algorithm,” Digit. Syst. Res. Center, Palo Alto, CA, USA, Tech. Rep. 124, 1994.

[21] S. W. Golomb, “Run-length encoding,” IEEE Trans. Inf. Theory, vol. IT-12, no. 3, pp. 399–401, Jul. 1966. [22] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inf. Theory, vol. IT-23, no. 3, pp. 337–343, May 1997. [23] T. Skopal, “ACB compression method and query preprocessing in text retrieval systems,” in Proc. DATESO, 2002, pp. 1–8. [24] I. Hamzaoglu and J. H. Patel, “Test set compaction algorithms for combinational circuits,” in Proc. Int. Conf. Comput.-Aided Des., 1998, pp. 283–289.

Satyendra N. Biswas (M’98) received the B.Sc. degree in electrical and electronic engineering from the Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, and the M.Sc. and Ph.D. degrees in electrical and electronic engineering from the Yamaguchi University, Yamaguchi, Japan, in 1991, 1996, and 1999, respectively. He was a Research and Development Engineer with the General Cybernetics, Inc., Toronto, ON, Canada. He was a Research Assistant with the University of Ottawa, Ottawa, ON, from 2003 to 2005, and an Assistant Professor of Electrical Engineering and Technology with Georgia Southern University, Statesboro, GA, USA, from 2005 to 2009. He served as an Associate Professor with the Norfolk State University, Norfolk, VA, USA, from 2009 to 2010. He is currently a Professor with Kaziranga University, Jorhat, Assam, India. He has published more than 60 technical papers in journals and refereed conference proceedings. His current research interests include very large scale integration circuit design and testing, data compression in built-in self-testing, dynamic image/video processing, and reconfigurable computing. Dr. Biswas is a Registered Professional Engineer and a member of the Institute of Electronics, Information and Communication Engineers.

Sunil R. Das (M’70–SM’90–F’94–LF’04) received the B.Sc.(Hons.) degree in physics and the M.Sc.(Tech.) and Ph.D. degrees in radiophysics and electronics from the University of Calcutta, Kolkata, India. He previously held academic and research positions with the Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California, Berkeley, CA, USA, the Center for Reliable Computing, Computer Systems Laboratory, Department of Electrical Engineering, Stanford University, Stanford, CA, USA, the Institute of Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan, and the Center of Advanced Study, Institute of Radiophysics and Electronics, University of Calcutta. He is currently an Emeritus Professor of Electrical and Computer Engineering with the School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ottawa, ON, Canada, and a Professor of Computer Science with the Department of Computer Science, College of Arts and Sciences, Troy University, Montgomery, AL, USA. He has published numerous papers in switching and automata theory, digital logic design, threshold logic, fault-tolerant computing, built-in self-test with emphasis on embedded cores-based system-on-chip, microprogramming and microarchitecture, microcode optimization, applied theory of graphs, and combinatorics. Dr. Das has served on the technical program committees and organizing committees of many IEEE and non-IEEE international conferences, symposia, and workshops and has acted as Session Organizer, Session Chair, and Panelist. He also served in the Editorial Boards of many IEEE and nonIEEE publications. He is the recipient of the IEEE Computer Society’s highly esteemed Technical Achievement Award, co-recipient of the IEEE’s Donald G. Fink Prize Paper Award, recipient of the C. V. Ramamoorthy Distinguished Scholar Award of the Society for Design and Process Science and Troy University’s Wallace D. Malone Distinguished Faculty Award, among others. He is listed in the Marquis Who’s Who in America, Who’s Who in the World, and Who’s Who in Science and Engineering. Dr. Das is a Life Member and Distinguished Scientist of the ACM, an Emeritus Fellow of the CAE, and a Fellow of the EIC and of the SDPS. He is the Founding Editor-in-Chief of the International Journal of Computers, Information Technology and Engineering published by the Serials Publications, Delhi, India, since 2007.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. BISWAS et al.: ON SoC TESTING USING HYBRID TEST VECTOR COMPRESSION

Emil M. Petriu (M’86–SM’88–F’01) is a Professor and University Research Chair with the School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada. He has published more than 300 technical papers, authored two books, edited two other books, and received two patents. His current research interests include system-on-chip design, biology-inspired robot sensing, and soft computing.

9

He is a Fellow of the Canadian Academy of Engineering and the Engineering Institute of Canada. He is the co-recipient of the 2003 IEEE’s prestigious Donald G. Fink Prize Paper Award. He received the 2003 IEEE Instrumentation and Measurement Society Technical Award and the 2009 IEEE Instrumentation and Measurement Society Distinguished Service Award.

Suggest Documents