cost-effective test sequence for Mesh NoC topologies based on XY routing. ..... of z e ro s hea der strin g. s o f ze ro s he ader strin g. s o f ze ro s h ead er s trings.
Redefining and Testing Interconnect Faults in Mesh NoCs Érika Cota
Fernanda Lima Kastensmidt Alexandre Amory
Maico Cassel Marcelo Lubaszewski
Paulo Meirelles
Universidade Federal do Rio Grande do Sul PPGC - Instituto de Informática Po Box 15064, ZIP 91501-970, Porto Alegre, RS, Brazil {erika, fglima, mcassel, prmmeirelles,amamory, luba}@inf.ufrgs.br
Abstract An extended fault model and novel strategy to tackle interconnect faults in network-on-chips are proposed. Short faults between distinct channels are considered in a cost-effective test sequence for Mesh NoC topologies based on XY routing.
1 Introduction Currently deep submicron technologies make it possible to integrate very complex applications into a single chip by simply interconnecting pre-designed IP modules. As the number of IP modules in Systems-on-Chip (SoCs) increases, bus-based interconnection architectures may prevent that these systems meet the performance required by many applications. For systems with intensive parallel communication requirements, for example, busses may not provide the required bandwidth, latency, and power consumption [1]. A solution for such a communication bottleneck is the use of an embedded switching network, called Network-on-Chip (NoC), to interconnect the IP modules in SoCs [2]. Several works have addressed different aspects of NoC design and implementation [1-3]. Recently, industrial NoCs have appeared in the market [4]. However, to become an industrial reality, this new design paradigm still depends on the definition of feasible, efficient, and plugand-play test mechanisms. Such mechanisms must tackle both, the network itself and the IPs connected to the NoC. To test the embedded IPs in a NoC-based system, the reuse of the NoC as Test Access Mechanism (TAM) has been proposed [4-7]. Thus, the test of the NoC itself becomes an even more important issue to ensure the SoC test quality. Recent works have been addressing the test of the NoC infrastructure, including routers [8-10] and interconnect channels [11-13]. The huge number of interconnects allied to the shrinking of the chip dimensions make the NoC prone to a growing number of interconnect faults. The capability of detecting interconnects faults in NoC-based systems-on-chips, such as short circuits between two channel links, is mandatory for yield improvement. Moreover, fault diagnosis of NoC interconnect links can help fault tolerance approaches to mitigate the faults and maintains the network service.
Few previous works have, to some extent, covered the problems of interconnect testing in NoC-based chips. Nevertheless, there are still some important issues to tackle in this topic. For instance, although advanced fault models, such as crosstalks, have been considered in the literature, only faults in inter-switch lines were targeted. However, considering realistic NoC layouts [14], the placement and routing of switches and channels are actually prone to even simpler faults, such as shorts between wires connecting the core to the network and between wires of distinct network channels. Figures 1(a) and 1(b) illustrate two possible floorplans for a NoC-based SoC. In both cases, as indicated by Figure 1(c), parts or the entire NoC are likely to be implemented in Standard Cells gathering different routers and channels in the same piece of layout. Interconnecting the Standard Cells makes use of different metal layers, as depicted in Figure 1(d). Thus, it is possible that short circuits or crosstalks involve interconnects implemented using metal lines belonging to the same or to different layers. To the best of our knowledge, those interconnect faults interactions involving different channels and the connections with the cores have not been addressed so far. Extending the fault model to include interaction faults that affect different channels of a NoC prevents the use of current NoC testing approaches, since they deal with interswitch links only. A global test solution, that considers at least a channel’s neighborhood, is thus needed. Additionally, to test the interconnections, the network must operate in its functional mode. However, in this mode, a number of switch ports and paths are not easily configurable or even possible. Therefore, the main challenge is to devise a test strategy that copes with the NoC routing constraints while ensuring a high quality and efficient test. Such a strategy is the main contribution of this paper. In this paper a test strategy is proposed to the detection of three types of interaction faults: shorts between lines of a single NoC channel, shorts between lines of distinct channels, and shorts between lines of a channel and lines that connect the NoC to the functional core. It is important to notice that the focus herein is in the test strategy, not in the test sequence. The well known walking sequence will be used in this work to exemplify our interconnect test solution. The proposed approach can
Paper 5.3 INTERNATIONAL TEST CONFERENCE 1-4244-1128-9/07/$25.00 © 2007 IEEE
1
be easily extended to other interaction faults, such as crosstalk, by simply adapting the test sequence. The paper is organized as follows: Section 2 reviews recent works on NoC test strategies. Section 3 explains the NoC topology and the corresponding fault model considered in this work. The proposed interconnect testing approach is presented in Section 4, and the expected fault coverage is analyzed. Section 5 contains the experimental results, while Section 6 briefly explains how the method can be applied to larger NoCs. Section 7 concludes the paper.
routers while responses can be compared within the chip. Stewart and Tragoudas [9] proposed a functional fault model and a functional test strategy for routers and network interfaces of regular NoCs with grid topology. Grecu et. al. [10] propose two schemes for testing the combinational blocks of the NoC switches, based on uniand multi-cast transmission of test-data packets through the switching fabric. NoC interconnections are also regular, but present poor observability and controlability due to its density and its central position. Thus, although all interconnects can use the same set of test vectors, ensure its application to all wires is a challenge from the test time and fault coverage point of view. Raik et. al. [12] propose an external test method for NoCs based on functional fault models. The method targets single stuck-at faults at the multiplexers and registers inside the network switches, but might also cover delay faults, opens and shorts between adjacent interconnection lines. Their approach assumes mesh-like with full duplex data transmission, deflecting switches (packet traffic does not have to be queued into input/output FIFOs), and deterministic XY routing. Fault coverage results presented in that work are close to 100%, without considering the routing logic. Although deflective switching reduces the router area and improves testability, routers of current industrial networks [4, 15] are usually based on FIFOs, which are one of the most difficult parts to be tested [8].
2 Related Work The test of a NoC-based SoC for manufacturing defects is usually divided into two parts: the test of the cores and the test of the communication infrastructure. The test of the cores is usually based on the reuse of the NoC as TAM [47], to avoid the burden of adding extra hardware for a dedicated test bus. The NoC has two particularities that influence its testing [4]: i) it is often composed of many identical (or very similar) subcores (routers and network interfaces); ii) it occupies a privileged central position in the SOC because of its interconnecting role. The first characteristic can be used to accelerate and reduce the area overhead and/or test time. For instance, Amory et. al. [8] propose a scalable methodology to test NoC routers using scan chains and a specific router configuration during test so that the same set of vectors can be broadcasted to all
core
core
core
core
core
core
(b) core
core
core
(a) core
core
core core
core
core
core
core
core
core
core
core
core
(b) (c) core
M6
core
core
core
Via5 M5 Via4 M4 Via3 M3 Via2 M2
Via1 M1 contact
(d)
.
Figure 1 – (a) and (b) Floorplans examples for NoC-based systems-on-chip, (c) NoC routers implemented as standard cell cores and (d) router interconnects across multiple metal/via levels .
Grecu et. al. [11] propose a built-in self-test methodology for testing the inter-switch links of the communication platform. The proposed methodology
Paper 5.3
targets crosstalk faults assuming the MAF fault model [16]. The test strategy is based on two BIST blocks: the test data generator (TDG) and the test error detector
INTERNATIONAL TEST CONFERENCE
2
(TED). TDG generates the eight test vectors capable of detecting all possible crosstalk faults in a inter-switch line. The test vectors are launched on the link under test from the transmitter side of the link, and then sampled and compared for logical consistency at the receiver side of the link by the TED circuit. Then, two approaches are discussed with respect to area overhead and test time. In the first one, the interconnect testing is combined to the test of the routers [10]. Thus, a single TDG block is required to generate the MAF test patterns for all interswitch links. The MAF test data is organized in test packets and a header is appended that identifies the link to be tested. When the test packet reaches the link under test, the TED circuit compares incoming test vectors to the locally generated MAF data. The second approach uses multicast if this feature is available in the NoC under test. In this case, multiple packets can be sent over nonoverlapping links, such that more than one link is tested at the same time. In another approach, Pande et. al. [13] propose to incorporate crosstalk avoidance coding and forward error correction schemes in the NoC data stream to enhance the system reliability. All related works discussed so far do not consider interconnect faults involving different channels and the connections with the cores. This limitation can compromise the efficiency of the entire testing. In the next sections, we define the extended fault model and propose a test strategy considering these faults.
3 NoC Basics and Case-Study NoCs typically use the message-passing communication model. Cores attached to the network communicate by sending request and receiving response messages. To be routed by the network, a message is typically composed by a header, a payload and a trailer. The header and the trailer frame the packet and the payload carries the data being transferred. The header also carries the information needed to establish the path between the sender and the receiver. Depending on the network implementation, messages can be split into smaller structures, so called packets, which can be individually routed. Packet-based networks present better resource utilization, since packets are shorter and reserve a smaller number of channels during transportation compared to a whole piece of message. Besides its topology, a NoC can be described by the approaches used to implement the mechanisms for flow control, routing, arbitration, switching and buffering [17]. Flow control deals with data traffic on the channels and inside the routers. Routing is the mechanism that defines the path a message takes from a sender to a receiver. The arbitration establishes priority rules when two or more messages request the same resource. Switching is the mechanism that accepts an incoming message of a router and sends it to an output port of the router. Finally, buffering is the strategy used to store messages when a requested output channel is busy. Current cores usually need to use wrappers to adapt their interfaces and protocols to the on-chip network.
Paper 5.3
In this work, we base our analysis on a packetswitching network model, so-called SoCin, introduced in [3]. It is implemented in a 2-D mesh topology. For the simplicity of router implementation, the channel between a router and its associated core is similarly defined. Each router in the SoCin network is composed of five input and five output ports. One pair of input/output ports is dedicated to the connection between the router and the core, while the remaining four pairs connect the router with the four adjacent routers. A SoCin router is implemented using from 3,000 to 6,000 gates, depending on the bit width of the network channel and depth of the input buffers [18]. The network uses credit-based flowcontrol and XY routing, where a packet is first routed on the X direction and then on the Y direction before reaching its destination. Switching is based on the wormhole approach, where a packet is broken up into multiple flits (flow control units): a header flit, a set of payload flits followed by the tail flit. Flit is the smallest unit over which the flow control is performed and its size equals the channel width. For the case-study NoC topology, the communication channels between two adjacent routers are defined to be 8-bit wide.
3.1 Fault Space Definition A short fault is a state in which a net is connected to another net, which can be placed at the same metal layer or at the top or bottom metal layers. There are three types of short faults: OR-short, AND-short and strong driver. In the case of a NoC communication, this universe of faults can reach nets of many different interconnect channels, including router-to-router interconnect links (rr_links) and router-to-core interconnect links (rc_links) in both directions. When considering short faults in the NoC, it is important to define the region where the faults may occur, i.e., which links are more likely to be short-circuited. Some authors have been concerned about faults in routerto-router links (rr_links) in a single direction inside a channel. On one hand, the assumption that short faults will occur only between these unidirectional lines is too optimistic when dealing with high density design, as explained. On the other hand, considering that all possible lines can be faulty might be too pessimistic. The problem is that the number of faults grows exponentially with the number of links considered, as shown in Equation 1 for n the number of independent links and k the size of each fault group, i.e., the number of lines that can be shorted together. C (n,k) =
n! k!(n-k)!
(1)
The affected region is actually defined during placement and routing steps. In some cases, faults can occur between any set of links of the network, so the searching region is the entire NoC. In other cases, faults may affect only the links placed in a specific part of the network, which
INTERNATIONAL TEST CONFERENCE
3
reduces the search space. The challenge is thus to find a suitable trade-off in terms of testing timing and fault coverage for the needed searching space.
3.2 A Basic BIST-based Approach The problem of testing interconnects against short faults has been extensively studied and test sequences of minimum length for 100% detection and diagnosis have been reported [19, 20]. Thus, a possible test strategy for NoC inter-switch links could simply reuse those available sequences by placing, for instance, test sequence generators and analyzers at each node of the network under test as shown in Figure 2. rr_link router
router
rr_link
(a) BIST blocks embedded in the routers
core rc_links
core rc_links rc_linkr rr_link
router
rr_link
rc_linkr
router
(b) BIST blocks embedded in the cores Figure 2 – BIST-based test technique
This basic approach assumes two BIST blocks at each node of the network. One BIST block generates a test sequence to be injected in the channel, whereas the second one verifies whether the incoming data corresponds to the same sequence. The majority of the works that aim at detecting faults in the inter-switch links has proposed the insertion of BIST blocks in the routers [22, 23], as shown in Figure 2(a). However, by placing the BIST blocks into the routers, neither faults in lines connected to the core nor faults between interconnects of neighborhood channels are detected. For this reason, we suggest that both test sequence generator and test response analyzer are placed in such a way that faults in the lines connecting the core to the router and lines between distinct channels are also exercised during test, as shown in Figure 2(b). If the core is a functional IP (processor, for instance), it can generate the sequence itself. Otherwise, a multiplexer must be added to the core wrapper. Using this approach, one might expect a fault coverage of 100% and maximum diagnosability if the universal test sequence [21] is used, for instance. However, NoC specificities actually change the statistics. In fact, one must consider that the first word sent through a link, i.e., the packet header, contains routing data. When permanent, intermittent and transient faults are assumed, such as shorts or crosstalk, the fault can actually change the routing
Paper 5.3
information in the packet, causing, for example, packet loss. Simulation results for an 8-bit 2x2 SoCin network have shown that fault coverage for this basic BIST configuration actually drops to 90% when the Interleaved True/Complement Code [22, 23] is used. This sequence was chosen for its detection capabilities (it detects static and dynamic interconnect faults and allows on-chip atspeed diagnosis) and efficiency in terms of silicon area. Thus, a more careful analysis on the network constraints is required.
4 The Proposed Approach
Interconnect
Testing
We propose a technique capable of detecting short faults between any 2 links (k = 2 in Equation 1) of any channel at any direction of a regular NoC implemented in a 2-D mesh topology. The method is scalable with the NoC size and the channels width. We consider a 2-D mesh NoC architecture for its known advantages in terms of regularity, concurrent data transmissions, and controlled electrical parameters [24]. Moreover, most industrial and academic NoC architectures presented so far are based on this topology [4, 15]. The test strategy is first proposed to a 2x2 Mesh NoC with 4 routers and 16 unidirectional links, as illustrated in Figure 3. In the worst case scenario, shorts can occur between any two links of this entire 2x2 NoC architecture. Thus, there are 128 link channels (rr_links and the rc_links and rc_linkr) that can potentially be in short with each other. From Equation 1, there are 8,128 faults to be tested in the circuit. Let us call this configuration our “basic NoC”, which is considered the minimum basic configuration topology for the detection of short faults. As it will be discussed later in the paper, the test of larger NoCs can be defined based on the test of this basic NoC configuration.
4.1 The Interconnect Testing Method The proposed method consists in detecting all pairwise short faults of a 2x2 NoC topology by sending packets through this network. The challenges are i) to define a test sequence for the packet payload to be applied to this NoC topology, so that 100% of these faults are detected in a reduced test time; and ii) to define a packet flow to maximize the NoC interconnect links usage. To achieve high fault coverage and a reduced test time, there are some important considerations that must be accounted for:
INTERNATIONAL TEST CONFERENCE
4
core 0
core 1
rc_links01
rc_linkr00 rc_links00
rr_link_00_01
RRC rr_link1_10_11
router
10
rc_links10
rc_linkr10
core 2
rc_linkr01
router
01
rr_link_01_11
rr_link_00_10
rr_ling_10_00
00
rr_link_01_00
rr_link_11_01
router
router rr_link_11_10
11
rc_links11
rc_linkr11
core 3
Figure 3 – Test strategy for the basic NoC topology
Consideration 1: in order to apply test vectors to the NoC interconnects, the communication protocol (packets organized as header, payload, and tail) must be followed. The test sequence is placed on the packet payload and the header is mandatory to apply the payload. The content of the header depends on the NoC and on the routing algorithm. Consideration 2: short faults may occur between any pair among the 128 interconnect links under test. One can think of the NoC as a large 128-bit wide bus. Consequently, the entire NoC must be completely filled with the test vectors in order to detect each fault. In other words, the packets must be sent in such a way that all links are filled up at the same time. Consideration 3: test sequence generators and test response analyzers are assumed to be connected to the routers to send and receive the test packets, similarly to the basic BIST approach discussed in Section 3.2. Consideration 4: since all packets are sent with headers, short faults may affect the header flits, thus modifying the packet routing. When this occurs, the packet can be routed to any other node of the NoC and the original target node will detect a time-out fault. This policy is part of the network communication protocol. Thus, shorts that affect the headers will be detected by the router at the target node (no receipt) and are classified as time-out faults. Shorts that do not cause packet loss are classified as payload faults, since they will be detected by the test response analyzer at the target node. Consideration 5: the packet payload must contain all the input vectors necessary to detect the faults. Without loss of generality, we will exemplify our interconnect test approach by using the Walking One Sequence [21], and evaluate its coverage to the AND-short fault model. This sequence can be easily generated by hardware or software
Paper 5.3
and presents maximum fault detectability of AND-shorts or OR-shorts. For other sequences and fault models, similar analysis can be easily performed. The number of test vectors of the Walking One Sequence is equal to the entire number of nets under test. Thus, for a 128-bit bus, the number of test vectors would be 128. However, in the 2x2 NoC, the 128 nets under test are distributed into 16 8bit channels and only four paths can be implemented using the XY routing policy (see Figure 3). Hence, the test length of the Walking One Sequence for the basic NoC is xactually divided by 4 resulting in 32 test vectors for each path, where each packet traverses a total of 32 lines. These 32 test vectors then must be organized in the packet format to be sent through the network. Figure 3 illustrates the application of the test sequence in the basic NoC topology. The proposed approach x exercises all 128 nets using the routers functional mode. Considering the XY routing strategy, this means that 4 packets can be sent across the 2x2 mesh network at the same time. Those packets are shown in Figure 3 by the four different lines, one solid, one dashed, one with lines and points, and one with triple lines. Each line pattern represents one routing path: - Packet 00 (solid lines): one step east and one step south, from core 0 to router 00, to router 01, to router 11, to core 3. - Packet 01 (lines with points): one step west and one step south: from core 1 to router 01, to router 00, to router 10, to core 2. - Packet 10 (dashed lines): one step east and one step north: from core 2 to router 10, to router 11, to router 01, to core 1. - Packet 11 (triple lines): one step west and one step north: from core 3 to router 11, to router 10, to router 00, to core 0 Figure 4 shows the organization of the four packets that are sent simultaneously by the 4 cores. Note that although all nodes use the same test sequence (32 vectors), the payload is shifted in time, so there is only one flit that contains only one bit at ‘1’ at a time in the 128 links. The detailed packet organization used in the proposed approach is shown in Figure 5. There are 32 test vectors (Walking One Sequence) that must be sent as payload information flits led by the header flit and followed by a tail in each packet. For the basic NoC topology, the header flit has at most 4 out of 8 bits that can assume the value ‘1’ or ‘0’ according to the XY routing strategy. The remaining bits of the header are always kept at zero. For the casestudy network, it takes 9 clock cycles to send a header from the source core to the target NoC node. This number (h) may vary for different NoC protocols. So, it is necessary to insert h flits with strings of zeros after the header to fill out the entire network links with zeros, preparing for the test vector sequence.
INTERNATIONAL TEST CONFERENCE
5
tail tail tail time
Figure 4 – Test sequence application in the basic NoC.
Additionally, there are l strings of zeros after each test vector in the payload. This is related to the number of clock cycles needed to send a payload flit from the source core to the target NoC node. In the case-study NoC, it takes 4 clock cycles (l=4). These strings are needed to guarantee that only one net of the network will be holding the value ‘1’ at a time, to ensure the detection capabilities
Link0
…………………….
The Walking One Sequence ensures 100% fault coverage (considering all 128 links as a single 128-bit bus). However, as explained, the application of any sequence in the network must follow a communication protocol. Thus, one must consider the possibility of packet loss due to a fault. In this case, the whole test sequence cannot be applied and test quality is not ensured. On the other hand, we can also use packets headers as test vectors to recover the fault coverage. In the following, we evaluate the detection capabilities of the headers in the SoCin network. Then, the final test sequence is defined. 4.2.1 Fault Detection Capabilities of SoCin Headers The format of an 8-bit header in the SoCin NoC is shown in Figure 6(a) and an example of a header is given in Figure 6(b). In the example, the packet leaves node 00 and it should reach the node 11. The headers that establish the four test paths shown in Figure 3 will similarly have at most 4 bits set to ‘1’, as shown in Figure 4.
Link w-1
0 0 0 0 0 0 0 0
tail
l
3w.(1+l) strings of zeros
1000000000000000000000000000000000000000 0000010000000000000000000000000000000000 0000000000100000000000000000000000000000 0000000000000001000000000000000000000000 0000000000000000000010000000000000000000 0000000000000000000000000100000000000000 0000000000000000000000000000001000000000 0000000000000000000000000000000000010000
payload
0 0 0 0 0 0 0 0
payload
h strings of zeros
header
X 0 0 X X 0 0 X
of the chosen sequence. Also, the payload must be completed with strings of zeros, as shown in Figure 4. In the end, the packet size depends on the header size, the number of h strings of zeros placed after the header, the number of vectors of the test sequence needed for a given number of links in the channel (w) and the number of l strings of zeros placed between the test vectors. For the SoCin NoC, the packet has 1[=header] + 9[=strings of zeros] + 8[=number of links in the channel] x 1+4[=test vector followed by the stings of zeros] +3[=number of o the packets in the NoC sent at the same time] x 1+4[=stings of zeros] +1[=tail] = 1 + 9 + 40 + 120 + 1 = 171 flits. Consequently, the test time is 171 clock cycles for the entire 2x2 Mesh NoC for detecting all pairwise faults.
4.2 Fault Coverage Analysis tail
payload
strings of zeros strings of zeros
strings of zeros
strings of zeros
strings of zeros payload
strings of zeros
strings of zeros
payload strings of zeros
strings of zeros
strings of zeros
strings of zeros strings of zeros
payload
1 0 0 0 1 0 0 1
strings of zeros
strings of zeros strings of zeros
strings of zeros
1 0 0 0 1 0 0 0
header
router 11
1 0 0 1 1 0 0 1
header
router 10
header
router 01
header
router 00
1 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0
l
Figure 5 – Packet organization for the Walking One Sequence
Paper 5.3
INTERNATIONAL TEST CONFERENCE
6
Direction E/W (1 bit)
# of shifts (3 bits)
Direction N/S (1 bit)
# of shifts (3 bits)
(a) Header format 0
1
2
3
4
5
6
7
0 0 0 1 1 0 0 1 (b) Header example Figure 6: Header of a 2x2 SoCin NoC
The header content is modified as it traverses the network, as shown in Figure 7. Each router that receives the header subtracts 1 from the shift field until no more shifts in one direction can be done. When the Y shift field is zero the header reached its destination. On the other hand, for the basic NoC topology, there is at most one shift in each direction. Therefore, the first two bits that indicate the number of shifts in either direction are always zero (bits 2, 3, 5, and 6). One can observe that all faults affecting bits 1, 2, 5, and 6 of Figure 6(b) will not affect the header contents, and will not cause packet loss. Similarly, some faults affecting bits 3, 4, and 7 will not be detected also. For instance, a short between lines 4 and 7 within the channel connecting nodes 00 and 01 does not change the contents of the header. Those faults must be detected by the test sequence applied through the packet payload. However, faults affecting the remaining bits can change the header by either indicating the wrong direction or the wrong number of shifts in one direction. Those faults will cause packet loss and will be detected by time-out. Let us first consider the case where the four packets are sent simultaneously over the network. For this configuration, let us consider only the headers as test vectors, i.e., the payload of all packets is a string of l zeros, so that all lines are filled up during testing. In this case, the test has four steps, as shown in Figure 7, corresponding to the traversing of the packet from the core to the router (rc_links), through channels connecting two routers (rr_links), and from the router to the target core (rc_linkr). At each step the headers may be modified as explained before. In the first step, 1,392 packet losses will be detected by the network. Steps 2, 3, and 4 detect, respectively, 960, 496, and 496 losses. Thus, a total 3,344 packet losses caused by short faults can be detected using the headers as test vectors. If every fault causes a single packet loss, the headers fault coverage would be 41,14% (3,344 out of 8,128 faults). In fact, many faults can generate more than on packet loss. That is the case, for example, of the short between bit 4 of rc_link_s00 (shown in Figure 3) and bit 0 of rr_link_01_00. Thus, 41,14% is actually an upper bound for the expected fault coverage of the headers. In the configuration above one can observe that there are more than one ‘1’ traversing the network at the same
Paper 5.3
time, which may reduce the fault coverage. Thus, let us consider a second configuration where the same headers are sent through the network but at different times, in such a way that a single header traverses the network at a time. A similar analysis indicates an expected fault coverage of 43,2% for this configuration. In fact, the number lines holding the value ‘1’ at the same time decreases as the headers advance through the network. Thus, very few additional faults are detected by the sequential application of the headers. Since this configuration takes much more time than the previous one, the test packet shown in Figure 4 can be applied simultaneously, keeping the test time short. 4.2.2 Fault Detection of the Complete Walking One Sequence The fault coverage of the Walking One Sequence is 100% as long as there is a single bit ‘1’ in each test vector. This sequence is applied to the basic NoC topology in such a way that this rule is respected. If a fault does not affect the packet header, the packet will arrive at the destination node and the payload will be compared to the expected one. If any difference is found in any flit, the fault is detected. Hence, all faults that do not cause packet loss will also be detected. In addition, we note that some faults can not only cause packet loss, but also be detected by the payload. It is the case, for example, of the short between bit 1 of rc_link_s00 and bit 1 of rc_link_s01. The packet leaving node 01 will be lost due to the fault. The header leaving node 00 is not affected, but the payload of this packet will show the fault. One may be tempted to use this information to reduce the number of test vectors in the payload. This reduction can be sought by considering all lines involved in a path that is traversed by the header in the network. For the basic NoC topology, this path corresponds to 32 lines under test. Considering that all other lines can be set to 0, the detection of a short between any of the 32 lines of a path and any other line in the network can only be ensured if a Walking One Sequence is applied. However, as we can observe in Figure 7, this is not the case for the headers. Similarly, the same fault can be detected by two payloads. That is the case, for instance, of a short between bit 0 of rc_link_s00 and bit 0 of rr_link_01_00. The flit 10000000 in the packet leaving node 00 will be modified and the fault will be detected at node 11. Since the fault is permanent, the flit 10000000 in the packet leaving node 01 will also be modified, and detected at node 10. However, both flits are the only ones capable of detecting the shorts between bit 0 and bits 1 to 7 of the same link for each channel in the path from node 00 to 11 and from 01 to 10. The other solution to detect these faults is to apply the Walking One Sequence for the remaining 7 bits of the flit. But, then, the reduction in the test sequence is minimal.
INTERNATIONAL TEST CONFERENCE
7
Step 1
10
router 10000000
10
01
00001000
00000000
00000000
00000000
00000000
router 00000000
10001000
00000000
11
00000000
00000000
router
router 00000000
00000000
00000000
00
00001000
10
10000000
01
00000000
router
00000000
00000000 00000000
router 00000000
10001000
10000000
00
11
00000000
00000000
00000000 00000000
router
router
Step 4
00000000
00000000
00000001
10000001
Step 3 00000000
01
00000000
10010001
00000000
00000000
00000000
10001001
router
11
00000000 router 00000000
00
router 00000000
router
00000000
10
01
00000000
router
00010001
00000000
00000000
00000000 00000000
00000000
00
00000000 router
00000000 00001001
00000000
router
00000000
00011001
00000000
10011001 00000000
00000000
00000000
Step 2
router 00000000
10001000
11
00001000
00000000
Figure 7 - Header traversing through the NoC.
5 Experimental Results In order to demonstrate the effectiveness of the proposed approach, a 2x2 SoCin NoC was implemented in VHDL together with a fault injection mechanism capable of inserting AND-shorts between all pair of links in the design. Test sequence generators and test response analyzers were implemented as testbenches. We note that although our experiments were performed for the Walking One Sequence, similar analysis can be performed for any test sequence generator and its correspondent analyzer, according to the defined fault model. The 8,128 faults were exhaustively injected in all 128 interconnects of the 2x2 NoC topology. The simulation was performed using the ModelSim tool, where internal signals can be connected and disconnected by the command force placed at the testbench file. The entire simulation time takes only few minutes. Table 1 presents the results of the fault simulation campaign. Faults are classified as “detected by time-out only” (column 1) or “detected by payload only” (column 2) and “detected by time-out and payload” (column 3). The first line shows the results when all headers are applied simultaneously, and line 2 represents the configuration where headers are multiplexed in time. In line 3, the complete Walking One Sequence is sent in each packet (after the header) and packets are sent simultaneously. For each fault, only one detection is accounted for. From Table 1 one can observe that the experimental results match the expected fault coverages defined in Section 4.2. The fault coverage of the headers alone is below the defined upper bound, and the increased coverage of the multiplexed header sequence is also
Paper 5.3
observed. Notice also that the walking sequence was able to detect all faults in only 171 clock cycles by exercising all the functional NoC paths.
6 Method Scalability The 2x2 basic NoC topology was used to define the minimum test configuration topology for the detection of short faults in an on-chip network. If short faults are to be detected in larger networks, as illustrated in Figure 8, a set of tests based on configurations of 2x2 sub-NoCs must be implemented. Figure 8(a) shows the first configuration that tests faults between all the middle interconnect links. Figure 8(b) shows the four test configurations that can run in parallel to test the 16x4 channels in each gray area. Figure 8(c) shows the last four test configurations that can be applied two by two in parallel to the whole network. In total, nine configurations are applied in four test rounds, covering all pairwise shorts between neighbor interconnect links. This ensures that all the neighborhood channels are tested. This configuration tests can be easily extended to any other size of mesh NoC. On the other hand, the designer can use less test configurations to test only some parts of the layout that are more susceptible to short faults. The number of signal links (w) of each interconnect channel may vary even in the minimum test configuration topology. The curve in Figure 9(a) shows the effect of w in the number of short faults that must be detected in the 2x2 NoC topology. As expected, the number of faults grows exponentially with the number of bits of each channel. On the other hand, the number of test clock cycles needed for testing grows linearly in the proposed technique, as shown in the curves of Figure 9(b). Considering the size of the network, Figure 9(c) shows the number of test
INTERNATIONAL TEST CONFERENCE
8
configurations and the number of test rounds to test 2x2, 3x3, 4x4, and 5x5 NoC topologies. One can observe that the proposed method is very efficient in terms of test time for large networks. Since some test configurations can be
applied in parallel, the number of test rounds and consequently the testing time can be reduced. In the cases presented, the number of test rounds presents a much smaller upper bound then the number of configurations.
Table 1 - Testing Fault Coverage Analysis Fault detection by # clock Payload error Payload error and cycles Time-out only only Time-out
Experiments Headers Sequence (Time shifted packets) Headers Sequence (packets sent at the same time) Walking One Sequence (packets sent at the same time)
44
2807 (34.53%)
0
0
11
2800 (34.44%)
0
0
171
92 (1.13%)
5328 (65.55%)
2708 (33.31%)
(b)
(a)
Total 2807 (34.53%) 2800 (34.44%) 8128 (100%)
(c)
Figure 8 – A set of configuration test topologies for a 4x4 mesh NoC.
(a) channel width (w) versus number of short faults
(b) channel width (w) versus number of test cycles
(c) Influence of the NoC size in the number of experiments Figure 9 – Proposed Approach in a NoC
Paper 5.3
INTERNATIONAL TEST CONFERENCE
9
7 Conclusions and Future Works Considering the current technology advance and realistic NoC layouts, we proposed an extension of the fault models used when dealing with interconnections testing in networks-on-chips. The proposed extended model is composed of shorts between lines of a single NoC channel, shorts between lines of distinct channels, and shorts between lines of a channel and lines that connect the NoC to the functional core. We then showed that available test strategies are not capable of tackling the extended fault model. A new test strategy was devised, analyzed, and implemented for a basic 2x2 mesh topology NoC. Experimental results have corroborated the analysis of the proposed method efficiency and quality. Fault injection results show that it is possible to achieve a successful atspeed testing when testing short faults between large numbers of neighborhood channel links. The reasons are the reduced number of vectors in the test sequence and its test parallelism. Furthermore, an analysis of the method scalability was also performed, showing that it can be used for larger networks with larger channels bit width channels. Although the Walking Sequence ensures high diagnosability, in the proposed approach the faults detected only by time-out are difficult to diagnose. The faults detected by payload only and by time-out and payload can be diagnosed. Current work includes the study of diagnosing procedures. Additionally, the evaluation of the fault coverage for other test sequences, and additional fault models, such as, crosstalk and delay faults is also being considered.
8 References [1] Moraes, F. G., Calazans, N., Mello, A., Möller, L., Ost, L., “HERMES: an Infrastructure for Low Area Overhead Packet-Switching Networks on Chip”. Integration, the VLSI Journal, vol. 28-1, 2004, pp. 69-93. [2] Guerrier, P.; Greiner, A.; A generic architecture for on-chip packet-switched interconnections, Design, Automation and Test in Europe, DATE, 2000, pp. 250 – 256. [3] Zeferino, C.; Susin, A. ; SoCIN: a parametric and scalable network-on-chip, 16th Symposium on Integrated Circuits and Systems Design, SBCCI, 2003, pp.169 – 174. [4] Vermeulen, B.; Dielissen, J.; Goossens, K.; Ciordas, C. Bringing Communication Networks on a Chip: Test and Verification Implications, IEEE Communications Magazine, Volume 41, Issue 9, Sept. 2003, pp. 74-81. [5] Cota, E.; Carro, L.; Lubaszewski, M., Reusing an on-chip network for the test of core-based systems, ACM Transactions on Design Automation of Electronic Systems, TODAES, v.9 n.4, 2004, pp.471-499. [6] Ahn, J.; Kang, S. Test Scheduling of NoC-Based SoCs Using Multiple Test Clocks. ETRI Journal, Volume 28, Number 4, August 2006. pp 475 - 485. [7] Kim, J.; et. al. On-chip Network Based Embedded Core Testing. IEEE International SOC Conference, 2004, pp. 223 - 226
Paper 5.3
[8] Amory, A. M.; Briao, E.; Cota, E.; Lubaszewski, M.; Moraes, F.G.; A scalable test strategy for network-on-chip routers, IEEE International Test Conference, ITC, 2005. 9 pp. [9] Stewart, K.; Tragoudas, S.; Interconnect testing for networks on chips, 24th IEEE VLSI Test Symposium, 2006. 6 pp. [10] Grecu, C.; Pande, P.; Baosheng Wang; Ivanov, A.; Saleh, R.; Methodologies and algorithms for testing switch-based NoC interconnects, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT, 2005, pp. 238 – 246. [11] Grecu, C.; Pande, P.; Ivanov, A.; Saleh, R.; BIST for network-on-chip interconnect infrastructures, 24th IEEE VLSI Test Symposium, VTS, 2006, 2006, 6 pp. [12] Raik, J.; Govind, V.; Ubar, R.; An External Test Approach for Network-on-a-Chip Switches, 15th Asian Test Symposium, 2006. ATS, 2006, pp. 437 – 442. [13] Pande, P.P.; Ganguly, A.; Feero, B.; Belzer, B.; Grecu, C.; Design of Low power & Reliable Networks on Chip through joint crosstalk avoidance and forward error correction coding, 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT, 2006, pp. 466 – 476. [14] Angiolini, F.; Meloni, P.; Carta, S.; Benini, L.; Raffo, L.; Contrasting a NoC and a Traditional Interconnect Fabric with Layout Awareness, Design, Automation and Test in Europe, DATE, 2006, pp. 1 - 6 [15] Kim, J.; Nicopoulos, C.; Park, D.; Yousif, M.; Vijaykrishnan, N.; Das, C.; A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, 33rd Annual International Symposium on Computer Architecture, ISCA, 2006, pp. 4-15. [16] Cuviello, M.; Dey, S.; Xiaoliang Bai; Yi Zhao; Fault modeling and simulation for crosstalk in system-on-chip interconnects, IEEE/ACM International Conference on Computer-Aided Design, ICCAD, 1999. pp. 297 – 303. [17] Benini, L.; De Micheli, G.; “Networks on Chips: A New SoC Paradigm, IEEE Comp. Vol.35 no.1, Jan. 2002. [18] Zeferino, C.A.; Kreutz, M.E.; Susin, A.A.; RASoC: a router soft-core for networks-on-chip, Design, Automation and Test in Europe Conference, DATE, 2004, pp.198 – 203. [19] Kautz, W., Testing for Faults in Wiring Networks, IEEE Transactions on Computers, Vol. C-23, No. 4, April 1974, 358-363. [20] Cheng, W.-T.; Lewandowski, J.L.; Wu, E.; Diagnosis for Wiring Networks, International Test Conference, ITC, 1990, 565-571. [21] Stroud, C., A Designer’s Guide to Built-In Self-Test, Frontiers in Electronic Testing, Volume 19, Springer, 2002. [22] Jutman, A.; At-speed on-chip diagnosis of board-level interconnect faults, Ninth IEEE European Test Symposium, ETS, 2004, pp. 2 – 7. [23] Bengtsson, T.; Jutman, A.; Kumar, S.; Ubar, R.; Peng, Z.; Off-Line Testing of Delay Faults in NoC Interconnects, 9th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools, DSD, 2006, pp. 677 – 680 [24] Holsmark, R.; Kumar, S.; Design issues and performance evaluation of mesh NoC with regions, NORCHIP Conference, 2005, pp. 40 – 43.
INTERNATIONAL TEST CONFERENCE
10