Fault-Tolerant Network Interfaces for Networks-on-Chip - IEEE Xplore

1 downloads 0 Views 1MB Size Report
malfunctions and failures in the networks-on-chip (NoCs) components increases. ... Index Terms—Networks-on-chip, network interface, fault tolerance, reliability, ...
16

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

Fault-Tolerant Network Interfaces for Networks-on-Chip Leandro Fiorin, Member, IEEE, and Mariagiovanna Sami, Member, IEEE Abstract—As the complexity of designs increases and technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the networks-on-chip (NoCs) components increases. In this work, we focus on the study and evaluation of techniques for increasing reliability and resilience of network interfaces (NIs) within NoC-based multiprocessor system-on-chip architectures. NIs act as interfaces between intellectual property cores and the communication infrastructure; the faulty behavior of one of them could affect, therefore, the overall system. In this work, we propose a functional fault model for the NI components by evaluating their susceptibility to faults. We present a two-level fault-tolerant solution that can be employed for mitigating the effects of both permanent and temporary faults in the NI. Experimental simulations show that with a limited overhead, we can obtain an NI reliability comparable to the one obtainable by implementing the system by using standard triple modular redundancy techniques, while saving up to 48 percent in area, as well as obtaining a significant energy reduction. Index Terms—Networks-on-chip, network interface, fault tolerance, reliability, online fault detection, high-level error models

Ç 1

INTRODUCTION

A

CMOS technology scales down into the deepsubmicron domain, devices and interconnect of new complex designs are subject to new types of malfunctions and failures that are hard to predict and to avoid with the current design methodologies [1], [2]. This is particularly true for embedded systems, often composed of a high number of heterogeneous intellectual property (IP) cores (possibly offered by different vendors), and connected by means of networks-on-chip (NoCs). To deal with faults in such complex systems, new fault-tolerant approaches are needed: new methodologies and architectural solutions should be explored. Several fault-tolerant solutions have been proposed for NoCs, in particular addressing permanent and temporary faults in the links and in the router architecture. However, only few works have addressed the fault tolerance of network interfaces (NIs). NIs are in charge of interfacing IP cores to the communication infrastructure and the overall system. They represent critical points in the design of a fault-tolerant NoC. In fact, faults in the NI can cause errors that, directly affecting the correct transmission of data and control information, could be extremely hard to detect and recover without the appropriate support (leading, for instance, to deadlock or livelock conditions). Moreover, a faulty NI can isolate a working core (or cluster of cores) from the rest of the system, thus generating a massive and unwanted extension of the fault area. Particular attention should, therefore, be given to the design of the fault-tolerant S

. L. Fiorin is with ALaRI, Universita` della Svizzera italiana, via Buffi 13, Lugano 6900, Switzerland. E-mail: [email protected]. . M. Sami is with the Dipartimento di elettronica, Politecnico di Milano, Informazione e Bioingegneria, via Ponzio 34/5, Milano 20133, Italy. E-mail: [email protected]. Manuscript received 12 May 2012; revised 26 Dec. 2012; accepted 4 July 2013; published online 11 July 2013. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TDSC-2012-03-0050. Digital Object Identifier no. 10.1109/TDSC.2013.28. 1545-5971/14/$31.00 ß 2014 IEEE

provisions of the NI. Usual fault-tolerant hardware implementations of sensitive components employ triple modular redundancy (TMR), in which three copies of the same component perform the same operation, and the single output result is obtained by a voting system [3]. A TMR implementation is, however, expensive in terms of the amount of resources and energy needed; in particular in the case of embedded systems, the strict design constraints make the extensive use of hardware redundancy not economically viable. This is particularly true for NIs, which often represent a significant part of the area of the overall communication subsystem [4]. While focusing on the study of the susceptibility of NIs to permanent faults, the goal of this work is to propose and evaluate architectural methodologies that could be applied to make NIs resistant to both permanent and temporary faults. In [5], we proposed and evaluated two-level architectural fault-tolerant solutions for implementing those NI components identified as most sensitive to faults, i.e., FIFOs or buffers, the lookup table (LUT), and the finite-state machines (FSMs) driving NI operations. The solutions require a limited amount of redundancy and yet are able to mitigate the effects of both permanent and temporary faults in the NI. This paper extends and improves the work presented in [5] by presenting an analysis of the NoC susceptibility to permanent faults, as well as the proposal of a high-level functional fault model for the NI that is afterward employed for evaluating the tolerance of different NI architectures to permanent faults. Moreover, this work extends the fault-tolerant solution previously proposed by presenting and discussing online reconfiguration strategies for the fault-tolerant components, activated as a consequence of fault detection. The remainder of this paper is organized as follows: Section 2 discusses related work. Section 3 presents the NI architecture taken as reference in this work. Moreover, it presents an extensive analysis of the NI fault susceptibility and discusses the error model considered in this work. Published by the IEEE Computer Society

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

17

Section 4 presents the proposed fault-tolerant architectural solutions for the NI components, while Section 5 discusses online error detection and runtime reconfiguration policies. Sections 6 and 7 describe the evaluation of the proposed NI architectures. Finally, Section 8 presents conclusions.

2

RELATED WORK

The fault tolerance of NoC-based systems has been addressed by a significant amount of research effort. On the one hand, the correct communication of data and control data is studied: in [6], [7], [8], fault-tolerant solutions are proposed to mitigate transmission errors due for instance to crosstalk, electromagnetic radiations, or alpha particles. Discussed solutions are mainly based on the use of error detecting and correcting codes [7], [8], [9], or/and retransmission [6], [10]. On the other hand, architectural solutions have been studied for increasing fault tolerance in routers and the NoC, often exploiting the intrinsic redundancy of NoC paths for providing alternatives to faulty links or faulty components in routers [11], [12], [13]. However, these solutions make the assumption that the information inserted in packet headers is correct. Without a careful protection of NI operations and of the information stored in the NI, this assumption cannot be guaranteed, thus rendering all the previous protection mechanisms ineffective, since the information is in fact corrupted before entering the NoC. In this work, we address the fault tolerance of NIs. Previous work focused on the definition of a functional fault model notation [14], [15], or on providing support for error detection in links [6], [7]. In [16], multiple NIs connect a core to more than one router, improving the fault tolerance of the connections between NIs and routers. However, as demonstrated in Section 3, the NoC can still suffer from errors in the communication due to faulty behaviors of NI components. In general, previous work about NI can be considered complementary to the solution presented in this paper. With respect to it, we address not only permanent faults in the link connecting the core to the NI, but we propose a solution able to deal with both permanent and temporary faults in all the main architectural elements of the NI. In our work, we address tiled architectures (such as, for instance, the one in [17]), in which the link between core and NI can be considered as part of the node circuits and signals, and treated accordingly with standard fault-tolerant techniques. For this type of NI architecture, a careful analysis of the NI’s fault tolerance has not been performed up to now; often, a “collapsing” of NI and core is adopted as far as faults are concerned. This work provides an evaluation of possible architectural techniques to be used for increasing the faulttolerant characteristics of the NI’s main components, and, therefore, of the overall NoC.

3

THE NI: REFERENCE ARCHITECTURE AND FAULT MODEL

This section investigates NI susceptibility to permanent faults and proposes a fault model for the NI: as we prove later on, faults in NIs become increasingly critical—with respect to router faults—when the number of nodes in the NoC increases.

Fig. 1. Overview of the reference NI architecture considered in the experiments.

3.1 Overview of the Baseline NI Architecture A fault-free network interface assumes a particular relevance in the design of a reliable multiprocessor system-onchip (MPSoC). An NI includes a front-end and a back-end submodules [18]. The front-end module implements the communication protocol adopted by the core. The back-end module is in charge of implementing basic communication services, such as data packetization, and routing and control flow-related functions. Moreover, additional services, such as link error detection and error recovering strategies, transaction ordering, support for cache coherence, and security, can also be implemented [18]. In this paper, we focus on an NI providing basic communication services. Presented fault models and fault-tolerant techniques can be, however, extended to include additional services. Fig. 1 shows the basic functional blocks of the NI taken as reference in our evaluations. Several alternatives are available for the implementation of the basic services provided by the NI, and in particular for the packetization phase [18]. In this work, we consider the NI as an independent hardware block located between the core and the communication infrastructure. The main NI components considered in our design are: an Open Core Protocol (OCP) [19] adapter, which implements the OCP protocol. At cores acting as initiators, the adapter implements a Slave interface, while, at target cores, it implements a Master interface; . the NI kernel, which receives and transmits data and control information from/to the adapter, packetizes and de-packetizes messages, schedules and inserts packets in the output FIFO buffer, retrieves them from the input FIFO buffer, and implements the control flow mechanism; and . the output FIFO buffer, that stores packets ready to be inserted into the NoC, and the input FIFO buffer, that stores incoming packets. The NI implements a transaction-based communication on a shared-memory abstraction, in which elements on the NoC are memory-mapped [18]. The NoC implements a wormhole flow control and a source-based routing policy [20], [21]. When a new transaction is requested by the processing element, the NI looks up the memorymapped address of the OCP transaction by employing a .

18

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

Fig. 2. Overview of the reference router architecture considered in the experiments.

programmable LUT, located in the NI kernel [21]. The LUT returns a sequence of bits that codes the path used by the packets to reach the destination node in the NoC. This routing information is inserted into the packet header, and its length depends on the dimension of the NoC and on its topology. At each router encountered along the path, a few bits of the sequence are employed for requesting the desired output port. After the use of the output port is granted, such bits are discarded and the header of the packet is updated [20]. The LUT is programmable in the sense that stored information can be rewritten to support runtime modifications of the routing paths, caused, for instance, by the presence of faulty links in the NoC.

3.2 NI Susceptibility and Fault Models To understand the susceptibility of the NI and the NoC, we carried out some fault injection experiments on a VHDL model. As model for the NI, we consider the baseline NI architecture shown in Fig. 1. As for the router, we implemented an input buffered architecture supporting source-based routing [18]. A high-level view of the architecture of a five-port router is shown in Fig. 2. In the experiments, we considered a tile-based NoC in a 5  5 mesh topology. The NoC implements a wormhole control flow and a deterministic deadlock-free source-based routing, in which the header with the routing information is contained in the first flit of the packet. We considered a 35-bit data path (32 bits for data and 3 bits for control information) and a depth of 4 and 2 for the router input and output buffer, respectively, and of 8 for NIs input and output buffers. The dimension of the buffers were chosen accordingly to common practice in NoC design, in which NI buffers are bigger than router buffers, and big enough to obtain predictable performance, for instance, in real-time systems [22], [23]. Both network interfaces and routers were implemented in VHDL and synthesized with Synopsys Design Compiler, by employing the Nangate 45-nm CSS typical open cell technology library. In our synthesis, we targeted a frequency of 500 MHz. In the case of a 5  5 mesh topology, the area obtained for the NI and the router is 0.0096 and 0:0097 mm2 , respectively. 3.2.1 Simulation Infrastructure for Permanent Faults To evaluate the impact of silicon defects in NIs and routers and characterize the high-level errors caused by the faults,

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

Fig. 3. Simulation infrastructure employed for high-level error characterization of routers and NIs.

we employed a simulation infrastructure similar to the one presented in [24]. As shown in Fig. 3, a simulator simulates in parallel two copies of the gate-level description of the design. While one of the two copies is kept defect-free, the second one is subject to fault injection. Defects are injected at time 0 and uniformly distributed in space in the design. We consider stuck-at-0 and stuck-at-1 faults, injected in each node of the synthesized netlist by employing Synopsys Tetramax. The input stimuli consist of a full coverage test that activates each internal circuit node of the system. We perform the simulation for each fault by considering one fault at the time. The output of the simulation of the defectexposed model is compared against the output of the defect-free model, and we identify the high-level error caused by the injection of that specific permanent fault into the system. By employing this methodology, we are able to identify the components of NIs and routers that are most sensitive to permanent faults, the high-level error generated by each fault, as well as the percentage of occurrence of a specific high-level error at the output of the component, which is calculated by taking into account the area of the cells hit by the faults generating that specific error. This characterization is employed in Section 7 for building the high-level models used for evaluating the resistance of the system to the faults. In the case of a 5  5 mesh topology, the number of faults injected by the evaluation system into the NIs is 763,800, while the faults injected in the routers is equal to 825,300. Table 1 reports the percentage of measured faults for each component of the NI and the router, calculated with respect to the total number of faults injected into the node. In both cases, Other components include glue logic and some additional registers needed for the controlling system TABLE 1 Percentage of Measured Faults for Each Component of the NI and the Router, Calculated with Respect to the Total Number of Faults Injected into the Node, in the Case of a 5  5 Mesh

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

19

TABLE 2 Percentage of Errors (with Respect to Total NoC Errors) Measured in the NI during the Fault Injection Campaign

TABLE 3 Percentage of Errors (with Respect to Total NoC Errors) Measured in the Router during the Fault Injection Campaign

operations. In the case of a 5  5 mesh topology, the number of faults concerning NIs and routers is approximately similar: Faults affecting routers represent around the 52 percent of the total faults in the NoC, while the faults affecting NIs represent about 48 percent. By analyzing the output of the defect-exposed model, we identified the following types of functional errors for the NI:

into it and potentially to a significant number of runtime routing errors due to those packets generated with that corrupted information. The second most significant type of errors (21.78 percent) is due to faults affecting the buffers of the NI and, therefore, the information temporarily stored in it (data flits, header flits, and control information). Faults in FSMs mainly affect logic related to the data transfer during the protocol translation. Table 3 shows the high-level errors generated in the routers. In the case of the router, we adapt the system-level fault models defined in [25] and [7] to our implementation. The following types of errors can be identified:

Corrupt data error. Data are corrupted during the operations of the NI, and wrong data are sent through the communication channel. This type of error can happen due to faults in the protocol adapters and in the FIFOs. 2. Corrupt protocol conversion error. At the initiator side, faults in the NI can lead to the corruption of the control signals received from the core, causing the NI kernel to generate wrong routing and control information for the packet header. At the target side, faults affecting the protocol conversion will cause wrong implementations of the core communication protocol, invalidating or disrupting the performed operation. This type of error is due to faults in the NI protocol adapters. 3. Routing path error. The routing path inserted in packet header is calculated looking up the address of the requested operation. Faults in the lookup table cause erroneous routing and control information to be inserted in the packet header, leading to possible communication errors such as misdirection, deadlock, or livelock. Faults in the LUT and FIFOs can cause this type of error. 4. Control flow error. Faults in registers storing control information in FIFOs and protocol adapters cause errors in the control flow of the FIFOs, by communicating corrupted information about the flits in the buffers. For instance, multiple copies of an outgoing or incoming packet could be sent to the input or output ports. Table 2 shows errors generated in the NI by the injected faults, according to the just described high-level error model. The table shows the measured number of errors and the related percentage with respect to the total errors generated in the node. In the case of a running system, the error distribution would depend on the working condition (e.g., the packet injection rate and the traffic patterns) and on the effect of the faulty components on the packets and NI operations. As it is possible to note from the table, a significant percentage of the faults (23.64 percent) will affect the LUT, leading to the corruption of the routing information stored 1.

Corrupt data error. Transported data are corrupted during its passage through the router. 2. Routing path error. Due to corrupted routing information, packets are routed to directions different than those originally specified by the routing information in the header. 3. Switching error. Packets are sent to wrong output ports, or duplicated. 4. Control flow error. The control flow of FIFOs is corrupted, due to faults in registers storing control information in the router. Errors in routers are mainly caused by faults hitting input and output buffers (32.98 percent), and therefore causing corruption of data, routing, and control information in packets. The second source of errors is the crossbar (10.59 percent), causing in particular corruption of data (8.86 percent). Faults in the switch allocator cause mainly errors in the switching of the router, potentially creating problems such as the missed delivery of flits and the wrong output port selection. Permanent fault and error distribution changes when varying the number of nodes in the NoC. Fig. 4 shows, for the case of n  n mesh topologies, how faults affect NIs and routers when varying the value of n. As Fig. 4 shows, NIs rapidly become the most sensitive elements in the NoC, in particular when its dimension increases. The increasing influence of the NI on the number of total faults and errors is due to the fact that, while the total area of the routers is approximately directly proportional to the number of nodes n in the topology, the total area of the NIs (and in particular of their LUTs) increases with n2 . As a results, NIs have higher probability to be hit by faults when the number of nodes increases, and, consequently, to lead to errors in the system. These results call for the need of designing appropriate fault-tolerant solutions for the NI’s components. 1.

20

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

Fig. 4. Total faults in NIs and routers when varying the number of nodes in the NoC.

4

PROPOSED NI FAULT-TOLERANT APPROACHES

Goal of this work is to improve NI resilience by increasing the fault-tolerant capabilities of its basic blocks. As shown in Section 3, in our fault model, errors are mainly due to faults concerning the LUT, FIFOs, and FSMs. We, therefore, focus on these building blocks. These NI components are mainly composed of memory cells (SRAM or flip-flops) and they are particularly affected by both permanent and temporary faults [26]. Solutions presented here for implementing these basic components are based on the use of error correcting and detecting codes [3], in combination with limited redundancy, and a limited use of TMR. Additional logic and components of the NI, which represent a small but not negligible part of the overall area of the NI architecture (in our implementations, about 6 percent for an NI of a 25-node NoC), are implemented by using TMR.

4.1 Lookup Table In a source-based implementation of the NI, a LUT is employed for retrieving the routing path associated with the address specified in a transaction [21]. As previously shown in the fault injection campaign, faults in the information stored into the LUT generate mainly routing path errors. As baseline architecture, we consider a LUT implemented as a combination of a nonprogrammable contentaddressable memory (CAM) [27] and either a RAM or a set of registers (labeled Configurable LUT in Fig. 5). Without loss of generality, in this work, we refer to a register-based implementation. The CAM contains hardcoded the address boundaries of the memory-mapped IP cores of the NoC. When initiating a new transaction, the most significant bits of the operation address are compared with the values coded into the lines of the CAM. The position of the CAM line matching the input address is used to select the register in which the output of the lookup operation is stored, i.e., the routing path to reach the destination node mapped to the input address. Routing path information is stored into the LUT registers at boot time or after topology reconfigurations. Fig. 5 shows the architecture proposed for increasing the fault tolerance of the LUT. For the sake of clarity, in Fig. 5, we only show the architectural elements related to the lookup operation. We implemented a two-level approach which employs error correcting and detecting codes and a

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

Fig. 5. Overview of the proposed LUT architecture.

limited amount of architectural redundancy, allowing us to deal with both temporary and permanent faults in the LUT. Path information are stored by using a single error correcting and double error detecting (SECDED) Hsiao code [28] that is able to correct up to one error and detect up to two errors in each LUT register. A Hsiao encoder encodes the information when writing the register, while a decoder decodes it after lookup. The Hsiao code was chosen because it allows a uniform distribution of the XORs in the implementation of the encoder and the decoder, reducing therefore the number of levels of logic ports and the overall delay of the modules [28]. The error-correcting code (ECC) corrects single-bit errors, be they due to transient or permanent faults. However, an error caused by a permanent fault will recur every time the bad cell of the register is used, and, as faults accumulate, the device eventually becomes unusable. To provide architectural redundancy to the LUT, we included in the design a certain number of spare registers that are meant to substitute LUT registers in which the number of faults is higher than one, and that cannot, therefore, be anymore employed for storing correctly the routing information. These spare registers are of critical importance because a defective LUT register will cause an entire core not to be reachable from that NI. A bit in a status register specifies whether a specific register of the LUT is working or faulty by selecting either the regular register or the spare register. Spare registers are simply addressed through the least significant bits (LSBs) of the addressing signals. We implemented the status register, as well as all the control logic, by using TMR. Spare registers can be employed for substituting faulty LUT registers both during the postmanufacturing testing phase and at runtime for online faults due to the wear-out of the device. The use of row/column substitution is a well-known technique extensively used in industry for dealing with permanent faults in RAM memories [29]. Similarly, the use of ECC is employed for error correction and detection in large memories. However, for relative small storage elements such as the LUT of the NI, the use of these techniques must be carefully designed and adapted to a constrained environment to avoid the large overhead associated with them. Moreover, lines substitution is mainly used for replacing elements found faulty during the offline testing, while in our case, as explained in detail in Section 5, we address substitution at runtime.

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

Fig. 6. Overview of the proposed FIFO architecture.

The presented LUT architecture allows the NI to be protected from routing path errors by detecting and correcting the degradation of the routing information, both in the case of permanent and temporary faults. Routing path errors are also avoided by protecting the control registers of the LUT by implementing them in TMR, reducing the probability of selecting the wrong routing path register associated with an input destination address.

4.2 FIFOs FIFOs in NIs are used for decoupling the computation performed in IP cores from the communication operations, and for allowing a separate implementation and optimization of the two system components [18]. Different FIFO implementations have been proposed in the literature [30], [13]. Our baseline architecture is a register-based synchronous FIFO circular buffer, but the same considerations hold also for FIFOs implemented using dual-port RAMs. We consider a FIFO whose data path is equal to the flit dimension (data and control signals). As shown in Fig. 6, in addition to the storage elements, logic is needed for managing the pointer to the element to be extracted (read pointer), the pointer to the first available position in the FIFO (write pointer), and for implementing control signals notifying whether the FIFO is full (Full) and whether at least one element is present in the FIFO (Exists). The read and write pointers are implemented as counters which are updated depending on the write or read operation performed on the FIFOs. The implementation of fault-tolerant FIFOs has been recently addressed by related work on NoCs. In [30], a reconfigurable buffer is proposed, which can borrow elements from FIFOs in the neighboring router ports, at the cost of increased wiring complexity. The solution addresses, however, only permanent faults in the FIFO slots without discussing methods for detecting them online. A

21

fault-tolerant FIFO has been proposed in [13], which employs solutions similar to the one discussed later in this section and previously presented in the conference version of the work [5]. While both solutions rely on using error correcting and detecting codes and on exploiting the intrinsic redundancy of the FIFO slots, our solution, as explained later in detail, is more suitable for runtime fault detection and reconfiguration of the component. Fig. 6 shows the architecture of our fault-tolerant FIFO. Similarly to the solution discussed for the LUT, the presented architecture employs a two-level approach. Information in the FIFO is encoded, one flit at the time, by using a SECDED Hsiao code. To deal with permanent faults in the component, the FIFO exploits the intrinsic redundancy of its slots. By using this approach, we are able to provide also for this component a graceful degradation of the performance during its operations. As shown in Fig. 6, an offset register is associated with each slot of the FIFO. It stores the offset to be added when calculating the next value of the read and write pointers. In the case of a nonfaulty FIFO, all offsets are set to 0 (next working slot is the one immediately following). In the case of a FIFO with faulty slots, the offset register stores the values to be added in the calculation of the next pointer values for skipping the faulty slots. The number of bits needed for encoding values stored in the offset register varies according to the number of adjacent slots that can be faulty at the same time. The offset register, as well as the logic for generating the read and write pointers, and the control signals are implemented in TMR. Fig. 7 shows the integration of the proposed FIFO architecture in an NoC implementing error correction and detection in the links connecting NoC components [6]. With respect to the reference architecture [6], information is encoded before being inserted into the FIFO and checked for errors when extracted from it. If no errors are detected, the coded information is directly sent through the link, bypassing in this way the encoding step performed in the reference architecture before the transmission. Similarly, in the case of a single error, the coded information is corrected and directly transmitted. At the receiving NI, flits are checked for errors, corrected, and copied encoded in the receiving FIFO. When extracted from the FIFO, the decoder decodes the flits and provides them to the NI and the core. The proposed architecture protects the NI from the corruption of the information stored into the FIFO. Functional runtime errors such as corrupt data errors, routing path errors, and control flow errors, due to faults in the FIFO slots storing, respectively, body flits, header flits, and control flow information, are avoided or significantly reduced.

Fig. 7. Integration of the proposed FIFO architecture within an NoC link implementing error correction and detection.

22

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

4.3 FSM Protocol adaptation and NI kernel operations are controlled by FSMs. The current state of the FSM is stored in a state register. A bit flip due to a permanent or temporary fault in one of the flip-flops of the state register of the FSM may generate unexpected results, or, even worse, bring the system to an undefined state or to a crash. Different techniques, based on some level of either hardware or information redundancy, have been proposed for reducing the sensitivity to faults of FSMs [31]. Due to the limited number of states needed both for the OCP protocol implementation and the NI kernel operations [19], FSMs in our baseline implementation are relatively small Moore state machines. In our study, we adopt the SECDED Hsiao code for storing the state information of the FSM and compare this technique with the baseline and a TMR implementation (in Sections 6 and 7). In the discussed architecture, after calculating the next state of the FSM, the information is passed to a Hsiao encoder and stored in the state register. A decoder is used when retrieving the information about the state in the following clock cycle. Single errors in the state register are directly corrected by the decoder. If a double error is detected, the FSM goes to a reset state (to avoid indeterminate system states) and a warning signal is raised to notify the NI and the core. A fault-tolerant implementation of the FSM allows the system to be protected from corrupt protocol conversion errors.

5

ERROR DETECTION AND RECONFIGURATION POLICIES

We employ the error-detecting characteristics of the Hsiao code for detecting permanent faults in the modules and for activating online reconfiguration by substituting at runtime the faulty elements with spare working ones.

5.1 Lookup Table Fig. 8 presents the procedure applied when detecting errors in the LUT. In the flow presented in Fig. 8, the generic register i of the LUT (LUT(i) in the figure) stores the routing information for reaching the node associated with the register. The generic register i is considered as working if no permanent faults have been detected in it. It is considered partially working when one permanent fault has been detected in it: The register can still be used for providing a corrected routing information to the packet, thanks to the error correcting capabilities of the Hsiao decoder. In the case where a double error is detected in the considered generic i register, and no working spare registers are available, the whole LUT is considered to have failed, since the NI is no longer able to provide correct routing information for the packet directed to the NoC node associated with the doubly erroneous register. Therefore, the NI should be put offline, or the node memory-mapped to the address associated with the failed register should no longer be addressed by the NI communications. As shown in Fig. 8, if a single error is detected during the lookup of a LUT register considered as working, the system reacts by performing a check to determine if the error was caused by a permanent fault. The check consists in copying

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

Fig. 8. Diagram describing the reconfiguration policy applied when detecting errors in the LUTs.

the information read from the LUT’s i register, corrected by the SECDED decoder, into the same LUT register. After this operation, if the register still presents a single error, the fault is considered as permanent, and the LUT register as partially working. The detection of a double error in a partially working register requires recalculation of the routing path associated with it. The recalculation of the routing path is performed by running a software routine that applies the rules of the routing algorithm implemented into the NoC for finding the path to the destination. Without loss of generality, we can assume this routine to be implemented in a fault-tolerant NoC, being in fact needed for dealing with permanent faults in links and router components. After the recalculation, the information is copied in the register and checked again for errors. If working spare elements are still available, the LUT is reconfigured by enabling the use of the associated spare register, and still considered as working. Otherwise, it is considered as nonworking.

5.2 FIFOs Fig. 9 shows the reconfiguration policy implemented for the FIFOs. A partially working FIFO slot i has all the replacement slots with at least one permanent fault, but they still can be used because of the correcting capabilities of the SECDED code. The FIFO is nonworking when for at least one slot is not possible to skip all the adjacent faulty slots. In the case of a working FIFO slot, at the detection of a single error by the FIFO output decoder, the index of the faulty slot is recorded. When the same FIFO slot is read again, it will contain a new flit. If the decoder finds again an error, the fault is considered as permanent. The offset register is appropriately updated to skip in the following FIFO operations the slot containing the permanent fault. When a fault is detected in the last fault-free replacement slot for slot i, the offset is put again to the initial configuration, not skipping any more the one-fault slots. The FIFO slot is then

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

6

Fig. 9. Diagram describing the conservative reconfiguration policies applied when detecting errors in the FIFOs.

considered as partially working. In this case, at the detection of a double error, the slot is checked for a permanent double fault by flushing the FIFO and by asking to reissue the packets stored in the FIFO. If the double fault is confirmed, and if no replacement slots with (at most) one fault are available, the FIFO is considered as nonworking. In the case of available replacement slots, the offset register is updated to skip the double faulty slot, the FIFO is flushed again, and the message reissued. If the NoC does not support the reissuing of packets, the above policy can be modified to move to the nonworking FIFO status when the first double error is detected. Policies similar to those presented for the LUT and the FIFO could also apply to FSMs, for which the detection of a single or of double errors activates a check on the correctness of the information stored in the following FSM states on the status register to determine if the fault is permanent or transitory.

5.3 Implementation of the Policies The reconfiguration policies are implemented as software routines running on the processor of the NoC tile. At the detection of an error, an interrupt request is generated by the NI. Additional signals specify the location of the error (LUT, FIFOs, FSM, additional TMR components) and its type (single, double). When the interrupt is processed, the processor calls the interrupt handler, which reads the error information and implements the reconfiguration policy for the component generating the interrupt request by taking care of reading and writing values in component registers, as well as updating the offset and the status registers. The NI interface to the core was extended to support the programming of the configurable elements of the components, such as the offset register of the FIFOs, and the status register of the LUT.

23

IMPLEMENTATION RESULTS

In this section, we evaluate the architectural solutions presented in previous sections. In our evaluation, a baseline architecture is the reference architecture that does not implement any fault-tolerant strategy. TMR architectures are implemented by employing TMR techniques throughout. A SECDED architecture employs the Hsiao code for detecting and correcting errors, without implementing any architectural redundancy. We call FT architectures those employing the solutions previously described. Without loss of generality, in our experiments, we refer to an NoC with a square-mesh topology, and we employ routers with up to five output ports. We assume full connectivity between the cores, i.e., each core is able to communicate with all the other cores. We target an NoC architecture implementing a source-based routing. Each router encountered along the path to the destination node employs 3 bits of the routing path for selecting the desired output port [21]. For the considered NoC topology, we imposed the condition that the farthest reachable node is at pffiffiffi 2 n  2 hops, where n is the number of nodes of the NoC (with this assumption, we are able to cross an n  n mesh from one corner to the opposite one by using for instance an XY routing algorithm). The number of routing bitspneeded ffiffiffi for encoding the path is, therefore, equal to 3ð2 n  1Þ, pffiffiffi i.e., 3ð2 n  2Þ bits for crossing the intrarouter links, and 3 bits for addressing the destination NI in the last encountered router. We consider a 35-bit data path (32 bits for the data of the flit, and 3 bits for control signals). We implemented components and NI in VHDL and synthesized them by using Synopsys Design Compiler. We targeted the Nangate 45-nm CSS typical open cell technology library. Results shown were obtained by targeting the synthesis to a clock frequency of 500 MHz. Energy estimation was obtained by using Synopsys Power Compiler.

6.1 Lookup Table Fig. 10 shows the area (in mm2 ) of the different implementations of the LUT, by varying the dimension of the mesh and, therefore, the number and dimension of the registers needed for storing the routing information. The data length pffiffiffi d of the LUT registers is equal to 3ð2 n  1Þ for the baseline and the TMR implementations. In the implementations employing error-correcting codes, namely SECDED and FT, the LUT registers store along the routing path information also the parity bits. The number of parity bits is equal to r þ 1, where r is the minimum number satisfying the equation 2r  d þ r þ 1. In the case of the FT architecture, we present results for implementations with number of spare registers equal to n, n2 , and n4 (named FT(n), FT(n/2), and FT(n/4), respectively). With respect to the baseline architecture, the FT architecture introduces an area overhead which varies with the dimension of the NoC. For the evaluated configurations, the maximum overhead was observed for the smaller NoC, i.e., the 3  3 mesh. Values of maximum overhead are 197, 137, and 107 percent for a number of redundant registers, respectively, equal to n, n2 , and n4 . The area overhead decreases with the dimension of the NoC (137, 80, and 51 percent for the three FT implementations, in the case of an 8  8 mesh). The reduction of the percentage overhead is

24

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

As presented in Fig. 10b, the energy (in pJ) shows trends similar to those obtained when evaluating the area. When considering the FT implementation of the LUT, the maximum overhead (217 percent) with respect to the baseline implementation is obtained for the 3  3 topology, while obtaining a maximum saving of around the 46 percent with respect to the TMR (8  8 topology). Similarly to the values measured for the area, the maximum overhead can be observed for smaller topologies, while the saving with respect to the TMR implementation increases with the dimension of the NoC. Fig. 10c shows the length of the critical path for the different LUT implementations, by varying the number of nodes in the NoC. As the figure shows, while reducing the amount of area and energy consumption, the FT architectures increase the critical path of the component. This is mainly due to the fact that both in the SECDED and in the FT implementations a decoder (usually synthesized as a tree of XOR [7]) is added to the critical paths of the LUT. For high-speed circuits, a TMR implementation is, therefore, preferable, at a higher cost in terms of area and energy consumption. With respect to the SECDED implementation, the overhead in area and energy consumption is on average around the 95, 52, and 29 percent for the FT(n), FT(n/2), and FT(n/4) architectures, respectively.

Fig. 10. Area, energy consumption, and critical path of the different LUT architectures, while varying the dimension of the NoC. In the case of the FT architectures, the maximum area and energy overheads can be observed for smaller NoCs. Overheads decrease with the dimension of the NoC.

due to the fact that the ratio between the number of data bits and the number of parity bits used for implementing the SECDED code increases with the number of nodes (in particular, for the configuration analyzed, the number of parity bits is equal to 6 for the 3  3 mesh, and to 7 for all the others). Moreover, the circuits for implementing the decoder and the encoder are shared by all the registers of the LUT, and their percentage overhead decreases as the number of registers in the LUT is increased, i.e., the number of nodes in the NoC. The calculated overhead can be considered however acceptable, in particular if compared to the area measured for the TMR architecture. In the case of the 8  8 mesh, our solution achieves a saving of up 50 percent, when considering the FT(n/4) implementation.

6.2 FIFOs In the case of the FIFO, we analyzed area and energy overhead while varying the number of slots. In the SECDED and the FT architecture, information stored into FIFOs is encoded with a ð42; 35Þ SECDED Hsiao code. In the case of the FT architecture, we present results for implementations able to skip m  1, m2 , and m4 faulty slots (named FT(m  1), FT(m/2)), and FT(m/4), respectively). m is the number of the total slots of the FIFO. Fig. 11 shows results obtained for the FIFO. The maximum overhead was obtained for smaller configurations, i.e., for the case of FIFOs with 4 slots. Values of maximum overhead are 97, 84, and 84 percent, respectively, for the FT(m  1), FT(m/2), and FT(m/4) implementations. The area overhead decreases with the number of slots (83, 73, and 64 percent for the three FT implementations, in the case of a 32-slot FIFO). With respect to the TMR implementation, the maximum saving in area (46 percent) is obtained in the case of a 32-slot FT(m/4) implementation. In the case of the FIFO, the maximum energy overhead with respect to the baseline implementation is 169 percent (32-slot FT(m  1)), while the maximum energy saving with respect to the TMR implementation is 38 percent (4-slot FT(m/4)). Fig. 10c shows the length of the critical path for the different FIFO implementations. As already observed for the case of the LUT, the use of SECDED and FT architectures increases the critical path of the component. The overhead in area and energy consumption of the FT implementations with respect to the SECDED implementation increases with the number of slots, and it was measured in a 32-slot configuration to be around 48 percent for the FT(m  1), 40 percent for the FT(m/2), and 31 percent for the FT(m/4) implementation.

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

25

TABLE 4 Area, Energy, and Critical Path Obtained by Synthesizing the Three Implementations of the FSMs

of FSM, FIFOs, and LUT. The SECDED NI employs SECDED versions of the FSM, FIFOs, and LUT, while a TMR implementation of the remaining components of the NI. The FT(min) configuration employs the FT(n/4) LUT and FT(m/4) FIFOs, while the FT(max) configuration the FT(n) LUT and FT(m  1) FIFOs. All the architectures employ 8-slot FIFOs. For the FSM, we considered the SECDED implementation, while the remaining NI components were implemented by using TMR. In general, the maximum area overhead can be observed for NoCs with smaller dimension, and it decreases when increasing the number of nodes in the topology. For a 3  3 NoC, the overhead was measured to be around 77 percent for FT(min) and 104 percent for FT(max). The maximum saving with respect to a TMR implementation was measured for the 8  8 NoC (48 percent for FT(min) and 24 percent for FT(max)).

7

Fig. 11. Area, energy consumption, and critical path of the different FIFO architectures, while varying the number of slots. In the case of the FT architectures, the maximum area and energy overheads can be observed for smaller FIFOs. Overheads decrease with the dimension of slots.

6.3 FSM Information in state registers of the FSMs in the SECDED implementation is encoded using a ð7; 3Þ SECDED Hsiao code. Table 4 shows synthesis results obtained for the implementation of the FSMs. The area of the TMR implementation is higher than the SECDED architecture (350 percent) and more than three times the baseline architecture. This is due to the fact that the implemented FSM drives relatively large signals, and the cost of the voters for those signals is higher than the one of the FSM itself. 6.4 Network Interface Fig. 12 shows synthesis results for the overall NI. The baseline NI was implemented by employing baseline version

SURVIVABILITY

To evaluate the effectiveness of the proposed solutions, we measured their survivability. The survivability is defined as the probability of producing correct behavior in the presence of faults. We create a high-level model of the NI and of its components in C++, based on the results of the Synopsys synthesis and on the results of a fault injection campaign similar to the one described in Section 3 on the different analyzed architectures. The high-level model of each component has been characterized by counting the number of faults producing their related errors. By taking into account the total area of the cells hit by the faults and producing the same error, we can evaluate the percentage of the error’s occurrence. Models are parametric and depend on the design parameters

Fig. 12. Area of different NI architectures, while varying the dimension of the NoC. The FT(min) configuration employs the FT(n/4) LUT and FT(m/4) FIFOs. The FT(max) configuration the FT(n) LUT and FT(m  1) FIFOs. Both FT NIs employs SECDED FSMs and remaining components implemented in TMR.

26

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

of the NI and the NoC, such as for instance the data width, the number of nodes, and the number of slots in FIFOs. The high-level model simulates the behavior of each component at the occurrence of a new permanent fault by evaluating whether it is able to survive the fault or if it produces an error. We evaluated the behavior of the system when a certain number of consecutive permanent faults affects it. By employing these models, we inject a defined number of faults in the component. Each single fault is injected in random position over the area, i.e., a specific component is hit with a probability which is proportional to its area, while the probability of an error in the component is proportional to the total area of the cells that, when faulty, produce that specific error. Injected faults are mutually independent, and we assume that enough time is left to the system to recover from the effects of the fault, if any. For each fault, we simulate the behavior of the component and determine if with the injected sequence of faults, it can be still considered error-free. For each defined number of injected faults, we repeated the experiment 10,000 times. We count the number of times over the total experiments in which, after the defined number of injected faults, the errors generated in the component were noncorrectable or nondetectable. When comparing the different architectures, we normalize the simulated results with respect to the area overhead of the evaluated solution [24] to obtain, for each number of injected faults, the same fault density in each evaluated architecture. Since the number of defects in a design is proportional in general to its area, the use of this metric for assessing the effectiveness of the protection techniques provides a fair comparison when comparing architectures with different area overheads. Therefore, the higher the survivability, the more resilient is the system to faults.

7.1 Lookup Table Fig. 13 shows the survivability for the LUT, by varying the number of injected faults, in the case of different NoC dimensions. Values of the survivability are normalized with respect to the area of the baseline architecture of the LUT. As the graphs shows, our solutions provide a survivability comparable to the TMR, while showing a significant improvement with respect to a SECDED implementation. For a relatively low number of injected faults, FT implementations are also able to provide better results than the TMR implementation. When the number of errors is too high, the TMR implementation of the NI provides, however, a better survivability. These results can be obtained thanks to the fact that FT implementations are able to deal with up to two faults as long as enough redundant resources are available, while TMR implementations will only mask one fault. For higher numbers of faults, the number of spare resources available will not be sufficient for masking them. As shown in Fig. 13, the behavior of the survivability of the FT implementations with respect to the TMR implementation varies with the dimension of the NoC (i.e., of the LUT) and with the amount of redundancy. This fact can be explained by considering that for LUTs with bigger dimensions, the probability of having a configuration leading to error-affected results for a given number of

Fig. 13. Survivability of the LUT for different NoC dimensions, while varying the number of injected faults. Values of survivability are normalized with respect to the area of the LUT’s baseline architecture.

injected faults is lower than in the case of smaller implementations. For a relatively low number of faults, the FT architectures provide a similar level of protection, while, when the number of faults increases, the FT architectures with higher redundancy (FT/n) is the one that presents the best survivability among them.

7.2 FIFOs Results of the analysis of the survivability for the FIFO architectures are shown in Fig. 14. In the figure, we show simulation results for different FIFO architectures by varying the number of slots and the number of faults injected. As obtained for the case of the LUT, for a relatively small number of injected faults, the FT architectures outperform the TMR implementation. As in the case of the LUT, the survivability of the FT solutions outperforms

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

27

Fig. 14. Survivability of the FIFO for different dimensions, while varying the number of injected faults. Values of survivability are normalized with respect to the area of the FIFO’s baseline architecture.

the one obtained for the SECDED implementation. In the case of the FIFO, the FT architecture using a smaller number of redundant slots (FT(m/4)) presents a better survivability. In FT(m  1) and FT(m/2), the increase in area overhead due to being able to use a higher number of redundant slots is not counterbalanced by the obtained increased resilience.

7.3 FSM Fig. 15 shows the results of the survivability analysis for the FSM. Contrarily to what could be expected, the SECDED implementation provides the highest survivability. This is due to the fact that the protection achieved by implementing the FSM in TMR is obtained at the cost of a high area overhead. The TMR FSM is, therefore, subjected to a high

fault probability, which is not counterbalanced by the provided protection.

7.4 Network Interface To evaluate the survivability of the overall NI, we explored its resistance to faults when varying the implementation of its components. We performed an exhaustive multiobjective design space exploration of the several alternative NI architectures obtainable by combining the different implementations of the NI components, as enumerated in Table 5. We evaluated area and survivability of the configurations for a fixed number of injected faults. In the table, BAS refers to the baseline implementation, while the meanings of the other terms are those previously presented in the paper. In the exploration, an NoC featuring 25 nodes (5  5) is used, while the number of slots of the two FIFOs of the NI is equal to 8. Fig. 16 shows the results of the exploration when imposing a number of injected faults equal to 3 and 5, equivalent to having a fault every 4,187 and 2,512 equivalent gates, respectively. The figure shows also the Pareto points of the exploration. The Pareto configurations are those that TABLE 5 Design Space for the NI

Fig. 15. Survivability of the FSM implementations, while varying the number of injected faults. Values of survivability are normalized with respect to the area of the FSM’s baseline architecture.

28

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,

8

VOL. 11,

NO. 1, JANUARY/FEBRUARY 2014

CONCLUSIONS

This paper presented a study on the implementation of fault-tolerant network interfaces for NoCs. By performing a fault injection campaign on the NoC, NIs, and routers, we demonstrated how the NI could be the main source of errors in the NoC, in particular when the number of nodes in the network increases. Moreover, we showed that the occurrence of permanent and temporary faults in the network interface could cause an unwanted behavior that may create unrecoverable situations in the NoC, such as deadlock or livelock conditions. We proposed and discussed a functional fault model for the NI based on the behavior on its main components, i.e., the lookup table, FIFOs, and the finite-state machines driving NI operations. We proposed new architectural solutions based on the use of error correcting and detecting codes and a limited amount of redundancy, and discussed policies for the reconfiguration of the components that should be applied at the detection of errors. In our experiments, we obtained a saving of up to 48 percent in the area overhead, as well as a significant energy reduction, with respect to an alternative standard hardware TMR implementation of the NI, while maintaining a similar level of robustness to faults.

ACKNOWLEDGMENTS Fig. 16. NI architecture exploration for different numbers of injected faults. Values of survivability are normalized with respect to the area of an NI implemented by employing baseline architectures for all its components.

in the design space exploration minimize the area and maximize the survivability of the NI. In the figures, we called  the point of the Pareto set with minimum area. Obviously, it corresponds to the NI architecture composed of the baseline implementation of all the components. As figures shows, its survivability is, however, 0 percent. ! is the Pareto point presenting the highest survivability. For three injected faults, ! is given by the NI architecture obtained when employing the FT(n/2) LUT, the FT(m/4) FIFOs, and by implementing the remaining components in TMR. For five injected faults, ! is given by the NI architectural configuration composed of the FT(n/4) LUT, the FT(m/4) FIFOs, and by implementing the remaining components in TMR. Among the set of the obtained Pareto points, we selected 1 the ones maximizing the product of survivability and area ( in the figures). For both three and five injected faults, the  point is represented by the NI configuration composed of the SECDED LUT, FT(m/4) FIFOs, SECDED FSMs, and a TMR implementation of the remaining components. This result can be explained by observing in Fig. 13 that, in the case of the LUT, the SECDED solution provides for a relatively small number of faults a survivability comparable to the one of the other studied fault-tolerant architectures. In should also be noticed that for the fault configurations presented, none of the NI architectures included in the Pareto sets employs TMR implementations for the LUT and the FIFOs. This result is, however, to be expected when observing in Figs. 13 and 14 the trend of the survivability of the single NI’s components.

This work was partially funded by the European Commission under Project MADNESS (No. FP7-ICT-2009-4-248424).

REFERENCES [1]

R. Marculescu, “Networks-on-Chip: The Quest for on-Chip FaultTolerant Communication,” Proc. IEEE CS Ann. Symp. VLSI, pp. 812, Feb. 2003. [2] J. Srinivasan and S.V. Adve, “RAMP: A Model for Reliability Aware MicroProcessor Design,” IBM Research Report RC23048, 2003. [3] I. Koren and C.M. Krishna, Fault Tolerant Systems. Morgan Kaufmann, 2007. [4] A. Ferrante, S. Medardoni, and D. Bertozzi, “Network Interface Sharing Techniques for Area Optimized NoC Architectures,” Proc. 11th EUROMICRO Conf. Digital System Design Architectures, Methods and Tools (DSD ’08), pp. 10-17, Sept. 2008. [5] L. Fiorin, L. Micconi, and M. Sami, “Design of Fault Tolerant Network Interfaces for NoCs,” Proc. 14th EUROMICRO Conf. Digital System Design (DSD ’11), pp. 393-400, 2011. [6] S. Murali, T. Theocharides, N. Vijaykrishnan, M. Irwin, L. Benini, and G. De Micheli, “Analysis of Error Recovery Schemes for Networks on Chips,” IEEE Design Test of Computers, vol. 22, no. 5, pp. 434-442, Sept./Oct. 2005. [7] A. Frantz, M. Cassel, F. Kastensmidt, E. Cota, and L. Carro, “Crosstalk- and SEU-Aware Networks on Chips,” IEEE Design Test of Computers, vol. 24, no. 4, pp. 340-350, July/Aug. 2007. [8] T. Lehtonen, P. Liljeberg, and J. Plosila, “Online Reconfigurable Self-Timed Links for Fault Tolerant NoC,” VLSI Design, vol. 2007, article 13, 2007. [9] Q. Yu and P. Ampadu, “Transient and Permanent Error CoManagement Method for Reliable Networks-on-Chip,” Proc. Fourth ACM/IEEE Int’l Symp. Networks-on-Chip (NOCS ’10), pp. 145-154, May 2010. [10] A. Ejlali, B. Al-Hashimi, P. Rosinger, S. Miremadi, and L. Benini, “Performability/Energy Tradeoff in Error-Control Schemes for on-Chip Networks,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 18, no. 1, pp. 1-14, Jan. 2010. [11] J. Kim, C. Nicopoulos, D. Park, V. Narayanan, M. Yousif, and C. Das, “A Gracefully Degrading and Energy-Efficient Modular Router Architecture for on-Chip Networks,” Proc. 33rd Int’l Symp. Computer Architecture (ISCA ’06), pp. 4-15, 2006.

FIORIN AND SAMI: FAULT-TOLERANT NETWORK INTERFACES FOR NETWORKS-ON-CHIP

[12] S. Rodrigo, J. Flich, A. Roca, S. Medardoni, D. Bertozzi, J. Camacho, F. Silla, and J. Duato, “Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing,” Proc. Fourth ACM/IEEE Int’l Symp. Networks-on-Chip (NOCS ’10), pp. 2532. May 2010. [13] A. DeOrio, D. Fick, V. Bertacco, D. Sylvester, D. Blaauw, J. Hu, and G. Chen, “A Reliable Routing Architecture and Algorithm for NoCs,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 5, pp. 726-739, May 2012. [14] K. Stewart and S. Tragoudas, “Interconnect Testing for Networks on Chips,” Proc. 24th IEEE VLSI Test Symp., pp. 100-107, 2006. [15] Y. Zou, Y. Xiang, and S. Pasricha, “Characterizing Vulnerability of Network Interfaces in Embedded Chip Multiprocessors,” IEEE Embedded Systems Letters, vol. 4, no. 2, pp. 41-44, June 2012. [16] V. Rantala, T. Lehtonen, P. Liljeberg, and J. Plosila, “Multi Network Interface Architectures for Fault Tolerant Network-onChip,” Proc. Int’l Symp. Signals, Circuits and Systems (ISSCS ’09), pp. 1-4, July 2009. [17] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar, “An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 29-41, Jan. 2008. [18] G.D. Micheli and L. Benini, Networks on Chips: Technology and Tools (Systems on Silicon). Morgan Kaufmann, 2006. [19] OCP-IP Assoc., Open Core Protocol Specification 2.2, 2008. [20] A. Radulescu, J.S. Pestana, O. Gangwal, E. Rijpkema, P. Wielage, and K. Goossens, “An Efficient on-Chip NI Offering Guaranteed Services, Shared-Memory Abstraction, and Flexible Network Configuration,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 1, pp. 4-17, Jan. 2005. [21] I. Loi, F. Angiolini, and L. Benini, “Synthesis of Low-Overhead Configurable Source Routing Tables for Network Interfaces,” Proc. Conf. Design, Automation Test in Europe (DATE ’09), pp. 262-267, Apr. 2009. [22] M. Coenen, K. Goossens, G. De Micheli, S. Murali, and M. Coenen, “A Buffer-Sizing Algorithm for Networks on Chip Using TDMA and Credit-Based End-to-End Flow Control,” Proc. Fourth Int’l Conf. Hardware/Software Codesign and System Synthesis (CODES+ISSS ’06), pp. 130-135, 2006. [23] J. Hu, U. Ogras, and R. Marculescu, “System-Level Buffer Allocation for Application-Specific Networks-on-Chip Router Design,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 12, pp. 2919-2933, Dec. 2006. [24] K. Constantinides, S. Plaza, J. Blome, B. Zhang, V. Bertacco, S. Mahlke, T. Austin, and M. Orshansky, “Bulletproof: A DefectTolerant CMP Switch Architecture,” Proc. 12th Int’l Symp. HighPerformance Computer Architecture, pp. 5-16, Feb. 2006. [25] T. Bengtsson, S. Kumar, and Z. Peng, “Application Area Specific System Level Fault Models: A Case Study with a Simple NoC Switch,” Proc. Third IEEE Int’l Workshop Electronic Design, Test and Applications, 2006. [26] P. Roche and G. Gasiot, “Impacts of Front-End and Middle-End Process Modifications on Terrestrial Soft Error Rate,” IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp. 382-396, Sept. 2005. [27] K. Pagiamtzis and A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712-727, Mar. 2006. [28] M.Y. Hsiao, “A Class of Optimal Minimum Odd-Weight-Column SEC-DED Codes,” IBM J. Research and Development, vol. 14, no. 4, pp. 395-401, July 1970. [29] S.-K. Lu and C.-H. Hsu, “Fault Tolerance Techniques for High Capacity RAM,” IEEE Trans. Reliability, vol. 55, no. 2, pp. 293-306, June 2006. [30] C. Concatto, D. Matos, L. Carro, F. Kastensmidt, A. Susin, E. Cota, and M. Kreutz, “Fault Tolerant Mechanism to Improve Yield in NoCs Using a Reconfigurable Router,” Proc. 22nd Ann. Symp. Integrated Circuits and System Design: Chip on the Dunes, pp. 26:126:6, 2009. [31] S. Niranjan and J. Frenzel, “A Comparison of Fault-Tolerant State Machine Architectures for Space-Borne Electronics,” IEEE Trans. Reliability, vol. 45, no. 1, pp. 109-113, Mar. 1996.

29

Leandro Fiorin received the master’s of engineering degree in embedded system design in 2004 from the University of Lugano (USI), Switzerland, the MS degree in electronic engineering, from the University of Cagliari, Italy, and the PhD degree from the Faculty of Informatics, USI, in 2012. He is currently a research associate at the Advanced Learning and Research Institute (ALaRI) on Embedded System Design, USI. Previously, he was also a contract researcher at USI, working on networks-on-chip and embedded systems architectures. His research interests focus on fault-tolerant and secure networks-on-chip and embedded systems, on-chip multiprocessors, reconfigurable systems. He is coauthor of several scientific papers on networks-on-chip, design methodologies for systems-on-chip, embedded system security, and of two patents on networks-on-chip security. He is a member of the IEEE. Mariagiovanna Sami received the Dr Ing degree from Politecnico di Milano, Italy, and subsequently, her Libera Docenza. She is currently a full professor at Politecnico di Milano, and has contributed to creating the Advanced Learning and Research Institute on Embedded Systems Design at Universita´ della Svizzera italiana, Switzerland, where she was the scientific director from 2000 to 2012. Her research interests have always focused on the area of digital systems design, with particular reference to design and testing techniques, defect and fault tolerance of complex systems, and to lowpower design of embedded systems. She has published more than 250 papers in international scientific journals, proceedings of international conferences, and chapters of books, and she is the coauthor of the books Fault-tolerance through Reconfiguration in VLSI and WSI Arrays and Low-Power Design of VLIW Architectures. She has obtained an international patent concerning power-saving CPU architectures. She has been a program committee member of a number of IEEE international conferences, chaired several international conferences (e.g., particular the 1983 Fault-Tolerant Computing Symposium, the 1993 DFT Symposium, the 1999 IJCNN). She has been the editor-inchief (jointly with Professor Lutz Richter of Zurich University) of the Journal of System Architecture and has been a member of the board of editors of the IEEE Transactions on Computers and IEEE Design and Test, as well as of the advisory board of IEEE Computer. She is a member of the board of editors of the Journal of Electronic Testing: Theory and Applications published by Kluwer International. She is a member of the Italian National Academy of Sciences (Dei Quaranta) and a lifetime member of the IEEE.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Suggest Documents