Fault-tolerant Router for Highly-reliable Many-core 3D-NoC Systems

1 downloads 0 Views 415KB Size Report
architecture and physical design of a reliable 3D-NoC system, called .... of Tokyo, Japan, in Collaboration with Synopsys, Inc. and Cadence Design Systems, Inc.
Fault-tolerant Router for Highly-reliable Many-core 3D-NoC Systems Abderazek Ben Abdallah, Mitsuhiro Nakamura, Akram Ben Ahmed, Yuichi Okuyama The University of Aizu

Graduate School of Computer Science and Engineering Aizu-Wakamatsu 965-8580, Japan Abstract: As 3D-NoCs keep showing their ability to provide high bandwidth and lowpower interconnect, their reliability is one of the main concerns that started significantly to arise during the past few years. This is due to their higher vulnerability to failures that may cause unacceptable performance degradation or even entire system failure. As a consequence, a lot of research has been conducted in order to make these systems immune to any short-term malfunction or permanent physical damage while minimizing the performance degradation as much as possible. Responding to the issues previously stated, in this paper we present the architecture and physical design of a reliable 3D-NoC system, called 3D-Fault-Tolerant-OASIS (3D-FTO). The 3D-FTO system is based on a novel robust 3D-router that manages to avoid the system failure at the presence of a large number of hard faults and addresses the fault occurrence in links, input-buffers, and the crossbar, where the faults are more often to happen. Keywords: 3D-NoC, Reliability, Architecture, Physical design) I. 3D-Fault-Tolerant-OASIS Router Architecture: As shown in Fig.1, the proposed 3D-FTO router contains seven input-ports, a switch-allocator, a crossbar, and a Fault-Control-Module (FCM). In the next paragraphs, we explain the enhancements added in 3D-FTO router including, first, the efficient routing algorithms to tackle the failure in links; second, the Random-Access-Buffer (RAB) for deadlock-recovery and fault-tolerance with the help of the Traffic-Prediction-Unit (TPU) in the input-buffer; third, the Bypass-Link-on-Demand (BLoD) approach to handle multiple faulty channels in the crossbar. Finally, the FCM module which is responsible for the assignment and control of the different detection and recovery tasks to the previously mentioned techniques. 1. Efficient fault-tolerant routing algorithms Look-Ahead-Fault-Tolerant (LAFT) routing algorithm [2] is proposed to tackle the fault occurrence in inter-router links and TSVs. LAFT takes advantage of look-ahead routing, to perform the routing decision for the next node and select the best minimal path while taking into consideration its link status. When evaluating the performance of LAFT, it provides 46% higher throughput than conventional routing algorithms [2]. Later, LAFT was optimized by combining both look-ahead and local routing for better routing decision. The optimized routing algorithm is named Hybrid-Look-Ahead-Fault-Tolerant routing algorithm (HLAFT) [1]. At every incoming flit, HLAFT makes a simple computation to judge whether the pre-calculated Next-port identifier will lead to a blocking path or not. In the case where a possible nonminimal route might occur, HLAFT re-computes the route depending on the local and neighboring nodes fault status. HLAFT provides 11.2% higher throughput than LAFT [1]. 2. Random-Access-Buffer mechanism Random-Access-Buffer mechanism (RAB) was initially proposed to solve the deadlock problem that can occur with LAFT and HLAFT [1, 3]. Most of the existing 3D-NoC systems use either Virtual-channels (VCs) or add restrictions to the routing selection to avoid deadlock. In our case, RAB is a similar technique to VCs, but it is much simpler and less complex. RAB was extended to be able to recover from transient, intermittent, and permanent faults in the inputbuffer [3]. When a fault is detected in one of the slots, the main controller (RAB-cntrl in Fig. 2) will take into consideration the flagged slots when assigning the write and read addresses, while remains to check the flagged slots whether their faults were recovered or not. 3. Traffic-Prediction-Unit (TPU):

In order to enhance the performance and fully exploit the entire router resources, we opted for a technique that allows sharing the input-buffers resources among all the input-ports. This means that when an input-buffer is not able to host more flits due to failures or congestion, it can redirect the incoming flits to another neighboring empty input-buffer to quickly forward them to their destination. The proposed Traffic-Prediction-Unit (TPU) technique collects information about the traffic load of each input-port via monitoring probes (labeled P in Fig.1) at a specified time interval. With this information, the TPU can decide the best input-port to host the faulty inputbuffer's flits without creating any considerable congestion. 4. Bypass-Link-on-Demand (BLoD): The Bypass-Link-on-Demand mechanism (BLoD) [3], provides additional escape channels whenever the number of faults in the baseline 7x7 crossbar increases. The ctrl unit (shown in Fig. 3), manages to check the crossbar link status. In the case where a fault is detected in one or several links, it sends flags to the Fault-control-module which disables the faulty crossbar links and enables the appropriate number of bypass channels. The number of bypass-links is very important. Any needless bypass-link results in unnecessary power and area overhead. Therefore, we decided to perform an incremental approach, where we analyze the used benchmarks and the assumed fault-rates, and we increment the number of bypass-links until the performance is steady or almost unchanged. From the evaluation results, three seemed to be the appropriate number of bypass-links necessary to provide the optimal trade-off between hardware cost and performance. II Evaluation Results The proposed 3D-Fault-Tolerant-OASIS (3D-FTO) router was designed in Verilog-HDL, and synthesized using 45nm technology library [5]. For the TSV integration, we used FreePDK3D45 kit compiler [6]. Table I shows the simulation configuration while Table II illustrates the different parameters adopted for the hardware design of the proposed router including the different TSV properties adopted. We built a system based on the 3D-router in hardware. The performance evaluation shows that the system provides better throughput than conventional systems that can reach the 51% and graceful performance degradation at high faultrates. Due to space limitation, Figure 2 shows only one evaluation of the throughput with Matrixmultiplication benchmark. Table III shows the hardware design results where we can see that the proposed 3D router is designed on a 300μm x 300μm chip and the TSV array (164 TSVs) occupies 63% of the total area. Figure 4 depicts the proposed router. This router currently deals with hard-faults only. Further research about efficient soft-faults handling mechanism is under investigation. ACKNOWLEDGEMNTS This Work is supported by Competitive Research Funding of the University of Aizu. Ref. P12-2014. It is also supported by VLSI Design and Education Center (VDEC), the University of Tokyo, Japan, in Collaboration with Synopsys, Inc. and Cadence Design Systems, Inc. References [1] A. Ben Ahmed, A. Ben Abdallah, ''Architecture and Design of High-throughput, Low-latency and Fault Tolerant Routing Algorithm for 3D-Network-on-Chip'', Jnl. of Supercomputing, 66(3):1507-1532, December 2013. [2] A. Ben Ahmed, A. Ben Abdallah, “Graceful deadlock-free Fault-tolerant Routing Algorithm for 3D-Network-on-Chip Architectures'', Jnl. of Parallel and Distributed Computing. 74 (4) 2229-2240, 2014. [3] A. Ben Ahmed, A. Ben Abdallah, “LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture”, the IEEE 6th Int. Symp. on Embedded Multicore SoCs (MCSoC-12), pp. 167–174, 2012. [4] Kenichi Mori, A. Ben Abdallah, OASIS Network-on-Chip Prototyping on FPGA, '''Master's Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 19KM-MT11. [5] Nangate 45nm open cell library, http://www.nangate.com. [6] FreePDK3D45, http://www.eda.ncsu.edu/wiki/FreePDK3D45:Contents

Figure.1: 3D-Fault-tolerant OASIS-NoC router architecture

Figure 2: Random-Access-Buffer mechanism block diagram

Table I: Simulation configuration Matrix-multiplication JPEG, Benchmarks Transpose, Uniform Network size 3x6x6, 3x3x3, and 4x4x4 Packet size 1~4 flits Flit size 34 bits Buffer depth 4 # injected flits 100~100.000

Figure3: Bypass-Link-on-Demand block diagram

Table II: Hardware design parameters Nangate 45nm, Technology FreePDK3D45 Clock 434 MHz Voltage 1.1 V #TSVs 164 TSV size 4.06 µm X 4.06 µm TSV Pitch 10 µm Keep-out-zone 15 µm

Figure.4: Throughput evaluation with Matrix-multiplication application

Table III: Hardware design results Number of pins 579 Power (mW) 4.42 Chip size 300μm X 300μm Router 0.018 (32%) 2 Area (mm ) TSV 0.04 (68%)

Figure.3: 3D-Fault-tolerant OASIS-NoC router layout

Suggest Documents