Low Power Heterogeneous 3D Networks-on-Chip ... - IEEE Xplore

14 downloads 0 Views 524KB Size Report
Glasgow Caledonian University, Glasgow, UK. [email protected]. ABSTRACT. Three dimensional Network-on-Chip (3D NoC) architec- tures have ...
Low Power Heterogeneous 3D Networks-on-Chip Architectures Michael Opoku Agyeman , Ali Ahmadinia, Alireza Shahrabi School of Engineering and Computing Glasgow Caledonian University, Glasgow, UK [email protected] nology has been proposed to stack vertically several of 2D silicon layers. The layers in the 3D IC provide shorter and more efficient interlayer wires with lower power consumption and larger resource connectivity compared to that of long interconnect wires in a 2D SoC implementation with equivalent number of resources [6]. However, the practical issue of 3D IC design is still a substantial question that has attracted a lot of interest over the past few years.

ABSTRACT Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the onchip communication delays of modern SoC systems. In this paper we propose low power heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D mesh topologies. Experimental results show a negligible penalty of up to 5% in average packet latency of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers. The heterogeneity however provides superiority of up to 67% and 19.7% in total crossbar area and power efficiency of the NoC resources, respectively compared to that of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers.

KEYWORDS:

Networks-on-Chip,

To meet the constraints of SoC design in the third dimension, Network-on-Chip (NoC) has been proposed as a feasible solution. In NoC, the idea of macro network communication is adopted. Here, communication among processing elements is achieved by sending packets of data in a circuit switched or packet switched manner using switches, routers and links but, with a limited amount of resources due to onchip constraints. NoCs are scalable and help in providing configurable parallelism and controlling the interlayer vertical interconnects (TSVs) and intra-layer wires [11]. Combining NoCs and 3D integration technology (3D NoCs) introduces new opportunities and design challenges. One of the main design challenges is TSV manufacturing which is an expensive and complicated process [16]. However, most of the works on 3D NoCs in literature assume full vertical connectivity, which may not be optimal for some applications.

3D-Integration,

Multi-core Architectures

1. INTRODUCTION Recently, System-on-Chip (SoC) has emerged as one of the most interesting solutions in embedded applications [3]. SoC consists of several components such as processors, peripherals, memory blocks and power management circuits on a single integrated circuit (IC). Some of the main issues with SoC design are the non-scalable wire delays and power consumption of the on-chip communication infrastructure [13]. These issues have attracted a lot of research over the past few years, such as the maximum number of cores per shared bus, efficient application mapping, reliability and efficient arbitration for accessing shared bus [5].

This paper presents an investigation of alternate architectures for low power 3D NoC implementation by employing 2D routers and single hop 3D NoC-bus hybrid routers for interlayer communications. It is obvious that limiting the connectivity among router would result in higher delay in conveying packets to their destinations. Hence, the second contribution to our work is the performance analysis with detailed comparison of the average packet latency of the considered architectures. The paper is organized as follows. Section presents a brief description of the related work. Section discusses the details of the considered 3D NoC architectures.The proposed heterogeneous architectures are explained in Section . In Section , we present the imple-

To optimize the long interconnect wires and larger footprint of SoC design on the traditional 2D IC, 3D IC design tech-

978-1-61284-383-4/11/$26.00 ©2011 IEEE

533

mentation details and the experimental analysis. Finally, we conclude this work in Section .

local PE via ports. Figure 1, illustrates a 2D router with 5 ports, four of which connects to other routers in the north, south, east and west direction. A port may consist of several channels which are composed of bidirectional data links with dedicated buffers.

2. RELATED WORK System-on-Chip design is an active research area and several papers have been presented in literature [10, 12, 14, 15] addressing 3D technology in NoCs. The topology of a network affects its performance and power consumption. In [13] evaluation strategies for comparing the performance and characteristics of NoC architectures under various NoC topologies are presented. The authors of [2] present a survey of current 3D integrated circuit fabrication technologies. It is presented that, Waferto-wafer bonding technology where the vertical interconnects are implemented using Through Silicon Vias (TSVs), is a popular and feasible 3D integration technology [9]. In [8], it is demonstrated that, in the famous Face-To-Back 3D IC manufacturing methodology, interlayer vias traverse active silicon layers. This implies increasing the number of TSVs reduces the number of on-chip resources. Thus, it is critical to find alternate 3D interconnect designs with reduced TSVs to compensate for the manufacturing cost with a minimal performance trade-off.

Figure 1. An Example of a 2D NoC Router with 5 Ports

A typical 3D NoC is implemented from the conventional 2D mesh NoC by placing several layers on top of each other and adding two ports to the 2D router to compensate for communication in the up and down directions. Adding two ports to a 2D router with 5 ports results in a 7 port router which requires a 7 ⇥ 7 crossbar to convey packets between that input and output channels. 3D routers employed in 3D mesh are referred as symmetric 3D routers because of the symmetry along the axis.

In [17] an evaluation of the impact of TSV to 3D NoC design presented, where 3D NoC with 5 layers are modeled on a Chip Multiprocessor(CMP) with processors on the first layer and shared cache memories on other layers. An algorithm for routing packets in a 3D mesh with limited vertical links is presented in [1]. Both [17] and [1] performed their analysis using 2D routers and symmetric 3D routers. However the symmetric 3D routers have a larger chip area and introduces hop by hop message passing in the third dimension which increases the average packet delay.

In order to reduce power consumption, a hybrid router is proposed [8] by saving a port compared to symmetric 3D router. An example of a 3D NoC bus hybrid router with 6 ports is shown in Figure 2. Unlike the multi-hop commu-

Our contribution is to explore low power heterogeneous 3D NoC topologies with reduced number of vertical interconnects by employing small footprint 5 port 2D routers to provide intra-layer hop by hop communication and 6 port 3D NoC-bus hybrid routers to provide single hop interlayer communication. We also present the average packet latency analysis associated with the heterogeneous 3D topologies.

3. 3D NoC ARCHITECTURES

Figure 2. An Example of a 6 Port 3D NoC Bus Hybrid Router

A 2D mesh consist of several tiles connected in a grid-like fashion. Each tile is made up of a router and processing elements (PE) connected via a network interface (NI). Each router is connected to four neighboring routers and to the

nication provided by the 3D symmetric routers, 3D NoCbus hybrid routers provide single hop communication in the

534

4.

HETEROGENEOUS 3D NOC ARCHI-

TECTURES In this section, we introduce various vertical interconnect patterns used in our investigation of latency and power consumption in heterogeneous 3D NoC architectures. The vertical interconnects patterns are provided into the NoC architecture by specifying the location of the bus links between 2D layers. Investigated heterogeneous patterns are as follows: • OE-2D3D: 3D routers are uniformly distributed along columns of each plane. As shown in Figure , The uniformity is achieved by placing 2D and 3D routers on even and odd columns respectively in each 3D layer.

Figure 3. An Example of 4 ⇥ 4 ⇥ 4 3D Mesh

Figure 4. OE-2D3D Interconnect Pattern with 50% of 3D Routers for n ⇥ n ⇥ n 3D Mesh

third dimension. Thus taking advantage of the negligible packet delays provided by the TSVs. An example of a 3D NoC with 64 hybrid routers is illustrated in Figure 3. The presence of a bus implies that concurrent communication in the third dimension is prohibited as only one flit can traverse the bus at a time. Single flit traversal implies high packet latencies under heavy traffic conditions. However, in [8] it is stated that contention is not really an issue until 9 silicon layers are used in a hybrid NoC. 3D NoC-bus hybrid crossbar switch has 46% improvement in terms of power consumption than that of a symmetric 3D router implemented using the same transistor technology. Also a symmetric 3D router consumes more than 120% of the power consumed by a conventional 2D router [7]. This values has a significant contribution to the total area and power consumption of the NoC resources, considering the fact that a typical multi-core NoC design may require several of these routers. Hence we do not consider the symmetric 3D router in our quest to explore low power heterogeneous 3D NoC architectures. An appropriate mix of 2D routers and 3D NoC-bus hybrid routers in 3D topologies will lead to a new architectural framework with reduced vertical interconnect, manufacturing cost and power efficiency in both application specific and general purpose SoC design.

• Diagonal: 3D routers are distributed along the diagonal of each layer, resulting in a total of 25% 3D routers in regular 3D architectures. Figure 5 illustrates two layers of a diagonal architecture in a 4 ⇥ 4 ⇥ 4 mesh.

Figure 5. Diagonal Interconnect Pattern with a Total of 25% 3D Routers

535

• Periphery: 3D routers are distributed along the peripherals of each layer, resulting in a total of 75% 3D routers in regular 3D architectures. Figure 6 illustrates two layers of a periphery architecture in a 4 ⇥ 4 ⇥ 4 mesh.

different distribution patterns of 2D router and 3D NoCbus hybrid routers and 3D mesh architecture with full vertical interconnect, in terms average packet latency and power consumption. The experimental analysis is performed using a cycle accurate simulation platform which was implemented by augmenting W orm sim [4], an existing 2D NoC simulator which employs wormhole packet switched routing and extending the routing schemes and traffic pattern as described in below. For evaluating power consumption of the NoC resources, we use Ebit energy model to calculate the energy consumption [18]. Power consumption is then calculated from the energy values and the simulation length. 3D architecture is constructed by placing several 2D grids in each layer and introducing a vertical bus pillar for communication in the third dimension. The routers employed are 3D NoC-bus hybrid routers with 6 ⇥ 6 crossbar and 2D routers with 5 ⇥ 5 crossbar architectures. with 40 Bytes of data per packet, we simulate various packet injection rates in 4 ⇥ 4 ⇥ 4 heterogeneous NoC architectures and fully connected 3D routers under 3D mesh topology. The setup is run for a warm-up period of 2000 cycles to allow enough flits to be introduced into the network and performance statistics are collected after a simulation period of 20, 000 cycles. To allow fairness and consistency in the analysis, the same simulation scenario is used for all the architectures.

Figure 6. Periphery Interconnect Pattern with a Total of 75% 3D Routers • mcolumn: In mcolumn pattern, we place 2D routers along (n - m) number of columns adjacent to each other in every layer. Where, n is the total number of columns and m can range between 1 to n - 1. For a 4 ⇥ 4 ⇥ 4 3D mesh, the available options for m are 1, 2 and 3 giving a total of 25%, 50% and 75% of 3D routers. For obvious reasons, we name these patterns onecolumn, twocolumns and threecolumns. Twocolumns architecture only two layers of a 4 ⇥ 4 ⇥ 4 mesh is illustrated in Figure 7.

We selected the XYZ dimension ordered routing (DOR) based deterministic routing and modified it to fit the heterogeneous 3D topologies. Flits are XY routed from their source to destination, if the destination and source are in the same layer. If the flits are destined for nodes in other layers, they are routed along the vertical bus that provide the shortest Manhattan distance between the source and destination nodes. To distribute traffic more evenly among all possible 3D routers, when more than one vertical bus provides shortest path, one is randomly selected. To clearly analyse the effect of heterogeneity on all nodes, we perform our experimental analysis under uniform traffic patterns, where each node in the NoC has equal probability of receiving packets. As illustrated in Figure 8, introducing heterogeneity comes with a penalty. This is because flits in a 3D mesh have more connectivity between source and destination nodes. It can be seen that 3D mesh NoC can sustain a higher injection rate compared to the other networks. It can also be seen that periphery architecture outperform the other heterogeneous architecture with a sustainable injection rate within 5% difference of that of 3D mesh. This is because like threecolumns, periphery has 75% of its routers being 3D. However, for a 4 ⇥ 4 ⇥ 4 mesh under uniform traffic, 3D routers in periphery architecture are more evenly distributed

Figure 7. An Example of a 4 ⇥ 4 ⇥ 4 Twocolumns Interconnect Pattern with a Total of 50% 3D Routers

5. EXPERIMENTAL RESULTS In this section, we present details of our experimental setup for evaluating heterogeneous 3D NoC architectures with

536

and periphery respectively compared to that of 3D mesh. On the other hand, in terms of normalized latency, 3D saves only 5% and 7% of average packet latencies compared to that recorded for periphery and threecolumns architectures. Architectures with a total of 25% 3D routers: onecolumn and diagonal, have the highest power savings of 67% and 65% compared to the power consumption of 3D mesh. Next in line are architectures with a total of 50% 3D routers: twocolumns and OE-2D3D. With twocolumns and OE-2D3D, we achieved power savings of 53% and 59% respectively, compared to that of 3D mesh.

Figure 8. Average Packet Latency of Different 3D NoC Architectures around the 2D routers with single hop in the x or y direction between any 2D router and a 3D router. Whereas in threecolumns there is only a single hop in the east direction from a 2D router to a 3D router. Comparatively, periphery provides more options for shortest vertical paths for flits originating from 2D routers to other layers. With the enhanced XYZ routing randomly selecting among vertical buses with shortest paths, the average packet latency is reduced. This is a necessary enhancement because if a particular bus is selected as the shortest path all the time, flits would have a longer wait period of arbitration, increasing the average packet latency. Though the saturation points between 3D mesh and periphery is not that significant, this value becomes more significant as the number of 3D routers and uniformity in the distribution of 3D routers reduces. The sustainable injection rates is very useful in NoC analysis as the average packet latency of packet between their sources and destinations increases exponentially after these points. Though they recorded the highest power savings, diagonal and onecolumn can sustain the lowest injection rate of about 20%.

Figure 9. Normalized Power Consumption of Heterogeneous 3D NoC Architectures Besides saving power by introducing heterogeneity, we also achieved superiority in area efficiency. As shown in Table area efficiency with values of 19.7%, 13.2% and 6.6% with heterogeneous architectures of 25%, 50% and 75% of hybrid routers respectively. These values were analytically achieved by adopting parameters from a 90nm crossbar area estimation for a 2D and 3D hybrid router presented in [7]. The area saved by introducing heterogeneity can be used to improve SoC design by accommodating other functional blocks.

The normalized power consumption for all the different architectures is shown in Figure 9. It can be easily deduced that 3D mesh architectures with only 3D routers (3D) has the poorest power consumption. This is because there are more vertical links, ports and central arbiters which consume more energy. By introducing heterogeneity we achieved normalized power consumption of 33%, 35%, 41%, 47%, 68%, 89% for diagonal, onecolumn, twocolumns, OE-2D3D, threecolumns and periphery architectures respectively. More connectivity implies more switching activities, link traversal and arbitration. Hence, higher power consumption. For this reason, we achieved power savings of savings 32% and 11% with threecolumns

Table 1. Area Efficiency of 4 ⇥ 4 ⇥ 4 Heterogeneous 3D Mesh Architectures Pattern Efficiency compared with 3D mesh Onecolumn 19.7% Twocolumns 13.2% Diagonal 19.7% Threecolumns 6.6% Periphery 6.6% OE-2D3D 13.2% 3D 0%

537

Potential optimized heterogeneous 3D NoC architecture is periphery, which has 10 20% power and area efficiency with a negligible average packet latency of 5% compared to a 3D mesh. It is however up to the designer to decide on the architecture which gives the optimal performance and power trade-offs of a particular system of interest.

[7] J. Kim, C. Nicopoulos, D. Park, R. Das, Y. Xie, V. Narayanan, M. S. Yousif, and C. R. Das. ”A novel dimensionallydecomposed router for on-chip communication in 3D architecture,” SIGARCH Comput. Archit. News, 35(2):138–149, 2007. [8] F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir. ”Design and Management of 3D Chip Multiprocessors Using Network-in-Memory,” In International Symposium on Computer Architecture (ISCA), pages 130–141, 2006.

6. CONCLUSION In this paper, we have presented low power heterogeneous 3D NoC architectures for modern SoC systems based on 3D NoC-bus hybrid router and conventional 2D router distributions. Different router distributions and vertical interconnect patterns were modeled and studied under 3D mesh topology with a cycle accurate simulator under uniform traffic pattern. Experimental analysis shows that power savings ranging from 11% to as high as 67% of that of 3D NoCs ciculd be achieved with reduced number of TSVs. By adopting hybrid routers, better power efficiency is obtained compared to equivalent architectures with symmetric routers. On the other hand, latency difference of a close as 5% of that of 3D mesh was recorded with heterogeneity. Based on our evaluation, it can be confirmed that the right choice of heterogeneous 3D NoC architecture would facilitate in reducing the total power consumption with an insignificant penalty in NoC performance.

[9] I. Loi, F. Angiolini, and L. Benini. ”Supporting vertical links for 3D networks-on-chip: toward an automated design and analysis flow,” In Proceedings of the 2nd international conference on Nano-Networks (Nano-Net), pages 1 – 5, ICST, Brussels, Belgium, Belgium, 2007. [10] R. Marculescu, U. Ogras, L.-S. Peh, N. Jerger, and Y. Hoskote. ”Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28(1):3 –21, 2009. [11] G. D. Micheli and L. Benini. NETWORKS ON CHIPS: TECHNOLOGY AND TOOLS, Morgan Kaufmann, 2006. [12] J. Owens, W. Dally, R. Ho, D. Jayasimha, S. Keckler, and L.-S. Peh. ”Research challenges for on-chip interconnection networks,” IEEE Micro, 27(5):96 – 108, 2007. [13] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. ”Performance evaluation and design trade-offs for networkon-chip interconnect architectures,” IEEE Transactions on Computers, 54(8):1025–1040, 2005.

REFERENCES

[14] S. Pasricha and N. Dutt. ON-CHIP COMMUNICATION ARCHITECTURES: SYSTEM ON CHIP INTERCONNECT, Morgan Kaufmann, 2008.

[1] K. S. Alexandros Bartzas and D. Soudris. NETWORKS-ONCHIPS THEORY AND PRACTICE, pages 1–28. Taylor and Francis Group, LLC, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742, 2009.

[15] C. Seiculescu, S. Murali, L. Benini, and G. De Micheli. ”SunFloor 3D: A tool for Networks On Chip topology synthesis for 3D systems on chips,” In Design, Automation Test in Europe Conference (DATE) pages 9–14, 2009.

[2] E. Beyne and B. Swinnen. ”3d system integration technologies,” In IEEE International Conference on Integrated Circuit Design and Technology (ICICDT), pages 1 – 3, 2007.

[16] D. Velenis, M. Stucchi, E. Marinissen, B. Swinnen, and E. Beyne. ”Impact of 3d design choices on manufacturing cost,” In IEEE International Conference on 3D System Integration (3DIC), pages 1 – 5, 2009.

[3] C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh. ”Timing analysis of network on chip architectures for MP-SoC platforms,” Microelectronics Journal, 36(9):833–845, 2005. [4] J. Hu and R. Marculescu. ”Energy- and performance-aware mapping for regular NoC architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(4):551–562, 2005.

[17] T. Xu, P. Liljeberg, and H. Tenhunen. ”A study of Through Silicon Via impact to 3D Network-on-Chip design,” In 2010 International Conference On Electronics and Information Engineering (ICEIE), pages 333 – 337, 2010.

[5] A. Jantsch and H. Tenhunen. NETWORKS ON CHIP, Kluwer Academic Publishers, 2003.

[18] T. T. Ye, G. D. Micheli, and L. Benini. ”Analysis of power consumption on switch fabrics in network routers,” In Proceedings of the 39th annual Design Automation Conference (DAC), pages 524 – 529, 2002.

[6] J. Joyner, R. Venkatesan, P. Zarkesh-Ha, J. Davis, and J. Meindl. ”Impact of three-dimensional architectures on interconnects in gigascale integration,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 9(6):922 – 928, 2001.

538