DTBR: A dynamic thermal-balance routing algorithm for Network-on ...

8 downloads 1296 Views 1MB Size Report
a State Key Laboratory of Integrated Service Networks, Xidian University, Xi'an 710071 ... (DTBR) algorithm for Network-on-Chip, which can solve both of the two ...
Computers and Electrical Engineering xxx (2012) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip Feiyang Liu a, Huaxi Gu a,⇑, Yintang Yang b a b

State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an 710071, China Institute of Microelectronics, Xidian University, Xi’an 710071, China

a r t i c l e

i n f o

Article history: Received 4 June 2010 Received in revised form 16 December 2011 Accepted 16 December 2011 Available online xxxx

a b s t r a c t Network-on-Chip (NoC) replaces the traditional bus-based architecture to become the mainstream design methodology for future complex System-on-Chip (SoC). It introduces the principles of packet switching and interconnection network into SoC design, and achieves much better performance for its high bandwidth, scalability, reliability, etc. However, thermal problem, such as regional temperature differential and hotspot, is still one of the main designing constraints. This paper proposes a dynamic thermal-balance routing (DTBR) algorithm for Network-on-Chip, which can solve both of the two thermal problems. DTBR is a minimal adaptive routing algorithm based on an architectural thermal model. An efficient thermal-aware router is designed to implement the DTBR algorithm. According to the simulation results, the proposed DTBR algorithm can make the network thermal distribution more uniform and hotspot temperature is cut down about 20% in different traffic patterns. Moreover, DTBR will bring a profit for the performance of packet delay and network throughput compared with other routing algorithms. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction As the semiconductor technology develops constantly, more and more transistors can be integrated into a single chip [1,2]. Many billion transistors era is coming now. This highly integrated on-chip system faces dramatic thermal problem because of the continuously increasing on-chip power density and cooling cost. On-chip thermal problem has a great influence on system performance, such as increasing latency and power leakage, reducing system reliability, etc. [3]. Network-on-Chip replaces the traditional bus-based architecture to become the mainstream design methodology for future System-on-Chip [4–7]. It introduces the principles of packet switching and interconnection network into SoC design. Long wire delay, low throughput and poor scalability occurred in the traditional bus-based architecture are all properly solved in the NoC architecture. Network-on-Chip achieves much better performance owing to its path diversity, scalability, reliability, etc. However, thermal problem is still one of the main designing constraints for NoC [8]. There are two main kinds of thermal problems in Network-on-Chip, including regional temperature differential and hotspot. Regional temperature differential is caused by the thermal unbalanced distribution in the network. It makes link latency and gate latency hard to predict thus increasing the possibility of system synchronization failure. Hotspot is the node whose temperature is much higher than the others’ in the network. A hotspot is formed when processing too much data and generating large amount of dynamic power consumption. The hotspot node will easily get damaged for its high temperature. Both regional temperature differential and hotspot decrease system reliability and infect system performance. Firstly, an architectural thermal model for Network-on-Chip is proposed in this paper, in which we take both the energy transformation and thermal conductivity properties into consideration. This thermal model is efficient enough to achieve ⇑ Corresponding author. E-mail address: [email protected] (H. Gu). 0045-7906/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.compeleceng.2011.12.006

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

2

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

on-chip thermal management. According to the analysis of the thermal model, we design a dynamic thermal-balance routing algorithm for NoC, DTBR. In order to achieve low packet delay, the DTBR algorithm uses minimal adaptive routing. An escaping virtual-channel is used to keep the routing algorithm deadlock-free. An efficient thermal-aware router to implement the proposed routing algorithm is also designed. Compared with the typical on-chip router, only negligible control logic cost is needed. As the simulation results illustrate, the DTBR algorithm can achieve better performance both in the thermal management and data transmission. It reduces hotspot temperature about 20%, improving the performance of packet delay and network throughput in different traffic patterns compared with XY and the turn model based routing algorithms. The main contributions of this paper include three aspects. (1) We propose an architectural on-chip thermal model, which takes both the energy transformation and thermal conductivity properties into consideration. (2) Based on the thermal model, a dynamic thermal-balance routing algorithm, DTBR, is proposed. It achieves good performance in thermal balance and data transmission. (3) We design an efficient thermal-aware router to implement the DTBR algorithm. The rest of this paper is organized as follows. In Section 2 we discuss the related works on NoC and thermal management mechanisms. In Section 3, we describe an architectural thermal model for Network-on-Chip. In Section 4, the dynamic thermal-balance routing algorithm, DTBR, is discussed in details, together with the thermal-aware router. In Section 5, we use a cycle-accurate NoC simulator to evaluate the proposed DTBR algorithm. The simulation results are analyzed in Section 6. Finally, in Section 7 we conclude the paper. 2. Related works 2.1. Network-on-Chip architecture Fig. 1 illustrates a basic mesh-based Network-on-Chip architecture [9]. The main components of NoC include IP cores, network interface (NI), routers and physical links. IP cores are some function units in the digital system, such as CPU, DSP, memory and I/O units. Network interface ensures the heterogeneous IP cores with different protocols communicating transparently. NI also conducts packet encapsulation, end-to-end flow control and packet reordering. Routers connected by the physical links are the most important components in the NoC architecture. They guarantee efficient data communication between any two IP cores in the system. Routing algorithm determines the route of each packet transmitting from its source to the destination [10]. It is critical to the performance of Network-on-Chip. NoC routing algorithms can be divided into deterministic routing and adaptive routing. In the deterministic routing, the route of each packet is determined only by its source and destination addresses, while the adaptive routing algorithm chooses the path according to the network condition dynamically. Recent proposed adaptive routing algorithms include: the turn model based routing [11], odd–even routing [12], and DyAD routing [13], etc. In [14], the authors propose a table-based adaptive minimal routing algorithm for NoC. Virtual-channel router is one of the promising router architectures for NoC owing to its low latency and high throughput. Dally and Towles proposed the basic virtual-channel router architecture in interconnection networks [15]. They point out that the virtual-channel router works in a pipelined fashion. In [16], the authors propose a low latency virtual-channel router using speculative arbitration and look-ahead routing methods. Deadlock problem in adaptive routing algorithm can be solved by properly allocating the virtual channels [17].

R

R NI Core

NI Core R

R

R NI Core

R

R

R

R

NI Core

NI Core

NI Core

Core

IP core

NI

Network Interface

R

Router

R NI Core

R NI Core

NI Core R

R

R NI Core

NI Core

NI Core

NI Core

R

NI Core R

NI Core

NI Core

Physical link

Fig. 1. Basic 2D mesh NoC architecture.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

3

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

2.2. Thermal management mechanisms In order to design an efficient thermal management mechanism for Network-on-Chip, an accurate thermal profile for the whole network is a prerequisite. Using thermal-sensor is a direct and accurate method to obtain the network thermal distribution [18,19]. But this method needs more control links to transmit the thermal signals. The number of thermal sensors and their arrangement should be calculated precisely. Another way to obtain the network thermal profile is to setup a thermal model for Network-on-Chip [20]. This method reduces the hardware cost of thermal sensors, and it is efficient enough to implement on-chip thermal management. Recent proposed thermal management mechanisms can be divided into design-time optimization (DTO) and dynamic thermal management (DTM) [21]. In design-time optimization methods, thermal balance is completed during offline design, taking the worst-case situation into consideration. DTO thermal management mechanisms mainly include thermal-aware task mapping, voltage and frequency scaling, etc. A hardware based thermal balance mechanism is proposed in [22]. Different clock frequencies and voltages are allocated to each node according to the thermal information generated in the node. In [23], the authors optimize the position of IP cores to achieve thermal balance. Design-time optimization methods are very effective only when the system is stable. They always optimize the NoC thermal distribution at the cost of decreasing data transmission performance. Dynamic thermal management methods regulate the network thermal distribution dynamically according to the current thermal condition. They can be divided into reactive DTM and proactive DTM [24]. Reactive DTM works only when the thermal emergency occurs and avoids the thermal problem at the expense of decreasing the performance of network. Proactive DTM is a more efficient way to achieve the average-case performance because of its ability to dynamically balance the thermal distribution of NoC. In [25], the authors propose a thermal management mechanism considering both the hardwarebased and software-based methods. The hardware-based method is conducted by regulating clock frequency or supplying voltage according to the node temperature. The software based method associates with task migration. In [26], a multi-clock frequency thermal balance mechanism is proposed. Different clock frequencies are dynamically allocated, so the thermal regulation is realized independently to each node. In [27], the authors propose a workload migration method that the hotspot traffic is migrated to other nodes to avoid the formation of hotspot. However, this workload migration needs some redundant modules to deal with the hotspot traffic. A thermal management mechanism is proposed in [28], which uses workload migration and DVFS simultaneously. 3. Thermal model 3.1. Basic thermal model There are mainly two factors determining the node temperature in Network-on-Chip: (1) Energy transformation. Each node consumes energy and the electricity transforms into heat. (2) Thermal conduction. Heat conducts among nodes with different temperatures. Our thermal model takes both the energy transformation and the thermal conduction into consideration. The current temperature of the node i can be expressed as ð0Þ

Ti ¼ Ti þ TT þ TC

ð1Þ ð0Þ

where T T and T C are the temperature generated by energy transformation and thermal conduction, respectively. T i represents the initial temperature of the node i. T T is directly associated with the energy consumption of the node i. We use a coefficient h (°C/J) to indicate the relationship between the node temperature and its energy consumption. For example, h = 1 °C/J means the node temperature will increase 1 °C if the energy consumption is 1 J. T T can be expressed as

T T ¼ h  Etotal

ð2Þ

where Etotal includes all the energy consumed by the node i during system operation. To calculate T C , we introduce the thermal resistance coefficient R to illustrate the thermal conduction property between each node. It is determined by the material property of the chip and the distance between routers. T C can be expressed as

TC ¼

X

Rji  DT ji

ð3Þ

where Rji represents the thermal conductivity between the node j and the node i, DT ji is the temperature differential between the node j and the node i. According to Eqs. (1)–(3), the temperature of the node i can be expressed as ð0Þ

T i ¼ T i þ h  Etotal þ

X

Rji  DT ji :

ð4Þ

The initial temperature of each node is determined by the environmental temperature. If all the nodes have the same inið0Þ tial temperature T i ,

DT ji ¼ T j  T i  h  DEji

ð5Þ

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

4

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

where DEji is the difference of energy consumption between the node j and the node i. The temperature of the node i can be approximately expressed as

  X ð0Þ T i ¼ T i þ h  Etotal þ Rji  DEji :

ð6Þ

3.2. Energy consumption According to the research results in [29], router consumes the equivalent amount of energy as IP core does in Networkon-Chip. In [30], a high level power model is proposed for the NoC router. We mainly consider the energy consumption of the network, which includes routers and links. IP cores can use separate cooling devices to solve the thermal problem. Energy consumption is directly associated with the traffic load in the network. Each flit consumes energy when transmitting through the routers and physical links. As Fig. 2 shows, each flit experiences several pipeline stages in a virtual-channel router. When a head flit is received in the input port, it is stored into the corresponding virtual channel buffer. Then it will go through routing computation (RC), virtual-channel allocation (VA) and switch allocation (SA) in a pipelined fashion. Once the switch allocation is finished, the head flit will be forwarded to the next router or the local IP core in the following cycle. Data flits carry no routing information. They are transmitted along the route which is set up by the head flit. They do not experience RC and VA stages. In each pipeline stage, the virtual-channel router completes specific operation and consumes energy dynamically. The energy consumed by the head flit and data flits can be expressed as Eqs. (7) and (8), respectively.

Ehead

flit

¼ Ereceive þ ERC þ EVA þ ESA þ Eforward :

ð7Þ

Edata

flit

¼ Ereceive þ ESA þ Eforward :

ð8Þ

We assume that all the packets transmitted in the network have a fixed size of P flits, so the average energy consumption of transmitting a flit can be expressed as

1 Eflit ¼ Ereceive þ ESA þ Eforward þ ðERC þ EVA Þ: P

ð9Þ

Thus, the energy consumption of the node i can be expressed as

Ei ¼ Eflit  Ni

ð10Þ

where N i is the total number of flits transmitted by the node i. Finally, according to Eqs. (6) and (10), the temperature of the node i can be expressed as

  X ð0Þ T i ¼ T i þ h  Eflit  N i þ Rji  DN ji :

ð11Þ

From the Eq. (11), we find that if the traffic distribution is balanced, namely making DN ji ¼ 0, all the nodes will have the same temperature in the network, which is

T ¼ T ð0Þ þ h  Eflit  N:

ð12Þ

4. Dynamic thermal-balance routing algorithm 4.1. DTBR algorithm Based on the thermal model presented in Section 3, we propose a dynamic thermal-balance routing algorithm for NoC, DTBR, which is a minimal adaptive routing algorithm to distribute the network temperature uniformly across the whole chip. The balanced thermal distribution as well as good transmission performance is obtained using the DTBR algorithm.

Head flit Receive

Data flit

RC

VA

SA

Forward

SA

Forward

Head flit

Data flit Receive VC Router Fig. 2. Flit transmission process in a virtual channel router.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

5

The proposed DTBR algorithm uses the thermal information to indicate the thermal condition of neighboring nodes and then adaptive routing is conducted to balance the network thermal distribution. The DTBR algorithm chooses the shortest path for the packet from its source to the destination, thus achieving low packet latency. The detailed thermal balance routing algorithm can be described in Fig. 3. 4.2. Deadlock recovery To prevent the performance deterioration caused by the deadlock, a deadlock recovery mechanism based on the virtual channel regulation is introduced. We use an additional virtual channel as the escaping path when the deadlock occurs. A packet is defined as a deadlock packet once it is buffered in the input FIFO exceeding a predefined time threshold. The deadlock packet must release its allocated resources and transmit through the escaping virtual channel. Since the deadlock packet has released the resources occupied, the network can be recovered from the deadlock state. The deadlock packet is routed to the destination node by XY routing algorithm. 4.3. Router architecture To implement the thermal management mechanism based on the DTBR algorithm, we add some control logics into the typical virtual-channel router. The introduction of thermal balance mechanism will not increase router pipeline stages. As Fig. 4 illustrates, the thermal-ware router architecture only needs to add a thermal analyzer and some control links. Compared with the typical virtual-channel router, the hardware cost of the thermal-aware router is negligible. The thermal-aware router operates similarly as the typical virtual channel router dose. When the head flit of a new packet is received by a certain input port, it is stored in the corresponding virtual channel buffer. Routing computation unit conducts the proposed routing algorithm using the thermal information from neighboring nodes. Virtual-channel allocation unit grants virtual-channel requests to the head flits after the RC stage. In our router architecture, VA unit also realizes the deadlock recovery mechanism. If a packet is defined as a deadlock packet, VA unit will release the output virtual-channel reserved by the head flit and allocate a predefined virtual-channel to this packet. Switch allocation unit grants crossbar requests to each flit and controls the connection state of the crossbar. After the SA stage, each flit is transmitted from its output port. Other data flits of the packet follow the route set up by the head flit and transmit to their destination. A thermal analyzer module is used to detect the temperature of each node and it will not disturb the router’s operation. The thermal model presented in Section 3 is realized in the thermal analyzer. 5. Simulation environment In order to evaluate the performance of the proposed thermal balance routing algorithm, we setup a cycle-accurate NoC simulator using SystemC language [31]. The kernel of the simulator is based on Noxim, which is an open source NoC simulator [32]. We add virtual-channels and management modules into Noxim to make it able to simulate the virtual-channel

Fig. 3. Proposed dynamic thermal balance routing algorithm.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

6

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

RC Unit VC Allocator DEMUX

MUX

VCID

DEMUX

MUX

VCID

VCID

DEMUX

VCID West Input

South Input

Switch Allocator West Output

South Output

East Output

MUX

East Input

Thermal Information Input

MUX

VCID

DEMUX

North Output North Input

Local Output

MUX

Local Input

DEMUX

Crossbar Thermal Information Output Thermal Analyzer

Fig. 4. Thermal-aware router architecture.

router based NoC. The simulator is highly parameterized and it can be used to simulate the delay and throughput performance with different network sizes, packet injection rates, traffic patterns, buffer depths, and the number of virtual channels. In the simulation, we use an 8  8 2D mesh Network-on-Chip. Each physical channel includes four virtual-channels in the router architecture, with one virtual-channel used as the escaping channel for deadlock-free recovery. Each input virtualchannel allocates a separate buffer and each buffer can store four data flits. The node generates packets subject to Poisson distribution and each packet has a fixed size of eight flits. We compare the proposed DTBR algorithm with various routing algorithms in different traffic patterns, including XY routing algorithm as well as three turn model based routing algorithms: west-first routing, north-last routing, and negative-first routing, in uniform, transpose, hotspot, and bit-reversal traffic patterns. The performance comparison is conducted in three aspects: average packet delay, network saturation injection rate, and network thermal distribution. To guarantee that the simulation is accurate enough, we set the simulation time to be 200,000 cycles, with 10,000 cycles as warm-up time. We also simulate at each injection rate in a number of times to get an average result. The average value is considered as the final simulation result. 6. Performance analysis 6.1. Average packet delay Packet delay is a significant parameter to measure the network performance. In the simulation, we define the packet delay as a time interval from the time of a packet generated in the source node to the time when it is finally received at the destination node. We use the average packet delay D to illustrate the communication delay performance of the network. It is defined as



N 1X Di : N i¼1

ð13Þ

Where N is the total number of packets received by the destination nodes. Di is the delay of the packet i. Tables 1–4 show the simulation results of the average packet delay of five different routing algorithms in four different traffic patterns. From these four tables, we find that different average packet delay performance will be achieved when using different routing algorithms in a specific traffic pattern. The performance of the same routing algorithm varies significantly in different traffic patterns. According to the simulation results given in these tables, we get the tendency of the average packet delay performance when the injection rate increases, as Fig. 5 shows. In Fig. 5, we can see that the average packet delay of different routing algorithms in different traffic patterns increase in the same tendency. When the injection rate is low, on-chip Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

7

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx Table 1 Average packet delay of different routing algorithms in uniform traffic. Injection rate (flit/ cycle/node)

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.23

XY West-first North-last Negativefirst DTBR

16.4838 16.7672 16.5775 16.7535

17.6118 18.0928 17.7954 18.0674

18.9376 19.6693 19.1957 19.6463

20.4826 21.5897 20.9032 21.4912

22.4652 23.9358 22.9962 23.8643

24.9393 27.3460 25.4876 26.9324

28.4694 32.6797 28.9496 31.1948

34.9259 46.3140 144.075 54.5383 157.516 33.4902 36.4569 39.9868 44.2024 50.9876 61.3923 78.2317 37.8453 44.9362 101.906

16.5089 17.7570 19.1236 20.8738 22.9769 25.6512 29.3616 34.9209

0.24

38.8805

0.25

0.26

0.27

0.28

44.5788 64.7719 88.7610

Table 2 Average packet delay of different routing algorithms in transpose traffic. Injection rate (flit/cycle/node)

0.08

0.10

0.12

0.13

0.14

XY West-first North-last Negative-first DTBR

16.6535 16.2617 16.2189 16.6585 16.5571

18.9866 17.8115 17.9262 19.1828 18.1380

24.2914 21.2735 21.5861 24.5681 20.1158

32.3438 24.5768 25.1874 31.4797 21.3369

93.7862 50.8927 64.6705 91.8161 22.6733

0.15

0.16

0.18

0.20

0.22

0.23

0.24

26.1430

30.8189

38.2395

52.0918

70.1245

255.158

502.108 567.091 24.3059

Table 3 Average packet delay of different routing algorithms in hotspot traffic. Injection rate (flit/cycle/ node)

0.08

0.09

0.10

0.11

0.12

0.13

0.14

0.15

XY West-first North-last Negative-first DTBR

16.1381 16.2243 16.2585 16.3076 15.6705

17.2124 17.2146 17.2109 17.1653 16.4162

18.5428 18.4368 18.5374 18.3953 17.2730

20.4850 19.8860 20.0682 19.8300 18.3060

23.3468 21.7999 22.5637 21.8392 19.5267

27.9654 24.4178 26.5272 24.3268 21.1504

41.9795 28.6669 36.4464 28.1343 23.1950

726.389 35.6772 679.821 34.0117 26.1869

0.16

0.17

0.18

67.4082

1653.78

53.4962 30.6853

1258.92 42.3008

186.123

Table 4 Average packet delay of different routing algorithms in bit-reversal traffic. Injection 0.06 rate (flit/ cycle/ node) XY Westfirst Northlast Negativefirst DTBR

0.07

0.08

0.09

0.10

0.12

0.14

0.15

0.16

0.18

0.20

0.21

0.22

0.23

0.24

16.0427 17.2906 19.2660 24.0080 184.565 16.3088 17.4998 19.4411 23.9606 99.4942 15.1785 15.7096 16.4027 17.2006

18.1271 21.5753 48.2038 596.255

15.3818 16.0061 16.5846 17.2603

18.0058 19.8507 22.1751

23.7779 25.5022 30.5538 41.0016 54.3142 221.912

15.3267 15.8719 16.5128 17.1878

17.9712 19.6935 22.0035

23.5689 25.3928 25.9755 27.3808 32.5189

38.1928 49.0875 100.041

network transmits each packet with little blocking and the average packet delay is about 20 cycles at this stage. With the injection rate increasing, the packet delay increases accordingly. Before the network saturation, this increasing is not so significant that the average packet delay is about 30–50 cycles. When the injection rate exceeds the network’s transmitting capacity, the network will come into saturation and the average packet delay increases dramatically. The DTBR algorithm has lower average packet delay in comparison with the other routing algorithms both at light traffic and heavy traffic, only except for the uniform traffic pattern. 6.2. Saturation injection rate Saturation injection rate represents the maximum volume of network traffic load that can be transmitted in the network. It is defined as the injection rate when the average packet delay exceeds the saturation threshold. This saturation threshold Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

8

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx 120

120

xy west-first north-last negative-first DTBR

100

Average Delay (cycles)

Average Delay (cycles)

100

xy west-first north-last negative-first DTBR

80

60

40

80

60

40

20

20

0 0.06

0 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30

0.08

0.10

(a) Uniform traffic pattern 120

xy west-first north-last negative-first DTBR

60

40

0.18

0.20

0.22

0.24

80

60

40

20

20

0 0.06

0.16

xy west-first north-last negative-first DTBR

100

Average Delay (cycles)

Average Delay (cycles)

80

0.14

(b) Transpose traffic pattern

120

100

0.12

Injection Rate (flits/cycle/node)

Injection Rate (flits/cycle/node)

0.08

0.10

0.12

0.14

0.16

0 0.04

0.18

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

Injection Rate (flits/cycle/node)

Injection Rate (flits/cycle/node)

(c) Hotspot traffic pattern

(d) Bit-Reversal traffic pattern

Fig. 5. Packet delay of different routing algorithms in different traffic patterns.

can be changed according to different system requirements. This paper sets 60 cycles, about four times of the non-blocking average packet delay, to be the network saturation threshold. According to Fig. 5, we get the saturation injection rate of all the routing algorithms in four different traffic patterns, as it is shown in Table 5. In the uniform traffic, the traffic load is evenly distributed in the network. We can see that all the routing algorithms have a similar saturation point except for the north-last routing algorithm. Its saturation injection rate can reach 0.266 flits/cycle/ node. The DTBR algorithm gets saturated when the injection rate reaches 0.236 flits/cycle/node. It is a little better than XY routing, west-first routing and negative-first routing algorithms. In the transpose traffic pattern, the traffic load is not uniformly distributed. XY routing and the turn model based routing algorithms get saturated at a much lower injection rate. Since the DTBR algorithm is fully adaptive, it can distribute the network traffic effectively. The DTBR algorithm has a saturation rate of 0.224 flits/cycle/node, which is much higher than that of XY, west-first, north-last, and negative-first routing algorithms by 61.15%, 54.48%, 51.35%, and 63.5%, respectively. In the hotspot traffic pattern, there are more packets

Table 5 Saturation injection rate of different routing algorithms (flits/cycle/node). Routing algorithm

XY West-first North-last Negative-first DTBR

Traffic pattern Uniform

Transpose

Hotspot

Bit-reversal

0.232 0.224 0.266 0.234 0.236

0.139 0.145 0.148 0.137 0.224

0.144 0.158 0.146 0.162 0.173

0.099 0.103 0.140 0.210 0.236

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

9

transmitted to the hotspot nodes, which is beyond the receiving capability of these nodes. Hence, the network will get heavily congested when the injection rate increases. All the routing algorithms get saturated quickly because of the network congestion. The DTBR algorithm can distribute the hotspot traffic into a much larger region, so it outperforms other counterparts in this traffic pattern. Using the DTBR algorithm, the network comes into saturation at 0.173 flits/cycle/node, which is 20.14%, 9.49%, 18.49% and 6.79% higher than that of XY and the turn model based routing algorithms. Bit-reversal is an application oriented traffic pattern, where the DTBR algorithm also has a remarkable advantage over other routing algorithms. 6.3. Thermal distribution Network temperature distribution using different routing algorithms is obtained in the simulation. It is assumed that the initial temperature of each node T ð0Þ is 25 °C and the coefficient h is 3 °C/J in the thermal model implemented in the thermalaware router. In order to evaluate the thermal balance property of different routing algorithms, the simulation is stopped when 500,000 packets are received. Fig. 6 shows the temperature distribution of different routing algorithms in the uniform traffic pattern. Although the traffic load is uniformly distributed in the network, routing algorithms, like XY routing and the turn model based routing algorithms, are not symmetrical. They disturb the network thermal distribution and form a hotspot region in the center of the network. We can see that the DTBR algorithm distributes the node temperature much more uniform in the network. Using the DTBR algorithm, the maximum node temperature in the network is 34.99 °C, while others are 38.85 °C, 41.29 °C, 41.0 °C, and 40.82 °C when using XY routing and three turn model based routing algorithms. The DTBR algorithm cuts down 25.29% of the peak temperature compared to XY routing. This performance improvement will be more significant compared with other routing algorithms. From this Figure, we can see that the temperature increments in the center of network will be about twice over the average temperature increment using XY routing or the turn model based routing algorithms. The DTBR algorithm can eliminate the network temperature differentials effectively. Four hotspot nodes are allocated in the center of the network in the simulation, which have 10% of hotspot traffic more than other nodes. As Fig. 7 illustrates, XY routing and the turn model based routing algorithms cannot process the hotspot traffic properly. Large amount of traffic is transmitted through the center of the network, making the temperature of the network center exceeds the average level. Using the DTBR algorithm, the hotspot traffic is distributed to a wider region.

Fig. 6. Thermal distribution in uniform traffic.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

10

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

Fig. 7. Thermal distribution in hotspot traffic.

It decreases the differentials of dynamic energy consumption among nodes, leading to the thermal balance distribution. Compared with other routing algorithms, the DTBR algorithm cuts down about 20% of hotspot temperature increment in the center of the network and the node temperature is more evenly distributed. In Table 6, a detailed comparison of the thermal-balance performance of five different routing algorithms in the uniform and hotspot traffic is listed. The thermal balance performance is compared in the condition that the same amount of heat is generated using different routing algorithms. The simulation is stopped when 500,000 packets are received. Therefore, the average node temperatures of these routing algorithms are nearly equal in Table 6. The proposed DTBR algorithm can narrow the gap between the maximum and minimum temperatures. The peak node temperature of the network is decreased evidently. The mean square deviation is used to illustrate the thermal-balance property of the routing algorithms. In Table 6, we can see that the lowest mean square deviation is achieved by the DTBR algorithm in both uniform and hotspot traffic. From the simulation results, we find that the DTBR algorithm can distribute the node temperature more uniformly across the whole network, while it will not influence the performance of data transmission.

Table 6 Thermal-balance performance of different routing algorithms. Routing algorithm

Uniform traffic

Hotspot traffic

Average temperature

Max temperature

Min temperature

Mean square deviation

Average temperature

Max temperature

Min temperature

Mean square deviation

XY West-first North-last Negativefirst DTBR

33.6104 33.6082 33.5935 33.6197

38.8475 41.2936 40.9983 40.8210

26.4552 25.1954 25.2300 25.4233

26.3499 35.8667 35.7321 33.5760

32.6194 32.5648 32.5875 32.5745

43.2075 43.4133 44.5737 42.2942

25.9125 25.1069 25.1191 25.2595

36.5788 39.8112 40.5401 38.1741

33.6876

34.9940

27.4759

16.1768

32.6300

37.4389

26.3712

25.0526

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

11

7. Conclusions This paper proposes a dynamic thermal balance routing algorithm, DTBR, to solve the on-chip thermal problem. DTBR algorithm is based on an architectural thermal model, which takes both the energy transformation and thermal conduction properties into consideration. According to the thermal model, the network thermal distribution is closely related to the traffic load distribution. DTBR algorithm is a minimal adaptive routing algorithm, which dynamically chooses a better route for each packet according to the thermal information. A virtual-channel allocation based deadlock recovery scheme is introduced to keep DTBR algorithm deadlock-free. Further, to implement the DTBR algorithm, a thermal-aware router is designed. The thermal analyzer module analyzes the thermal condition of the current node using the proposed thermal model and exchanges thermal information with the neighboring nodes. Hardware overhead of the thermal-aware router is negligible compared with the basic virtual-channel router. Finally, DTBR algorithm is evaluated in an 8  8 mesh based NoC, compared with XY routing and three turn model based routing algorithms in four different traffic patterns. Simulation results show that the proposed DTBR algorithm can make the distribution of network temperature more uniform and the hotspot temperature is cut down above 20% in uniform and hotspot traffic patterns. Moreover, better performance of average packet delay and network saturation injection rate is achieved using the DTBR algorithm. In our future work, DTBR algorithm will be applied to other topologies for Network-on-Chip. Acknowledgements The authors would like to thank the editor and reviewers for their helpful comments to improve this paper. This work is supported partly by the National Science Foundation of China under Grant Nos. 60803038, 61070046 and 60725415, the special fund from State Key Lab (No. ISN1104001), the Fundamental Research Funds for the Central Universities under Grant No. K50510010010, the 111 Project under Grant No. B08038, the fund from Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory under Grant No. ITD-U11009. References [1] Xiang Dong, Zhang Ye. Cost-effective power-aware core testing in NoCs based on a new unicast-based multicast scheme. IEEE Trans Comput Aid Design 2011;30(1):135–47. [2] Chen Zhen, Xiang Dong. A novel test application scheme for high transition fault coverage and low test cost. IEEE Trans Comput Aid Design 2010;29(6):966–76. [3] Skadron Kevin, Stan Mircea Raducu, Sankaranarayanan Karthik, Huang Wei, Velusamy Sivakumar, Tarjan David. Temperature-aware microarchitecture: modeling and implementation. ACM Trans Archit Code Optim 2004;1(1):94–125. [4] William James Dally, Brian Towles. Route packets, not wires: on-chip interconnection networks. In: DAC ‘01: Proceedings of the 38th annual design automation conference, 2001. p. 684–89. [5] Benini Luca, Micheli Giovanni De. Network on Chip: a new SoC paradigm. Computer 2002;35(1):70–8. [6] Yang Mei, Jiang Yingtao, Wang Ling, Yang Yulu. High performance computing architectures. Comput Electr Eng 2009;35(6):815–6. [7] Liu Peng, Xia Bingjie, Xiang Chunchang, Wang Xiaohang, Wang Weidong, Yao Qingdong. A Network-on-Chip architecture design space exploration – the LIB. Comput Electr Eng 2009;35(6):817–36. [8] Sheng Xu, Ibis Benito, Wayne Burleson. Thermal impacts on NoC interconnects. In: NOCS ‘07: Proceedings of the first international symposium on Network-on-Chip, 2007. p. 220. [9] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikael Millberg, Johny Oberg, Kari Tiensyrja, Ahmed Hemani. A Network on Chip architecture and design methodology. In: ISVLSI ‘02: Proceedings of the IEEE computer society annual symposium on VLSI, 2002. p. 105–12. [10] Xiang Dong. Deadlock-free adaptive routing in meshes with fault-tolerance ability based on channel overlapping. IEEE Trans Depend Secure Comput 2011;8(1):74–88. [11] Christopher James Glass, Lionel Ming Shuan Ni. The turn model for adaptive routing. In: ISCA ‘92: Proceedings of the 19th annual international symposium on computer architecture, 1992. p. 874–902. [12] Chiu Ge-Ming. The odd–even turn model for adaptive routing. IEEE Trans Parall Distrib Syst 2000;11(7):729–38. [13] Jingcao Hu, Radu Marculescu. DyAD: smart routing for networks-on-chip. In: DAC ‘04: Proceedings of the 41st annual design automation conference, 2004. p. 260–63. [14] Wang Ling, Song Hui, Jiang Yingtao, Zhang Lihong. A routing-table-based adaptive and minimal routing scheme on Network-on-Chip architectures. Comput Electr Eng 2009;35(6):846–55. [15] Dally William James, Towles Brian. Principles and practices of interconnection networks. Morgan Kaufman; 2003. [16] Jongman Kim, Dongkook Park, Theocharis Theocharides, Narayanan Vijaykrishnan, Chita R. Das. A low latency router supporting adaptivity for on-chip interconnects. In: DAC ‘05: Proceedings of the 42nd annual design automation conference, 2005. p. 559–64. [17] Duato Jose. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parall Distrib Syst 1993;4(12):1320–31. [18] Lionel Vogt, Youness Chara, Hicham Ouannani, Maria Nazih. Integrated temperature sensor with digital output for SoC power management. In: DTIS ‘07: International conference on design & technology of integrated systems in nanoscale era, 2007. p. 7–12. [19] Kameswar Rao Vaddina, Liang Guang, Ethiopia Nigussie, Pasi Liljeberg, Juha Plosila. On-line distributed thermal sensing and monitoring of multicore systems. In: NORCHIP, 2008. p. 89–93. [20] Li Shang, Li-Shiuan Peh, Amit Kumar, Niraj Kumar Jha. Thermal modeling, characterization and management of on-chip networks. In: MICRO ‘04: Proceedings of the 37th annual IEEE/ACM international symposium on microarchitecture, 2004. p. 67–78. [21] Kumar Amit, Shang Li, Peh Li-Shiuan, Jha Niraj Kumar. System-level dynamic thermal management for high-performance microprocessors. IEEE Trans Comput Aid Design 2008;27(1):96–108. [22] Chunsheng Liu, Vikram Iyengar, Dhiraj Kumar Pradhan. Thermal-aware testing of Network-on-Chip using multiple-frequency clocking. In: VTS ‘06: Proceedings of the 24th IEEE VLSI test symposium, 2006. p. 46–51. [23] Wei Hung, Charles Addo-Quaye, Theocharis Theocharides, Yuan Xie, Narayanan Vijaykrishnan, Mary Jane Irwin. Thermal-aware IP virtualization and placement for Networks-on-Chip architecture. In: ICCD ‘04: Proceedings of the IEEE international conference on computer design, 2004. p. 430–37. [24] Ayse Kivilvim Coskun, Tajana Simunic Rosing, Kenny C. Gross. Procative temperature balancing for low cost thermal management in MPSoCs. In: ICCAD ‘08: Proceedings of the IEEE/ACM international conference on computer-aided design, 2008. p. 250–7.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006

12

F. Liu et al. / Computers and Electrical Engineering xxx (2012) xxx–xxx

[25] Emilio Martinez, David Atienza. Inducing thermal-awareness in multicore systems using networks-on-chip. In: ISVLSI ‘09: IEEE computer society annual symposium on VLSI, 2009. p. 187–92. [26] Yu-Wei Yang, Katherine Shu-Min Li. Temperature-aware dynamic frequency and voltage scaling for reliability and yield enhancement. In: ASP-DAC ‘09: Proceedings of the 2009 Asia and South Pacific design automation conference, 2009. p. 49–54. [27] Greg M. Link, Nicopoulos Vijaykrishnan. Hotspot prevention through runtime reconfiguration in network-on-chip. In: DATE ‘05: Proceedings of the conference on design, automation and test in Europe, 2005. p. 648–9. [28] Shervin Sharifi, Ayse Kivilcim Coskun, Tajana Simunic Rosing. Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor SoCs. In: ASP-DAC ‘10: Proceedings of the 2010 Asia and South Pacific design automation conference, 2010. p. 873–78. [29] Andrew Byun Kahng, Bin Li, Li-Shiuan Peh, Kambiz Samadi. ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: DATE ‘09: Proceedings of the conference on design, automation and test in Europe, 2009. p. 423–28. [30] Lee Seyng Eun, Bagherzadeh Nader. A high level power model for network-on-chip router. Comput Electr Eng 2009;35(6):837–45. [31] Home – Open SystemC Initiative (OSCI). Available from: http://www.systemc.org/home/; 2011. [32] Noxim – NoC simulator. Available from: http://noxim.sourceforge.net/; 2011. Feiyang Liu, received the B.E. degree in Telecommunication Engineering from Xidian University, Xi’an in 2005. He is now a graduate student in the State key lab of ISN, Xidian University, Xi’an, China. His current interests include Network-on-Chip, router architecture, and routing algorithm. Huaxi Gu, received the B.E. degree, M.E. and Ph. D. in Telecommunication Engineering and Telecommunication and Information Systems from Xidian University, Xi’an in 2000, 2003 and 2005, respectively. He is Associate Professor in the State key lab of ISN, Xidian University, Xi’an, China. His current interests include interconnection networks, networks on chip and optical interconnect. He has more than 60 publications in the refereed journals and conferences. Yintang Yang, received the Ph. D. degree from the School of Technical Physics, Xidian University, majoring in the specialty of semiconductor. He is now a professor in Microelectronics Institute, Xidian University. He won the title of National Model Teacher and the Chinese Youth Science & Technology Award. He was selected into the ‘‘Trans-century Outstanding Talents Program’’ of the Ministry of Education, the National ‘‘Key Talents Program’’ & ‘‘New Century Key Talents Program’’, and the Shaanxi Provincial ‘‘Trans-century Talents’’ Program. He has more than 100 publications in refereed journals and conferences.

Please cite this article in press as: Liu F et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip. Comput Electr Eng (2012), doi:10.1016/j.compeleceng.2011.12.006