Cross-Base Routing over Diagonalized Mesh ... - Semantic Scholar

3 downloads 0 Views 314KB Size Report
can be seen from Figure 2(c), hexagonal mesh is modified in. 2D mesh with an .... source to its final destination and throughput is amount of data transmitted ...
2010 IRAST International Congress on Computer Applications and Computational Science (CACS 2010)

Cross-Base Routing over Diagonalized Mesh Network on Chip Mushtaq Ahmed

M. S. Gaur, Vijay Laxmi

Department of Computer Engineering Malaviya National Institute of Technology Jaipur, India

Department of Computer Engineering Malaviya National Institute of Technology Jaipur, India

Abstract—Development of a new regular topology for Network on Chip (NoC) is a challenging task, as the proposed design should meet the application specific targets of latency, throughput, area, and energy as well. A suitable routing over newly proposed topology must handle congestion and provide deadlock free transmission. This paper discusses design, implementation of five port regular NoC structure (excluding core port) derived from hexagonal structure, a cross-base routing algorithm for the proposed topology and its comparison with Odd-Even and North-Last adaptive routing algorithms. The Cross-Base routing for the proposed five port two-dimensional regular NoC structure with simplified node numbering results better improvement in the latency and throughput as compared with the other adaptive routing technique.

II. COMMON NOC TOPOLOGIES Scalability is one of the design issues for on-chip communication networks. As a result, regular topologies are used in general-purpose NoC. Mesh, Torus and Fat Tree are commonly used regular topologies [5, 13] in NoC. Routers are connected in 2D Mesh [4] with its adjacent four neighbors. Boundary and corner routers are connected to three and two routers respectively as shown in figure1 (a). In Torus all nodes have four neighbors in symmetry as shown in figure 1 (b).

Keywords- Networks-on-Chip, Cross-Base, Multiport NoC, adaptive routing.

I.

INTRODUCTION

Network-on-chip has emerged as an alternate to conventional on-chip communication network. This is based on “route packet and not wire” principles and was proposed by Dally [7]. Topology design for this on-chip-network is one of important issues in NoC research [1, 2]. Mesh, Torus, and Fat Tree topologies are most common regular topologies used in NoC architecture [2, 12]. In this paper, we propose a regular structure in which each node (except for edge and corner nodes) has five neighbours. This topology is derived from Hexagonal topology as discussed in [16]. Only difference is that unlike HARTS, nodes can be mapped to a 2D mesh in a one-one manner. A new cross-base routing algorithm is proposed for this topology.

1 (a) Mesh

Figure 1: (a) 4x4 Mesh and (b) Torus topologies showing nodes and their interconnections. III.

PROPOSED TOPOLOGY

Hexagonal structures for NoC are discussed by [14, 15 and 18], The HARTS project used Hexagonal Torus. [17] discussed hexagonal honeycomb structure. [14, 15] discussed 2D and 3D Hexagonal cubical structure connected with adjacent six neighbours. In scattered manner three dimensional numbering in was used for nodes to provide shortest path interconnection links. In hexagonal structure, routers are required to be placed systematically.

The paper is organized as follows: Section II highlights common NoC topologies. Section III discusses, in detail, our topology and compares it with existing topologies. Scalability and cost issues are discussed in Section III.A. Section IV presents cross-base routing algorithm. Section V discusses brief description of NOXIM simulator used in our simulations. Section VI presents experimental setup and contrasts results of new cross-base algorithm with Odd Even and North-last routing algorithms. Concluding remarks and future works are mentioned in section VII.

978-981-08-6846-8/$26.00 ©2010 IRAST

1 (b) Torus

Proposed topology is derived from basic hexagonal structure where each nodes are connected with its three neighbour nodes as shown in Figure 2 (a). A simplified addressing scheme akin to XY addressing in 2D mesh can be achieved as shown in Figure 2(b) by adding columns. As can be seen from Figure 2(c), hexagonal mesh is modified in 2D mesh with an additional diagonal or cross-link. Figure 2(d) shows modified topology in which a diagonal link (referred to as cross in subsequent discussion) is added

13

along one diagonal and only alternate columns to achieve a five neighbour connectivity (IP core excluded).

2(a) Hexagonal

2 (c) Alternate view

3(a)

3(b)

3(c)

3(d)

2(b) Rows and column for Addressing Scheme

2(d) Five neighbours topology

Figure 2: (a) Hexagonal topology, (b) Rows and Column for addressing schemes, (c) alternate view and (d) modified topology to achieve 5-neighbours connectivity for intermediate nodes. This rearrangement makes a simplified cross base mesh structure connecting intermediate nodes diagonally with other generic node or tiles. Every intermediate nodes of the network consist of six ports including one for core. Numbers of directions are thus increased from four to five. Figures 3(a) and 3(b) show two of four possible configurations, both regular structures. These configurations can be generated as follows: 1. Adding one South-East (SE) port for odd column tiles and North-West (NW) port for even column tiles. (Figure 3(a)). 2. Adding one South-East (SE) port for even column tiles and North-West (NW) port for odd column tiles. (Figure 3(b)). 3. Adding one North-East (NE) port for odd column tiles and South-West (SW) port for even column tiles. (Figure 3(c)). 4. Adding one North-East (NE) port for even column tiles and South-West (SW) port for odd column tiles.(Figure 3(d)). Comparison of regular architecture of 2D Mesh, Torus and proposed 5 ports topology is given in Table 1.

Figure 3: Four possible configurations for modified topology to achieve 5-neighbour connectivity. Table 1. Comparison of proposed topology with Mesh, Torus and Fat tree Topology → Parameter

2D Mesh

Degree (incl. IP core)

Torus

3-5

Diameter

2 n −1

Average distance

2

3

n

5

Proposed

Fat Tree

n

3-6

2 n − 1,

2(log2 n −1) +1

2

1 log2 n ∑(2h −1)2h−1 n −1 h=1

n 1

2

n

3

n

,

2(n − n) + Wire Cost

Scaling

2(n− n)

n

2n

n IV.

⎢ n −1⎥ ⎥( n −1) ⎢ ⎣ 2 ⎦

2(n − n)

n

n2

ROUTING ALGORITHM

Routing algorithms select a set of channel from all available channel(s) which are used to transmit flow control unit (flits) from source to destination. Deterministic routing algorithm determine route from a source to destination as a function of address of source/ intermediate and destination nodes. For a given source and destination, the route remains fixed and does not vary. As a result, deterministic routing is not amenable to congestion handling. XY routing is most common deterministic routing method for 2D mesh-like topologies. On the other hand, adaptive routing algorithms

14

use information about the state of the network to make routing decisions [10]. As route from a source to a destination is not fixed in dynamic routing, deadlock avoidance is a major aspect of an adaptive routing algorithm.

Algorithm 1: Cross Base routing 1. First decide whether column is odd or even. 2. diff_x= destination.x - current.x; 3. diff_y= -(destination.y - current.y); 4. if (diff_x == 0)// source, destination at same column 5. if diff_y > 0 6. Enable North turn; 7. else 8. Enable South turn; 9. endif 10. elsif (diff_x > 0)// destination to East of source 11. if(current.x is odd) // odd column 12. if(destination.y < current.y) 13. Enable SE turn; 14. else 15. Enable East turn; 16. endif 17. else 18. Enable Ease turn; 19. endif 20. else // destination to West of source 21. if(current.x is odd) // odd column 22. Enable West turn; 23. else 24. if(destination_y > current.y) 25. Enable NW turn; 26. else 27. Enable West turn; 28. endif 29. endif 30. endif

A. CROSS BASE ROUTING A link is introduced in diagonal nodes which provides a direct communication and, hence, shorter link between diagonally placed nodes and likely to reduce latency as flits can be forwarded directly to diagonal node without using the link in same row or column. Also introduction of an alternate path is likely to delay onset of congestion as traffic is distributed on more links. The cross base routing is deadlock free [9] as the figure 4 shows dependency graph without any cycle (for the simplicity extended dependency graph links are omitted).

Figure 4: Dependency graph for the cross-base routing Cross-base routing is a slight modification of XY routing. Of available links, preference is given to cross link wherever possible. Otherwise, a normal XY routing strategy is adopted. The proposed routing algorithm is shown in Figure 5 illustrates application of algorithm applied for the topology. Figure 5(a) shows path using cross links from source1 and destination1 whereas source2 cannot use cross link for destination2 as it is not available between them. Similarly figure 5(b) shows cross linking path between source3 and destination3 node and normal route from source4 to destination4 node.

V. NOXIM SIMULATOR Noxim simulator[11] developed in SystemC is freely available from sourceforge under GPL license terms. Noxim explores the design space using different parameters of a NoC such as size routing algorithm, traffic pattern, file and buffer size etc for analysis and evaluation including delay, throughput, energy consumption using synthetic traffic generators. VI.

Figure 5: Cross Base routing: If available, cross link is chosen else normal XY routing is applied.

APPLICATION AND CONFIGURATION

Traffic in the network is not always uniform and varies frequently. The traffic over the NoC may be of random or uniform with constant bit flow [11]. Both can be applied over the network. We used random and transpose traffic with the flit size ten having two header payload and eight bytes data payload over the different routing algorithms while increasing rate at which the packets are generated and transmitted i.e. packet injection rate[19]. We applied packet injection rate (pir) 0.01 to 0.2 with 0.001 steps, total upto 20% over the 5 port NoC structure, after that the system becomes stable. The applications and configuration parameters like traffic pattern, topology, routing algorithms

15

are initialized and simulation experiments are made over the NoC using Noxim simulator.

Global Average Delay for Random Traffic 4000

A. Experimental Details Application and traffic pattern are configured as per the Figure 3.(b) structure and simulated using the different routing algorithms. We chose Odd Even and North Last adaptive routing as they are commonly used partial adaptive routing used in NoC in comparison with cross-base routing over the 5x5 mesh networks. Random and transpose traffic generators with random selection for the 10 packet size was used for simulation for five thousand cycles. Graphs for latency and throughput are plotted using MATLAB for comparative study. Delay is calculated as the total amount of time spent in a network by a packet starting from its source to its final destination and throughput is amount of data transmitted over the network successfully in a given period of time.

3500

Global Average

3000 2500 2000

oddeven crossbase northlast

1500 1000 500 0

0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

0.14

0.16

0.18

0.2

6. (b) Global Average Throughput for Random Trafficc 0.24

Global Average Throughput

Improvement in overall throughput and latency has been observed in the proposed topology with additional ports and cross base routing as compared to other north last and odd even adaptive routing algorithms used for simulation. As the packet injection rate increases, congestion in the network start increasing and hence non linear increase in the delay is observed. The delay in the Odd Even is increases more as East to North and East to South turn are restricted for every node at even columns and North to West and South to West turn are restricted for every node positioned at odd columns. For the cross-base routing for the random traffic max delay and global average delay was decreased around 500 cycles at the peak packet injection rates as shown in Fig 6(a) and 6(b). Network throughput was increased about 6 percent compared to Odd Even with the available additional port as well, as shown in Fig 6(c). Similarly for the applied transpose traffic result shows improvement in average and maximum delay Fig. 6(d) and 6(e). Fig. 6(f) shows an increase in throughput up to 3% for the transpose traffic making proposed architecture and algorithm more suitable for the NoC design in comparison to conventional architecture.

0.22 0.2 0.18 0.16 0.14 0.12 oddeven crossbase northlast

0.1 0.08 0.06 0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

0.14

0.16

0.18

0.2

6. (c) Global Average Delay for Transpose Traffic 3000

Global Average Delay

2500

2000 oddeven crossbase northlast

1500

1000

500

0

0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

0.14

0.16

0.18

0.2

6. (d)

Max Delay for Random Traffic 5000

Max Delay for Transpose Traffic 5000

4500 4500 4000 4000 3500 3500 oddeven crossbase northlast

Max Delay 3000 2500

Max Delay

oddeven crossbase northlast

3000 2500

2000 2000 1500 1500 1000 1000 500 500 0

0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

0.14

0.16

0.18

0.2

0

6.(a)

0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

6. (e)

16

0.14

0.16

0.18

0.2

6. Global Average Throughput for Transpose Traffic 0.4

Global Average Throughput

7.

0.35 0.3

8.

0.25

oddeven crossbase northlast

0.2

9.

0.15

  0.1

0.05 0

0.02

0.04

0.06

0.08

0.1 PIR

0.12

0.14

0.16

0.18

0.2

6. (f) Fig. 6(a) , 6(b), 6(c) Shows Average delay, Max delay and throughput for random traffic figure 6(d), 6(e), 6(f) shows Average, Max delay throughput for transpose traffic for the odd even, north-last and cross-base routing. VII. CONCLUSION AND FUTURE WORK In this paper, we presented simplified modified hexagonal topology with the additional ports similar to Mesh and assigned with simple addressing to make efficient enough for dynamic adaptive routing algorithms satisfying current on chip network constraint. Significant improvement is observed in throughput and latency with the cost of supplementary ports in the topology. Cross-Base routing provides optimal latency even under highest packet rate. Deterministic routing algorithms are unable to handle such situation and causes decrease in throughput of network. With the use of virtual channels further improvement can be possible for the proposed network as virtual path ensure more accessibility of available paths to the flits and provide deadlock free condition for the applied topology and routing algorithm. REFERENCES 1.

2.

3.

4.

5.

10.

11. 12. 13.

14.

15.

16.

17.

18.

19.

U. Ogras, J. Hu, R.Marculescu“ Key research problems in NoC Desin: A Holistic Perspective”, IEEE TCAD vol 23, no 6, pp.69-74, 2005 Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen et al.: A network on chip architecture and design methodology. In Proceedings of IEEE Computer Society Annual Symposium on VLSI, April 2002. Duato J. : A Theory of Fault Tolerant Routing in Worm Hole Networks IEEE transaction, Parallel and Distributed systems, Vol 8, no 8, pp. 790-801 August 1997 Hemani et al., “Network on a chip: An architecture for billion transistor era,” in Proc. IEEE NorChip Conf., Nov. 2000, pp. 166–173. Ivanov and G. De Micheli. The Network-on-Chip Paradigm in Practice and Research. Design & Test of Computers, 22(5):399–403, 2005

17

J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2002 William J. Dally and Brian Towles. “Route Packet, Not Wires: On-Chip Interconnection Networks”. In Proceedings of DAC, June 2002 M. Palesi, R. Holsmark, S. Kumar, and V. Catania, “A Methodology for Design of Application Specific Deadlock-Free Routing Algorithms for NoC Systems,” Proc. Int’l Conf. Oct. 2006. Duato, J.: A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE Trans. On Parallel and Distributed Systems, 4(12): 1320-1331 (December 2003). C.J. Glass and L.M. Ni, “The Turn Model for Adaptive Routing,” J. ACM, vol. 41, no. 5, pp. 874-902, Sept. 1994. “Noxim:Network-on-ChipSimulator, http://sourceforge.net/ project/noxim, 2008 Axel Jantsch and Hannu Tenhunen, editors. Networks on Chip. Kluwer Academic Publishers, February 2003. Pande at el, “Performance Evaluation and Design TradeOffs for Network-on-Chip Interconnect Architectures”, IEEE Trans. On computers Vol 54. no 8 Cacherine D, David S.,” 3D Hexagonal Network: Modeling, Topological Properties, Addressing Scheme, and Optimal Routing Algorithm”, IEEE Trans. Parallel and Dist. Systems, Vol 16. No 9, Sept 2005. M.S. Chan, K.G.Schin, and D.D. Kandlur,” Addressing, Routing and Broadcasting in Hexagonal Mesh Multiprocesors:, IEEE Trans, Computers, Vol 39, no. 1, pp 10-18 Jan 1990. K.G. Shin, “ HARTS: A Distributed Real-Time Architecture,” Computer, Vol 4, no 5, pp 25-35, May 1991. I, Stojmenovic, “ Honeycomb Networks: Topological Properties and communication Algorithms”, IEEE Trans, Parallel and Distributed Systems, Vol. 8. no.10, pp 10361042, Oct 1997. J.F. Myoupo, H.N.Damas, and D. Same,” Hexagonal Mesh: A New Addressing scheme and an optimal Routing Algorithm:, Proc. Int’l Symp. Parallel and Distributed Computing and Networks”, PDCN’02 2002. Giuseppe Ascia, V. Catania, Maurizio Palesi,” Implementation and Analysis of a New Selection Strategy for Adaptive Routing in Networks-on-Chip”, IEEE Trans. on computers, vol. 57, no. 6, pp 809-820 June 2008

Suggest Documents