Routing Solution Over a Multi-layered Network on Chip

4 downloads 0 Views 239KB Size Report
[5] may be implemented which is the gateway for inter NoC routing. ... pyramid network of size n2 has its base 2D mesh which contains n2 modules. So this ...
Routing Solution Over a Multi-layered Network on Chip Saurab Dutta† , Prasun Ghosal‡ College of Engineering, India ‡ Indian Institute of Engineering Science and Technology, Shibpur, India [email protected], [email protected] † JIS

Abstract—Mesh-Tree architecture is an enhanced architecture in NoC. In a three dimensional Mesh-tree topology every layer may be treated as mesh and trees are used to connect adjacent layers. This type of architecture is very efficient for broadcasting as well as high speed communication. Chip multiprocessors are becoming more widely utilized to efficiently use the increasing number of transistors available in a modern VLSI technology. Onchip architectures are used to fulfill the growing demands of cores in communication substrate. In this paper, a new architecture and algorithm for multi-layered communication is proposed so that signal can be transferred from source to destination in a faster way. Keywords—NoC (Network on Chip); Topology; Routing; Meshtree; Performance

I. I NTRODUCTION Network on Chip (NoC) has proven itself as a plausible solution for the growing demand of high performance computing by providing on chip communication solution over SoCs (System on Chip) in a Multicore framework. Advent of three dimensional architecture has given it a boost in recent past e.g. Erno salminen, Ari Kulmala, and Timo D. Hamalainen have compared different topologies based on different parameters like latency, power, frequency, and size to compare different topologies in [1], [2] and to better understand their advantages and disadvantages. In this work, a multilayered approach has been applied in routing over 3D NoCs. If there are N layers or levels, then a mid-layer (N/2 th layer) Network-on-Chip (NoC) [3] [4] [5] may be implemented which is the gateway for inter NoC routing. This approach is mainly applicable for pyramid type network. If the interlayer switch is placed at n/2 th layer then all the routers above n/2 th level will be holding their levels. Other layers i.e. layers over n/2 will be incremented by one level. In mid-layer NoC some Level-2 switch along with their processing elements has been transferred and some level-1 switch is also shifted to mid-layer NoC. The middle layer NOC is connected with its immediate upper layer and immediate lower layer NOC. So when a signal arrives from a source mesh-of-tree NOC to destination mesh-of-tree NOC, then the level-1 switches of destination NOC can be reached directly without following XY routing for inter-layer if the destination switch is within the shifted layer1 switches/routers. The objective is to propose a 3-D view of Mesh-tree[6] topology for local and inter-architecture transportation. By

Fig. 1.

NOC architecture view

using the 3-D architecture the problem of 2-D architecture is eliminated. The objective of the work is to find out a routing algorithm which will minimize routing time in three dimensional architecture and which will minimize congestion in the network.

II.

T OPOLOGY D ESIGN A PPROACH

The architecture is like a pyramid network which may be thought of as combination of mesh and tree network. In pyramid network of size n2 has its base 2D mesh which contains n2 modules. So this concept is like k-ary tree. In case of 2D routing this K-ary concept can be implemented to search a specific node. In 3 dimensional surface we propose Kd tree searching type approach. If different surfaces are there, and the surfaces are divided into different regions. In a 3 dimensional plane if we do inter-domain routing from one surface to another surface using proposed searching method to find the destination surface. Now based on the generation of priority, the optimal path is chosen which is explained in section 5. We divide the plane horizontally and vertically alternatively. First we divide the any plane whether it is X or Y. Then if first X plane is divided then we get two portions, one left half and one right half. In left half certain points are scattered and also in right half some points are scattered. Then if the searched node lies in the left half portion after division, then we split the Y-plane. So alternatively dividing X and Y plane we want to track the searched node. In 3 dimensional plane, the z-axis is first divided into two parts. Then first find in which part the node lies and then again we apply dividing X-Y plane alternatively to search the destination node.

Copyright ©IEEE2015

Fig. 2.

Fig. 3.

Fig. 4.

Cluster based k-ary tree routing approach

Fig. 5.

Clustering of Mesh-of-tree NOC

Pyramid architecture

kd tree surface splitting

III. P ROPOSED D ESIGN In this design, in case of multilayer communication an interlayer NOC is used to act as a gateway. When communication happens from one layer of an NOC to another layer of a different NOC then one gateway NOC is used to catch and transfer the signal of incoming and outgoing signal. In multilayer architecture, interlayer communication is required. As we know NOC fabrication cost is quite high, but in this architecture routing time for inter-layer communication is minimized. According to this design the gateway NOC is placed in the n/2 th layer, where n is the number of layers. If the gateway is placed in n/2 th layer the routing time of interlayer communication of different is decreased. As the NOC is placed at the middle layer , so communication to all the layers is less time consuming compared to simple top to bottom layer communication. If we form cluster [7] [8] then in a cluster few nodes work within the same cluster. Now cluster head is one of the nodes of that cluster. Now cluster head can be identified by 3 bits. First bit represents the layer, 2nd bit represents cluster(like CH1,CH2,CH3.) and third bit represents the node. A cluster head may be a cluster or simply a node. Here we are treating cluster head as a node in NOC. Now when the nodes are within the same layer then the nearest cluster head is searched to find the destination node. But if

the destination node is not in the same layer then the packet will move to the other layer and will find the cluster head to reach to the destination node. First we divide the total k dimensional surface into two parts. First we move to left part surface and then towards right part surface. Now inside a cluster kd tree splitting algorithm is applied to divide the X-Y plane and hence the searched node is found. Now which plane to divide first is the first question comes in mind. If current node position is fx and fy. Now if the farthest node in the X plane is ax and if the farthest node in Y plane is ay then if (f x − ax) > (f y − ay), then first X plane is divided into 2 parts, otherwise y plane is divided into 2 parts. This algorithm continues until the searched node is found. Once the node is found, now routing will take place following the proposed algorithm. IV. P ROPOSED A LGORITHM Routing determines the path of each packet between a source and destination node . In case of deterministic routing only one path exists between a source and a destination. But in case of adaptive routing, several paths are available between source node and destination node. Adaptive routing better balances network traffic, thus allowing the network to obtain a higher throughput. Routing strategies can be also classified as source or distributed. To propagate a packet in the desired direction a value is calculated to compare among different routes. Directional value[9] means that if through that route the destination can be reached in lowest time then some numerical value is assigned to the directional value like 200,250 etc. Percentage of remaining flits means after the head flit how many flits are remaining to pass through the communication

Fig. 6.

Tree type NOC architecture

Fig. 7.

Three dimensional design view of the proposed architecture

channel. Here in this algorithm the remaining flits are the loads of the router. For load calculation the load of local (current) router as well as the load of all neighbour routers are taken into account. 80% of the load is taken from local load value load. 20% of the propagated load depends on the load of the neighbouring routers. So main emphasis is given on the load of the local router. If the packet has to be destined to one of the 3 dimensions(the direction from which the packet is coming has to be ignored), then load of each dimension is calculated. The less load a dimension has, that particular path has highest priority. Evgeny Bolotin,Israel Cidon,Ran Ginosar,Avinoam Colodny [10] mentioned in their paper that the total size of routing entries of table i can be estimated by the sum of entry sizes(li,j ). The look-up logic size can be estimated by the address width, log(N), where N is the total number of in the network, multiplied by the number of table entries(ni ). Thus the total area cost is the sum over all Routing Tables in the network. The dynamic power dissipated in these tables can be also estimated by the sizes of the tables, since the total capacitance is proportional to the number of entries and the size of the entry. The same is true regarding static leakage power, since it is proportional to the number of leaking devices. Now in case of 9 × 9 = 81 nodes, if we think that for each XY routing the time is t1 seconds, for look-up table searching and execution of routing algorithm the time is t2 seconds and if we estimate the time consumed

for network congestion is t3 seconds, then in 9×9 L1 nodes for the farthest distance among two nodes(in mesh-of tree) the time required to route can be estimated as ((8 × t1 × t2) + t3) seconds and for each node it is (t1 × t2 + t4) seconds, where t4 < t3. But in case of implementation of interlayer switch(4 L2 cores with processors and their respective projected L1 routers (4×9=36)), the routing time for L1 level routers(for two farthest node) decreases to ((5×t1×t2)+t3) and after the fabrication of interlayer switch, for the rest of the L1 routers, the routing time between farthest routers is: (6 × t1 × t2) + t5 . So we are having less routing time compared to one 9×9 NOC.   R T area= (ni log2 (n) + lij ) Where, i∈NoC table and j∈entries of table i In the proposed formulation Routing decision is being influenced by the following parameters. • Source-destination distance • Available bandwidth • Load on the routers Based on the above 3 factors a priority value is generated. Now a router while taking the decision of direction of flow takes into consideration the above 3 factors. Based on the above three factors, a priority value is generated and based on that priority value the flow of direction is determined.  Source  destination distance can be calculated using as:= x + y Where x represent each x directional hop and y represent each y directional hop. Available bandwidth can be calculated as: Available bandwidth in the current router/Total available bandwidth of all routers Load[9] of the router is calculated as follows: Load = 80% × ((Remaining f lits of the local router)/ (remaining f lits of all routers)) + 20% × ((Remaining f lits of the neighbour routers)/(remaining f lits of all routers)) Now priority of distance, bandwidth, load is calculated and these priorities are added together to get    the total priority.  Priority(distance) = ( x + y)/max( x) + max( y)   Now if ( x + y) increases then fraction value increases, then priority decreases. So priority of distance can be calculated as:    P1(distance) = 100 − (( x + y)/max( x) +  max( y)) × 100 Now priority of bandwidth can be calculated as : P2(bandwidth)= available bandwidth in local router /total bandwidth available in all routers. As the available bandwidth of the local router is on the higher side, then priority is on the higher side and vice-versa. Now priority of load can be calculated as: P3(load) = 100 - 80%× ((Remaining flits of the local router)/(remaining flits of all routers)) + 20%× ((Remaining flits of the neighbour routers)/(remaining flits of all routers)) As the priority load of local router increases, the priority gets decreased. So in the above calculation the first component is very important. Now after calculation of the first 3 priority values, the 3 priorities are added to get the ultimate priority as P=P1(distance) + P2(bandwidth)+ P3(load) Now based on this priority value the routing decision will be taken place. Now if input is coming from a direction the

North Port

West Port

East Port

South Port

Fig. 8.

Load propagation from input port to output ports Fig. 10.

X--

Y--

System generated latency comparison graph

Different variable abbreviations used in the pseudo codes are given in table I. TABLE I.

x++

Y++

z++

Fig. 9.

3D routing example

output can,t be re-directed into that direction i.e. if input is coming from east direction, then routing should be done on the remaining 3 directions, i.e. North(x++), South(x–), West(y–) depending on the priority value. Now if both x and y dimensions are busy ,then routing takes place in z dimension(z++ or z–). If the packet needs to be destined to the upper layer than packet goes in z++ dimension, otherwise in z– dimension. Now consider a situation as depicted in Figure 9, where source node s1 which is in layer-1 and the destination node is s7. Here it can be seen as in layer 2 many alternative paths are there, but as those paths are overloaded so the routing path is: s1− >s2− >s3− >s4− >s5− >s6− >s7. The paths which are discarded having “×” symbol. Conditional situations based on which XYZ routing order gets decided has been described as follows. If (Priority in X dimension > Priority in Y dimension Priority in Z dimension) then route in XYZ order If (Priority in Y dimension > Priority in X dimension Priority in Z dimension) then route in YXZ order If (Priority in Z dimension > Priority in X dimension Priority in Y dimension) then route in ZXY order So the function tries to find out a suitable path where priority is maximum based on the calculated value.

> > > the

TABLE OF NOTATIONS

Xdest

X co-ordinate of actual destination node.

Ydest

Y co-ordinate of actual destination node.

Zdest

Z co-ordinate of actual destination node.

Xcurr

X co-ordinate of current node.

Ycurr

Y co-ordinate of current node.

Zcurr

Z co-ordinate of current node.

Xof f set

difference between Xdest and Xcurr .

Yof f set

difference between Ydest and Ycurr .

Zof f set

difference between Zdest and Zcurr .

Xdif f

absolute difference between current Xdest and Xcurr .

Ydif f

absolute difference between current Ydest and Ycurr .

Zdif f

absolute difference between current Zdest and Zcurr .

Flow of Proposed Routing is as follows. Pseudocode of proposed XYZ Routing algorithm is as follows. V. R ESULT A NALYSIS We have developed the code to view if the signal directly goes to gateway chip then how much time can be saved in routing. If the k-d surface splitting algorithm is applied and if layer bits are applied then node searching becomes more faster in multilayer architecture and if the gateway NOC is installed then the searching procedure will be more faster as the searching is limited within N/2 layers. So using cluster bit using k-ary tree or k-d tree algorithm for searching the destination node and installing N/2 th layer NOC gateway the multilayer communication can be made faster in dynamic routing environment. VI. F UTURE S COPE AND I MPLEMENTATION By applying threaded k-ary tree approach we can track the routing path and hence it will allow us to find out different routes from source to destination by storing all the routes in an array. So if any link becomes faulty in future then the next best path based on the priority value can be selected. We have applied only gateway algorithm, but routing based on directional value is yet not implemented in the code. So

Overall Routing Algorithm Step1: Compute(Xof f set, Yof f set , Zof f set ) Xof f set = Xdest − Xcurrent ; Yof f set = Ydest − Ycurrent ; Zof f set = Zdest − Zcurr ; Step2: if ((Xof f set orYof f set orZof f set ) > n/2) then movetogateway(Xcurr, Xcurr , Zcurr ) end Step3: Compute(Xdif f , Ydif f , Zdif f ) if (Xdest > Xcurr ) then Xdif f = Xdest − Xcurr end else Xdif f = Xcurr − Xdest end if (Ydest > Ycurr ) then Ydif f = Ydest − Ycurr end else Ydif f = Ycurr − Ydest end if (Zdest ¿Zcurr ) then Zdif f = Zdest − Zcurr end else Zdif f = Zcurr − Zdest end Step4: Compute (Xdif f , Ydif f , Zdif f ) if (Xdif f = 0 and Ydif f = 0 and Zdif f = 0) then channel=internal // i.e. exit routing end else follow XYZ routing end Algorithm 1: Overall Routing Algorithm

in the distributed architecture where packets are routed and processed,this directional value algorithm will be helpful to find the routing direction.

XYZ Routing Algorithm if (Zdiff ! = 0) then if (Zoffset>0) then channel=Z++ end else Channel=Z– end end else if (Ydiff ! = 0) then if (Yoffset>0) then channel=Y++ end else channel=Y– end end else if (Xoffset>0) then channel=X++ end else channel=X– end end end Algorithm 2: Proposed XYZ Routing Algorithm

TABLE II.

A SIMPLE COMPARISON OF REQUIRED TIME FOR PROPOSED MODEL COMPARED WITH I NTERLAYER L AYER ROUTING MODEL

Source

Destination

Time in Simple interlayer routing

Time in our proposed model

111

555

4.5

3.85

111

888

8.344

4.375

221

566

7.844

3.641

122

244

2.609

2.016

223

456

3.922

3.609

222

888

6.766

4.156

352

364

2.609

2.016

R EFERENCES [1]

[2]

[3]

NoC Benchmark Workgroup, “An initiatie towards open network-on-chip benchmarks, white paper,OCP-IP.” [Online]http://www.ocpip.org/socket/whitepapers/NoC- BenchmarksWhitePaper-15.pdf, 2007, 2007. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, “Performance evaluation and design trade-offs for network-on-chip interconnect architectures,” IEEE Transactions on Computers, vol. 54, pp. 1025–1040, Aug 2005. L. Benini and G. De Micheli, “Networks on chips: a new soc paradigm,” Computer, vol. 35, pp. 70–78, Jan 2002.

TABLE III.

T IME COMPLEXITY OF PROPOSED ALGORITHM

Complexity type

Average case

Worst case

space

O(n)

O(n)

Search

O(logn)

O(n)

insert

O(logn)

O(n)

delete

O(logn)

O(n)

TABLE IV.

C OST OF CONVENTIONAL N O C VS CLUSTER BASED N O C

Source

Destination

Cost in Conventional NOC structure

Time in Cluster based NOC

A11

A13

2

1

A11

A23

4

2

A11

A42

5

4

A11

A43

6

4

A21

A31

4

2

A21

A33

4

2

A24

A34

4

3

A31

A22

5

2

A31

A24

3

2

A33

A22

5

3

A41

A11

4

3

A43

A11

6

4

[4]

[5]

[6]

[7]

[8]

[9]

[10]

R. Gindin, I. Cidon, and I. Keidar, “Noc-based FPGA: Architecture and routing,” in First International Symposium on Networks-on-Chip, 2007, pp. 253–264, May 2007. F. Karim, A. Nguyen, and S. Dey, “An interconnect architecture for networking systems on chips,” IEEE Micro, vol. 22, pp. 36–45, Sep 2002. E. Salminen, A. Kulmala, and T. Hamalainen, “On network-on-chip comparison,” in 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools, pp. 503–510, Aug 2007. F. Ge, N. Wu, X. Qin, and Y. Zhang, “Clusteringbased topology generation approach for application-specific network on chip,” in World Congress on Engineering and Computer Science Vol II, San Francisco, USA, Oct 2011. M. Winter, S. Prusseit, and P. Gerhard, “Hierarchical routing architectures in clustered 2D-mesh networks-on-chip,” in International SoC Design Conference, pp. 388–391, Nov 2010. W. Trumler, S. Schlingmann, T. Ungerer, J. Bahn, and N. Bagherzadeh, “Self-optimized routing in a network on-a-chip,” in BiologicallyInspired Collaborative Computing, vol. 268 of IFIP The International Federation for Information Processing, pp. 199–212, Springer US, 2008. E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “Routing table minimization for irregular mesh nocs,” in Design, Automation Test in Europe Conference Exhibition DATE ’07, pp. 1–6, April 2007.

Suggest Documents