A Method for Router Table Compression for Application Specific Routing in Mesh Topology NoC Architectures Maurizio Palesi1 , Shashi Kumar2, and Rickard Holsmark2 1
DIIT, University of Catania, Italy
[email protected] 2 J¨ onk¨oping University, Sweden {Shashi.Kumar, Rickard.Holsmark}@ing.hj.se Abstract. One way to specialize a general purpose multi-core chip built using NoC principles is to provide a mechanism to configure an application specific deadlock free routing algorithm in the underlying communication network. A table in every router, implemented using a writable memory, can provide a possibility of specializing the routing algorithm according to the application requirements. In such an implementation the cost (area) of the router will be proportional to the size of the routing table. In this paper, we propose a method to compress the routing table to reduce its size such that the resulting routing algorithm remains deadlock free as well as has high adaptivity. We demonstrate through simulation based evaluation that our application specific routing algorithm gives much higher performance, in terms of latency and throughput, as compared to general purpose algorithms for deadlock free routing. We also show that a table size of two entries for each output port gives performance within 3% of the uncompressed table.
1 Introduction Routing topology and routing algorithm are the two most important aspects which distinguish various proposed NoC architectures [1,2,3]. Fixed tile size based two dimensional mesh topology is favored by many research groups because of its layout efficiency and resulting good electrical properties of the signals. It is possible to envision that application area specific NoC chips will soon become off the shelf products like FPGA chips. One can easily imagine that one such chip could be useful for multi-media applications. Such a heterogeneous multi-core chip will be next in line to the current superscalar DSP chips and will provide an order higher computational power than the current DSPs. The one mechanism to specialize such a chip for a specific application, or a set of concurrent applications, will be through configuring the routing algorithms in the underlying communication infrastructure. The routing algorithm in such an application area specific NoC chip must provide deadlock free communication with a high degree of adaptivity and low latency. Many deadlock free routing algorithms, like e.g. OddEven [4] and the Turn Model [5], have been proposed in literature for mesh topology networks. In these algorithms, deadlock freedom is achieved at a high loss of adaptivity. Boltin et al. [1] have proposed hard coded paths for deadlock safe routing for an application for an irregular mesh topology NoC. A non-minimal deadlock S. Vassiliadis et al. (Eds.): SAMOS 2006, LNCS 4017, pp. 373–384, 2006. c Springer-Verlag Berlin Heidelberg 2006
374
M. Palesi, S. Kumar, and R. Holsmark
free routing algorithm is described for an irregular mesh topology NoC with regions in [6]. Duato has proposed a general theory to develop adaptive deadlock free routing algorithms for any communication network which uses worm-hole switching technique [7]. Most of the deadlock routing algorithms proposed in literature are general purpose and have been designed to handle worst case communication patterns in the network. A NoC system specialized for a set of applications can be regarded as a semi-static system. Here we can have the information about the set of pairs of cores which communicate and other pairs which never communicate after task mapping step. But it may not be possible to know the dynamic variations in the communication traffic among the cores. This information about the communication topology can be incorporated in Duato’s theory to design highly adaptive routing algorithms. We call such algorithms as Application Specific Routing Algorithms (APSRAs) [8]. The most natural way to implement an APSRA will be to provide a table in every router which will guide an incoming flit to an appropriate output port of the router. A table implemented using a writable memory can provide a possibility of specializing the routing algorithm according to the application requirements (in the same way as different functions can be configured in a SRAM based FPGA). Like in FPGAs, we even have a possibility of dynamically updating a routing algorithm. However, the implementation of this routing table will constitute a major part of the router cost (area). In this paper, we propose a method to compress the routing table to reduce its size such that the resulting routing algorithm remains deadlock free. We have analyzed the cost saving possible with our lossless compression method for various sizes of mesh topology NoC. We have also compared the performance of APSRA which uses limited size router tables generated by our methodology with a general purpose deadlock free routing algorithm. The results justify the use of APSRA methodology and our router table compression method.
2 Application Specific Routing Algorithm In [7] Duato has proposed a general theory to develop adaptive deadlock free routing algorithms for communication networks which use wormhole switching technique. Duato’s method is based on generating a Channel Dependency Graph (CDG), in which every channel is a node and there is a directed edge from a node i to j if channel j can be used after channel i for some communication among resources in the network. A cycle in the CDG indicates a possibility of a deadlock. Duato’s method takes only the network topology as input and generates many routing algorithms which will work for all possible situations in the network. In [8] we extended Duato’s theory and presented a method to generate routing algorithms for communication networks when the communication graph of the application is known. We applied the extended method to generate a routing algorithm for a mesh topology network. Figure 1 shows the block diagram of the APSRA methodology. There are two main blocks. The first one implements the APSRA methodology whose inputs are: – A Communication Graph where each vertex ti represents a task, and each directed arc (ti ,t j ) represents the communication from ti to t j .
A Method for Router Table Compression for Application Specific Routing
375
Fig. 1. Block diagram of the APSRA methodology
Fig. 2. An example of application specific channel dependency graph (c) for a given topology graph (a), communication graph (b) and a fully adaptive minimal routing
– A Topology Graph where each vertex pi represents a node of the network, and each directed arc (pi , p j ) represents a physical unidirectional channel (link) connecting node pi to node p j . (In this paper we focus on mesh topologies). – A Mapping Function M : T → P which maps a task t ∈ T on a node p ∈ P. For the sake of example, let us consider the 2 × 2 mesh depicted in Figure 2(a). Let us suppose a communication graph, CG, in which each task communicates with each other task except for task t1 and t4 as shown in Figure 2(b). As mapping function let us consider M(ti ) = pi . The APSRA methodology starts by considering a fully adaptive minimal routing and builds the CDG. Then, by exploiting the CG it extracts from the CDG a sub-graph named Application Specific Channel Dependency Graph (ASCDG). The difference between CDG and ASCDG is that the latter does not contain any channel dependencies between channel pairs that do not belong to any admissible source/destination path for the current routing. Figure 2(c) shows the ASCDG for our example. In [8] we demonstrated that if the ASCDG is acyclic then routing is deadlock free. In our example, ASCDG is acyclic therefore we can assure that minimum fully adaptive routing is deadlock free for this specific communication graph. More in general, if the ASCDG contains some cycles, in [8] we presented an heuristic to break these cycles in order to minimise adaptiveness degradation. The outputs of APSRA methodology is a set of routing tables one for each node of the network. Unfortunately, as we will see in the next sections, the size of each routing
376
M. Palesi, S. Kumar, and R. Holsmark
table grows linearly with network size. For this reason we introduce a second block, named Compression, which performs routing table compression. It gets as inputs: a) the set of routing tables generated by APSRA, and b) a constraint about the maximum routing table size. The compression algorithm (which will be discussed in Section 4) tries to reduce the size of routing tables in such a way as compressed routing tables size do not exceed a user defined threshold. However, sometimes, this operation is not lossless and the cost to pay is a reduction of adaptiveness. At any rate, in all our experiments the reduction in adaptiveness is very low: in the worst case less than 6 percent against a reduction in routing table size of 66 percent.
3 NoC Router Functionality and Design Options A NoC router will have to perform the same functionality as a traditional computer network router, which is basically to help packets sent into the network reach their destination. Due to the on-chip physical constraints, size and power consumption needs to be given higher consideration while designing NoC routers. This will make complicated routing schemes infeasible. An overview of a generic router for mesh topology NoC is shown in Figure 3(a). The router has five input and five output ports, one for each direction plus one for the local resource. There could be packet buffers to manage variations in traffic. The functionality of finding the route for a packet can be split into a routing function and a selection function. A crossbar switch connects the input and output ports. When a packet enters an input port the routing function has to decide to which output a packet should be forwarded. This is in the simplest case done by examining the destination address in the header of a packet. For more advanced routing schemes additional information in the header could also be used.
Fig. 3. Generic mesh topology router (a). Table-based NoC router (b).
If the used routing algorithm is adaptive, it is possible that the routing function returns multiple output choices. In the case that these outputs are not occupied by other packets, a selection has to be made among these. This corresponds to the selection function. There are several schemes that can be used, for example (pseudo) random selection or selection according to a favoured dimension. It is also possible to use lookahead techniques that sense distant congestion and try to avoid this. If a crossbar is used packets headed for non-conflicting outputs can be simultaneously routed. There could
A Method for Router Table Compression for Application Specific Routing
377
be a situation where several packets simultaneously want to use the same output. In this case, arbitration between these has to be performed, for example by using round-robin, random or priority policies. One way to implement the routing function is to design it in hardware logic. For simple routing functions, this results in small and fast routers which can be repeatedly implemented throughout the network. This method has been used by several NoC proposals [1,9]. Another way, mainly used in non on-chip networks, to implement the routing function is to use a routing table [10], depicted in Figure 3(b). Index to the table, where the admissible outputs are stored, is the destination address or a function of the destination address. The values of the table are dependent on which router, or even in which input of a router it is implemented. Using a table gives the possibility to implement more complex routing functions and also the possibility to change it. A disadvantage is that a table can take large space if many destination addresses should be stored. We believe therefore that compressing the table will be of high importance in the NoC context. In this case there would be some encoding logic to find the right table position. As we show later in the paper, routers with small routing table sizes are sufficient for APSRA methodology based routing.
4 Router Table Compression Looking at Figure 3(b), the AdmissibleOutputs block determines the set of admissible output ports through which a header flit can be forwarded to reach a given destination. curr There is an AdmissibleOutputs block for each input. It contains a routing table RTipn where the subscript ipn represents the input port name (North, East, South, West and curr consists of a Local) and the superscript curr indicates the current node id. A RTipn memory addressed by a destination node id dst which returns the set of admissible output port(s) which can be used to reach the destination dst. The total number of bits to store in a generic router is: curr curr curr curr curr ) + Size(RTEast ) + Size(RTSouth ) + Size(RTWest ) + Size(RTLocal ) (1) Su1 = Size(RTNorth
If we consider a H × W mesh based NoC it is simple to show that: Su1 = 12 × (1 + H × W − H − W )
(2)
Since we are dealing with shortest path routing, for a given ipn the destination will be in the opposite quadrants with respect to ipn. For instance, if ipn is West then destination will be either in the first or in the fourth quadrant. For this reason it is possible to reduce the number of bits to store the admissible outputs from 4 to 3 (i.e., it is enough to store the North, East, and South output directions). If we do that, we have to specialise the AdmissibleOutputs block for a given input port. In this case the total number of bits to store in a generic router is: Su2 = 9 × (1 + H × W − H − W )
(3)
The main problem of this approach is that a great deal of memory locations are wasted. Since in real cases a node communicates with only a small subset of network
378
M. Palesi, S. Kumar, and R. Holsmark
nodes, many table entries are never used. An alternative approach is to store the admissible output ports for a set of destinations rather than for a given destination. Let us consider a generic input port, for the sake of clarity let us consider the west input port. If a router receives a flit from its west input port the destination will be in the first and forth quadrant. The problem is to choose the admissible output ports in accordance with the complete routing table generated by APSRA. There are five alternatives: North, South, East, North & East, South & East. The basic idea is to associate a color to each of these 5 alternatives (e.g., North=Red, South=Green, East=Blue, North & East=Purple, South & East=Yellow). Then label each destination with a color. (For instance, if for destination d it is admissible to use outputs North and East, destination d is labeled with color purple). Finally destinations are clustered based on their color.
Fig. 4. (a) Routing table before compression. (b) Color based clustering. (c) Compressed routing table.
For example let us consider the routing table associated to the west input port of node X shown in Figure 4(a). After coloring each destination, a color based clustering is performed [Figure 4(b)]. The constraint is that clustering is performed by means of rectangular regions. In this way it is no more necessary to store the set of all the destinations but only the set of the regions [Figure 4(c)]. Figure 5 shows the block diagram of the AdmissibleOutputs block which uses the compressed routing table. The block InRegion checks if a destination dst belongs to a region identified by its top left corner (TL register) and its bottom right corner (BR register). If this condition is satisfied the output directions assumes the value of the Color register and output hit is set. The same figure shows also the pseudo-code of the InRegion block for a west input port. For a H ×W mesh based NoC and M InRegion blocks per input port the total amount of bits to store is: Sc = number o f inputs × M × [Size(Color) + Size(T L) + Size(BR)] = 5 × M × [3 + (lg2 W + lg2 H) + (lg2 W + lg2 H)] = 5 × M × 3 + lg2 (W × H)2
(4)
From Equations (2), (3), and (4) with M = 4 the compression technique starts to be effective from 7 × 7 mesh size. The saving in terms of the number of bits to store grows
A Method for Router Table Compression for Application Specific Routing
379
InRegion ( in : dst , T L , BR , Color out : ao = (N, E, S) , hit ) { i f ( dst.col ≥ T L.col && dst.row ≥ T L.row && dst.col ≤ BR.col && dst.row ≤ BR.row ) { ao ← Color hit ← 1 } else hit ← 0 } Fig. 5. Block diagram of the AdmissibleOutputs block using the compressed routing table and Pseudo-code of the InRegion block of a west input port
very fast with the mesh size (e.g., 14% for 7 × 7, 47% for 8 × 8, 84% for 9 × 9, and so on). The factor M is the number of InRegion blocks operating in parallel on different regions. In other words it represents the available size of router table in a NoC router. APSRA methodology produces a routing table for every router for any given application mapped on the NoC. Each of these routing tables will be compressed using color based clustering method. Let M represent the size of the compressed table in a given router. If M ≤ M then it is possible to map each region into a InRegion block. Otherwise, if M > M, there are not enough InRegion blocks to host all the regions. It is possible to manage the latter situation by performing a further level of compression at the cost of a loss of adaptiveness. Let us consider again Figure 4(b) where the number of detected regions was M = 5 (A, B, R1, R2, I). If M = 4 we have to remove at least one region. To do this we can restrict the set of admissible output ports for destination A from {North, East} to {East}. Doing that the color of destination A changes from purple to blue and the application of the color-based clustering now returns M = 4 regions (R3, R1, R2, I) as shown in Figure 6. Of course, it is possible to reiterate this method to increase the compression ratio at the cost of a degradation of adaptiveness. For instance, it is possible to merge region
Fig. 6. Example of routing table compression with loss of adaptivity (a) Initial table (b) Color based clustering (c) Compressed routing table
380
M. Palesi, S. Kumar, and R. Holsmark
Fig. 7. Size constrained compression of routing table (a) Initial table (b) Color based clustering (c) Compressed routing table
R3 and region R1 restricting the set of admissible outputs for the destinations belongs to R1 from {South, East} to {East} as shown in Figure 7. Finally, Figure 8 shows the pseudo-code for routing table compression. The function RoutingTableCompression requires as inputs the set of routing tables obtained by APSRA (RT ) and the maximum number of regions a router can manage (M). The output is the set of compressed routing tables. The BuildColorMatrix function returns the color matrix cm for a given routing table. The ColorClustering function perform the color-based clustering of a color matrix and returns the set of the located regions R. The RestrictRouting try to merge some regions by restricting the set of admissible output ports for some destinations.
RoutingTableCompression
}
( inout : RT , in : M ) {
for ( p ∈ P) f o r ( l ∈ Lin (p) ) { cm ← BuildColorMatrix ( RT (p, l) ) R ← ColorClustering ( cm ) w h i l e ( |R| > M ) { RestrictRouting ( cm, R, RT (p, l) ) cm ← BuildColorMatrix ( RT (p, l) ) R ← ColorClustering ( cm ) } }
Fig. 8. Pseudo-code for the routing table compression
4.1 ColorClustering Function Clustering of the color matrix is carried out by expanding each color as much as possible in a rectangular fashion with the constraint that the expanded region of a color c cannot contain any other color c = c . The pseudo code of the function ColorClustering is shown in Figure 9. The input of the function is the color matrix cm. The output is aset of
A Method for Router Table Compression for Application Specific Routing R ColorClustering ( in : cm ) { w h i l e ( cm is not fully covered ) { c ← GetAColoredElement ( cm ) R ← GetRawRegion ( c, cm ) w h i l e ( R contains impurities ) { p ← GetImpurity ( R, c ) R ← CutOffImpurity ( p, R ) } R = R {R} Freeze ( cm, R ) } }
RestrictRouting {
}
381
( in : cm , R inout : RT (p, l) )
R ← GetCandidateRegion ( cm, R ) i f ( R = 0/ ) abort ( ) ; else { nc ← GetNewColor ( cm, R ) ChangeColor ( cm, RT (p, l), nc, R ) }
Fig. 9. Pseudo-code of the ColorClustering function and RestrictRouting procedure
regions R. As said before, a region is identified by three attributes: the top left corner, the bottom right corner, and a color. The external loop iterates until the set of regions covers all the colored elements of the color matrix. That is, for each colored element c of the color matrix cm there exists one and only one region R ∈ R that contains c. First, a colored element c is extracted using function GetAColoredElement. Then, by using function GetRawRegion, a region R containing all the colored elements of the same color of c is extracted. Of course, R could contain some impurities (i.e., colored elements with a different color than c). In this case, for each impurity p, extracted by function GetImpurity, R is reshaped in such a way as to cut-off the impurity p from R. This is performed by function CutOffImpurity which objective is to maximise the density of the reshaped region. The density of a region R is the number of colored elements in R reduced by the number of impurities in R. Finally, when the region R is impurities free, it is inserted in the regions set R and the area of cm in correspondence of R is marked with a particular color code that avoid other colored elements to expand and overlap R (function Freeze). 4.2 RestrictRouting Procedure The procedure RestrictRouting tries to reduce the number of regions by means of adaptivity reduction for some source destination pairs. The pseudo code of the procedure is shown in Figure 9. The input of the procedure are the color matrix cm and the set of regions R. The current routing table RT (p, l) is an input/output parameter. First of all, the candidate region R is extracted from R by using function GetCandidateRegion. The candidate region is the minimum density region whose color has a cardinality of 2. The cardinality of a color is defined as the number of output directions the color represents. If no region respects this constraint routing table cannot be compressed anymore. Otherwise, function GetNewColor returns the color nc used to fill region R. To explain how this color is calculated, let us suppose the original color of R is yellow. We remind that yellow represents South & East output directions, green represents South output direction and blue represents East output direction. Let the average of the Euclidian distances between each point of R and each green (blue) point of cm be dgreen (dblue ).
382
M. Palesi, S. Kumar, and R. Holsmark
If dgreen < dblue then nc is green else nc is blue. Finally, function ChangeColor fills region R of cm with color nc. This function also update routing table RT (p, l) consequently (i.e., destinations that belongs to R are now reachable by using output ports defined by nc).
5 Evaluation Experiments and Results In this section we analyse the degradation in both adaptiveness and overall performance due to routing table compression. We consider three communication traffic scenarios: random, locality, and hot spot. For random and locality traffic we define the communication density, ρ , as the ratio between the number of communications and the number of tasks. The communication graphs are generated randomly based on two different assumptions. In the random scenario, each task can communicate with every other task with equal probability. In the locality scenario, tasks communicate with a probability depending on the distance of the nodes where they are mapped on (probability decrease with distance). Finally, in the hot spot traffic scenario some nodes are designated as the hot spot nodes, which receive hot spot traffic in addition to regular uniform traffic. Given a hot spot percentage h, a newly generated packet is directed to each hot spot node with an additional h percent probability. We consider hot spot nodes located at the center of the mesh [nodes (3, 3), (4, 3), (3, 4), (4, 4)] with 20% hot spot traffic.
Degree of adaptiveness
1
0.9
Random, ρ=2 Random, ρ=4 Random, ρ=6 Locality, ρ=2 Locality, ρ=4 Hot spot
0.8
0.7
0.6
0.5 0
2
4 6 Max number of regions
8
10
Fig. 10. Max number of regions versus degree of adaptiveness for random, locality, and hot spot communication traffic
Figure 10 shows the degree of adaptiveness after compression of the routing tables for different values of the factor M for a 8 × 8 mesh. For random communication traffic the compression is lossless downto M = 4 and M = 5 for ρ = 2 and ρ = 4 respectively. For locality communication traffic compression is lossless downto M = 3 and M = 2 for ρ = 2 and ρ = 4 respectively. For hot spot traffic compression is lossless downto M = 5. If the lossless hypotesis is relaxed the degradation of adaptiveness is less than 2% for random and locality traffic with ρ = 4, 6% for random traffic with ρ = 2, and 3% for hot spot traffic. Finally, we evaluate dynamic performances of APSRA before and after routing table compression using a flit-accurate simulator developed in SystemC (Figure 11). As
A Method for Router Table Compression for Application Specific Routing
383
performance metrics we choose throughput and delay. The evaluations are made on a 8 × 8 network using wormhole switching with a packet size randomly distribuited between 2 and 10 flits. In our model, each router has an input-buffer size of 2 flits. The maximum bandwidth of each link is set to 1 flit per cycle. We use the source packet generation rate as load parameter with Poisson packet injection distribution. For each load value, latency values are averaged over 60,000 packet arrivals after a warm-up session of 30,000 arrived packets. The 95 percent confidence intervals are mostly within 2 percent of the means. If multiple output ports are available for a header flit, the output whose connected input port has the minimum buffer occupied is choosen. As traffic 400
0.03
Odd−Even APSRA APSRA−compress
0.028
300 Throughput (flits/cycle)
Average delay (cycles)
350
250 200 150 100
0.026 0.024 0.022 0.02 0.018
50 0 2
Odd−Even APSRA APSRA−compress
0.016
2.5 3 3.5 Packet injection rate (flits/cycle/IP)
(a)
4 −3
x 10
0.014 2
2.5 3 3.5 Packet injection rate (flits/cycle/IP)
4 −3
x 10
(b)
Fig. 11. Delay variation (a) and throughput variation (b) under hot spot traffic
scenario we use hot spot traffic which is considered to be more realistic than typical synthetic traffic such as uniform, transposte, etc. For this traffic scenario the average degree of adaptiveness of APSRA is 0.93. Applying our compression technique, adaptiveness reduces to 0.90 with a minimum number of regions equal to two. This extremely low degradation in adaptiveness is also confirmed from the dynamic behaviour point of view: there is no appreciable difference between APSRA and APSRA-compressed in both delay and throughput. Moreover, we see that APSRA, in both its natural and compressed form, outperform Odd-Even adaptive routing [4].
6 Conclusions In this paper, we have highlighted the importance and one possibility of developing application area specific NoC chips for mass production. Such a chip will have a capability of configuring a routing algorithm using the communication topology information of already mapped applications. We have argued that a natural way to provide this configuration possibility is to implement the routing function as a table in a writable memory in each router in the communication infrastructure. We have described a cluster based scheme for lossless compression of these tables. An extension of this scheme for table size constrained compression is also described. Through analysis and simulation based evaluation we demonstrate that by using very small fixed sized tables we loose less than 3% performance as compared to uncompressed table.
384
M. Palesi, S. Kumar, and R. Holsmark
We are aware that in any network with fixed size router tables there is always a finite probability that we may not be able to route all required communications. The routability problem may be solved by modifying task mapping on the NoC resources. In the worst case, there is a possibility that routing requirements of an application cannot be satisfied with any possible mapping. Then we must use a NoC chip with larger router tables. It will be interesting to study the routability property as a function of the table size using synthetic as well as real applications. The proposed method can easily be extended for non-homogeneous mesh topologies as well as other topologies. The configurability of the table based router comes at an extra cost of hardware. It will be interesting to compare the hardware cost of the router implementing this scheme with cost of routers implementing general purpose deadlock free routing algorithms like Odd-Even routing. We believe the availability of writable tables in routers will open up many new possibilities for NoC usage as a dynamically configurable computing structure.
References 1. Bolotin, E., Morgenshtein, A., Cidon, I., Kolodny, A.: Automatic and hardware-efficient SoC integration by qos network on chip. In: IEEE International Conference on Electronics, Circuits and Systems, Tel Aviv (2004) 2. Dally, W.J., Towles, B.: Route packets, not wires: On-chip interconnection networks. In: Design Automation Conference, Las Vegas, Nevada, USA (2001) 684–689 3. Guerrier, P., Greiner, A.: A generic architecture for on-chip packet-switched interconnections. In: Design Automation and Test in Europe, Paris, France (2000) 250–256 4. Chiu, G.M.: The odd-even turn model for adaptive routing. IEEE Transactions on Parallel Distribuited Systems 11 (2000) 729–738 5. Glass, C.J., Ni, L.M.: The turn model for adaptive routing. Journal of the Association for Computing Machinery 41 (1994) 874–902 6. Holsmark, R., Kumar, S.: Design issues and performance evaluation of mesh NoC with regions. In: IEEE Norchip, Oulu, Finland (2005) 40–43 7. Duato, J.: A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Transactions on Parallel and Distribuited Systems 4 (1993) 1320–1331 8. Palesi, M., Holsmark, R., Kumar, S., Catania, V.: APSRA: A methodology for design of application specific routing algorithms for NoC systems. Technical Report DIIT-TR-01060406, Dip. di Ingegneria Informatica e delle Telecomunicazioni, Univ. di Catania (2006) 9. Wang, X., Siguenza-Tortosa, D., Ahonen, T., Nurmi, J.: Asynchronous network node design for network-on-chip. In: International Symposium on Signals, Circuits and Systems. Volume 1. (2005) 55–58 10. Vaidya, A.S., Sivasubramaniam, A., Das, C.R.: LAPSES: A recipe for high performance adaptive router design. In: Fifth International Symposium On High-Performance Computer Architecture, Orlando, Florida, USA (1999) 236–243