Yield Enhancement by Robust Application-specific Mapping on Network-on-Chips A. Dutta Choudhury
ALaRI - University of Lugano Lugano, Switzerland
[email protected]
ABSTRACT
The current technological defect densities and production yields are a motivating factor supporting the introduction of design-for-manufacturability techniques during the highlevel design of complex, embedded systems based on networkon-chips (NoCs). In this context, we tackle the problem of mapping the IPs of a multi-processing system to the NoC nodes, by taking into account the effective robustness of the system with respect to permanent faults in the interconnection network due to manufacturing defects. In particular, we introduce an application specific methodology for identifying optimal NoCs mappings which minimize the variance of the system power and latency and maximizes the probability that the actual system will work when deployed, even in presence of faulty NoC links. We provide experimental results by comparing the proposed methodology with conventional mapping approaches, by highlighting benefits and drawbacks of both techniques.1
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated Circuits—Advanced technologies; B.7.1 [Hardware]: Integrated Circuits—VLSI (very large scale integration)
1.
INTRODUCTION
As the complexity of system level design continues to grow, System-on-Chips are becoming increasingly affected by manufacturing or transient defects which can span the overall layout of the chip. The problem of the yield-loss (i.e., the percentage of chips which are defected) is not only impacting the company’s overall revenue but also creates big challenges to employ such chips into real-time, dependable embedded systems which currently constitute a wide market. Among those applications which are dramatically influenced by the dependability of the chip, we can identify the automotive market and the bio-medical market. 1
This work was supported in part by the EC under grant MULTICUBE FP7-216693
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NoCArc ’09, December 12, 2009, New York City, New York, USA c 2009 ACM 978-1-60558-774-5... $10.00 Copyright
NoCArc 2009 — December 12, 2009, New York, New York, USA
G. Palermo, C. Silvano, V. Zaccaria
Politecnico di Milano Dipartimento di Elettronica e Informazione {gpalermo, silvano, zaccaria}@elet.polimi.it
Network-on-chips are more and more becoming the golden choice for implementing interconnection networks within a multi-processing system-on-chip architecture. NoCs deploy packet-based communication within the chip. This is strictly justified by the need for processing speeds no longer achievable by using standard connection approaches. NoCs are, however, one of the most delicate part of the chips for what concerns the system reliability with respect to faults, either transient or permanent. The need of fast, energy-efficient, fault-aware and faulttolerant NoCs is of extreme importance and constitutes a major area of research in the embedded system design scenario. In this paper, we propose a novel approach for determining IP-to-node mappings which are able to tolerate a number of link failures yet produce acceptable operating performance. The proposed approach exploits some key developments in the field of quality and robust design in order to maximize the overall yield of the system with respect to faults by contemporaneously optimizing the operating performance of the target architecture. The methodology exploits a dynamic re-routing capability of the underlying network on chip, enabling defective chips to sub-optimally work and thus contribute to the system yield. The rest of the paper is organized as follows. Section 2 describes the state of the art of NoC topology customization and IP-to-node mapping. Section 3 introduces the proposed design methodology while section 4 shows the experimental comparisons with state-of-the-art mapping techniques. Finally, Section 5 highlights some conclusion and some research direction.
2.
STATE OF THE ART OF NETWORK-ONCHIP-DESIGN
A major issue of the network-on-chip design paradigm is the routing technique used for transferring data from one processor node to the other, by determining the sequence of intermediate forwarding actors (or routers), which is also called path. One of the most important problem of routing is deadlock. A deadlock can occur when a circular dependency exists among the channels dedicated to the packet’s route [4]. Usually, the designer is interested in building network routing techniques which are deadlock-free by construction.
Routing techniques can be broadly classified into two categories: deterministic and non-deterministic. In deterministic routing, the path followed by a packet is determined
37
solely by the source and destination. Among the most used deterministic routing schemes we can find the XY (or Xthen-Y) [7] routing which essentially produces bi-linear trajectories within a rectangular mesh. This type of routing is considered a minimal path routing algorithm and has deadlock-free properties and being also straightforward to implement. Non-deterministic or adaptive routing represent a category of routing techniques in which the path scheme is determined dynamically at run-time [4]. Usually, those techniques are not mutually exclusive and can be integrated to create customized routing techniques which suit the application under consideration. In this paper we focus our attention on the Application Specific deadlock-free Routing Algorithm(APSRA) proposed by Palesi et al. [12] which exploits an input communication graph in order to produce an adaptive deadlock-free routing table. Another problem for network-on-chip design is the mapping of cores or application specific functionalities to the network nodes. This problem is generally treated like a integer linear program, with some extensions to handle multiple objectives. A vast research approach covers this area and the mapping techniques proposed include Constraint-Driven Communication Synthesis [13], Branch and Bound Algorithms [6], SUNMAP, NMAP [10, 9], Genetic algorithms [1]. Although a lot of effort has been devoted to the design of performance and power optimal network on chips, to our knowledge this paper represents the first attempt to combine fault-tolerance at the network mapping and routing layers.
3.
A DESIGN METHODOLOGY FOR EFFICIENT FAULT-TOLERANT NETWORK ON CHIPS
We assume that a given application is decomposed into a set of tasks whose memory and functional requirements have been mapped onto a set of IPs (either processors, ASICs or memory blocks). Figure 1 shows an IP graph which describes the composition and the communication requirements of such set of IPs for a VOPD (Video Object Plane Decoder) decoder, which is a part of the MPEG4 decoding algorithm. IPs are connected together by communication channels which represent flow of information. The communication channels are characterized by a required communication bandwidth (shown as a label for each network edge).
The network on chip synthesis is carried out by selecting the target network on chip fabric topology and by mapping the IPs specified into the network on chip nodes (IP-to-node map). Moreover, the synthesis step identifies a routing policy for the packets, which is either static or dynamic. The fabric topology usually consists of a NoC Topology Graph, i.e., a directed graph connecting network nodes where each edge is denoted by the available bandwidth. Usually, the synthesis is performed by taking into account the actual estimated performance and power consumption of the overall system and the matching between required and offered bandwidth for each communication channel (quality
NoCArc 2009 — December 12, 2009, New York, New York, USA
70 vld
357
run le dec
16
iquan
idct
arm
362 362
27 353
inv scan
362
acdc pred
49
stripe mem
up samp
313
300
vop mem
pad
313 vop rec
94 500
Figure 1: Communication Graph of VOPD Application of service); in this process, a suitable set of Pareto solutions, associated to the execution time delay and the energy consumption, is identified and a decision-making criterion is applied for selecting the ’golden’ solution to be implemented.
3.1
Motivational considerations
We assume that the chosen network is able to adapt, to some extent, the routing strategy when a fault is detected. A network fault makes some of the topology links unavailable and it produces a change ∆(P) on the current NoC Topology Graph P, resulting into a new P ∗ . Assuming that the network is able to dynamically change the routing policy, the new P ∗ can, in principle, still be working and producing a sub-optimal power and delay figures of merit. Chances are, however, that the new P ∗ cannot be accommodated by any new routing policy without incurring in a deadlock. We define the latter state as a not-working state. The motivating question behind this paper is: is it possible to devise an IP mapping onto network nodes such that, given a probability distribution for ∆(P), the losses associated to power efficiency and performance of P ∗ are minimized as well as the probability of incurring into a deadlocked P ∗ ? In this paper, we will show that this problem can be solved by introducing a suitable methodology for identifying optimal IP-to-node maps exploiting some key concepts derived from Quality design and Robust design. We actually assume that the initial IP-graph as well as the target NoC topology fabric is fixed. In particular, in the experimental results, we concentrate our analysis on 2D mesh topologies but we want to stress that the proposed methodology is general and can be applied also to other fabrics. We finally assume that the dynamic routing policy follows the guidelines of deadlock-free minimum path routing as described by APSRA [12].
3.2
The assumed fault model
The fault model used in this paper is derived from the classification proposed in [3]. In particular, we concentrate our efforts on omission failures represented by the probability that a link in the connection fabric becomes unavailable due to physical reasons (manufacturing defects or permanent defects showing during the operating life of the chip). More formally we introduce the following definition of NoC
38
Topology Graph which includes the fault probability model assumed in this paper:
3 η(M, P ∗ ) 5 ∀P ∗ ∈ Π(P o ) 4 τ (M, P ∗ ) min M ∈M(P o ,G) prob(⊥ (M, P ∗ )) 2
(1)
Definition 1. The NoC Topology Graph is a directed graph P(U, F ) where U is the set of network nodes and F is the set of directed edges (ui ,uj ) representing an existing link between the network nodes ui ∈ U and uj ∈ U . Each edge fi,j ∈ F has a weight bwi,j which represents the bandwidth available across fi,j , and a probability πi,j of becoming unavailable due to manufacturing defects.
where η is the total power consumption of the system on chip, τ is the execution time delay and ⊥ is a predicate which is true whenever the combination of the actual mapping M and the current scenario P ∗ results into a network deadlock, when using a minimum path routing [12], or when the communication requirement are not satisfied.
As each link of the network is essentially a data-path composed by n bits, πi,j describes probability that at least one of the n data lines is unavailable on the link (ui ,uj ). As can be noted, we consider a spatially uncorrelated fault-model, i.e., a model in which a fault on a specific link is not-correlated with faults on nearby links. We plan to study spatial correlation on the fault-model in further research.
The previous formulation is a classical representation of a robust problem. According to [8], the robust optimization problem shown in Eq. 1 is over-constrained and often unsolvable. A set of approaches [5] aim at negotiating the Equation 1 with a less constrained problem by introducing the minimization of a reduced number of statistical moments of the probability distribution π(P ∗ ) with respect to the uncertainty introduced by the manufacturing process.
Starting from an original, non-faulty NoC topology graph P o , we define Π(P o ) as the set of NoC topology graphs P ∗ that can derive from each possible variation ∆(P o ). We define each element P ∗ as scenario. Each scenario P ∗ has a corresponding probability π(P ∗ ) of materializing itself, which is associated to the probabilities πi,j specified in the original graph P o .
3.3
The target optimization problem
In order to formally construct our target optimization problem, let us introduce the following two definitions: Definition 2. The IP Graph is a directed graph G(V, E) where V is the set of IPs belonging to the target System-onchip and E is the set of edges representing the communication between the IPs vi ∈ V and vj ∈ V . The weight of the edge ei,j ∈ E, denoted by commi,j , represents the bandwidth of the directed communication from vi to vj .
Definition 3. The IP-to-Node mapping function M : V → U is defined as the set IP-to-Node mappings (vi , uj ), representing the IP vi ∈ V mapped to the network node uj ∈ U . The set of possible mappings M(P, G) depends on a given network topology graph P and an IP graph G. In this paper we target the optimization of the power consumption and the application execution delay of our NoCbased system on chip. It is worth noting that the power and the delay of the target system, are a function of the actual mapping chosen M and the actual scenario or network topology P ∗ . As we noted, the actual network topology graph can be different from the original P o since some of the original links may be non-available due to manufacturing defects. Our initial problem is finding an optimal IP-to-Node mapping M such that the estimated power and delay are minimal for all the possible scenarios derived from the original P o . Moreover, we want to minimize the probability of deadlocked combination (M, P ∗ ):
NoCArc 2009 — December 12, 2009, New York, New York, USA
In this paper, we propose a formulation of the robust optimization problem in Eq. 1 which has the following goals: 1. Optimization of the mean value µ of the target system metrics, i.e., power and delay. The optimization is a multi-objective optimization which results into a Pareto front of mapping solutions H ⊂ M(P o , G). 2. Optimization of the variance σ 2 of the target system metrics. Generally, overestimating fluctuations of the target system metrics due to manufacturing defects, impacts the predictability of the system and the design effort. On the other hand, underestimating fluctuations impacts the manufacturing effort since the design phase is likely to be reiterated if the system does not meet the application constraints. 3. Maximization of the yield with respect to non-deadlocked combinations (M, P ∗ ). Obviously, the yield of the target chip when deployed in the real world should be maximized in order to maximize profits. The yield derived from a specific IP-to-node mapping M is formally defined as: Y (M ) = 1 − prob(⊥ (M, P ∗ )|π(P ∗ ))
(2)
In order to limit the dimensions of the final multi-objective optimization problem, we tackle the optimization of the mean and the variance (goals 1 and 2) by defining an aggregate quality performance measure Qy,S which is a monotonic function of the signal-to-noise ratio SNY for as-small-aspossible problems [2]. For a specific system metric y (either power η or delay τ ) and a set of N samples yi picked up by considering the possible set of non-deadlocked scenarios Π(P o ), the term Qy is defined as follows: Qy = 10
SNy 10
= “
1 N
1 PN
i=1
yi2
”
(3)
It can be proved that the defined performance metric is an aggregate quality measurement of mean µ and variance σ 2 [2]. In our methodology, the quality estimations of the figures of merit depend on the target IP-to-note mapping M
39
and they are actually performed by means of a stratified sampling of the scenario space Π(P o ). Stratified sampling is a particular kind of Monte Carlo simulation technique which takes into account the probability π(P ∗ ) that a specific scenario P ∗ can materialize. Given the previous definition, we reformulate our target problem as: 3 2 Qη(M ) 4 Qτ (M ) 5 (4) max M ∈M(P o ,G) Y (M ) The previous problem is a multi-objective mapping problem for which a Pareto set of mappings H should be found; an heuristic for this problem is presented in the following paragraph.
3.4
The optimization heuristic
The proposed optimization heuristics can be seen in Algorithm 1. The optimization heuristics aims at finding a single optimal mapping belonging to the Pareto-set of M with respect to the Qη(M ) , Qτ (M ) and Y (M ) objective functions. This is done by an initial screening for a differentiated set of mapping solutions in terms of yield (in order to avoid to be stuck into local maxima), followed by a neighborhood search for optimal mappings. More specifically, in the initial phase (steps 2-7) an iterative random design of experiments is used to refine an approximated estimate of the Pareto-set H. Practically, the algorithm generates batches of R random mappings trying to improve the coverage χ among successive evaluations of H (see [14] for a definition of Pareto-sets, dominance and coverage). The operator ψ filters a set of mappings by returning only the Pareto-set associated to the three objectives Qη(M ) , Qτ (M ) and Y (M ). After the initial phase, a k-means clustering algorithm (step 8) is applied to partition H into three (minimum, average and maximum) yield classes. From here on, the algorithm will try to climb from the three centroids found in order to find a set of better mappings. Practically, for each of the 3 clusters, we select the optimal centroids in terms of geometric average of the objective functions Φ(M ) = Qη(M ) ∗ Qτ (M ) ∗ Y (M ) (step 10). Each optimal mapping k0 is refined by a steepest-climb algorithm which employs a neighborhood search for the best Φ. The neighborhood is generated with the operator ν which considers a specified range ρ around the current configuration k0 (step 11). This operator orders the mapping relations of k0 = {(vi , uj )} and generates a set of new mappings N (k0 ) by flipping nearby mapping assignments within a distance ρ. The best k1 in N (k0 ) is then chosen as the next initial point for the iterative neighborhood search (starting at step 14). The search continues until the evaluation of f0 converges to a maximum. Finally, the optimal point derived from the three cluster centroids, {hmin , havg , hmax } are filtered to derive the final mapping maximizing the function Φ(M ).
4.
EXPERIMENTAL RESULTS
In this section, we show the experimental results obtained by applying the proposed methodology to the fault-tolerant
NoCArc 2009 — December 12, 2009, New York, New York, USA
Algorithm 1 Yield Enhancement by Robust Mapping Require: G(V, E),P o (U, F ),R,ρ Ensure: |U | ≥ |V | 1: H = { } 2: cov = ∞ 3: while cov > 0 do 4: HR = generate R random initial mappings from M(P o , G) 5: cov = χ(ψ(H ∪ HR ), H) 6: H = ψ(H ∪ HR ) 7: end while 8: C = {Cmin , Cavg , Cmax } = k-means clustering of H into 3 sets, considering Y (M ), ∀M ∈ H 9: for ∀Ci ∈ C do 10: k0 = arg max Φ(M ), ∀M ∈ Ci 11: N (k0 ) = ν(k0 , ρ) 12: k1 = arg max Φ(M ), ∀M ∈ N (k0 ) 13: f0 = Φ(k0 ), f1 = Φ(k1 ) 14: while f1 > f0 do 15: k0 = k1 , f0 = f1 16: N (k0 ) = ν(k0 , ρ). 17: k1 = arg max Φ(M ), ∀M ∈ N (k0 ) 18: f1 = Φ(k1 ) 19: end while 20: h i = k0 21: end for 22: return arg max Φ(M ), ∀M ∈ ({hmin , havg , hmax })
mapping of a VOPD decoder IP-graph to a 12 (4 × 3) mesh NoC topology. To test the optimization algorithm, we used a statistical model of the contention, the required application bandwidth and the available NoC bandwidth, to derive the power and latency figures of merit of the target NoC configurations. The power model has been characterized with a 90 nm standard cell library from STMicroelectronics (5% error w.r.t. gate-level model), while the latency model is derived with the simulation framework [11] (10% error for low-medium traffic levels w.r.t. a cycle accurate simulation); in particular, the router model is a 3-stage pipeline architecture implementing table-based routing and wormhole switching.
4.1
Monte-Carlo Estimation of Qualities and Yield
As said before we use Monte-Carlo stratified sampling to estimate qualities and yield of each configuration M . The sampling process generates a set of possible scenarios Π(P o )(M ) by using a probability of faulty link πi,j = 0.8% (corresponding to an overall link down probability of 20%). Each generated scenario P ∗ is tested for routability (i.e. absence of deadlocks) in order to update the estimated yield Y (M ) while the corresponding latency and power figures of merit are used to update the estimate of Qη(M ) and Qτ (M ) . The Monte-Carlo sampling process is an iterative process which tries to improve the precision of the estimate of the yield and the qualities figures of merit. The process uses sampling windows and iterates until no significant change on the estimated yield and qualities is observed between two successive windows. Each window is composed by 100 sce-
40
Figure 2: Convergence of the estimation of yield and qualities.
Figure 3: Probability of a non-working topology, given the number of links down/faulty.
Table 1: Comparison between the ’conventional’ and the ’robust’ solution Power quality Qη(M ) [mW−2 ] Latency quality Qτ (M ) [cycle−2 ] Yield Y (M )
Conventional 0.000538 0,0234 0.9650
Robust 0.000487 0,0259 0.9975
narios while the variation thresholds between windows have been set to 2.5% (a typical value for confidence intervals) for the qualities Q and 0.5% for the yield Y . Figure 2 shows a representative behavior of the estimates of yield Y and qualities Qs for each sampling window for a single mapping M of the target NoC. Values have been normalized with respect to their maximum and minimum samples.
4.2
Comparison with a conventional approach
In this section, we compare the robust solution found with the proposed approach to a mapping solution found with the SUNMAP [10] algorithm. The SUNMAP algorithm minimizes the overall bandwidth Quality of Service (QoS) of the NoC considering a non-faulty topology P o . We compare the experimental results of our approach by using the same VODP benchmark used in [10] and by comparing our solution with the best solution that was found in that paper for a 4×3 mesh topology. In the following paragraphs, we will label as ’conventional’ all the results associated with SUNMAP, while we will use the label ’robust’ to refer to the approach proposed in this paper. Table 1 reports the quality objective functions Qη(M ) and Qτ (M ) and the yield Y (M ) for the conventional solution and the robust solution. As expected, the overall yield given by the robust approach is higher than the conventional one. The yield loss of 3.5% of the conventional solution is very significant considering the possible economical loss associated to it. Concerning the quality values, we observe an effective improvement in terms of latency with respect to the conventional solution (10%) with a degradation of the power consumption within the same ratio ( 10%). Note that quality values are computed only for routable topologies derived from the original topology P o which are effectively less for
NoCArc 2009 — December 12, 2009, New York, New York, USA
Figure 4: Percentage variation of the robust solution from the conventional solution in terms of average power and delay. the conventional case (lower yield) than the robust case. Moreover, as will be seen in the next paragraphs, the conventional solution is characterized by a very good behavior in terms of power which is very hard to overcome. Figure 3 shows the actual probability of a non-working system, given that a specific number of links are down for the conventional and the robust approaches. Although the Monte-carlo sampling generated a set of topologies with a varying number of faulty links, we focus only on the trend on latency and power up to 2 faulty links (with an overall probability of 4% for the 2 faulty link case). We note that the way in which the mapping has been done influences greatly the resilience with respect to deadlocks in the NoC. While the robust solution is able to reach a yield close to 100%, the conventional approach falls down to 85% for 1 link down and 75% for 2 links down. In other words, the proposed methodology increases the overall fault-tolerance more than 10% and up to 20% for the most probable faulty scenarios. This means that, whenever one or more faults occur in the network, the conventional mapping, even by rerouting packets differently, has a 10% probability to incur into a deadlock. Figure 4 shows the behavior of the percentage difference be-
41
6.
Figure 5: Percentage variation of the robust solution from the conventional solution in terms of standard deviation on power and delay. tween the power and latency of the conventional and robust case. Negative percentage values indicate a smaller value for the robust case with respect to the conventional case. We note that the average and worst case latency, by varying the number of faulty links, are smaller for the robust approach. The power of the robust case is, however, higher. This is mainly due to how the conventional solution has been selected. In fact the SUNMAP algorithm minimizes the network bandwidth by giving priority to mapping highest communicating IPs to complex nodes. With the current power model, this translates into lower power with respect to the robust case. However, bandwidth decrease does not directly decrease the latency due to contention. Figure 5 shows the percentage difference in standard deviation of the power and the latency between the robust and conventional case. Also in this case, smaller values indicate a reduction in the standard deviation of the robust case. We can note that, except one case, the overall standard deviation is reduced. This fact is directly related to the use of Taguchi quality figures of merit within the robust approach. As final consideration of the experimental results we can note that the higher yield and reduced standard deviation derived with our approach confirm that, at the expense of some power increase, it is possible to find a mapping solution which is more resilient to deadlocks than conventional solutions, when faulty links are present. This positively answers the question posed in Section 3.1.
5.
CONCLUSIONS
In this paper we introduced an application specific methodology for identifying optimal network-on-chips (NoCs) mappings which minimizes the variance of the system power and latency and maximizes the probability that the actual system will work when deployed, even in presence of faulty NoC links. Experimental results have shown that the proposed methodology has evident benefits from the point of view of yield and standard deviation with respect to conventional mapping approaches, while minimizing losses on power and/or latency. Future research is directed towards the analysis and optimization of the influence of more complex probability distributions to the overall system yield and figures of merit variance.
NoCArc 2009 — December 12, 2009, New York, New York, USA
REFERENCES
[1] Giuseppe Ascia, Vincenzo Catania, and Maurizio Palesi. Multi-objective mapping for mesh-based noc architectures. In CODES+ISSS ’04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 182–187, New York, NY, USA, 2004. ACM. [2] G Box. Signal-to-noise ratios, performance criteria, and transformations. Technometrics, 30(1):1–17, Jan 1988. [3] Flaviu Cristian, Houtan Aghili, and Ray Strong. Clock synchronization in the presence of omission and performance failures, and processor joins. In Zhonghua Yang and T. Anthony Marsland, editors, Global States and Time in Distributed Systems, IEEE Computer Society Press. 1994. [4] W.J. Dally and H. Aoki. Deadlock-free adaptive routing in multicomputer networks using virtual channels. Parallel and Distributed Systems, IEEE Transactions on, 4(4):466–475, Apr 1993. [5] Kalyanmoy Deb and Himanshu Gupta. Introducing robustness in multi-objective optimization. Evol. Comput., 14(4):463–494, 2006. [6] Jingcao Hu and Radu Marculescu. Exploiting the routing flexibility for energy/performance aware mapping of regular noc architectures. In DATE ’03: Proceedings of the conference on Design, Automation and Test in Europe, page 10688, Washington, DC, USA, 2003. IEEE Computer Society. [7] Jingcao Hu and Radu Marculescu. Dyad - smart routing for networks-on-chip. dac, 00:260–263, 2004. [8] S Kugele, L Watson, and M Trosset. Interplay of numerical integration with gradient based optimization algorithms for robust design optimization. SoutheastCon, 2007. Proceedings. IEEE, pages 472–477, Jan 2007. [9] Srinivasan Murali, Luca Benini, and Giovanni De Micheli. Mapping and physical planning of networks-on-chip architectures with quality-of-service guarantees. In ASP-DAC ’05: Proceedings of the 2005 conference on Asia South Pacific design automation, pages 27–32, New York, NY, USA, 2005. ACM. [10] Srinivasan Murali and Giovanni De Micheli. Sunmap: A tool for automatic topology selection and generation for nocs. dac, 00:914–919, 2004. [11] G. Palermo and C. Silvano. PIRATE: A Framework for Power/Performance Exploration of Network-On-Chip Architectures. In Proc. of PATMOS, 2004. [12] Maurizio Palesi, Rickard Holsmark, Shashi Kumar, and Vincenzo Catania. A methodology for design of application specific deadlock-free routing algorithms for noc systems. In CODES+ISSS ’06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 142–147, New York, NY, USA, 2006. ACM. [13] Alessandro Pinto, Luca P. Carloni, and Alberto L. Sangiovanni-Vincentelli. Efficient synthesis of networks on chip. In ICCD ’03: Proceedings of the 21st International Conference on Computer Design, page 146, Washington, DC, USA, 2003. IEEE Computer Society. [14] Eckart Zitzler and Lothar Thiele. Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, 3(4):257–271, 1999.
42