A Multi-Criteria Estimation Tool for System-on-Chip

0 downloads 0 Views 215KB Size Report
exploration of big design spaces (as for embedded SoC) cannot be done ... ADRES with data split; others mapped on the 3 others with functional split. ... 1, no. 7, 2005. [Online]. Available: http://dare.uva.nl/record/221458. [2] Y. Moullec, J-P.
A Multi-Criteria Estimation Tool for System-on-Chip A. Richard, A. Vander Biest, A. Bartzas, A. Papanikolaou, D. Soudris, D. Milojevic, F. Robert [arichard,avdbiest,dmilojev,frrobert]@ulb.ac.be ; [alexis, antonis, dsoudris]@microlab.ntua.gr BEAMS-ULB – Universite Libre de Bruxelles – Belgium http://beams.ulb.ac.be/ NTUA – National Technical University of Athens – Greece http://www.microlab.ntua.gr

Abstract We have developed a multi-objective performance exploration tool to help SoC designers to take decisions at the early stages of a flow. We demonstrate it on a real time video decoding application.

1. Introduction Nowadays systems-on-chip (SoC) are characterized by increasing application complexity and multiple functional and non-functional constraints (energy, area, cycles, timeto-market,…). To achieve desired performances according to these constraints, designers have to take right decisions from the early stages of the design flow. While automatic tools do not exist at high abstraction levels, the efficient exploration of big design spaces (as for embedded SoC) cannot be done directly by the designer. Various performance prediction tools can be found in the literature. While state-of-the-art tools (SESAME[1], Design Trotter[2],…) generally focus a kind of platform (like MPSoC) based on one criterion (cycles or energy) and exploring only a part of the system (functionality, platform or no one), we have decided to develop our own tool, Nessie [3], more general and flexible based on a mapping core interfaced with a closed-formed based library [4]. This tool aims at guiding designers in taking decisions at the first stages of their flow to limit costly iterations. To demonstrate how Nessie is able to model a real system and explore quickly a given design space, we have simulated a case study [5] based on a H.264/AVC real-time video decoding application (studied for three different resolutions: CIF, 4CIF, HDTV) performed on the 3MF MPSoC platform developed in [5]. Section 2 explains shortly our framework; section 3 presents our case study and the related results. Section 4 concludes the paper.

2. A high-level prediction tool The tool is based on a hierarchical description of the functionality and the architecture of SoC and on an automatic mapping performed in three steps (scheduling, allocation and routing). The mapping is driven by a userdefined objective function used to optimize the allocation and the routing. To perform the criteria estimation (area, energy…), Nessie is interfaced with a closed-formed based models description and evaluation library developed during the research and enabling analytical and dynamic models building that brings a lot of flexibility to Nessie. The

functionality is described thanks to petri nets able to capture sequentiality, parallelism, control and data dependency. The platform is described in a netlist where component (computation, memory or interconnexion blocks) are associated to user-defined costs. The simulation tool evaluates each possible combination of the degrees of freedom defined by the user, in one shot, and reports the performance criteria values and the mapping process in output files. Nessie is a C++ framework (see figure 1) entirely interfaced by the mean of XML files. The input files are read and turned into a structure of C++ objects. The tool implements also a strict verification of the user input files relying on an XML schema grammar combined with run-time error management to make simulation initialization and running easier for the user.

Figure 1: Schema of the Framework We have decided to demonstrate the ability of Nessie to simulate and explore quickly a big design space on a real design problem offering several degrees of freedom. This case is presented in the next section.

3. A real case study The case study is taken from the paper [5] and is based on the mapping of a real time video decoding application running on a 3MF MPSoC platform. For three application scenarios and three resolutions (HDTV, 4CIF, CIF), we focus on the power consumption of the communication infrastructure interconnecting six ADRES processors [6] for the computation, a ARM processor for the control, two data memories (L2D1, L2D2), two instructions memories (L2Is1, L2Is2), a FIFO (buffer for the data streams) and an external memory interface (EMIF). These nodes are interconnected thanks to two Arteris NoC’s: for the datapath (2x2 mesh topology) and for the instructions path (1 router with 6 inputs, 2 outputs). Furthermore, we have extended the problem with the 3D stacking paradigm which aimed at reducing the wire length, then the delays and the power consumption. This increases considerably the design

space and necessitates a tool such Nessie. For the simulation, ten 3D architecture variants have been chosen by placing differently the nodes on 2 or 3 layers. These have been described in one XML file. In the same file, we have represented three Petri Nets (for the three application scenarios) for the different resolutions. The application scenarios are the following : - Data split: video streams equally divided on the 6 ADRES what puts the stress on the instruction NoC. - Functional split: distributes the functionality over the 6 ADRES what increases the flow on the data NoC. - Hybrid: heaviest computational task mapped on 3 ADRES with data split; others mapped on the 3 others with functional split. The power consumption of the NoC has been estimated during the mapping process thanks to a power model (cfr. paper) included in the closed-formed based library.

The selected logical prototypes of the final system implementation produced during the high-level exploration step have been validated using a C++ transaction level NoC simulator and an industrial physical design suite. Namely, an extended version of the WormSim NoC simulator has been used to simulate the traffic on the network and to generate the transfer traces between the various IP blocks and routers. In the next step, a floor-plan and a global routing of the entire 3D system is performed using a novel tool flow for 3D physical prototyping based on engines of the Cadence SoC Encounter tool suite where performance metrics, NoC clock frequency and power consumption, are calculated based on the resulting physical prototype. Furthermore, we can notice that the Nessie simulation time for the ten variants, the three application scenarios for three resolutions takes only few seconds. On the contrary, to obtain the performance metrics in the real flow, we need one hour per variant, where floorplan is the more time consuming and traffic NoC simulation is on average 10 minutes.

4. Conclusion

Figure 2: Wire power consumption (HDTV)

With the video decoding case study, we have seen the ability of Nessie to explore quickly a design space and then give the designer some inputs to take decisions which are sometimes not intuitive. Moreover, by comparing these results to those obtained by the real flow, we are able to illustrate the correct behaviour of the framework.

5. References

Figure 1: Wire power consumption (CIF) The results for resolution HDTV and CIF are respectively shown in figures 2 and 3. These are compared to the 2D original layout results based on the reference case. It shows that 3D stacking enables to reduce wire power consumption (from 32.8% to 55.3%) but that some architectures are really not efficient (4, 9, 10). Indeed on these architectures, EMIF and FIFO are on the second layer. While these blocks represent only 5% of the data traffic, it is not relevant for the data split scenario. The simulation shows also that depending on the resolution, the best application scenario differs (from hybrid for HDTV to functional split for CIF). Indeed, the functional split reduces the instruction bandwidth which is more critical when the resolution decreases because the instruction/data ratio increases.

[1] A. Pimentel, “The Artemis workbench for system-level performance evaluation of embedded system architectures at multiple abstraction levels,” International Journal of Embedded Systems, vol. 1, no. 7, 2005. [Online]. Available: http://dare.uva.nl/record/221458 [2] Y. Moullec, J-P. Diguet, N. B. Amor, T. Gourdeaux, and J-L. Philippe, “Algorithmic-level specification and characterization of embedded multimedia applications with design trotter”, J. VLSI Signal Process. Syst., vol. 42, no. 2, pp. 185–208, 2006. [3] A. Vander Biest, A. Richard, D. Milojevic and F. Robert, “A Multi-Objective and hierarchical exploration tool for SoC performance estimation”, LNCS, vol. 5114, pp. 85-95, 2008. [4] A. Vander Biest, A. Richard, D. Milojevic and F. Robert, “A Framework introducing model reversibility in SoC design Space Exploration”, LNCS, vol. 4599, pp. 211221, 2007. [5] D. Milojevic, L. Montperrus and D. Verkest, “Power Dissipation of the Network-on-Chip in Multi-Processor System-on-Chip Dedicated for Video Coding Applications”, Journal of signal Processing Systems, 2008. [6] B. Mei, S. Vernalde, D. Verkest, H. De Man and R. Lauwereins, “ADRES: an architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix”, in FPL, 2003, pp. 61-70.

Suggest Documents