SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping Sang-Heon Lee, Jae-Gon Lee, Seonpil Kim, Woong Hwangbo, Chong-Min Kyung P
Electrical Engineering Department, KAIST, Daejeon, Korea P
[shlee, jglee, spkim, woonghb]@vslab.kaist.ac.kr
[email protected] communication overhead. For this, bus functional Abstract – It is important in SoC design that the design model(BFM), which takes higher abstract command and and verification can be done easily and quickly. And RTgenerates pin and cycle accurate signals, is implemented in level simulation in verification methods is still necessary, the FPGA. but the usage is limited by its slow speed. Therefore we The number of IPs in a SOC design is increasing, and the propose a SoC verification environment in which bus architecture is getting complicated having hierarchy. hardware parts are accelerated in FPGA and cores are The IP reuse became a common sense and the reusable IP modeled with ISS. To connect ISS in high abstraction libraries are getting wealthy. Accordingly though the level with emulator in pin-level accuracy, bus functional required IPs are various, most of them would be already model(BFM) is used. For hardware debugging, bus available. Only a small part of the whole SOC design monitor is designed. By post-processing the data obtained needs to be newly created. For these reasons, designing by bus monitoring, debugging and performance the on-chip bus architecture becomes a relatively big job. estimation are possible. For easy and quick design and In many case the bus protocol would be fixed in the very verification, we developed a tool which creates early design time, automatic bus generation can make the configurable bus architectures automatically. With this, SOC design and verification easy and speedy. Thus we the design time from specification to FPGA based developed a tool which generates bus architecture in fast prototyping can be reduced remarkably. Thus fast and easy way. This work can reduce the design time verification and design space exploration are possible. considerably, and make design space exploration for the AMBA is chosen as the SoC bus protocol. bus architecture possible. In this paper, the idea is realized Keywords: SoC, BFM, AMBA, prototype, bus on FPGA based prototype system[4]. The hardware generation prototype board is mounted on a PCI slot of the host 1 Introduction computer. And thus communication between hardware and software takes place through PCI interface. For the onFor the system-on-chip(SOC) design, many high chip bus protocol, AMBA has chosen. abstraction level design methodologies, for example, using SystemC[1] are introduced and being developed. But RTThis paper organized as follows. The related works are level test and verification is still important step in SOC explained in section 2. In section 3, we explain design flow. For this, co-simulation has been used, which prototyping and debugging methodology in the proposed is generally based on ISS and HDL simulator. Coenvironment. In section 4, a tool which generates on-chip simulation methodologies provide accurate verification, bus architecture automatically is depicted. In section 5, we but their performance is limited. As the designs are getting show a case study on JPEG. And in section 6, we give bigger, especially in the case of SoC, the HDL simulation conclusions. can not afford sufficient test. So the prototyping which is based on FPGA is used as an alternative methodology. In 2 Related works the prototyping system, synthesized hardware designs are There are various prototyping systems. In [6], the comapped into FPGA to accelerate the operating clock speed. verification system consists of VLIW processor and FPGA board. The software component is an ISS of the core Of course, co-verification must be able to support running on the VLIW processor. And BFM is used to concurrent development of software application which will make pin signal. In [7], for design space exploration be executed on an embedded core[3]. In some prototyping simulation based method is used. And after that whole systems, real chip is used to model core. But the cores are system is mapped in prototyping system where special not always available in test chip form, so ISS are used to DSP is implemented in FPGA. ARM offers Integrator model various cores. In our SoC design environment, the family[2] to provide the developer with a rapid core is modeled in host computer with ISS and the prototyping environment that enables the integration of hardware IPs are mapped into FPGA. To improve hardware and software IP. It consists of ARM test chipperformance, the communication between ISS and based Core Modules, and FPGA-based Logic Modules. hardware takes place in transaction level to reduce the
But it mainly aims to assist developing software so that it does not concern about hardware debugging.
supports all features of the AHB specification. In C algorithm, AMBA API is used to communicate with BFM.
For the automatic bus generation, Synopsys DesignWare AMBA On-chip bus[4] provides synthesizable and configurable bus components. But this package does not provide a way how to interact with cores. And the generated components are configured in hardwired manner, so even for small changes the hardware must be regenerated, re-synthesized and re-compiled to the FPGA. In our environment, every component has configuration registers. So HW characteristics such as address map, priority, clock frequencies can be changed immediately.
3.2.2 Debugging In the Prototype system, the debuggability is generally lacking because of the poor probing. The logic analyzer could help this problem but it is troublesome work and the channel bandwidth is still insufficient. So we designed AMBA monitor which is a pin-accurate and cycle-accurate debugger for AMBA development environment.
3
Proposed SoC design environment
3.1 Co-simulation For Co-simualtion, the IPC library of the emulator[4] is used. The software application in C/C++ can run in native code or in cross complied code on ISS. The hardware blocks including generated bus components in the dotted box and application specific IPs are run in HDL simulator. Two processes run during co-simulation. The first is the process of C algorithm where APIs of the emulator are interacting with each other to generate stimulus to HDL simulator. The second is the process of HDL simulator where the HDL simulator executes model of bus components and IPs. 3.2 Prototype Because bus generator offers synthesized bus components too, you can go to the prototyping step easily and quickly. In the FPGA based prototyping system used in this paper, the system clock is based on the PCI clock which is 33MHz or 66MHz. The cycles per second of emulation system is hundreds times bigger than that of simulation. So the exhaustive verification or design space exploration is possible. 3.2.1 Bus functional model BFM(Bus Functional Model) enables C algorithm to communicate with hardware IPs in transaction level. BFM gets high abstraction level commands from C algorithm and interprets them to make pin and cycle accurate transaction to AHB bus. The one side of BFM is PCI controller interface of emulator and the other side is AHB interface. The major role of BFM is AHB master. But it has AHB slave interface also to utilize BFM as AHB slave or both. BFM is AMBA AHB rev 2.0 compliant and
AMBA monitor is composed of hardware and software parts. The hardware part of AMBA monitor samples the AHB and APB signal values at every clock cycle respectively and sends them to the software part. There are several triggering conditions which define bus activities. With triggering conditions, the hardware part of monitor starts working when a predefined triggering condition is met, and stops when stopping triggering condition is met. The software part of monitor stores bus sampling information in a file. When the output file size reaches 1GBytes, it closes the file and opens a new file. One can debug with waveform viewer. Statistics of bus activity, coverage testing and protocol violence checking could be obtained through post processing with the dump files. This information can help performance measuring and bus architecture determination 3.2.3 Synchronization We have applied a simple scheme to synchronize the cycle count between SW and HW. Figure 3 shows clock flow. HW holds clock when there is no request. When SW needs to access HW, it sends not only the request but also the cycle count between requests. BFM can enable or disable the clock generator. Bus Activity
Software Algorithm
AMBA API
Configuration Commands
Bus Commands
Host Computer (C/C++ compiler or ISS)
Bus Monitor Clock Generator
APB bridge
Configuration module Bus Functional Model
APB IP 1 APB IP 2 AHB IP 1
AHB Bus
AHB IP 2 AHB IP 3
PCI Channel FPGA
Figure 2 Co-emulation system
Figure 3 Cycle flow between SW and HW Figure 1 Co-simulation system
When the BFM gets a request, it first enables the clock generator during N cycle, and next manages the request. The number of cycles consumed for the request is returned to SW, so that the total cycle count for an application can be obtained by summing all the cycle counts. For the requirements described above, we made a tool which generates bus architecture automatically from the bus specification. Figure 4 shows design flow using the tool. Bus generator produces two kinds of bus models from user bus specification, using AMBA bus component library which is described in section 4.1. One of the bus models is for simulation in HDL. The other bus model is for emulation in EDIF format.
configurable. Decoder has two sets of address map, normal and boot. Each address can be configured. Active address set can be switched during runtime by setting address mode control bit. Decoder includes default slave. There are two muxes, master-to-slave mux and slave-tomaster mux. These muxes can support 16 masters and 16 slaves respectively. APB bridge has two sets of address map, normal and boot. And APB bridge includes asynchronous FIFO. So the clock of APB bus could be independent of that of AHB. It supports 16 APB slaves. AHB-to-AHB bridge connects two AHB buses and enables constructing hierarchical bus architecture. It has asynchronous FIFO, so clocks of the two AHB can be independent. It supports all burst mode of master and responses of slave
Bus generator takes bus specification via graphical user interface and connects automatically all components required. From the specification to the bus models which are ready to be used for simulation or emulation, it takes several minutes only. So the time and effort which are required for complicated SOC design process can go down very much. And also when the bus architecture is not fixed, the design space exploration can be done in accurate and fast way. The bus architecture may include bus hierarchy, arbitration scheme, memory map, clock speed and so on. Because the design iteration time is quite small, such design factors can be decided based on the results of through simulation or emulation.
And there are three special function blocks, BFM, bus configuration module and clock generator. Bus functional model(BFM) is used to communicate with software side in transaction level. As described above, almost all bus components are configurable and bus configuration module does the work. This module takes configuration information from software and distributes it to configurable components. Clock generator is connected to the configuration module to get configuration information. Based on the information, it produces up to 16 different clocks and 4 resets. The frequency of the fastest clock is same with PCI clock. The slowest is slow 128 times than PCI clock. The phases of each clock are controllable.
4.1 Reconfigurable bus components There are several components in AMBA bus system. Figure 5 illustrates AMBA bus example. The gray blocks are application specific user IPs. There are two kinds of components, the basic AMBA bus blocks and special function blocks. All components are configurable with parameters.
4.2 Bus architecture Figure 6(a) shows example AMBA based SoC bus which has three hierarchically organized AHB buses. The system can be redrawn as figure 6(b). The connection between bus and master/slave IPs can be replaced with switch box. So to make the connection configurable, the switch box should be implemented with reasonable resources.
Arbiter, decoder, Muxes, APB bridge, AHB-to-AHB bridges are the basic blocks of AMBA bus. Arbiter does priority based arbitration and support up to 16 masters. The priorities of masters and default master are
The proposed method is depicted in figure 7. In this scheme, master-to-slave mux in AHB bus is merged with switch box. So, for each AHB bus, only one mux is enough. But it requires relatively complex mux controller which takes not only configuration information but also HMASTER and HREADY as input. HMASTER is arbiter output which indicates which master has the bus ownership now.
4
Automated bus generation
Figure 4 Design flow with Bus generator
Figure 5 Bus architecture example
Figure 8 JPEG Decoding system
master takes charge of communication between algorithm and hardware. VLD and IDCT have master and slave interfaces respectively. So there are three masters and three slaves in the system. Configuration module and clock generator are omitted in the figure.
Figure 6 Example SoC: (a) hierarchical bus structure and (b) reconstructed bus
The signals which are not the inputs of master-to-slave mux, such as HBUSREQ and HLOCK, needs another switch box. It would be implemented with three stage clos network. In this scheme, the resource overhead is like below. < N × 1 > ×a + α , where α is the resource for clos network means N by one mux which covers all output pins of master. Note that this includes the master-to-slave mux of AHB. When the number of AHB(a) is 3, the number of mater IP(N) is 20; a switch box requires 5010 LUTs which is 5.3% of the total LUTs of Xilinx VirtexII 8000. The target synthesis library used is FPGA, because the reconfigurable bus would be used in the FPGA based prototyping system. The mux for inputs of master is counted and clos network for HBUSREQ and HLOCK is counted.
5
Case study
We applied the proposed environment to JPEG decoding system. Figure 8 shows the system. Header management part is implemented in C algorithm in the host side. The other parts such as VLD(Variable Length Decode), IDCT(Inverse Discrete Cosine Transform) and memory are implemented in hardware. BFM which operates as
Figure 7 Merged-Mux based switch box
Using bus generation tool, the bus system could be created in a few minutes. And the next step is connecting the bus with IPs such VLD, IDCT and memory. It takes also just a few minutes. To map the design into FPGA, FPGA compilation is needed. The compile time depends on the FPGA device and the operating system, generally about one or two hours. So when the required IPs are available, a few hours are enough to co-emulate from specification. The JPEG hardware ran at 33MHz clock frequency.
6
Conclusions
We introduced a SoC design environment with FPGA based emulation system. For debuggability, we have designed AMBA monitor which samples all bus activities in pin and cycle accurate way. To improve the SOC design flow, we have developed automatic bus generation tool. In that environment, bus architecture can be generated from the bus specification in an easy and quick way with configurable bus components library. Using this, complicate SOC design processes from the specification to the prototyping system can be done in very small amount of time and effort.
References [1] Benini, L., “Virtual In-Circuit Emulation for Timing Accurate System Prototyping”, ASIC/SOC Conference, pp. 49-53, Sept 2002. [2] Schaumont, P., “Interactive Cosimulation with Partial Evaluation”, Design Automation and Test in Europe Conference and Exhibition, pp. 642-647, Feb 2004. [3] http://www.arm.com/products/DevTools/IntegratorA P.html [4] http://www.dynalith.com/2003/iprove.php [5] http://www.synopsys.com/products/designware/dwlib rary.html [6] Schnerr, J., “Instruction Set Emulation for Rapid Prototyping of SoCs”, Design Automation and Test in Europe Conference and Exhibition, pp. 56-567, Mar 2003. [7] Bieger, J. “Rapid Prototyping for Configurable System-on-a-Chip Platforms: A Simulation Based Approach”, International Conference on VLSI Design, pp 577-582, Jan 2004. [8] ARM. AMBA specification Rev 2.0