Journal of Systems Architecture 58 (2012) 99–111
Contents lists available at SciVerse ScienceDirect
Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc
On the interfacing between QEMU and SystemC for virtual platform construction: Using DMA as a case Tse-Chen Yeh, Ming-Chao Chiang ⇑ Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan, ROC
a r t i c l e
i n f o
Article history: Received 29 July 2010 Received in revised form 3 November 2010 Accepted 8 February 2012 Available online 3 March 2012 Keywords: QEMU SystemC ESL SoC Hardware modeling DMA OS Device driver
a b s t r a c t In this paper, we present an interface for the hardware modeled in SystemC to access those modeled in QEMU on a QEMU and SystemC-based virtual platform. By using QEMU as the instruction-accurate instruction set simulator (IA-ISS) and its capability to run a full-fledged operating system such as Linux, the virtual platform with the proposed interface can be used to facilitate the co-design of hardware models and device drivers at the early stage of Electronic System Level (ESL) design flow. In other words, by using such a virtual platform, the hardware models and associated device drivers can be cross verified while they are being developed so that malfunctions in the hardware models or the device drivers can be easily detected. Moreover, the virtual platform with the proposed interface is capable of providing statistics of instructions executed, memory accessed, and I/O performed at the instruction-accurate level—thus not only making it easy to evaluate the performance of the hardware models but also making it possible for design space exploration. Ó 2012 Elsevier B.V. All rights reserved.
1. Introduction To deal with the increasing complexity of System-on-Chip (SoC), the hardware/software co-simulation based on virtual platforms has become a popular approach for the Electronic System Level (ESL) design flow. In [18], three approaches to modeling the processor of a virtual platform are addressed: hardware description language (HDL), instruction set simulator (ISS), and formal. In general, the simulation speed of the HDL approach is far slower than that of the ISS and formal approach. The formal approach, which uses ‘‘compiled simulation’’ to simulate software statically, is always faster than the ISS approach, which uses ‘‘interpretive simulation’’ to simulate software dynamically. In terms of the simulation speed, although the formal approach such as LISA [20,21,16] is the fastest, the ISS approach such as QEMU–SystemC [28],1 which is capable of booting up a full-fledged Linux kernel in about 11 s, is generally fast enough to be acceptable by system architects and software designers. Another simulation framework [31,24] that combines QEMU–SystemC with CoWare’s Platform Architect was proposed in 2009. Although the authors claim that they implement the
⇑ Corresponding author. Tel.: +886 7 5252000x4321; fax: +886 7 5254301. E-mail address:
[email protected] (M.-C. Chiang). Throughout this paper, we will use QEMU–SystemC to refer to the virtual platform proposed in [28]. 1
1383-7621/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2012.02.002
so-called local master interface to access the host memory, no details whatsoever are provided. Fig. 1 shows the main differences in building a co-simulation environment on the ISS and virtual machine (VM) based virtual platforms. For the ISS-based virtual platform shown in Fig. 1(a), a lot of hardware and interconnect models need to be built so as to be able to run a full-fledged operating system (OS). Although it is not trivial to adapt the hardware models and the system functionalities to fit an unmodified OS, the accessibility between the hardware models can be retained if they are all implemented in a single language, say, SystemC. On the other hand, for the VM-based virtual platform given in Fig. 1(b), almost all the necessary hardware models required to run an OS are contained in the virtual platform; thus, all we have to do is to extract the information from the processor model in the virtual machine to make it behave as an ISS [32]. Some ISSs have a predefined interface for connecting the memory and peripheral models. A good example is ARMulator [7]. Others, created from VM such as QEMU [10], provide no documented interface for interfacing with the external peripheral models because they were designed to mimic a physical machine, which is capable of running a full-fledged system instead of just for simulation. A good example is the virtual platform described herein. One of the problems with such a virtual platform is that the hardware models on the virtual platform are spread out to either QEMU or SystemC. The consequence is that it is impossible
100
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
OS porting virtual platform ISS
virtual platform
ISS interconnect and peripheral models
ISS interconnect and peripheral models
(a) OS porting
OS porting
virtual machine
virtual machine
virtual platform
virtual platform
processor model interconnect and peripheral models
processor model interconnect and peripheral models
extended virtual platform ISS wrapper
external interconnect and infrastructure peripheral models interface
(b) Fig. 1. The flow of building a co-simulation environment on the ISS and VM-based virtual platforms. (a) Conventional ISS-based virtual platform and (b) VM-based virtual platform. The shaded parts indicate the portions of the virtual platform that need to be built.
for models in SystemC to access models in QEMU without a core interface as described herein. 1.1. Motivation of the work In order to make the combination of QEMU and SystemC capable of accessing all the hardware modeled in QEMU and SystemC for hardware/software co-simulation, SystemC requires that QEMU provide the interface for memory access, I/O operations initiated by the processor, and interrupt handling as well as for peripherals to access memory directly. Proposed in 2007, QEMU–SystemC was successful in exporting the I/O interface for virtual hardware devices modeled in SystemC; however, the I/O interface provided by QEMU–SystemC is only capable of simulating the operations of slave devices accessed by the processor model. In other words, the I/O interface provided by QEMU–SystemC is incapable of modeling master devices, which need to access other slave devices. To overcome such a limitation, we propose a much more generic interface for connecting master and slave ports of hardware devices modeled in SystemC to QEMU. To make the idea more concrete, we use Direct Memory Access Controller (DMAC) modeled in SystemC as an example to illustrate how the proposed interface works. 1.2. Contribution of the paper The main contributions of the paper are threefold: 1. We proposed an interface for connecting master/slave ports of hardware devices modeled in SystemC to QEMU, which overcomes the limitations of QEMU–SystemC. 2. The virtual platform2 can facilitate the co-design of hardware models and device drivers at the early stage of ESL design flow, even before the hardware platform is available. It can even be used to co-verify the correctness of the hardware models and the associated device drivers under development. 3. The virtual platform is capable of providing the statistics of booting up a full-fledged Linux kernel and handling the data movement using DMAC. In other words, the virtual platform can even be used to benchmark the performance of attached hardware. 2
Since there is no confusion possible, we will use ‘‘the virtual platform’’ to refer to ‘‘the virtual platform with the proposed interface’’ throughout the paper.
1.3. Organization of the paper The remainder of the paper is organized as follows. The related work is given in Section 2. The proposed interface is presented in Section 3. The experimental results are summarized in Section 4. Section 5 concludes the work. 2. Related work In this section, we begin with a brief introduction to SystemC and the hardware emulation features the original version of QEMU provides, which to the best of our knowledge are not described in any published documents of QEMU. Next comes a brief description of QEMU–SystemC, which is the first virtual platform based on QEMU and SystemC. Then comes a brief description of a virtual platform that combines the enhanced QEMU–SystemC wrapper with CoWare’s Platform Architect. After that, we introduce the ISS based on QEMU and SystemC, which is used as a processor model in the virtual platform we proposed. Finally, we compare the simulation speed of several known ISS-based virtual platforms. 2.1. SystemC SystemC is an ANSI standard C++class library developed by Open SystemC Initiative (OSCI) [29] in 1999 and approved as IEEE standard in 2005 [22]. Due to the requirements of abstraction at different levels of detail, it has become one of the most popular modeling languages in the ESL design flow [4]. Because SystemC can simulate concurrency, events, and signals of hardware, the abstraction of the hardware model can be achieved at the transaction level without the need of considering details down to the signal level [17,13]. From the perspective of ESL design flow, a platform-based design together with SystemC can satisfy the requirements of hardware/software partitioning, post-partition analysis, and verification using transaction level modeling (TLM) and/or register transfer level (RTL) modeling [9]. In this paper, SystemC is the language of choice for modeling the hardware features such as clock and concurrency that C cannot model. 2.2. Hardware emulation features of QEMU QEMU provides two execution modes: the user mode and the system mode [10]. The user mode is provided to execute programs directly. The system mode is provided to execute an OS of the
101
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
target CPU with a software memory management unit (MMU). Since our goal is to simulate a full-fledged system, we will focus on the system mode with a software MMU. The way the load and store instructions of the target CPU access the memory depends on how the virtual address of the target OS is mapped to the virtual address of the host processor. As for the slave ports of I/O, QEMU predefines a set of callback functions in C to act as the slave I/O interface, which can be used to model the virtual hardware devices for the virtual platform of QEMU. Hardware devices with master ports may access the other peripherals or memory areas directly. Because the memory of QEMU is managed by a software MMU, the most convenient way to access it is thus to utilize the memory access functions defined by QEMU. In fact, these memory access functions have been applied to a variety of virtual platforms provided by QEMU. In addition, most of the hardware interrupt sources will be connected to the virtual interrupt controller modeled by the I/O interface of QEMU. Then, the virtual interrupt controller will ultimately be connected to the virtual CPU, which will in turn call a specific function asynchronously to inform the CPU main loop of QEMU that an interrupt is pending. 2.3. QEMU–SystemC QEMU–SystemC [28] is an open source software/hardware emulation framework for the SoC development. It allows devices to be inserted into specific addresses of QEMU and communicates by means of the PCI/AMBA bus interface as shown in Fig. 2. The bus interface was upgraded to the TLM-2.0 interface [8] appending version [27] in 2009. Although the waveform of the AMBA on-bus chip of the QEMU– SystemC framework can be used to trace the access of slave device, no information about the processor and master device is available for the virtual platform. Because the TLM-2.0 appending version only changes the interface for modeling the bus connection, the same problem exists as the PCI/AMBA version does. For instance, the instructions executed, the memory accessed, and so on, which can be valuable to the system designers, are unfortunately not provided.
2.5. QEMU and SystemC-based ISS QEMU is in essence an instruction-accurate virtual machine (IA-VM); however, the instructions executed are only available for debugging offline. Fortunately, by leveraging the strengths of QEMU and SystemC, our implementation shows that this problem can be easily solved by converting QEMU from an IA-VM to an instruction-accurate instruction set simulator (IA-ISS). In practice, the performance of the ISS, no matter that our concern is latency or bandwidth, depends, to a certain degree, on the IPC of the host operating system in use. As a consequence, the experimental results will vary from system to system [23]. The socket-based IPC mechanism allows QEMU and SystemC to be executed on different hosts whereas the pipe-based IPC mechanism and the shared memory mechanism only allow co-simulation on the identical host. No matter which approach is adopted, the context switches between QEMU and SystemC are unavoidable unless QEMU and SystemC are implemented as a single thread running in a single process. This, however, is too restricted. Thus, we adopt the approach of implementing QEMU and SystemC as two threads in a process since context switches between threads are generally much faster than between processes. Moreover, as far as the paper is concerned, the shared memory mechanism is designed and used as a unidirectional FIFO between QEMU and SystemC, as shown in Fig. 4. In other words, the communication between QEMU and SystemC is one way so that the relative order of the instructions executed, the memory accessed, and the I/O write operations performed is retained by the packet receiver within both the ISS wrapper and infrastructure interface. Because the interface can simulate different bus transactions by using information in the received packets, it can be used to build different Bus Functional Models (BFMs). In addition, the synchronization between QEMU and SystemC is only needed by the I/O read operations, which can be achieved by having QEMU call the I/O read function—which will pass the pointer to the data to be read to SystemC—and then block until SystemC returns. Furthermore, it is the infrastructure interface that is discussed in the paper. The details will be given in Section 3.
2.4. Co-simulating QEMU–SystemC with CoWare QEMU
Another framework [31,24] that combines the enhanced QEMU–SystemC wrapper with CoWare’s Platform Architect [3] is shown in Fig. 3. The QEMU–SystemC wrapper communicates with the CoWare-SystemC wrapper by using the inter-process communication (IPC) socket interface. This framework utilizes the bus models provided by off-the-shelf Model Library [2], which supports lots of capabilities of profiling and analysis. However, no details whatsoever about the CoWare-SystemC wrapper they proposed are provided.
ARM virtual platform QEMU-SystemC wrapper socket interface
socket interface CoWare-SystemC wrapper VM access port M
Application Linux
S
S
on-chip bus
Device driver M
QEMU PCI/AMBA interface
interrupt controller
PCI/AMBA to SystemC bridge
SystemC module
S
S
hardware model
memory model
CoWare PA Fig. 2. The block diagram of QEMU–SystemC [28]. The functional descriptions of PCI/AMBA interface and PCI/AMBA to SystemC bridge in the block diagram are different from those in the original paper but identical from the implementation perspective.
Fig. 3. The block diagram of the framework that combines QEMU–SystemC with CoWare’s Platform Architect. Note that ‘‘M’’ and ‘‘S’’ denote, respectively, the master and slave ports connected to the on-chip bus model.
102
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
QEMU
SystemC
virtual platform
ISS wrapper & infrastructure interface
processor model
unidirectional FIFO
memory access interface
packet receiver
bus functional model
memory-mapped I/O interface hardware model thread
thread process
Fig. 4. The IPC mechanism used by the ISS wrapper and infrastructure interface described herein.
2.6. Simulation speed of different virtual platforms As described in [26], the simulation of the functional model at the instruction-accurate level can be made 1000 to 100,000 times faster than the full cycle-accurate RTL simulation. Most of the processor-based platforms need to take into account the instruction simulation techniques of the selected ISS, and most of the fastest ISSs using interpretive simulation use the dynamic binary translation to increase their simulation speed. Table 1 compares the simulation speed of several MPSoC/SoC based virtual frameworks proposed by academic units and commercial sectors. In Table 1, the row labeled ‘‘QSC2’’ refers to the QEMU and SystemC-based framework we proposed. The column labeled ‘‘Instruction-Accurate ISS’’ refers to the simulation speed of the instruction-accurate ISS, while the column labeled ‘‘OS Simulated’’ indicates that the simulation efficiency is gathered by simulating the indicated OS on the virtual platform. The numbers given in the sub-columns labeled ‘‘w/o trace’’ and ‘‘with trace’’ of the column labeled ‘‘instructions/s’’ of Table 1 give the simulation speed with the capability of instruction trace turned off and the simulation speed with the capability of instruction trace turned on. The column labeled ‘‘transactions/s’’ of Table 1 adds the number of memory access comparing to the column labeled ‘‘instructions/s.’’ It can be easily seen from Table 1 that QSC2 without trace is only slower than RealView [16] whereas QSC2 with trace is only faster than Benini et al. [12]. This is expected because QSC2 provides much more information about all the instructions executed, all the memory accessed, and even all the I/O operations performed.
Also shown in Table 1, most of the platforms provide the simulation efficiency without having OS run on the virtual platform except RealView, Simics, Mambo, and QSC2. As described in Table 1, the row labeled ‘‘RealView’’ indicates that the RealView Real Time System Model for the ARM1176JZ (F)-S processor can simulate a Linux boot at more than 100 MIPS [16]. Furthermore, the simulation speed of LISA with static scheduling [19] is several order of magnitude faster than LISA with dynamic scheduling [19]. Although Simics can be used to boot up a variety of OSs such as QEMU, the simulation efficiency of booting up Linux is in the range of 3.2– 9.3 M instructions per second. Also shown in Table 1, ARMulator can provide simulation efficiency of 2 M instructions per second at the instruction-accurate level [19]. It is important to note that all the statistics given in Table 1 are calculated based on the assumption that only one processor is used for all the platforms. 3. Interfacing with attached hardware Before we turn our discussion to the proposed interface, we will first look at the virtual platform of QEMU, as shown in Fig. 5(a). Basically, the virtual platform of QEMU is made up of the processor model, the software MMU, the memory and memory-mapped I/O models, which are managed by the software MMU. Moreover, the mechanism for cascading the interrupts is used to cascade the downstream interrupt-driven hardware models to the topmost one, i.e., the interrupt signals of the processor model. As the block diagram of QEMU–SystemC in Fig. 5(b) shows, the interface of the external memory-mapped I/O and the upwardsending interrupt mechanism provides the fundamental capability to attach ‘‘simple’’ hardware models written in SystemC. However,
Table 1 Comparison of the simulation speed of several MPSoC/SoC based virtual platforms with one processor. Simulation technique
Compiled simulation
Virtual platform
RealView [16] LISA (static) [19] LISA (dynamic) [19]
Instruction-accurate ISS
OS simulated
Instructions/s
Transactions/s
100 M 11–36 M 4–6 M
n/a n/a n/a
Linux No No
Reflective simulation
ReSP [11]
< 2.9 M
n/a
No
Interpretive simulation
Benini et al. [12] Simics [25] OVP [5] Mambo [14] ARMulator [19] QSC2 [32]
31.7 K 3.2–9.3 M 4.2 M 4M 2M w/o trace 36.21–38.31 M
n/a n/a n/a n/a n/a w/o trace 49.44–52.26 M
No Linux No Linux No Linux Linux
with trace 0.75–0.78 M
with trace 1.10–1.15 M
103
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
processor model software MMU
processor model
memory
software MMU
memory-mapped I/O #1 .. .
memory memory-mapped I/O #1 .. .
memory-mapped I/O interface for SystemC .. .
memory-mapped I/O #K
memory-mapped I/O #K
interrupt cascading
data bus port physical address
hardware model
interrupt cascading
QEMU virtual machine
interrupt propagation
QEMU virtual machine
(a)
bus functional model (BFM)
interrupt mechanism
Hardware models in SystemC
(b)
Fig. 5. The block diagram of QEMU vs. the block diagram of QEMU and SystemC-based virtual platform. (a) QEMU and (b) QEMU–SystemC. The differences between QEMU and QEMU–SystemC are shaded.
Table 2 The proposed interface for the QEMU and SystemC-based virtual platform. Category
Function
Description
Processor-associated interface
cpu_register_io_memory () sysbus_init_mmio ()
For registering I/O read/write functions For registering a memory region for memory-mapped I/O
Memory-associated interface
cpu_physical_memory_read () cpu_physical_memory_write () ldn_phys () stn_phys ()
For For For For
Interrupt cascading mechanism
qdev_init_gpio_in () qemu_set_irq ()
For registering interrupt handler for receiving interrupts from downstream hardware devices For sending interrupts to upstream hardware devices
reading variable-length data from memory writing variable-length data to memory reading fixed-length data from memory writing fixed-length data to memory
the interface is not versatile enough for hardware models that need to access the memory model of QEMU such as DMAC or the upstream hardware models that are capable of receiving the interrupts triggered by the downstream devices such as vector interrupt controller (VIC). In this section, we turn our discussion to the interface we proposed for attaching the virtual hardware modeled in SystemC to QEMU. These functions can be divided into three categories: 1. Processor-associated access. This refers to access initiated by the processor model, which includes read from peripherals to the processor and write from the processor to peripherals. In this case, the virtual device plays the role of slave device of BFM. 2. Memory-associated access. Because the memory of QEMU is managed by a software MMU described in Section 2.2, all the access to the memory of the virtual platform needs to go through the address translation mechanisms of the software MMU. 3. Interrupt cascading mechanism. Although the interrupt line has nothing to do with any data access to BFM, it is indispensable for a system with interrupt-driven hardware models to work properly.
3.1. Processor-associated access To fulfill the requirements of being a system emulator, QEMU provides an I/O interface for connecting the target processor to the virtual platforms provided by QEMU. Although undocumented, most of the existing virtual platforms are modeled and constructed based on this I/O interface of QEMU. The I/O interface can be divided into two categories: PCI and memory-mapped I/O. Because the virtual platform we proposed is aimed for the SoC development, we will only present the interface for the memory-mapped I/O. Our implementation of the I/O interface is similar in principle to that of QEMU–SystemC [28] except that the interrupt mechanism we provide is much more complete. To ensure the portability of QEMU, the I/O interface provides callback functions to access 8-, 16- and 32-bit data. The set of callback functions can be registered by calling the function cpu_register_io_memory (). Most of the hardware devices require a physically consecutive memory region, and the return value of cpu_register_io_memory () will be used as the third argument to the function sysbus_init_mmio (). The purpose of the function call is to register the memory-mapped I/O space to QEMU.
104
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
ISS wrapper
ISS wrapper processor model
instruction fetch extractor
software MMU
instruction bus port virtual address
memory access extractor
physical address
processor model
M
data bus M port
instruction fetch extractor
bus functional model (BFM) software MMU
memory
memory access extractor
instruction bus port virtual address physical address
memory
infrastructure interface
memory-mapped I/O interface for SystemC .. .
hardware model interrupt mechanism
interrupt propagation
memory-mapped I/O #K
memory-mapped I/O #1 .. .
data bus M port
infrastructure interface
S memory-mapped I/O #1 .. .
M
internal memory S access interface interrupt propagation
memory-mapped I/O interface for SystemC .. .
bus functional model (BFM)
M
S
hardware model interrupt mechanism
memory-mapped I/O #K
interrupt cascading
interrupt cascading
QEMU virtual machine
ISS wrapper & infrastructure in SystemC
(a)
QEMU virtual machine
ISS wrapper & infrastructure in SystemC
(b)
Fig. 6. The block diagram of the QEMU and SystemC-based virtual platform. Note that ‘‘M’’ and ‘‘S’’ denote, respectively, the master and slave ports of the hardware models. Note that the revised parts of the virtual platform are shaded. (a) Addition of the bidirectional interrupt mechanism. (b) Addition of the internal memory access interface.
After that, the read/write functions can be called by the virtual processor to access the internal state of the hardware model. 3.2. Memory-associated access In order to handle the diversity of virtual platforms, the memory access mechanism of QEMU is complicated in the sense that it needs to handle endianness, alignment, virtual to physical translation, memory-mapped I/O, and so forth. Moreover, some of the memory access functions need to invoke the dynamic binary translation (DBT) to generate the executable code at run-time to simulate the execution of instructions. However, our purpose is to initiate a transaction on behalf of the master port of a virtual device, we will not discuss any further the memory access functions, which need to deal with DBT. A good example is ldl_code (), the purpose of which is to fetch instructions from the virtual address space where each instruction occupies 4 bytes. Instead, the following four functions:
cpu_physical_memory_read (), cpu_physical_memory_write (), ldx_phys (), and stx_phys ()
are used for the master ports of a virtual device to access the physical memory of QEMU. The first two are capable of handling variable-length data while the last two can only be used for fixedlength data. Note that x in the name of the functions ldx_phys () and stx_phys () can be either b, w, l, or q to indicate, respectively, byte, word, long, and quad-word. 3.3. Interrupt cascading mechanism Due to the complexity of system architecture, the interrupt mechanism of a system is generally complicated. Although not all the hardware models need the interrupt line to preempt the execution of a program, it is unavoidable for interrupt-driven hardware models because the only mechanism to signal the completion of their operation is by interrupt. For convenience of replacing a hardware model, the interrupt cascading mechanism needs to be two-way. One is for receiving
interrupt from QEMU while the other is for sending interrupt to QEMU. Most of the downstream devices (the devices sending interrupt to QEMU) need only to trigger the interrupt. However, to model the devices sitting in the middle, such as interrupt controller or second interrupt controller, both of the sending and receiving directions are indispensable. In QEMU, the interrupt processing uses the qdev_init_gpio_in () function to register the interrupt handler, which can receive the interrupt from other downstream hardware devices. Then, the function qemu_set_irq () will be called to determine if the incoming interrupt has to be sent upward. To make it easier to understand the proposed interface described herein, all the functions defined as part of the interface are summarized in Table 2. 3.4. Integrating interface of the virtual platform Although the processor-associated access is introduced as part of the infrastructure interface we proposed, it should be part of the data port of the ISS wrapper from the perspective of the implementation or the computer organization as shown in Fig. 6(a and b). The ‘‘internal memory access interface’’ shown in Fig. 6(b) is responsible for registering the functions exported for the attached hardware models written in SystemC, i.e., the same interface proposed in QEMU–SystemC except the interrupt handler used to receive the interrupts triggered by the downstream components. A code fragment showing how the ‘‘internal memory access interface’’ works is given in Listing 1. Although the macro FROM_SYSBUS () in line 37 acts like the macro container_of () used in the Linux kernel, it is only useful where the predefined structure SysBusDevice in the structure soc_state is used. In essence, the implementation of QEMU relies heavily on the function pointers, such as CPUReadMemoryFunc and CPUWriteMemoryFunc in lines 22 and 28. Although the arrays of the function pointers sc_soc_readfn[] and sc_soc_writefn[] are set for three read/write functions for 8, 16, and 32-bit data, the virtual platform we describe herein only needs the 32-bit access due to the on-chip bus width. The cpu_register_io_memory () function will associate the arrays of the read/write functions sc_soc_readfn[] and sc_soc_writefn[] with the structure soc_state and set fields that will be accessed and managed by the software MMU accordingly. Eventually, the initialization
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
105
Listing 1. Code fragment showing how the internal memory access interface works.
function qdev_init_gpio_in () will register the sc_soc_irq_hdlr () function as the interrupt handler for receiving the interrupts from the downstream hardware models. In addition, the third argument of qdev_init_gpio_in () shows that the hardware model preserves 32 interrupt pins to be connected by the downstream hardware models. As we have previously discussed in Section 2.5, the shared memory mechanism is used as a unidirectional FIFO between QEMU and SystemC. The function pointer technique can be used to set up the proposed interface for QEMU and SystemC. In order to avoid the null function pointer, all the functions to be used in the QEMU and SystemC co-simulation need to be set before any meaningful transaction can be initiated as the code fragment of the initialization function of the virtual Versatile/PB926EJ-S platform given in Listing 2 shows. The declaration in line 11 sets aside an array of pointers to IRQ to indicate the nFIQ and nIRQ signals of the ARM processor. The sysbus_create_varargs () function is used for registering the
initialization functions of hardware modeled in QEMU. After the sysbus_create_varargs () function is called, the soc_int_init () function is called to deliver packets that contain the information for initialization via the unidirectional FIFO from QEMU to SystemC as shown in Fig. 7. Upon receiving the packets (by the packet receiver on the SystemC side), the information in the packets will be retrieved to initialize the function pointers of the infrastructure interface we proposed. The first and third arguments of the soc_int_init () function indicate, respectively, the base address of the memory-mapped hardware model and the number of signals used to send interrupts upward to the upstream hardware models. Because the DMAC model only uses the cpu_physical_memory_read (), cpu_physical_memory_write (), and ldl_phys () functions, the code fragment shows that only these function pointers are initialized, which will be used by the hardware modeled in SystemC. In other words, access to the hardware devices modeled in SystemC and in QEMU can be achieved via the internal memory access interface shown in Figs. 6(b) and 7.
106
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
QEMU
SystemC
unidirectional FIFO
ISS wrapper & interrupt interface instruction bus port
transaction dispatcher information extractor
packet receiver
··· transactions extracted from QEMU
signal transition
transaction
data bus port
interface initialization
software MMU
internal memory access interfcace
memory model interface initialization
interrupt cascading
interrupt interface
Fig. 7. The architecture and the inter-process communication (IPC) mechanism used by the ISS presented in the paper. All exported from QEMU, the instruction bus port and the data bus port are for connecting the BFM. Furthermore, the internal memory access interface and the interrupt interface are for modeling, respectively, the memory access to the memory model within QEMU and the interrupt-driven hardware in QEMU.
ISS wrapper
processor interface
Infrastructure Interface
ARM926
Internal memory access interface
Memory FIQ IRQ
PL190 (VIC)
Bus Functional Model (BFM)
Interrupt propagation interface .. .
PL080 (DMAC) PL011 (UART)
SIC
.. .
.. .
PL050 (KMI0, KMI1) Virtual Versatile/PB926EJ-S Platform on QEMU
BFM & hardware models in SystemC
Fig. 8. The block diagram of the virtual platform Versatile/PB926EJ-S of QEMU with the PL080 DMAC model written in C replaced by a hardware model written in SystemC. The processor, memory, and interrupt mechanisms are exported from QEMU to interface with the BFM and hardware devices modeled in SystemC via the ISS wrapper and infrastructure interface.
4. Experimental results In this section, we turn our attention to the experimental results of using the proposed interface to connect an ARM PrimeCell PL080 DMAC [6] modeled in SystemC to the virtual Versatile/PB926EJ-S platform, which is used as the experimental virtual platform throughout the paper, the details of which are as shown in Fig. 8. In addition, the processor model is based on the ARM9 processor without cache. Because Linux kernel does not provide the device driver for PrimeCell PL080 of the Versatile/PB926EJ-S platform, we need to develop our own device driver for the purpose of testing.
The performance of the target virtual platform is evaluated based on two different measures: (1) the time it takes to boot up a full-fledged Linux kernel, and (2) the statistics that can be collected while the system is booting. For all the experimental results given in this section, a 2.40 GHz Intel Core 2 Quad Q6600 processor machine with 2 GB of memory is used as the host, and the target OS is built using the BuildRoot package [1], which is capable of automatically generating almost everything we need, including the cross-compilation tool chain, the target kernel image, and the initial RAM disk. The Linux distribution is Fedora 9, and the kernel is Linux version 2.6.27.12–78. QEMU version 0.11.0-rc1 and Sys-
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111 Table 3 Notations used in Tables 4–8 Min
Max
l r NTX NTI NLD NST NDMAR NDMAW NRD NWT
The best-case co-simulation time and the worst-case simulation efficiency of 30 runs, where the ‘‘simulation efficiency’’ is defined to be the number of instructions or operations simulated per second as far as this paper is concerned The worst-case co-simulation time and the best-case simulation efficiency of 30 runs The mean of co-simulation time and simulation efficiency of 30 runs The standard deviation of co-simulation time and simulation efficiency of 30 runs The total number of transactions The number of target instructions simulated The number of load operations of the virtual processor The number of store operations of the virtual processor The number of read operations initiated by the master ports of DMAC The number of write operations initiated by the master ports of DMAC The number of times the virtual processor reads data from the DMAC (PL080) The number of times the virtual processor writes data to the DMAC (PL080)
temC version 2.2.0 (including the reference simulator provided by OSCI) are all compiled by gcc version 4.3.1. Also, notations used in Tables 4–8 are summarized in Table 3.
4.1. Device driver for DMAC
107
QEMU down. Thus, the test bench can easily estimate the co-simulation time of QEMU and SystemC at the OS level. For the purpose of comparison, we use two test benches to test the data movement. One uses DMA to move the data, but not the other. The amounts of data moved are 2,048,000 words (4 bytes per word), half of which are read while the other half of which are write. When moving the data via DMA, an interrupt will be raised to signal the end of the transfer. The time it takes to boot up a full-fledged Linux kernel is as shown in the column labeled ‘‘Co-simulation time’’ of Tables 4 and 6. The rows labeled ‘‘min,’’ ‘‘max,’’ and ‘‘l’’ present, respectively, the best-case, the worst-case, and the average-case running time of booting up the kernel and shutting it down immediately for 30 times. The row labeled ‘‘r’’ gives the variability. As described in Tables 4 and 6, the column labeled ‘‘NTI’’ shows the number of target instructions actually executed by the virtual ARM processor. The columns labeled ‘‘NLD’’ and ‘‘NST’’ present, respectively, the number of load and store operations of the virtual processor including the memory-map I/O. The columns labeled ‘‘NDMAR’’ and ‘‘NDMAW’’ give, respectively, the number of reads and the number of writes initiated by the DMAC, i.e., by the master ports of DMAC. The column labeled ‘‘NTX’’ gives the total number of target instructions executed and load and store operations performed. Because the number of read/write operations of the slave port of DMAC (PL080 in this case) has been counted as the load and store operations of the virtual processor, only the number of read/write operations of the master ports has to be counted. That is,
NTX ¼ NTI þ NLD þ NST þ NDMAR þ N DMAW : To make it easier for the test, the device driver for DMAC is implemented as a char device [15]. Thereby, application program can be easily written to control the behavior by calling the ioctl () function defined in the device driver. In practice, most of the DMA device drivers cannot be characterized as either char or block device. For example, for Samsung S3C6410 SoC [30], the DMA device driver is used as the base of a driver stack by exporting functions to be used by other drivers such as sound driver or memory controller driver. That is why we choose to implement the device driver for DMAC as a char device so that it is easier to control such a device. By using the virtual platform, the device driver and the DMAC hardware model can be used to cross verify the functionality of each other while they are being developed.3 A byproduct of this is that it proves that the hardware/software co-simulation on a virtual platform can be used to verify the functionality of hardware models and device drivers at the early stage of ESL design flow, even before the physical hardware is available. Besides, the number of words moved by DMAC can be observed from the statistics reported by the IA-ISS.
4.2. Time to boot up Linux In order to gather the statistics, the initial shell script is modified to enable the option of executing the DMAC test bench and then rebooting the virtual machine automatically as soon as the booting sequence is completed. Furthermore, the predefined no-reboot option of QEMU—a flag which can be used to force a reboot inside the guest to behave like a shutdown—will catch the reboot signal once the OS executes the reboot command after completing the DMAC test in the shell script and then shuts the 3 The consequence is that we eventually found two long-standing bugs within the read/write operations of the PL080 DMAC model in QEMU, which are inconsistent with that specified in the PL080 DMAC Technical Reference Manual [6]. These two bugs can terminate the simulation when accessing some of the control registers, which are located at specific address region of memory-mapped I/O.
The columns labeled ‘‘NRD’’ and ‘‘NWT’’ give an idea about the number of read/write transactions between the virtual processor and the DMAC. Note that all the numbers given are, as the names of the rows suggest, the min,max, and average of booting up the ARM Linux and shutting it down immediately on our virtual platform for 30 times. It is worth pointing out that if DMA is not used in moving the data, then the virtual platform cannot provide any information because the data is moved by the load and store instructions of the virtual processor. That explains why the columns labeled ‘‘NDMAR’’ and ‘‘NDMAW’’ in Table 6 remain zero. The simulation times of booting up Linux with data movement using DMA and not using DMA are, respectively, 12 min 48.265 s and 11 min 58.602 s in the worst case. The slowdown of the simulation speed with data movement via DMA is due to the I/O synchronization needed for communicating with the DMA modeled in SystemC. The percentages given in parentheses are defined as
Na 100% NTX where the subscript a is either TI, LD, ST, DMAR, DMAW, TX, RD, or WT. For instance, the percentage given in the column labeled ‘‘NTI’’ of the row labeled ‘‘l’’ of Table 4 is computed as
460; 050; 883:57 100% ¼ 69:89%: 658; 223; 885:83 4.3. Simulation efficiency Tables 5 and 7 show the simulation efficiency of the same results as given in Tables 4 and 6 except that the numbers have been normalized so that they indicate the simulation efficiency instead of the numbers per run. This would make it easier to understand exactly how many instructions are executed or how many load and store operations are performed in a second in
108
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
Table 4 Simulation time of booting up the Linux kernel plus data movement with DMA on our virtual platform for 30 times. Statistics
Co-simulation time
NTI
NLD
NST
NDMAR
NDMAW
NTX
NRD
NWT
Min
08 min 47.359 s
445,011,311.00 (70.31%)
131,257,961.00 (20.74%)
54,641,476.00 (8.63%)
1,024,000.00 (0.16%)
1,024,000.00 (0.16%)
632,958,748.00 (100.00%)
3,008.00 (0.00%)
9,002.00 (0.00%)
Max
12 min 48.265 s
479,649,276.00 (69.41%)
145,416,069.00 (21.04%)
63,946,144.00 (9.25%)
1,024,000.00 (0.15%)
1,024,000.00 (0.15%)
691,059,489.00 (100.00%)
3,008.00 (0.00%)
9,002.00 (0.00%)
l
10 min 44.956 s
460,050,883.57 (69.89%)
137,461,532.53 (20.88%)
58,663,469.73 (8.91%)
1,024,000.00 (0.16%)
1,024,000.00 (0.16%)
658,223,885.83 (100.00%)
3,008.00 (0.00%)
9,002.00 (0.00%)
r
01 min 00.20 s
10,081,646.36
4,074,998.19
2,684,633.76
0.00
0.00
16,841,278.31
0.00
0.00
Table 5 Simulation efficiency of booting up the Linux kernel plus data movement with DMA on our virtual platform for 30 times (i.e., transactions per second). Statistics
NTI
NLD
NST
NDMAR
NDMAW
NTX
NRD
NWT
Min
843,848.88 (70.31%)
248,896.78 (20.74%)
103,613.43 (8.63%)
1,941.75 (0.16%)
1,941.75 (0.16%)
1,200,242.59 (100.00%)
5.70 (0.00%)
103,613.43 (0.00%)
Max
624,327.88 (69.41%)
189,278.53 (21.04%)
83,234.48 (9.25%)
1,332.87 (0.15%)
1,332.87 (0.15%)
899,506.64 (100.00%)
3.92 (0.00%)
11.72 (0.00%)
l
718,129.55 (69.92%)
214,426.27 (20.88%)
91,382.35 (8.90%)
1,601.47 (0.16%)
1,601.47 (0.16%)
1,027,141.10 (100.00%)
4.70 (0.00%)
14.08 (0.00%)
r
52,477.67
14,152.60
4,831.34
148.75
148.75
71,759.12
0.44
1.31
Table 6 Simulation time of booting up the Linux kernel plus data movement without DMA on our virtual platform for 30 times. Statistics
Co-simulation time
NTI
NLD
NST
NDMAR
NDMAW
NTX
NRD
NWT
Min
08 min 15.351 s
435,402,217.00 (70.74%)
127,911,363.00 (20.78%)
52,185,122.00 (8.48%)
0.00 (0.00%)
0.00 (0.00%)
615,498,702.00 (100.00%)
8.00 (0.00%)
1.00 (0.00%)
Max
11 min 58.602 s
472,317,649.00 (69.84%)
142,224,553.00 (21.03%)
61,773,665.00 (9.13%)
0.00 (0.00%)
0.00 (0.00%)
676,315,867.00 (100.00%)
8.00 (0.00%)
1.00 (0.00%)
l
10 min 15.390 s
452,457,224.70 (70.30%)
134,521,372.77 (20.90%)
56,644,735.00 (8.80%)
0.00 (0.00%)
0.00 (0.00%)
643,623,332.47 (100.00%)
8.00 (0.00%)
1.00 (0.00%)
r
01 min 04.551 s
10,355,227.14
4,193,136.67
2,765,526.43
0.00
0.00
17,313,890.24
0.00
0.00
Table 7 Simulation efficiency of booting up the Linux kernel plus data movement without DMA on our virtual platform for 30 times (i.e., transactions per second). Statistics
NTI
NLD
NST
NDMAR
NDMAW
NTX
NRD
NWT
Min
878,977.19 (70.74%)
258,223.69 (20.78%)
105,349.79 (8.48%)
0.00 (0.00%)
0.00 (0.00%)
1,242,550.66 (100.00%)
0.02 (0.00%)
105,349.79 (0.00%)
Max
657,272.94 (69.84%)
197,918.39 (21.03%)
85,963.67 (9.13%)
0.00 (0.00%)
0.00 (0.00%)
941,155.00 (100.00%)
0.01 (0.00%)
0.00 (0.00%)
l
741,861.21 (70.33%)
220,382.04 (20.89%)
92,639.92 (8.78%)
0.00 (0.00%)
0.00 (0.00%)
1,054,883.18 (100.00%)
0.01 (0.00%)
0.00 (0.00%)
r
63,968.33
17,364.13
5,959.48
0.00
0.00
87,291.94
0.00
0.00
Table 8 Normalized simulation statistics of booting up the Linux kernel plus data movement without DMA on our virtual platform for 30 times. Statistics
NTI
NLD
NST
NDMAR
NDMAW
NTX
NRD
NWT
Min
463,536,531.94
136,176,586.93
55,557,159.90
0.00
0.00
655,270,273.51
8.00
1.00
Max
504,959,795.25
152,053,771.89
66,042,878.93
0.00
0.00
723,056,446.08
8.00
1.00
the worst, best, and average case. For instance, as the row labeled ‘‘l’’ of Table 5 shows, on average, the instruction-accurate ISS we proposed can execute about 718,129.55 instructions and perform about 214,426.27 load and 91,382.35 store operations in a second. Even in the worst-case, as the row labeled ‘‘max’’ of Table 5 shows, it can still execute about 624,327.88 instructions and perform about 189,278.53 load and 83,234.48 store operations in a second.
4.4. Performance evaluation of target virtual platform As it can be seen from Tables 5 and 7, without normalization, the experimental results cannot be easily compared. As such, all the instruction and transaction counts are normalized in terms of the simulation time first before they are compared by
NNorm ¼ Nb T cosim b
109
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
Compared to the case of not using DMA, the performance gain of using DMA in terms of NTI can be defined by
GTI ¼ NNorm NDMA ; TI TI the performance gain of using DMA in terms of NLD and NST by
Gc ¼ NNorm NDMA þ 1; 024; 000 c c where c indicates either LD or ST; the performance gain of using DMA in terms of NTX by
GTX ¼ N Norm NDMA þ 2; 048; 000 TX TX
Fig. 9. The statistics of different data movement settings in a logarithmic plot.
where NNorm and N DMA indicate, respectively, the normalized simu lation results of not using DMA as shown in Table 8 and the normalized simulation results of using DMA as shown in Table 4. The results are as shown in Table 9. Furthermore, the performance gain of using DMA in terms of NTI in percentage can be defined accordingly by
NNorm NDMA TI TI Table 9 Performance gain in booting up the Linux kernel on our virtual platform with DMA for 30 times. Statistics
GTI
GLD
GST
GTX
Min
18,525,220.94 (4.16%)
3,894,625.93 (2.94%)
108,316.10 (0.19%)
20,263,525.51 (3.19%)
25,310,519.25 (5.28%)
5,613,702.89 (3.83%)
1,072,734.93 (1.65%)
29,948,957.07 (4.32%)
Max
NDMA TI
100%;
the performance gain of using DMA in terms of NLD and NST in percentage by
NNorm NDMA þ 1; 024; 000 c c þ 1; 024; 000 NDMA c
100%
where c is as defined above; the performance gain of using DMA in terms of NTX in percentage with DMA by
NNorm NDMA þ 2; 048; 000 TX TX NDMA þ 2; 048; 000 TX
100%
because without DMA, all the transfers are handled by the load and store instructions of the virtual processor, which have been counted as part of the total number of transactions. For instance, the performance gain given in the column labeled ‘‘NLD’’ of the row labeled ‘‘min’’ of Table 9 is
3; 894; 625:93 ¼ 136; 176; 586:93 ð131; 257; 961:00 þ 1; 024; 000Þ; and the performance gain in percentage is
136; 176; 586:93 ð131; 257; 961:00 þ 1; 024; 000Þ 100% 131; 257; 961:00 þ 1; 024; 000 ¼ 2:94%:
Fig. 10. Performance gain of data movement ‘‘with DMA’’ with respect to one ‘‘without DMA’’ in percentage.
where Nb indicates the statistics shown in Table 7 except NDMAR, NDMAW,NRD, and NWT because the number of words transferred is fixed. Moreover, Tcosim indicates the co-simulation time in the column ‘‘Co-simulation time’’ of Table 4. For instance, the number given in the column labeled ‘‘NTI’’ of the row labeled ‘‘min’’ of Table 8 is computed as
463; 536; 531:94 ¼ 878; 977:19per sec 527:359 sec : The statistics of different data movement settings shown in Tables 4, 6 and 8 are compared in Fig. 9. It is interesting to note that TDMAR and TDMAW are visible only for the simulation setting ‘‘with DMA.’’
The performance gain of data movement ‘‘with DMA’’ with respect to one ‘‘without DMA’’ in percentage is as shown in Fig. 10, which is essentially a summary of Table 9. Our experimental results given in Table 9 show that with DMA, the performance gain, with respect to the number of instructions required to boot up a Linux kernel, is about 18.53–25.31 M instructions for transferring 2,048,000 words or 4.16–5.28% compared to without DMA. In other words, with DMA, the number of instructions saved in transferring a word is
Nmin 18; 525; 220:94 ¼ 9:05 ¼ 2; 048; 000 2; 048; 000 in the worst case and is
Nmax 25; 310; 519:25 ¼ 12:36 ¼ 2; 048; 000 2; 048; 000 in the best case where Nmin and Nmax indicate, respectively, the statistics given in the column labeled ‘‘GTI’’ of rows ‘‘min’’ and ‘‘max’’ of
110
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111
Listing 2. Code fragment showing how all the functions used in the QEMU and SystemC co-simulation on the virtual Versatile/PB926EJ-S platform are set.
Table 9. In other words, using DMA to transfer a word can save 9.05–12.36 instructions for the ARM9 processor. The other columns labeled ‘‘GLD’’, ‘‘GST’’ and ‘‘GTX’’ in Table 9 show, respectively, the performance gain of using DMA in terms of ‘‘NLD’’, ‘‘NST’’ and ‘‘NTX;’’ that is, the number of load operations, the number of store operations, and the number of total transactions saved. The performance loss shown in the column labeled ‘‘GST of Table 9 is caused by the fact that booting up a kernel is a non-deterministic procedure. 5. Conclusion This paper presents an interface for connecting QEMU to SystemC for a QEMU and SystemC-based virtual platform. The proposed interface can be used to enable the master/slave ports of the attach hardware modeled in SystemC to access the virtual platform. As a consequence, such a virtual platform can facilitate the co-design of the hardware models written in SystemC and the associated device drivers while they are being developed. Moreover, it can be used to co-verify the correctness of the hardware models and the associated device drivers. For concreteness, we use DMAC as an example to show how the proposed interface works. By using such a virtual platform, we were eventually able to fix two long-standing bugs in the DMAC model of QEMU. Furthermore, the virtual platform we proposed can even provide instruction-accurate statistics for measuring the performance of the attached hardware. Even more important, with all the transactions traced, the virtual platform takes only 12 min 48.265 s to boot up a full-fledged kernel, even in the worst case of 30 runs. In other words, the proposed virtual platform provides a viable solution for hardware/software co-simulation where speed is a concern. Acknowledgment This work was supported in part by National Science Council, Taiwan, ROC, under Contracts NSC98-2221-E-110-049 and NSC99-2221-E-110-052.
References [1] BuildRoot. Available from: . [2] CoWare Model Library. Available from: . [3] CoWare Platform Architect. Available from: . [4] Nine reasons to adopt SystemC ESL design. Available from: . [5] P. Agrawal, Hybrid Simulation Framework for Virtual Prototyping Using OVP, SystemC & SCML. Available from: . [6] ARM, PrimeCell DMA Controller (PL080) Technical Reference Manual, 2003. Available from: . [7] ARM, RealView ARM ISS User Guide, 2007. Available from: . [8] J. Aynsley, OSCI TLM-2.0 Language Reference Manual, 2009. Available from: . [9] B. Bailey, G. Martin, A. Piziali, ESL Design and Verification, Morgan Kaufmann Publishers, 2007. [10] F. Bellard. QEMU, a fast and portable dynamic translator, in: Proceedings of USENIX Annual Technical Conference, June 2005, pp. 41–46. [11] G. Beltrame, C. Bolchini, L. Fossati, A. Miele, D. Sciuto, ReSP: a non-intrusive transaction-level reflective MPSoC simulation platform for design space exploration, in: Proceedings of the 13th Asia and South Pacific Design Automation Conference, 2008, pp. 673–678. [12] L. Benini, D. Bertozzi, D. Bruni, N. Drago, F. Fummi, M. Poncino, Legacy SystemC Co-Simulation of Multi-Processor Systems-on-Chip, in: Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2002, pp. 494–499. [13] D.C. Black, J. Donovan, SystemC: From The Ground Up, Springer Science + Business Media, 2004. [14] P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E.V. Hensbergen, L. Zhang, Mambo: a full system simulator for the PowerPC architecture, ACM SIGMETRICS Performance Evaluation Review 31 (2004) 8–12. [15] J. Corbet, A. Rubini, G. Kroah-Hartman, Linux Device Drivers, O’Reilly, 2005. [16] Design & Reuse, ARM Expands RealView Product Family with Fast Simulation Technology to Speed Up Software Development. Available from: . [17] T. Grötker, S. Liao, G. Martin, S. Swan, System Design with SystemC, Kluwer Academic Publishers Group, 2002. [18] G.R. Hellestrand, Systems Engineering: The Era of the Virtual Processor Model (VPM). Available from: .
T.-C. Yeh, M.-C. Chiang / Journal of Systems Architecture 58 (2012) 99–111 [19] A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, H. Meyr, A novel methodology for the design of Application-Specific Instruction-Set Processors (ASIPs) using a machine description language, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 20 (2001) 1338–1354. [20] A. Hoffmann, O. Schliebusch, A. Nohl, G. Braun, O. Wahlen, H. Meyr, A methodology for the design of Application Specific Instruction Set Processors (ASIP) using the machine description language LISA, in: Proceedings of the International Conference on Computer Aided Design, 2001, pp. 625–630. [21] A. Hoffmann, O. Schliebusch, A. Nohl, G. Braun, O. Wahlen, H. Meyr, A universal technique for fast and flexible instruction-set architecture simulation, in: Proceedings of the 39th Annual Design Automation Conference, 2002, pp. 22– 27. [22] IEEE Computer Society. IEEE Standard System C Language Reference Manual. Design Automation Standards Committee, 2005. Available from: . [23] P.K. Immich, R.S. Bhagavatula, R. Pendse, Performance analysis of five interprocess communication mechanisms across UNIX operating systems, Journal of Systems and Software 68 (2003) 27–43. [24] J.-W. Lin, C.-C. Wang, C.-Y. Chang, C.-H. Chen, K.-J. Lee, Y.-H. Chu, J.-C. Yeh, Y.-C. Hsiao, Full system simulation and verification framework, in: Proceedings of Fifth International Conference on Information Assurance and Security, Aug. 2009, pp. 165–168. [25] P.S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, B. Werner, Simics: a full system simulation platform, Computer 35 (2) (2002) 50–58. [26] G. Martin, Overview of the MPSoC design challenge, in: Proceedings of the 43rd ACM/IEEE Design Automation Conference, 2006, pp. 274–279. [27] M. Montón, J. Carrabina, M. Burton, Mixed simulation kernels for high performance virtual platforms, in: Proceedings of Forum on Specification and Design Languages, September 2009, pp. 1–6. [28] M. Montón, A. Portero, M. Moreno, B. Martínez, J. Carrabina, Mixed SW/ SystemC SoC emulation framework, in: Proceedings of IEEE International Symposium on Industrial Electronics, June 2007, pp. 2338–2341. [29] OSCI. Open SystemC Initiative. Available from: . [30] Samsung. Mobile SoC Application Processor S3C6410. Available from: . [31] C.-C. Wang, R.-P. Wong, J.-W. Lin, C.-H. Chen, System-level development and verification framework for high-performance system accelerator, in:
111
Proceedings of International Symposium on VLSI Design, Automation and Test, Apr. 2009, pp. 359–362. [32] T.-C. Yeh, G.-F. Tseng, M.-C. Chiang, A fast cycle-accurate instruction set simulator based on QEMU and SystemC for SoC development, in: Proceedings of the 15th IEEE Mediterranean Electrotechnical Conference, Apr. 2010, pp. 1033–1038.
Tse-Chen Yeh received the B.S. and M.S. degrees, both in Information Engineering from I-Shou University, Kaohsiung, Taiwan in 1996 and 1998, respectively. He is currently working toward the Ph.D. degree in Computer Science and Engineering at National Sun Yat-sen University, Kaohsiung, Taiwan. His current research interests include system modeling, hardware/software cosimulation, and design space exploration.
Ming-Chao Chiang received the B.S. degree in Management Science from National Chiao Tung University, Hsinchu, Taiwan in 1978 and the M.S., M.Phil., and Ph.D. degrees in Computer Science from Columbia University, New York, NY, USA in 1991, 1998, and 1998, respectively. He had over 12 years of experience in the software industry encompassing a wide variety of roles and responsibilities in both large and start-up companies before joining the faculty of the Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan in 2003, where he is currently an Associate Professor. His current research interests include image processing, evolutionary computation, and system software.