Interface Design of VHDL Simulation for Hardware-Software ...

7 downloads 0 Views 90KB Size Report
Codesign and Parallel Processing Laboratory. Seoul, Korea ... us to evaluate each design decision such as partitioning or ... Since, the software part is running as a UNIX process, most of our ef- forts for ... U.C.Berkeley[8] devised a common interface mechanism ... tions of the system as the procedures called from the VHDL.
Interface Design of VHDL Simulation for Hardware-Software Cosimulation Wonyong Sung, Moonwook Oh, Soonhoi Ha Seoul National University Codesign and Parallel Processing Laboratory Seoul, Korea TEL : 2-880-7292, FAX : 2-886-7589 fyong,mwoh,[email protected] Abstract To perform cosimulation, an interface design of VHDL simulation is needed. This interface is responsible for communicating packets between any VHDL simulator and the cosimulation backplane, PeaCE, which is a Ptolemy extension as codesign environment. The interface also manages the simulation for correct timed cosimulation. By the automatic interface generation mechanism, the interface is generated without user intervention. The proposed interface mechanism is implemented for two VHDL simulators and verified by covalidation with a “QAM-16” modulation example. The results and lessons from the experiment are described.

1 Introduction Cosimulation is a key facet of codesign methodology. Through cosimulation, we can validate the functional correctness of the hardware and software working together, ahead of the final synthesis step. Also, cosimulation allows us to evaluate each design decision such as partitioning or component selection. As codesign proceeds, the level of cosimulation evolves from the behavioral level to the implementation level in which timed cosimulation would be necessary to validate the timing requirements. It is unknown to us if there is a cosimulation environment which cooperates with various existing VHDL simulators in cosimulation from the behavioral level to the timed level. This paper will present an interface mechanism to combine any VHDL simulator in the proposed cosimulation environment. In most codesign systems, the software part is usually a C program and the hardware part is written in hardware description languages (HDLs) such as VHDL, Verilog, or Hardware-C. For hardware simulation, most people use a hardware simulator such as a VHDL simulator. For software simulation, however, there are various approaches

from using the processor simulator to making stand-alone UNIX processes. We use the latter approach. To perform cosimulation, two simulators should be combined, which requires making interfaces for communication between them. To keep designer from the burden of making interfaces at each design iteration step, an automatic interface generation facility is devised in this paper. Since, the software part is running as a UNIX process, most of our efforts for interface design is focused on the interface for the VHDL simulation. In the next section, we will give a short introduction to our codesign workflow. The requirements and solutions for VHDL interface are described in section 3. The implementation details dependent on our environment are explained in section 4. Some experimental results and discussions are shown in section 5 and section 6 has a conclusion and future works.

2 Codesign Methodology The codesign workflow in PeaCE, a codesign framework we are developing, is represented in figure 1.

2.1 speci cation and algorithm simulation A dataflow graph is chosen as an initial specification model for a given application and simulated for algorithm validation. Dataflow models are very effective for specifying most types of DSP applications[2, 3]. During algorithm development and simulation, the use of dataflow specification allows to obtain measures of the algorithmic performance (e.g. bit error ratio) quickly, because there is no need to determine the timing of the operations [9]. Clock, reset and other implementation dependent signals are not required at this level of abstraction. The simulation efficiency is higher than that of discrete event simulation [4].

SDF or DDF

Algorithm Simulation BP domain

Partition

topic of this paper

tem could be found. To be simulated, the generated C code is compiled into a UNIX process and the generated VHDL code from the hardware graph is passed to the VHDL simulator for hardware simulation. Back Plane

VHDL

CGC

BP scheduler

interface insertion

: interface node

CGC

Dataflow in original design

Event Queue

VHDL

test vector

C

VHDL

Dataflow in Backplane Dataflow Visulaization using socket

Code generation with interface VHDL

C

compile with Unix CC Unix Process

construction simulator VHDL

cosimulation

simulator Synthesize

DSP compiler Evaluation

C Process

VHDL simulator

not satisfied

O.K. DSP executable file

FPGA loadable file Prototyping Board

Figure 1. Hardware Software Codesign Workflow

2.2 Partition, Cosimulation, and Evaluation The next step is to partition the initial dataflow graph into subgraphs; software graphs and hardware graphs. By cosimulation and evaluation, the feasibility and the cost effectiveness of a partition is examined. At each iteration of codesign process, a new partition is made, which requires the rebuilding of the interface between two subgraphs. Since making an interface is tedious and error-prone work, it is desirable to generate the interface automatically[10]. This is the main topic of this paper. After partitioning, each subgraph is modified in order to add interface nodes at the boundary. From the partitioned graphs, C and VHDL codes are generated. No user intervention is needed for the addition of interface node and code generation. The interface node to be added is chosen from the reusable interface node library according to the synthesis target and communication protocol. Through cosimulation, a designer can check the functional correctness ahead of the final synthesis step. Also, cosimulation is used to get the profiling information. As shown in [1], the profiling results help the designer partition the target system more efficiently. Another role is to identify the performance bottleneck. Moreover, by the timed cosimulation, the exact timing behavior of the whole sys-

Figure 2. The cosimulation backplane and communication with client simulators

To combine two concurrent simulator (C process and VHDL simulator), PeaCE introduces and implements a backplane concept, which reduces the number of interface module from N(N-1) to N. In the backplane approach, a software or a hardware simulator interacts with PeaCE, the cosimulation backplane, not knowing the existence of the counterpart. The backplane monitors and manages all communication events between the software and the hardware simulators. On the other hand, Ptolemy group of U.C.Berkeley[8] devised a common interface mechanism between any pair of simulators so that they also achieve the same reduction of interface modules to N. Their approach is called ”heterogeneous“simulation[12]. Therefore, as shown in figure 2, while the cosimulation is in progress, several UNIX processes are running concurrently and cooperatively: C processes, a VHDL simulator and PeaCE itself. In figure 2, a C process and a VHDL simulator communicate with each other through the cosimulation backplane, PeaCE. In the backplane, we can use simulation and visualization capability of Ptolemy. In figure 2, a dashed line represents a flow of data within the backplane through a function call and a solid line displays a data transmission through socket. The user specifies the dataflow with the dotted lines as shown in figure 4. The backplane supports an event-driven scheduling with an event queue, which holds future events sorted by time. Any data between processes is transferred through the cosimulation backplane, which makes the event queue in the backplane a global event queue. The backplane scheduler manages the event queue and transmits a packet to a client simulator in the order of event generation time. If

the destination is an external process like a C process or a VHDL simulator, the backplane scheduler calls the utility functions to send the packet via socket. The interface node, automatically inserted by PeaCE, receives the packets and transmits it to C or VHDL module. After transmitting packets to a client simulator, the backplane scheduler waits in a polling loop until it receives the results. To avoid deadlock, the client simulator should make it sure to send a response packet to the backplane even when there is no result data.

2.3 Synthesis and prototyping Based on simulation results, the evaluation module makes a decision whether the current partition satisfies the system requirements. Unless they are satisfied, the codesign process continues to iterate from the partition stage to the evaluation stage. Otherwise, the codesign process reaches the synthesis and testing step. After validation check through timed cosimulation and evaluation step, C and VHDL codes are regenerated from the hardware and software subgraphs. In this stage, the generated code includes interface code suitable for the prototyping board, which consists of a DSP and a FPGA.

3 Design and generation of cosimulation interface In this paper, we use VHDL simulators and UNIX processes as concurrent processes communicating each other through BSD sockets. An alternative approach is to treat the entire system as a single process by making the C portions of the system as the procedures called from the VHDL modules[7]. A serious drawback of this approach is that not all software parts can be expressed as procedures. In our environment, the communication between the hardware and the software is modeled as a message passing system. We aim to design the flexible and extensible message passing interface for cosimulation. We first make the desired characteristics of the message passing interface for generic cosimulation.

 No modification of the initial specification : Generally, in the initial algorithm specification, there is no considerations for partitioning and interfacing. So, whenever the cosimulation is needed, interface code should be generated on every hardware-software partition automatically. By only adding new the interface, a cosimulation is constructed, without any modification of the user design.  No modification of VHDL simulator : Since we will use existing VHDL simulators for HW simulation, this requirement is crucial. Many of problems we met in

this paper are from this requirement. It is revealed that the interface design will be improved significantly if we can add some capabilities to the simulator. In section 4.3, the desirable capabilities will be described.

 No modification of VHDL module libraries : We will not change the code of VHDL modules but augment the interface code automatically without user’s intervention. It distinguishes our approach from Sari’s approach of Carnegie Mellon University[11].  No restriction on VHDL specification : A previous work from U.C.Berkeley restricted the VHDL program, which is generated from a program graph with SDF semantics, to a single thread of control [8]. Even though this approach schedules the communication statically for deadlock avoidance as well as runtime performance improvement, it is too restrictive for general applications. In [8], only one sequential process is running on the VHDL simulator. Our VHDL specification has no such restrictions on VHDL model. In fact, we add a new VHDL module as the interface module, which will run concurrently with the VHDL graph.  Timed cosimulation : After functional correctness is validated, we also need to check the timing requirements of the system. Since the VHDL simulator has the notion of time, we can perform timed cosimulation if software processes are managed by an event-driven scheduling. Then, we need to define a synchronization protocol between two concurrent event-driven simulators.

4 Implementation

4.1 Automatic interface generation Our codesign environment, PeaCE is based on the Ptolemy which is a framework for heterogeneous system specification, simulation and synthesis[6]. In Ptolemy, an application is represented as a block diagram which is given an appropriate semantics. For example, a DSP algorithm is represented as a block diagram with the dataflow graph semantics. Each block contains the code to be executed to make the coarse grain dataflow graph. In this paper, we consider a category of applications that can be represented with dataflow graphs, for example, DSP systems. Ptolemy generates C codes from the partitioned dataflow graphs for software and makes stand-alone C processes to be run on the host computer. And, from the partitioned dataflow graphs for hardware, Ptolemy generates VHDL codes that are passed to the VHDL simulator. Before C codes and VHDL codes are generated from the partitioned

graphs, communication blocks are automatically inserted at each partition boundary. The type and connectivity information of the communication block are decided by those of the partitioned arc. These communication blocks contain all communication and synchronization routines for cosimulation: they include socket establishment, socket termination, data communication, buffer management and synchronization by timestamp management. By the foreign interface mechanism according to the ANSI/IEEE std. 1076-1993, the communication codes for VHDL are written in C and called by the VHDL code in the newly added communication modules.

4.2 Design of interface method inf(1) object_time

inf(2)

VHDL signal VHDL foreign interface

M_1

Foreign Module (C)

Receive

Send

Receive

Send

wait for 1 ns;

new data to the outside. Even in case there is no output data, it generates the “END” signal to the outside to indicate that the VHDL partitioned graph has finished its execution. If we define a simulation loop as the duration between receiving input packets form the backplane and transmitting response packets, a simulation loop is divided into two parts: the interface time (inf(1) + inf(2) in figure 3) and user module time(object time in figure 3). At the beginning of a simulation loop, the master node calls the socket receive function, which scans all incoming channels to read all simultaneous data from the backplane until the “GO” control packet is received. Each packet from the outside contains an identification field to specify which receiver node it is transferred to. After transferring all received data to the input buffer, it generates an “enable” signal to each receive node to run the partitioned graph. The receive nodes get the data from the buffer, which defines the end point of inf(1) duration. If a send node is scheduled, it writes a result data into an output buffer in the foreign module. This is the beginning of inf(2) duration. After delta delay in the VHDL simulator, the master node scans the output buffer and transmit the data to the backplane through socket. The end of inf(2) is marked here. The time measurement is described in section 5.2.

4.3 Considerations for timed cosimulation

M_2 User’s Design Master

Figure 3. Function of three entities of VHDL interface For VHDL simulator we append three types of interface nodes; master node, send node, and receive node. For each partitioned arc, we add either a receive node for an input arc or a send node for an output arc. Send nodes and receive nodes are implemented separately for each data type: float or integer. A VHDL simulator schedules interface nodes when communication occurs. In case there are more than one input communication links between hardware and software, there exists a risk of deadlock unless we carefully manage the firing order of receiver modules. We solve this problem by adding one VHDL entity, called “the master node”, which serializes the firings of communication nodes. The architecture of interface VHDL simulation is depicted in figure 3. In the initialization stage, the master node establishes the socket connection and setups the input and output queues; one for each partitioned link. Also, the master node scans the output buffer of send blocks to export the

To make the concurrent event-driven simulation, we use a conservative approach so that the clock of VHDL simulator may not be ahead of the global clock. In a conservative timed cosimulation, a client simulator can advance its local time only when it receives a packet which has a time stamp larger than its local time. Since we may not expect that the VHDL simulator has an interface for the external cosimulation engine to prevent the advancement of the local clock, we let the master node keeps the time advancement of the VHDL simulator from being ahead of the global clock. The master node plays a role of runtime manager of the VHDL simulator. For correct timed cosimulation, the behavior of the master node described in the previous subsection is divided into two parts. At each execution, the master node checks the input connection and send signals to wake up the receive nodes. After all events at the current time are processed in the VHDL simulator, the master node is scheduled again to check the output buffer and sends packets to the cosimulation backplane. Then, the master node goes into a wait loop to be excited from the outside. Figure 4 and figure 5 are screen dumps from QAM cosimulation in PeaCE. Figure 4 has a top view of QAM system which has two super nodes; one has a C subgraph and the other has a VHDL subgraph that is displayed in figure 5. The top view window also has two simulation support

Figure 4. A screen dump of top view window of QAM in PeaCE

is scheduled and produces a output value when a packet is received from the backplane. If a ramp node is scheduled after the master node checks the output buffer and produces a result, then the result will be checked and sent out at the next time slot. The backplane will receive the packet after it advances to the next time slot. So, the packet will be an old packet and the conservative cosimulation will be broken. Another problem is that the value responded first is a glitch. Since it is usually not possible to schedule a certain process at the end of the current time wheel, as shown in figure 6, we advance the local time for delta delay before the master node checks the output buffer and sends response packets to the backplane of cosimulation. Although the master node is scheduled at the beginning of the next time slot rather than at the end of the current time slot, we can deliver the final results of the previous time slot to the backplane if we subtract the delta time from the time-stamp of the output packets. while(Go packet is received) loop receive packet from socket; write data into input queue; end loop; Check timestamp of received packet; if (timestamp > simulator’s time) advance time; Send ENABLE signal to receiver nodes; Advance local time for delta delay(1 NS); Check data in output queue; If data exists then send data to socket; Send END signal with next time;

Figure 5. A screen dump of QAM cosimulation in PeaCE

nodes; a clock node (a test pattern generation node) and a XGraph node (a visualization node) in the backplane. While the VHDL module graph has 8 nodes, 12 VHDL entities are simulated because of the insertion of a master node, two receive nodes, and a send node. Each of 12 entities has its own process. Thus 12 concurrent processes are running within the VHDL simulator. For the master node to check the output buffer, the master node should be scheduled only after all events in the same time wheel are processed. Or, the VHDL simulator may not produce the current output to the backplane on time. Recall that once the master node receives a packet from the backplane, even if there is no result data at that time, a response packet should be sent to the backplane for the backplane scheduler to exit from the wait loop. In the VHDL module graph shown in figure 5, there is a source node, a ramp node, which produces an incremented value whenever it is scheduled. The designers intention is that the ramp node

Figure 6. Behavior of the master node The advancement of the local clock by delta delay is prohibited in conservative distributed simulation. To cure this problem, the duration of arbitrary time advancement should be small enough to be ignorable at the simulation interface. An easiest way is to use a very small time unit such as 1 femto second. Since the small unit of time makes the internal data structure of VHDL simulator inefficient, we use a SCALE parameter, which is a ratio between the time unit in the entire cosimulation and that within the VHDL simulator. When the master communicate with the backplane, the time-stamp of a packet is interpreted by multiplying or dividing the time-stamp with the SCALE value. The behavior of the master node is shown in figure 6. There are two points where the time advancement is occurred. After packets from the backplane are received, if the future time stamp is detected by calling check time advance() foreign procedure, the VHDL simulator advances its local time. The amount of time advancement is computed by multiplying the SCALE value with the time difference between the backplane and the VHDL simu-

lator. Another advancement is the delta delay advancement described earlier. This is used only once per each simulation cycle. If the SCALE is larger than 1, timed simulation works well. However, another problem is caused by using the SCALE parameter. If there is a VHDL module which uses a time unit smaller than SCALE, the VHDL module work differently from the designer’s intention. In QAM, for example, a ramp node has a ’wait for 1ns;’ statement. After we replace the statement with ’wait for SCALE ns;’, the system works correctly. We will construct a VHDL module library, a SCALE is used as one of its generic parameters. From the experiences of interface design and implementation, we make a list of facilities that VHDL simulators had better support for cosimulation.

 A callback function for hooking the scheduler: Before the VHDL simulator’s scheduler advances to the next cycle, a function pointed by a pointer is called. In normal situation the pointer is pointed a null function. If a cosimulation environment designer wants to hook it up, he defines a function body and set the pointer to the new address. If the supported language is a C++, it will be done by a virtual function. By the callback mechanism, the master node can be executed at the end of the current wheel. Then, we will do without the delta delay management described above.  The time of nearest future event: To perform more efficient cosimulation, the information of nearest future event is required. The current implementation of PeaCE uses the next time increment as a nearest future event and it is a major source of inefficiency of cosimulation time. The C language interface in the Synopsys VHDL simulator(VSS) supports this facility with the ’cliGetNextEventTime’ function.

5 Experiences of cosimulation with designed interface mechanism We implemented two sets of the proposed interface generation mechanism with QAM modulation example. One set of implementation is for Synopsys VSS simulator and the other is for IVSIM, which is developed in the same University[13]. Though both support the VHDL foreign interface, since the implementation details of the foreign interface are not specified in the standard, the interface mechanism in each simulator depends on the simulator implementation.

5.1 Code size overhead of interface blocks The fixed C module in table 1 is a foreign module which includes socket handling routine and VHDL interface rou-

tine. In VSS, a foreign module defines the body of a VHDL entity itself, thus needs more extra code to interface with the scheduler of the VSS simulator. On the other hand, the C modules in IVSIM are just procedure calls. As a result, the fixed C part in VSS is larger than that in IVSIM. In VSS, however, interface designer can exploit more facilities related with VHDL simulator kernel using CLI(C Level Interface) facility. The fixed VHDL module in VSS includes entity definitions of send, receive, and master node, while that in IVSIM has only a master node. On the contrary, the proportional parts in IVSIM are larger than those in VSS. The proportional part in VSS has only entity instantiation codes while, in IVSIM, the proportional part defines and instantiates entities of send or receive node. Although there are much differences in detailed implementation, the same interface mechanism using the master node is used for both simulators, and the size of interface part is about 10% of the whole simulated code for both VSS and IVSIM in the QAM example.

Table 1. Interface code overhead in VSS and IVSIM simulators Fixed C Module (bytes) VSS IVSIM

17688 5628

Fixed VHDL module (lines) 341 25

VHDL lines per Receive 2 27

VHDL lines per Send 2 17

5.2 Time overhead of interface blocks The result of runtime monitoring of VHDL simulator and interface module is presented in table 2. To measure the execution time of user design VHDL modules and that of interface modules, we use gettimeofday() UNIX system call. The current time is expressed in elapsed seconds and microseconds since 00:00 GMT, January 1,1970. We obtain the system time at four points as described in section 2.2. The time spent in interface modules means the duration between the UNIX socket to input/output buffers while the module time is the time duration between when the receive nodes get the input data from the input buffer and when the send nodes write the results into the output buffer. The time overhead of interface block is small enough to be ignored.

6 Conclusion We have presented a new interface mechanism for hardware-software cosimulation. We think that the approach, which satisfies all of the requirements in the wish-

Table 2. Interface time overhead in QAM cosimulation. (IVSIM simulator is used and all values are presented in micro seconds unit.) Simulation Loop 96 192 288 384 480

User Module 65,905 127,768 190,005 259,545 327,625

Interface Module 5,599 8,772 11,522 15,220 18,399

list, is applicable to all detailed levels of cosimulations and to various existing VHDL simulators. It is not known to us that there has been any cosimulation environment which works with any VHDL simulator and at the same time performs timed cosimulation. We implemented the interface mechanism both for VSS and IVSIM simulators and compared them. Also, we verified the feasibility of the proposed approach by conservative timed cosimulation with a “QAM-16” modulation example. The implemented interface and lessons for more efficient timed cosimulation has been described. As a future work we will improve our cosimulation environment, construct a generic and parameterized VHDL module library, and make a smooth migration path to cosynthesis.

References [1] C. Passerone, et. al. Fast and accurate hardwaresoftware co-simulation using software timing estimates. CODE/CASHE96, 1996. [2] E. A. Lee. Recurrences, Iteration, and Conditionals in Statically Scheduled Block Diagram Language in VLSI Signal Processing III. IEEE Press, 1988. [3] E. A. Lee, and D. G. Messerschimitt. Synchronous data flow. IEEE Proceedings, September 1987. [4] G. Jennings. A case against event driven simulation of digital system design. The 24th Annual Simulation Symposium, pages 170–176, April 1991. [5] IEEE. IEEE Standard VHDL Language : Reference Manual. IEEE, Inc., 345 East 47th Street, New York, NY 10017, USA, 1993. [6] J. Buck, S. Ha, E. A. Lee, and D. G. Messerschimitt. Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer Simulation, 4:155–182, April 1994. [7] J. P. Soninen, et. al. Co-simulation of real-time control systems. IEEE/ACM Proc. of Euro-Dac’95, pages 170–175, 1995. [8] J. Pino, Michael C. Williamson, and Edward A. Lee. Interface Synthesis in Heterogeneous System-Level DSP Design

[9]

[10]

[11]

[12] [13]

Tool. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. Peter Zepter, Thorsten Grotker, and Heinrich Meyr. Digital Receiver Design Using VHDL Generation From Data Flow Graphs. Procedings of 34th DAC, June 1995. S. Schemeler, et. al. A backplane approach for cosimulation in high-level system specification environments. European Design and Test Conference, 1995. Sari L. Coumeri and Donald E. Thomas. A Simulation Environment for Hardware-Software Codesign. IEEE Design and Test of Computers, pages 16–28, September 1993. Wayne Wolf. Hardware-software codesign of embedded systems. Proceedings of IEEE, 82:967–989, July 1994. Y.Kim, K. Kim, Y.Shin, T.Ahn, W.Sung, K.Choi, and S.Ha. An integrated hardware-software cosimulation environment for heterogeneous system prototyping. Proc. of ASPDAC, pages 101–106, August 1995.

Suggest Documents