A Parameter-based Mapping Scheme for Behavior ... - CiteSeerX

6 downloads 5127 Views 62KB Size Report
The main goal of the approach is to enable the system designer to preserve as much as possible a separation between the behavior of the system and the ...
A Parameter-based Mapping Scheme for Behavior/Architecture Co-Design Marcello Lajolo Politecnico di Torino Torino, Italy [email protected]

Luciano Lavagno Universit`a di Udine Udine, Italy [email protected]

Claudio Passerone Politecnico di Torino Torino, Italy [email protected]

Alberto Sangiovanni-Vincentelli University of California at Berkeley Berkeley, CA, USA [email protected]

Abstra t

We present a parameter-based scheme for the architectural mapping of the behavior of an embedded system initially specified at a very high level of abstraction. The user can run on the same netlist two types of simulations: a behavioral simulation with a very abstract model of communication and a mapped simulation with the behavior mapped onto a specific architecture template. The template, including processors, memories, ASICs and busses, can be created by the designer using blocks from a simulation library. The technique is based on automatically inserting filters at each input/output port of behavioral blocks, that act as architectural unit (e.g. bus) access controllers. The main goal of the approach is to enable the system designer to preserve as much as possible a separation between the behavior of the system and the architecture chosen to implement it. Architectural details can be incorporated gradually during the verification step and driven by performance/cost constraints that can be continuosly checked in the same co-simulation infrastructure. 1

Introdu tion

With the ability to mix processors, complex peripherals, and custom hardware and software on a single chip, full-system design and analysis demands a new methodology and set of tools. Hardware/software co-design has received a lot of interest in the last few years and several approaches have been proposed [3]. Its aim is to try to keep together the design of hardware and software during the entire design flow in order to allow the system designer to test the hardware/software integration as soon as possible. However, hardware/software co-design does not seem to be the correct point from which to start system level design because it suffers from the fact to be too close to an implementation view. The established consensus is that the future of hardware/software co-design will be the behavior/architecture codesign, which is the selection of features and of the architectural platform to implement them. This should be the the right level of abstraction to begin system design. The task of partitioning the design into hardware and software should be determined and guided by the decisions taken at the behavior/architecture codesign level. To analyze a specific behavior/architecture, we need to describe

how the behavior will be implemented using the facilities available in the architecture. This process is called “mapping the behavior onto the architecture”. Some behaviors are mapped to software by assigning them to a software task or interrupt-service routine. Communication between behaviors must also be assigned an implementation, perhaps using a hardware bus, interrupt, polling, mailboxes, shared memory with semaphores, or other mechanism. It is generally necessary to refine the large tokens/transactions from the behavior-level design down into smaller (e.g. bite-sized) pieces that can be carried by the lower-level architectural communication resources. Co-simulation is widely used in order to validate both the hardware and the software component of embedded systems, as well as the interaction between them. It requires both a high degree of detail and good performance. This can be obtained by allowing the designer to simulate at a level which provides only the required information and does not waste time generating unnecessary details. The models adopted should be flexible enough to allow to specify additional details as long as they are available in the design cycle. Rowson lists several co-simulation technique [4] which illustrate the tradeoff between performance and detail. In [2] a technique that allows communication to be represented at multiple levels of detail and which gives a designer the ability to dynamically choose the appropriate level for different parts of the system is presented. We propose in this paper a parameter-based scheme for the architectural mapping of the behavior of an embedded system initially specified at a very high level of abstraction. The user can run on the same netlist two types of simulations: a behavioral simulation with a very abstract model of communication and a mapped simulation with the behavior mapped onto a specific architecture template that can be customized by the user starting by some models distributed with our simulation library. The methodology is based on automatically inserting behavioral port holes at each input/output port that act as bus access filters. The main aim of the approach is to enable the system designer to mantain the quasi-total separation between the behavior of the system and the architecture selected to implement it. Architectural details can be incorporated gradually during the verification step and driven by performance/cost constraints that can be continuosly checked in the same co-simulation infrastructure. Once a behavior has been mapped onto an architecture, then a new set of verification and analysis tasks must be performed. Initially for our behavioral simulation, we use an abstract notion of time, assuming an infinitely fast reaction to events. When we map it onto an architecture, our behaviors will be annotated with real or estimated delays, based on how they will be implemented. A behavior mapped to software will have two different types of delays: its own (estimated) computation time and the delays caused

by sharing a processor with other behaviors and an RTOS. Similar architectural delays occur in hardware and are the result of mapping large tokens onto narrow buses. This is similar to the computation time in software. They also occur in getting access to a shared bus, like the delays caused by sharing a processor. Meeting throughput, reaction time, and latency requirements call for a verification step using the mapped behavior annotated with architectural delays. At this behavioral level we will have estimated rather than cycle-accurate delays. These estimates are not good enough to write test programs, but can be extremely useful in making the high-level choices of what is in hardware as opposed to what is in software; how to architect the software in terms of number of threads; and which RTOS to choose. By separating behavior and architecture, we enable their coevolution. Unique requirements of the behavior will require changes in the architecture. Cost considerations within the architecture may imply behavioral modifications. Keeping implementation neutral and allowing the independent mapping of behavior onto architecture is essential to good system design and is the essence of what has been termed hardware/software codesign. The rest of the paper is organized as follows. Section 2 gives an overview of our mapping scheme. Section 3 describes its implementation within the POLIS framework. Section 4 provides some conclusions and an overview of future extensions and improvements.

 Computation delays, i.e. how much time it takes to execute a given behavior on a CPU or with a hardware implementation.  Architectural delays, caused for instance by sharing a processor with other behaviors and an RTOS.  Communication delays, which may occur for example on a bus with multiple masters which can instantiate a transaction. Delays can be estimated or measured on a reference implementation, and can be annotated on the functional behaviors to perform a mapped simulation. The main goal of our work is to provide the designers with a flexible mechanism to specify an application at the behavioral level, and then gradually refine it to the implementation level within the same framework. At first, only functional simulation is allowed, and after mapping and delay annotation, performance simulation is applied. As we already have an hardware/software codesign framework (see Section 3) where each module can be assigned either an hardware or software implementation, here we will mainly cover mapping of communication links onto communication resources. We have chosen a parameter-based mapping scheme to carry out the process of mapping a behavior onto a target architecture; the approach is based on considering the behavior of each module in the netlist of the system at two different levels of abstraction: 1. Functional level.

2

Mapping the behavior onto the ar hite ture

The design of a large and complex embedded system requires a certain number of steps and refinements, which start from an abstract system level view, and go down to the final implementation. At each step the designer takes various decisions and verifies them using different validation tools: if they are satisfactory, then they are used as a refinement, otherwise the design flow is reiterated. We are particularly interested in the verification phases obtained through simulation of a model of the system at different levels of abstraction. In this paper we will concentrate on two of them, which occur early in the design flow: Functional (Behavioral) Simulation : this is usually the first step after the system specifications have been written into an executable form. The main goal at this level is to check the correctness of the chosen algorithm, without considering architectural details. There is no explicit notion of time, and everything runs concurrently. Performance (Mapped) Simulation : a candidate implementation architecture is selected (based on past design experience) and each behavior of the functional specification is mapped onto an architectural element. Estimates or measures of the execution time are used to synchronize the simulation. Mapping a behavior onto an architecture is a refinement of the pure functional simulation and introduces new details that were previously abstracted out. For each module, an implementation that can be either hardware or software is chosen, and for a software implementation a particular CPU is selected. Also communication links should be mapped onto some architecture, like an hardware bus, shared memories, interrupts and so on. This step may also require to refine communication links that use large tokens at the behavioral level, into smaller (byte-sized) pieces of data that can be carried by the lower-level architectural communication resources. While during functional simulation time is completely ignored, performance simulation should take into account the delays associated with each architectural element. These delays can be of many forms:

2. Mapped level. Within our methodology, the system is described as a hierarchical netlist containing a collection of blocks, with additional filters at each input/output port. Each filter models the behavior of a particular communication resource, and it is used to specify a mapping between the behavioral communication link and an actual architectural element, like a wire on the system bus. At the functional level, the communication between the blocks is performed by means of events and we are only interested on time and value of the data to be transferred, without concernes about details on how it gets done. For instance, at this level all nonfunctional communication such as instruction fetches and local data accesses is abstracted out. At the mapped view each module (communication) is assigned to a specific element (communication line) in the architecture template. The filters here have an important role because they are responsible to incorporate the effects of bus latency and arbitration mechanism. At this level the traffic on the bus is actually generated with the sequence of instruction fetches and interrupt or memory mapped communication. Filters are therefore a powerful mechanism to keep the behavior and the architecture of a system separated. An example, which will be used in the rest of the paper, is depicted in Figure 1. The system represents a simple connection of two processes (process1 and process2), which communicate through a control connection (represented by the signals DONE1 and DONE2) and a data connection. The data connection is supposed to carry large tokens, written in a shared memory at an address which is also exchanged by the two processes. At this level, concurrent access to the shared memory is not an issue, since detail model of the arbitration is not yet considered. However, an arbitration scheme will be necessary once the communication is mapped onto an architecture. A possible implementation scheme for the example is shown in Figure 2. Here the communications between process1 and process2 and the shared memory have been mapped on a bus monitored by an arbiter that acts as a bus control unit. Process1 and process2 can be mapped on either hardware or software and the shared memory is implemented with a reserved portion of the

RTOS

DONE1 ADDR1

PROCESS n. 1

DATA1

PROCESS n. 2

DONE2 ADDR2 DATA2

SW

Figure 1: An example of system behavior.

HW

Figure 3: The original architecture template in POLIS. data segment in the CPU memory. The point to point communications between process1 and process2 (signals DONE1 and DONE2) have not been mapped on the bus. PROCESS 1

DONE1 DONE2

RTOS

PROCESS 2

RnW ADDR DATA REQ1 REQ2

SW

GRANT

ARBITER

SHARED MEMORY RnW ADDR DATA

Figure 2: The desidered architecture.

BUS

Our methodology allows the designer to specify and simulate a system at the functional level as in Figure 1; after that, he can successively refine it without changing the netlist, in order to simulate the behavior mapped onto the architecture shown in Figure 2. This is accomplished by using a set of parameters (see Section 3) to describe the architecture for the chosen communication resource. 3

HW

The enhan ement of the POLIS's ar hite ture template

In POLIS [1] the system is described using a formal behavioral model based on a network of communicating entities called Codesign Finite State Machines (CFSMs) that can be implemented both as hardware and software. The user can map each CFSM to an implementation in either hardware or software, estimate the performance, and perform a simulation to evaluate that particular mapping. Finally, a hardware or software implementation, including the RTOS, can be synthesized. The currently implemented architecture template in POLIS is limited to a single microprocessor and custom hardware as shown in figure 3. Basically, with this architecture template, the software and the hardware parts of the system are completely separated and there is no notion of shared resource between hardware and software components. From a functional viewpoint, the only difference between hardware and software is the mechanism used to synchronize the various CFSMs. Hardware CFSMs operate concurrently, while the operation of the software CFSMs is coordinated by a scheduler modeling the RTOS used in the final implementation. In this work we have enhanced the basic architecture template supported by POLIS and introduced the model of a system bus with a memory and an arbitration mechanism (see Figure 4). The main goal was to define a parameterized model for the system interconnect, customizable by using a set of parameters.

MEM

BCU

Figure 4: The enhanced architecture template in POLIS.

3.1

Mapping Parameters

The extended architectural template is based on using a hierarchical unit in place of each functional block (process) that is part of the behavioral specification. This unit, as illustrated in figure 5, inserts an object (from a pre-defined library of communication blocks), drawn as black boxes in the picture, for each I/O port of the functional block. Figure 5, in particular, shows the mapped view of the example of figure 1. The designer can use parameters attached to the black boxes in order to assign each behavioral communication to an architecture bus resource. This uses the same mechanism that in POLIS allows one to attach a parameter to a functional block specifying if it is implemented in hardware or software. In order to switch between the functional and the mapped view is sufficient to change a parameter called simulation view of the co-simulation. The parameters associated to each communication filter (black box) are three. The first parameter is called bus line and it is used to specify the line of the bus on which the corresponding communication must be mapped. This parameter is set to NONE by default that corresponds to a point to point communication that do not involve the use of the bus. The second parameter, called arbitrated, is taken into account only when the previous parameter is not set to NONE and it is used to specify if the communication through the associated port needs arbitration or not. When it is set

bus_line: NONE

Parameter Name DMA SUPPORTED MAX DMA SIZE HANDSHAKE CYCLES NO DMA HANDSHAKE CYCLES DMA MEMORY ACCESS TIME NUM BITS ADDR BUS NUM BITS DATA BUS

arbitrated: NO priority: 0

PROCESS n. 1

PROCESS n. 2 DONE1 ADDR1 DATA1

BEHAVIOR

BEHAVIOR DONE2

Table 1: Bus parameters

ADDR2 DATA2

bus_line: DATA arbitrated: NO priority: 0

The bus parameters can allow one to rapidly describe a new communication architecture by just having a look at the bus specification. This parameter can also be extracted once for each type of bus and stored in a simulation library .

bus_line: ADDR arbitrated: YES priority: 1

Figure 5: Parameter-based mapping (partial). to 1 an event will not be effectively sent to the bus until the module receives an explicit grant to access the bus. When it is 0 an event is always sent to the bus without waiting for grant. The third and last parameter is an integer number called priority that specifies the priority of the request sent on that specific line. Highest number means highest priority. This parameter is by default set to 0 and it is taken into account only when the corresponding line has the parameter arbitrated set to 1. The bus scheduling policy is basically priority based. Requests with the same priority are served in a round-robin fashion. Note that the process of assigning the correct values to each bus port hole is very time consuming. The problem is also increased by the fact that the mapping depends in general from the type of implementation (HW/SW) selected for the module. In order to alleviate this problem, the assignments made once for a given implementation can be stored and reused every time that the user re-runs a simulation with the same implementation for the module. 3.2

Bus Parameters

The number of masters of the bus 1 is automatically extracted by counting the number of output ports that have the parameter arbitrated set to 1. Additional parameters are then available in order to describe the bus. The set of parameters that we provide for this task can allow to specify the communication architecture at a very abstract level. This granularity of the specification can then be improved in order to incorporate more and more architectural details until the final phase where the netlist can be exported and passed to an actual co-verification environment. Table 1 shows some of the parameters available. It is possible to select: 1. Whether to employ Direct Memory Access (DMA), and the size of each DMA block. 2. The number of bus cycles required for handshaking with the arbiter of the bus in DMA and non-DMA mode. 3. The memory access time. 1 A master of a bus is a module that can initiate a bus transaction differently from a

slave that can only react to a bus transaction.

4

Con lusions

This paper presented a design methodology allowing the system designer to gradually increasing the level of detail of the architectural implementation of a given selected behavior without changing the co-simulation environment or requiring low-level details to be defined (e.g. the exact bus protocol, the detailed behavior of bus lines and so on). Everything is handled within a homogenous infrastructure where early architectural implementation choices can be specified and evaluated by customizing architectural templates available in our co-simulation library. In the future we will improve the user interface of our simulator by automatizing the process of generating the mapped level view of each functional module, and by simplifying the process of chosing the actual parameters. Acknowledgements: The authors would like to thank Anand Raghunathan from NEC Princeton USA for his help with defining the problem and outlining solutions. Referen es

[1] F. Balarin, M. Chiodo, P. Giusto, H. Hsieh, A. Jureska, L. Lavagno, C. Passerone, A. Sangiovanni-Vincentelli, E. Sentovich, K. Suzuki, and B. Tabbara. Hardwaresoftware Co-Design of Embedded Systems: The POLIS Approach. Kluwer Academic Publishers, Norwell, MA., 1997. [2] K. Hines and G. Borriello. Dynamic communication models in embedded system co-simulation. In Proc. Design Automation Conf., pages 395–400, June 1997. [3] G. De Micheli and M. Sami, editors. Hardware/Software Co-Design. Kluwer Academic Publisher, 1996. [4] J. Rowson. Hardware/software co-simulation. In Proc. Design Automation Conf., pages 439–440, June 1994.

Suggest Documents