Rapid Prototyping of Digital Systems with COTS/ASIC Components S. Famorzadeh, T. Egolf, V. K. Madisetti, P. Kalutkiewiczz, M. Falcoz, R. Dreilingz Abstract| Electronic Systems Design Automation (ESDA) is one of the challenging tasks in rapid prototyping and veri cation of digital systems. The goals of rapid prototyping include | (i) Reduction in the design and veri cation time by an order of magnitude, (ii) Reduction in the cost of the design and lifecycle support process, and (iii) Capability for rapid insertion of new technology and algorithmic innovations into existing prototypes. This paper discusses the models and views in rapid prototyping, facilitated by VHDL, that allow rapid design and veri cation of large digital systems.
in a dierent description medium. Additionally, early integration and code reuse are impossible and a uni ed knowledge of dierent heterogeneous description mechanisms and tools is missing. Another method uses a mixed language approach which uses a \wrapper-like" interface to existing hardware simulators. Hardware design is performed using standard hardware description languages, while software design is done using high-level programming languages. The advantage to this choice is that both hardware and software are developed in their own environments, and high simulation speeds can be achieved. Disadvantages are I. Introduction that no common debugging environment is available, and The increasing complexity of subsystems realized by digvery little support is provided for HW/SW co-speci cation ital hardware has created an urgent need for new design [8][9]. methodologies and tools to reduce the complexity of the Why VHDL ? | Common to all approaches is the design to manageable levels, without compromising on the lack of a uni ed environment for hardware and software quality of the product. The design of any complex system during the early stages of the design where not all comstarts from an abstract level, where only a set of requireponents of the system are at a stage that can be protoments are known, and works its way down to a physical typed. In order to provide these requirements a number level, where a working hardware prototype is available. of new languages have been developed that support both There are many dierent (virtual) paths that one could hardware and software description, e.g., VHDL, Verilog, take to travel from the abstract level to the physical level, Hardware C. The VHSIC Hardware Description Language and one of the primary problems in system design is iden(VHDL) is an IEEE standard (IEEE-1076), and is widely ti cation of those steps that maximize a certain objective used all phases of the design of integrated circuits. In addifunction, e.g., time-to-market (TTM) or maximum perfortion VHDL, being a superset of ADA with few exceptions, mance, or both. was designed for the development of complex and concurThe conventional approach to design of digital systems rent software systems. These powerful expressive capastarts by independently specifying and designing hardware bilities of VHDL make it suitable for the analysis, design and software, followed by an integration and debugging and documentation of embedded systems. Another advanphase on the tested hardware prototype. The disadvantage of using VHDL for a uni ed design representation is tages to this method are well known. The problem is the availability of many support tools such as interpreters, that the two domains are completely isolated from one another, and it is not until late in the design cycle be- compilers, hardware accelerated simulators, and synthesis fore a prototype is available and the two can be integrated packages. In this paper, models and views that are esand tested. Often, a redesign of the hardware is war- sential for rapid prototyping of digital systems with mixed (COTS) components are ranted. To combat the problems encountered with the pre- ASIC and Common O-The-Shelf 1 has shown that by using these presented. Our experience vious method, a dierent approach using graphical frontends for HW/SW co-design has been developed. These models and views one can prototype the system at many tools provide a medium for executable speci cation of an dierent levels of abstraction (as in Figure 1), which in implementation-independent system. Examples include Stat- turn shortens the design cycle through a smooth transition echart[6], SDL [3], and Statemate[7]. The advantage of from one level of abstraction to another. these tools is that they provide an interface to powerful II. Views CASE tools, however, a signi cant disadvantage is the lack of hardware support. Hardware design has to take place As mentioned earlier, one of the main problems of a com1 Our recent experience indicates that using ISA/BFM models as discussed in later sections of this paper, it is possible to install a MercuryTM operating system on the virtual prototype of an i860 RISC processor, demonstrating \ideal" hardware/software integration in a virtual prototyping environment.
This research was supported by Advanced Research Projects Agency (ARPA). Authors are with DSP Laboratory, Georgia Tech, Atlanta, GA 30332-0250. zAdvanced Engineering and Technology Division, Lockheed Sanders, Inc., Nashua, NH 03061-0868. Email:
[email protected],
[email protected].
1
Data ow is the second design view under VHDL, and it bridges the gap between the structural and behavioral views. It is important to note that data ow is not a composite of structural and behavioral views, rather it represents a modeling view in its own right. The data ow view represents concurrency, without any explicit connectivity. Each statement in the data ow view possesses functionality, and a ow of execution is ordered by the availability of data (data-driven). Typical VHDL data ow statements include conditional and selected signal assignments. In addition, VHDL provides guarded signal assignment, eectively controlling the ow of data in the design. The third design view is the behavioral view In the context of the Ecker Cube, the behavioral level of abstraction relates to the use of sequential statements. In general, \behavioral" denotes the speci cation of any functional property of a system, while the precise form (e.g., parallel, concurrent or sequential) plays a subordinate role. Here, both de nitions of behavior are used, and depending on the context, (design view or functional property) the meaning is derived. Therefore, unlike the data ow view, this view uses a control ow mechanism for its execution. Typical VHDL statements for the behavioral view include if, loop, and subroutine calls. The sequential environment in VHDL is provided for by the (concurrent) process statement, where order of execution inside a process is sequential, but multiple processes execute concurrently. Another useful VHDL statement that helps to interface the behavioral and data ow views, is the wait statement. 3
plex design is to de ne the path, starting from an abstract level of speci cation down to a physical design. In order to de ne such a path, one needs to have a virtual model of design representation that can eectively identify all the available design options. The traditional representation of hardware design uses three axes, each representing one of three domains of description, to represent a design. An example is the Y chart due to Gajski[10]. It depicts the structural, behavioral, and geometrical aspects of a design, and is shown in Figure 2. Another closely related model to the Y chart is the X chart[10], which provides an additional axis for testing. Recently, a new design representation was proposed by Ecker[15] that we will utilize in this paper to illustrate the top-down design process (Figure 1). The Y chart is suitable for the design of ASICs, and as such is too low a level in representation, to be able to eciently model the process of embedded systems design. Indeed, the Y chart includes the concepts of placement and routing that are mature areas in the industry, and not of immediate interest to the systems designer, though necessary for the nal stages of design. Ecker's Cube appears to be[15] ideal for representation of a system prototype under design, where one starts from point A and ends with a prototype at point Z (in Figure 1). The Cube relies on three independent orthogonal views of the prototype: The Design View The Timing View The Value View The Ecker Cube can be partitioned into three classes of models for the prototype under design: Algorithmic and System Level Models, Register Transfer Level Models, Gate Level Models. These three models have conventionally been utilized in the design of ASICs, and now are extended to general systems design environments. The next few sections describe the various views and models as facilitated by VHDL.
B. Timing Views There are three dierent timing aspects that can be speci ed | propagation delay-related, clock-related and causalityrelated. All three timing views are supported under VHDL. Propagation delay is the nest level of timing, and it is the one most closely related to hardware. Using this level of timing, a variety of approaches to modeling can be taken. The typical VHDL views that are applicable to this level are Unit, Fixed, Variable and Generic Variable delays, with the unit delay being the most limited model, and the variable generic the most exible one. The next level of timing is clock-related corresponding to a synchronous design. In this design view all combinatorial operations are related to a clock signal, and all sequential operations possess no speci ed timing. The relevant VHDL construct is the wait on clock. Using this construct, actions of the design can be synchronized to either falling or rising edges of the clock. The next timing view provides a way for the speci cation of synchronization points without using a clock or propagation delay information. This level is most useful at system or algorithmic level speci cation where ordering of the events is important, and not the actual wall-clock time. At this level, high level constructs such as Semaphores or Send and Receive operations can be performed on abstract
A. Design Views The Design View encompasses the (i) Structural, (ii) Data ow, and (iii) Behavioral aspects of the design. 2 The structural view is the traditional hardware design view, and is closely related to the modular netlist and schematic capture view of a typical CAD tool. Therefore, it represents a at or hiearchical interconnection of parts and their instantiation. The components can be selected from a library or from a set of user-de ned models that capture the behavior of components. The component instantiation is a typical VHDL statement utilized in the structural view. Other important statements of VHDL that are directly relevant to this model are block statements and generate statements. The former help in describing a local design hierarchy, while the latter help in generating parameterizable structures.
3 The interested reader is referred to detailed examples for structural, data ow and behavioral views in VHDL described by gures 11, 13, and 8, respectively.
2 The Behavioral and Structural views can be identi ed in the Y chart model, however, the Data ow view is a distinctive characteristic of the Ecker Cube and VHDL.
2
value views. The highest level of modeling corresponds to the causality of timing view ,and is used for algorithmic and system speci cation and design purposes. The next lower level is related to the clock in the timing view and is used for Register Transfer Level modeling of the design. The lowest level of modeling is accomplished using the propagation-delay view of timing and is primarily used for Gate or Logic level design. In the following subsections, each of these models are discussed in detail. A. System and Algorithm-Level Models System and Algorithm models are the most abstract level of modeling, and in the Ecker Cube they lie on the highest plane as shown in Figure 1. Modeling at a relatively higher level of abstraction provides several advantages. First, the design speci cation can be captured using high level models which may also serve to document all design modi cations. Although a small overhead is incurred for development and maintenance of these models, higher level models greatly reduce the cost of product maintenance throughout the product life cycle. Secondly, by capturing the speci cations via a high level model, a set of executable speci cations are obtained that can be simulated, alone or within their intended environment, to ensure that the design functionality and performance requirements are satis ed. For example, communication protocols between the design and its environment may be validated. Furthermore, the executable speci cations can guide the designer in partitioning the design onto a set of hardware and software tasks. As shown by the cube, there are nine possible models that correspond to this level of virtual prototyping. These models correspond to dierent combinations of the Design and Value views. The most interesting models are de ned by the vertices running along the Value views at each of the behavioral, data ow and structural points, which correspond to the following 3-tuples: (Causality,||{, Behavioral) (Causality,||{, Data ow) (Causality,||{, Structural) These models represent three virtual prototypes of the system to be designed and assist in speci cation, partitioning, and selection of devices to be used in the system. In all these models, only causal time relations between synchronization points are speci ed. Synchronization is typically implemented using semaphores and communication channels, which can also facilitate data exchange. The VHDL constructs that are available for algorithm level modeling are processes, functions, and procedures. These constructs form the basis for describing the behavior independently of any hardware-speci c details. Multiple processes in an algorithmic model use signals to interface to each other. The timing at this level, besides the causality view of the events, corresponds to the time that a module is busy executing a speci c function, rather than to the inertial or propagation delay of an electrical circuit, or the set-up and hold times of a latch. Statistical packages can also be utilized that collect statistics about the virtual system under simulation. The statistics normally include the current
communication channels data types. These operations can be implemented as subroutines together with type de nition in a package as abstract data types. It is important to note that it is possible to include processing times with operations that take place, even though, the synchronization mechanism is at a causal level of abstraction. These processing times help in evaluating the system performance based on the designers timing budget. C. Value Views Under VHDL Under this view, VHDL provides a type speci cation to specify the characteristics of an object, usually related to data or control. An object can be any of the following: Signals, Variables, or Constants. Each object can represent a wide variety of types | from scalar numeric types, to composite arrays and records, to le types. The simplest type is the bit type, corresponding to a digital bit value, and can have a value of 0 or 1. For more accurate representation of the digital values a Multi-Value Level (MVL) system may be used that includes a multitude of dierent values between 0 and 1 [1]. The next value abstraction level uses composite bit values. Here a number of bit values can be grouped together in an array. This level of abstraction allows a complete manipulation and structured representation of a set of bit values. It allows more semantic exibility, for example, an eight-bit array can be used to represent an ASCII character. VHDL array types and records support this level of abstractions. Once again, if desired special packages can be included to allow for MVLs. Another powerful VHDL construct for this level is the capability to declare unconstrained arrays. This allows the development of general models that are not dependent on the width of the data. The next higher level of abstraction is represented by abstract values. These are normally high level scalar types such as integer, real, or enumerated types. In addition, it is possible to de ne user-de ned types by the use of a record. For example, to represent the IEEE standard single precision oating number the following record can be used: subtype SINGLE_EXP is INTEGER range -127 to 127; subtype SINGLE_MAN is INTEGER range -8388608 to 8388607; type IEEE_SP_FP is record MANTISSA : SINGLE_MAN; EXP : SINGLE_EXP; end record;
Abstraction of values not only oers a clearer design representation, but also enhances the the speed of simulation. III. Models
In the previous section under Design Views, three different distinct views provided by VHDL were introduced. In this section, three dierent levels of modeling in VHDL that are based on those views are presented. These models correspond to dierent planes in the Ecker Cube, where the position of the planes is xed by the timing view, and the space spanned by the planes is de ned by design and 3
simulation time, minimum and maximum busy times of a process over all invocations, average and total busy time of a process across all invocations, and the utilization of each process. An example of such a package from [1] is shown in Figure 3. The package declares a number of variables as strings which are used to identify each time value that is reported. It also declares a record (called stat-info) which has 6 elds associated with time, such as minimum and maximum times. In addition, a function called computestat is also declared and described in the package body. The function compute-stat writes all the timing information about a particular algorithm in a le for analysis. The rst model described by (Causality,||,Behavioral) is primarily used for system speci cation. Behavioral speci cations with causal timing attempt to characterize the input-output speci cation of the system to be designed. That is, for a given set of inputs, the way outputs are derived is functionally speci ed by means of a mapping from input to output. Here the model describes the behavior of a system using a single thread of control, much in the same way as a von Neumann machine. Given this model, a set of performance metrics can be derived that represents a rough estimate of the computational requirements of the behavioral speci cation (and other performance criteria such as optimum signal to noise ratio, burst error handling and channel eciencies). This estimate is based on high level operations such as the number of adds, multiplies, and memory moves. Therefore, at this level a set of rough performance speci cations, or gross operational characteristics can be obtained. For example, an environment like QuickFix[2], a VHDL based environment under development at Georgia Tech, enables an algorithm designer to study the trade-os associated with xed point implementation of an algorithm on a certain architecture. The second model described by (Causality,||-,Data ow) is mainly used for high level system partitioning. At this level, the logical processing requirements for the system may be viewed in terms of the overall data ow requirements for each of the modes and levels of system operation. Here, the functional decomposition (e.g., FFT or DCT) of the major processing activities of the system can be examined separately, as long as the overall data ow interfaces between them are maintained. Therefore, this modeling view is most helpful in partitioning the design and evaluating the dierent alternatives available to the designer. Unlike the previous model, this modeling style represents the system as a set of concurrent processes that are synchronized using tokens, semaphores and communication channels A set of concurrent processes are processes which interact with one another, they may cooperate, share data or compete for common resources, as opposed to independent processes which neither compete nor cooperate with each other. The nal model described by the 3-tuple, (Causality,| |,Structural), is used for assignment of actual hardware to the individual parts of the preceding model's partitioning. Here, due to the structural nature of the model, a set of processors, buses, switches, and device controllers are used 4
to perform the system operations at a very high level, see Figure 4. This level is useful in obtaining device dependent performance values. At this level, behavioral models of these components are used to get an estimate of the eciency of the previous partitioning based on the available devices. At this stage it is possible to partition the design onto hardware and software components, compile the software, and execute it on an Instruction Set Architecture (ISA) model of a popular microcontroller to evaluate the eciency of the software based on the available instruction set of the device. This model represents the very rst prototype of the embedded digital system, with behavioral models of hardware units along with ISA and Bus Functional Models (BFM) of the COTS components. It is possible, at this stage, to de ne the exact communication protocols between modules, be they are COTS parts or ASICs to be developed, via the use of BFMs. By the end of this modeling phase, the partitioning into hardware and software is completed, allowing them to proceed independently and concurrently. B. Register Transfer Level Models The remaining task after all possible partitioning steps (at the causal level) includes the design of the hardware at the Register Transfer level. The RT level describes the design at a level much closer to the underlying hardware than the previous level. This level is mainly used for hardware design, and in the early integration of hardware and software. The dierence between register transfer and the algorithmic level is the level of detail at which the internal structure with respect to the clock is speci ed. At the algorithmic level. The variables in the description do not necessarily correspond to the internal registers of the design, nor do assignment statements correspond to actual transfers | only the input-output behavior is considered. At the register-transfer level, processes are each implemented as a combination of two modules: Data Path | or the module responsible for operating and storing data; Controller | responsible for generating the control signals of the data path as well as the state/output sequence of the embedded system. The data path is usually implemented with operational units, multiplexors, buses, and storage elements such as registers, latches, or memory elements. The control part is usually described by way of a Finite State Machine (FSM). For each of the concurrent processes, this level describes the register transfer behavior in each state of each controller in terms of arithmetic and logical operations evoked, the registers loaded, and the next state. For this modeling level, the timing view has changed to that of a system clock. The scope of an RT-level model is generally a chip, and can be used in several dierent ways. Using a RT-level description of a design enables the designer to verify that the logical decomposition of the hardware design to register transfer level elements is functionally equivalent to the higher level algorithmic models. It can also be used to verify the clocking scheme, and the sequencing and control
of a synchronous system. Furthermore, it can be used to obtain the detailed logic and layout of the design by an automated synthesis process (e.g., logic synthesis followed by silicon compilation). Once the transformation from the algorithm level to the RT-level is done, it is possible to verify the correctness of the transformation through simulation or via formal veri cation. A testbench along with a set of test stimulus are created for the algorithm models as part of the system speci cation, and the same test vectors are then applied to the register transfer model, with the two results being compared to assure correct transformation of the description. This is an important procedure, considering that most design errors occur during these types of transformations. This modeling view corresponds to the center of the cube and is presented by the 3-tuple, (data ow, clock-related, composite-bit-values). VHDL supports this state by conditional signal assignments using composite type. Selected signal assignments have a format similar to case statements where a single expression is evaluated, and the value assigned to the target signal is determined by matching the values for the clauses against the value of expression, assigning the value associated with the matching value (see Figure 9). What makes this model interesting is the VHDL-based commercial RT-level synthesis tools that are available. Using these tools, once an RT-level of the model is obtained, leads to a hardware prototype.
A. COTS Models A typical o-the shelf component is the microprocessor or the programmable DSP chip. Microprocessors are incorporated in most digital system designs. The following discussion describes methods used by us for modeling these devices, and the modeling techniques presented here can be applied to a wide variety of applications | ranging from a 4-bit controller to a state-of-art RISC processor. The rst model recommended for a microprocessor is a Functional Instruction Set Architecture Model (FISA) of the device. FISAs are an integral part of simulation at the board level. An FISA is a high level model that is capable of executing the instruction set of an COTS microprocessor[1]. Here, only those parts that are related to the instruction set of the device are implemented. The body of the model is basically a case when statement, and based on the current instruction a particular action is taken. The variables in the model include a number of dierent arrays for the device register le and control registers. No distinction is made between internal or external memory. The executable object code can be stored in a le, or if it is greater than a few thousands of words, it can be broken into separate les in order to keep simulation memory requirements at a reasonable level. The same is also true for the data memory. In order to avoid the time-consuming le I/O operations for reading instructions and data, an internal array is used much the same way as a RAM is used when a program resides on a hard disk. Most variables, including the data memory arrays, are declared using abstract data types such as integers. Control registers are declared as bit arrays and aliased bitwise for easy access. Data transformations on integer data types of 0 to 32 bits wide are performed using the standard VHDL operators, +,*,/. For oating point operations and longer precision data transformation operations, a set of packages can be developed that can handle the desired data types. The reason for using abstract data types is the enhanced simulation speed that is achievable through use of the VHDL standard operators. The FISA only captures the functionality of the processor, and not its performance. All instructions in the FISA are executed in one cycle regardless of their actual cycle time. However, if one is interested in the performance issues of a processor, then a Performance Functional Instruction Set Model (PFISA) needs to be developed. PFISA models are an extension of the FISA | the dierence is that the model is no longer based on single cycles, and the actual cycle time information of the instructions are included in the model[1]. For example, consider the i860XP oating point unit which can be operated in both pipeline and non-pipeline modes. The pipeline is three stages long, if single precision oating point numbers are used, and two stages long if double precision oating point numbers are used. The stages for the single precision operations each will take a single cycle and the stages for the double precision operations each take two cycles and if pipeline each will produce a result every cycle. Certainly, the mode of operation of the oating point unit will have a great impact on the performance of the algorithm running
C. Gate Level Models Gate level modeling is a lower level of modeling for a digital systems designer. An interesting point of note is the fact that it was the starting point of CAD design in the 70's and 80's. Conceptually, this level of digital design corresponds to assembly language programming in software engineering. The behavioral description at this level is done by a set of boolean equations and via nite automata. At this level, the design is captured as a sea of gates and
ip- ops | once very popular due to the familiarity of the digital designer with the gate level design. However, more and more designers are moving to the higher levels as the performance of synthesis tools improve (e.g., behavioral VHDL synthesis), much the same way as assembly language programming is being abandoned due to better compiler technology. IV. Systems with ASICs/COTS parts
The preceding sections introduced general models and views that are available under VHDL. In this section, a set of models for Common O-The-Shelf (COTS) components and ASICs is presented. The intent is that by using these models a virtual prototype of the digital system under design can be obtained at dierent levels of abstraction. A general digital system is composed of a combination of microprocessors, memories, buses and ASICs. As was explained earlier, it is possible to use dierent levels of abstraction to describe the behavior of the system. The lower the level of abstraction, the closer is the model to the actual prototype of the system. 5
on the i860. A sample of the code from the PFISA model of the i860XP is shown in Figure 5. This code is part of the main body of the PFISA model, the multiply instruction is decoded rst in the opcode case statement of the model. The appropriate variables are then set in the decode section, two ags and a counter are used to identify the mode of the operation. The counter is used to keep track of the number of cycles before the result of the multiply is written to its destination in case of non-pipeline oating point operations. The \*" has been rede ned in a separate package to perform the IEEE standard oating point operation with as much as 9 digits of accuracy. A record is de ned having as one eld the mantissa and the other as exponent both of integer types. In addition to including cycle times, PFISA also includes other performance issues such as cache hit ratio, and internal vs. external memory cycles. Some of the performance issues (such as cache hits and others) can be modeled statistically, for faster simulation. Reasonably accurate statistical models can be used to approximate the cache performance. Furthermore, the variables to the statistical models can be controlled by the user, so that dierent performances for best, typical and worst conditions can be simulated[19]. Therefore, given a PFISA it is possible to evaluate the performance of any algorithm on the target processor. Using these methods, it is possible to describe any microcontroller on the market. Recently, an ISA model of i860XP RISC processor was developed using these very ideas at Georgia Tech with encouraging results. Depending on the type of instructions, integer or oating, anywhere from 2000 to 4000 i860-cycles per second are executed with the model on a Sparc10 workstation. It is also possible to construct only a Performance Instruction Set Architecture model, where no functionality is included with the instructions. The purpose of such a model is to study performance issues alone. The simulation speed is much greater than either of the FISA or PFISA models. FISA and PISA model all the internal activities of a processor, however, in the real world the processor is intended to interact with many other devices. In order for a microprocessor model to be useful, it must be capable of modeling all the processor's external ports. It is only then that a faithful model of the processor is available. The BFM includes all interrupts, serial ports, address lines, data lines, timer lines, input-output ports, and any other pin that is visible at the boundries of the processor[20]. It is important to note that bus functional models are not unique to processors. Any device with external ports requires a bus functional model. The timing information associated with the BFM is usually clock- or propagation delay-based. The intent of the BFM is to resolve any signal contentions or timing miss-matches that might occur during the normal operation in the intended environment of the device. The BFM can be used in conjunction with the PISA, PFISA and FISA models by which a complete model of the microprocessor is available, or it can be used with a much simpler model that includes those selected instructions that support external bus activity. The former model is most useful in early phases when the designer is interested in
simple interactions on the bus. Memories are omnipresent in digital systems. Memories can easily be modeled in VHDL at a behavioral level by use of signals and variables. The signals can be used to model address lines, data lines, and control lines, such as output enable and read/write signals, and variables in form of one dimensional or two dimensional arrays can be used to hold the memory contents. Based on the values on the address lines, the correct data values are found in the array and either read or written based on the state of the read/write signal. It is also possible to de ne the memory image at loading time as an external le. This feature is very important as the loading of reasonable size memories can require many cycles. The array modeling of memory is quite effective for small sizes, however, for large memory systems the array size will get out of hand. For large memories the VHDL access types provide a convenient way for implementing virtual memories. A virtual memory is a memory model that maintains only those pages of the memory that have been read or written. The memory process maps the addresses which are received as part of read or write commands, and maps them to the references of speci c pages. If an address is received that is not in the address space of the current page, then a new page is created and added to the list, and the old page can be released if desired. A simple memory model is shown in Figure 6. For buses, VHDL provides a special signal kind of \bus" that oats to a user-de ned value when it is disconnected from all its drivers. A traditional bus is often implemented with a component called a Bus Interface Unit (BIU) or bus controller. Hardware units that need to access the bus interface with the BIU and request particular actions, and the BIU is responsible for implementing the details of the bus protocols (arbitration) and for transfer of data and other information between the various units on the bus. A bus controller is normally de ned as a nite state machine where each state of the bus represent the actions on the bus, such as data transfer, and control signaling. B. Models for ASICS Rapid prototyping of an ASIC implies a short turnaround time from speci cation to synthesis and veri cation of the device. An ecient realization in this process is via the use of a synthesis program in the generation of the ASIC from the algorithm or register-transfer level. VHDL was developed as a language for modeling applications, namely digital system modeling. In order to be standard, the simulation semantics must be precisely de ned, to ensure independence from implementation. As a consequence, its use for synthesis applications is not straight-forward. Due to the fact that VHDL contains many semantics designed for simulation purposes that are dicult to synthesize, almost all synthesis packages accept only a subset of VHDL, and therefore, the need for two dierent views for synthesis and simulation. Synthesis helps to rapidly transform the design from a higher level of abstraction to a one at a lower level of abstraction through automation, while simulation views assist design by providing a \prototype" of
6
the system early in the design cycle. C. Models for Synthesis Synthesis techniques have matured to a state where they can be applied successfully in the generation of VLSI circuits from a high level description. The highest level of abstraction that can be used with some degree of success in the synthesis process is the behavioral view at the RTlevel. The next levels of abstractions that can be synthesized successfully are the data ow view and structural view at RT-level, and at gate level. The performance of the synthesis tool improves as the level of abstraction is lowered, though the number of possible design options decreases. At the gate level, the synthesis process is concerned with combinatorial logic, and once the storage elements have been identi ed they are directly transferred from the library component that implements them. Gate level logic is described by a set of boolean equations which are transformed to a netlist of components from a given cell library by a two step process. In the rst step of the process, a series of logical transformations are applied to the circuit's equations in order to achieve a performance goal, such as area or speed. In the second step, the elements from a cell library are selected and connected in such way to accomplish the required functionality. For synthesis, the register transfer level displays characteristics from both behavior and structure. The synthesis at this level is applicable only to synchronous systems, since it is possible to describe the design using a simple model of time based on control steps, which correspond to states or cycles of an FSM. There are four main steps in the synthesis[14] process at the RT level: Compilation Transformation Scheduling Allocation. The compilation process is almost identical to compilation of high level programming languages. The intent of the compilation process is to convert the given high level representation to an internal representation that is better suited to synthesis, usually a graph. Transformation is concerned with optimization of the given behavioral representation. Almost all the optimization techniques that were originally developed for compilers; such as dead code elimination, loop unrolling, common subexpression elimination, etc., are also applicable to synthesis. Additionally, there are optimization techniques that are speci cally tuned for synthesis, such as hardware-speci c local transformations, reducing the depth of the internal graph for faster implementation, and increasing parallelism through concurrent processes. After compilation and optimization, the behavior is translated to hardware through scheduling and allocation. The goal of scheduling is to assign each operation at a particular point in time, while the allocation step assigns each operation to a particular piece of hardware. Currently available synthesis tools that use VHDL as a language can perform synthesis at two dierent levels |
structural and data ow RT-levels. The structural RT level synthesis is very similar to designing at an RT-level using schematic capture tools. The design is usually composed of a datapath and a controller, with the datapath being described by the interconnection of a number of leaf modules, such as multipliers, ALU, registers and latches, and the controller described as an FSM. As much as ve dierent styles have been proposed for implementing FSMs in VHDL[13]. The dierence between them is mainly in the number of processes used, their capacity to model combinational outputs, and the number of signals used as the state registers. In the rst style, the entire FSM is described in a single process that includes the clock, the state transitions, and the outputs. The second style of coding, uses two processes to describe the FSM, one process contains the clock and the states, while the other speci es the outputs. The third and fourth styles of coding also use two processes, the dierence between them is mainly in the way the states, clock and outputs are partitioned between the two processes. Finally the remaining style of coding uses three separate processes to implement the FSM. The coding style can have as much as 20 percent eect on the size of the nal synthesized circuit. An example of the rst two coding styles is presented in Figure 9. For the datapath, the leaf modules are described behaviorally and the synthesis tool uses either some library to nd elements that correspond to the leaf modules, or synthesizes them using random logic. The controller section is also synthesized using random logic. An example of an IIR lter that is described structurally at RT level is shown in Figures 7 8. The second level of synthesis is performed at a data ow RT-level. At this point, an example is used to illustrate some of VHDL constructs that are useful for synthesis at this level. Two dierent data ow views for a 4-tap FIR lter are shown in Figures 10 and 11. In the rst example, the behavior of the FIR lter is described through the four multiply and add concurrent assignments. Because of the concurrent assignments, a separate multiplier and adder will be allocated for each multiply and add operation. This circuit was synthesized using Compass synthesis tools with CMOS 0.8 micron datapath library. In this example, the concurrent signal assignment capability of VHDL was used to obtain an FIR lter that is optimized for speed. On the other hand the behavioral description of Figure 11, is optimized for area through the use of conditional signal assignments. This circuit was also synthesized. For this circuit only one multiplier and adder were allocated. The control-step scheduling was implicitly described in the model through the select signal and the use of conditional signal assignment. D. Simulation Models A virtual prototype can be used to delay the decision to fabricate the silicon prototype as late as possible in the development cycle. As mentioned earlier, simulation views are intended to provide a virtual prototype of the system during the design. VHDL allows the designer of a system to have prototypes that are at dierent levels of abstraction,
7
where very abstract models are used early in the design cycle and the less abstract prototypes are used in the later stages of the design. It is also possible to mix the levels of abstractions, to allow for the system integration, since it is possible that not all of the designs under development will have the same rate of progress. The use of VHDL during early stages of the design was discussed in the algorithm and system section. Here, the simulation view that is suf ciently detailed to permit the model to be used within a larger VHDL model for test generation and fault grading of the larger system is discussed. The ASIC modeling view that can be detailed enough for test generation and fault grading of a larger system has to be at a RT level or gate level[1]. The gate level models are not particularly interesting for rapid prototyping of large systems due to their poor simulation eciency, and at algorithmic level the model lacks the timing or bit level accuracy that is required for test generation and fault grading of the lager system. The rst choice for the prototyping of an ASIC that is to be veri ed in a larger system is a behavioral RTlevel model. The behavioral model must fully de ne the functions of the device, and include detailed input-output port timings to support test generation and fault detection and isolation of the device when performing board or subsystem simulation. VHDL synthesis models can be modi ed to include exact timing behavior in the model, after the synthesis is done. It is possible to include worst case, nominal case, and best case timing delays at the model interface. Providing several delay models allows the designers to evaluate the performance of the design under adverse as well as optimistic operating conditions. VHDL allows for all timing parameters to be entered as generic parameters, therefore, allowing the designer to explore several dierent timing con gurations without modifying the model. As an example, consider the structural model of the IIR lter at RT-level as shown in Figures 7 and 8. The code was used as an input to the COMPASS synthesis program and was synthesized successfully. The new model is an extension of the original model that was used for synthesis, and includes the delay associated with the multiplier, adder, and the latch and all their ports as generic times. Now, it is possible to back-annotate the values obtained after synthesis for each of the element's propagation delay, and the simulation model corresponding to the IIR lter is shown in Figure 12. The model includes generic timings for maximum and minimum times for the multiplier and adder, the registers have generic timings for maximum and minimum set-up and hold times. The package iircon g includes a function that when passed minimum and maximum times, will return one or the other based on a constant de ned in the package. Once the simulation model of all the datapath elements along with the controller are obtained, it is possible to construct a model of the entire ASIC at a behavioral level. Note that without the behavioral simulation models, the synthesized elements themselves have to be used in the simulation, which would impede the simulation considerably. The dierence in simulation time can
be as high as a couple of orders in magnitude. VHDL simulators are event-driven simulators, and consquently the simulation time is directly proportional to the length of the event list. The synthesized multiplier for the above example contained 450 gates, if this multiplier is used in the simulation, for each activity on the multipliers input, 450 gates need to be scheduled for simulation. On the other hand there would be only one event to be scheduled with the high level simulation model. As was mentioned earlier, VHDL makes it possible to have a virtual prototype of a system at multiple levels of abstraction as the design progresses. With the help of the prototypes many system level issues can be resolved at different stages of the design, as opposed to the current practice where many system issues are discovered late in the project since early integration is not possible due to the lack of a realistic prototype. V. CAE-Based Emulation
Recently, a class of methods has been emerging that combine Computer Aided Engineering (CAE) translation and synthesis software with Field Programmable Gate Array (FPGA) technology to automatically produce hardware prototypes of ASIC designs from a gate-level netlist. The CAE tools oer the capability to automatically map, partition and compile a given logic circuit onto a pre-de ned programmable hardware architecture. The FPGAs provide a exible and programmable platform. Two examples of such systems are Quickturn[23] and Paradigm RP[22]. The advantages to these methods are that once the netlist is mapped on the FPGA and a hardware prototype of the o-the-shelf components is ready, it is possible to run the simulation at a much faster rate than is possible using VHDL simulation models. The major drawback with these systems is that the designers have to create two separate designs in parallel, diluting the eort spent on the main design. This may only be feasible for designs under 15000 gates[21] according to recent studies. Furthermore, moving back and forth between a masked ASIC version of the design and an FPGA design is not trivial. Other disadvantages include the initial high cost associated with these tools at this time, and the earliest a prototype can be available is after the system architecture is clearly de ned and a hardware prototype of the rest of the system is available. Clearly the best result is obtained when VHDL simulations are done during the early stages of the design to clearly de ne the architecture, and CAE-based emulation on hardware platforms used for the later stages of the design. VI. Conclusions
The complexity of electronic systems has grown tremendously as a result of advances in IC technology. At the same time, required product development times and life cycles have decreased considerably. Market and competitive pressures make it more critical than ever to design the right product the rst time and do it in signi cantly less time. Current approaches to system design often fail due to lack of design environments that allow for early HW/SW 8
integration and debugging of the system, and also due to lack of incorporation of packaging issues (MCM, signal integrity, thermal design, etc.) early on in the design process. The design of complex systems demands a process that lets designers work at manageable levels of abstraction and still access detail during the entire design cycle. A general scheme for system design using VHDL was presented that shortens the design cycle by providing a uni ed design environment. This scheme solves the problems of common speci cation and early design integration. The use of VHDL throughout the design process supports a homogeneous, and consistent design description, code reusability, and provides support of hardware design down to netlist representation. The simulation of the virtual prototype of the system at several levels of abstraction helps to identify design errors quickly and eciently as the design progresses. There are other languages that are capable of providing a uni ed design environment, however, none has the support of VHDL because of its worldwide acceptability as an IEEE standard.
[14] Composano R. \From Behavior to Structure: High-Level Synthesis", IEEE Design and Test of Computers, October 1990, pp 8-19 [15] Ecker W., Hofmeister M, \The Design Cube - A Model for VHDL Design ow Representation", Proceedings of the EUROVHDL,1992, pp 752-757 [16] Petri C., Communication With Automata, Ph.D. thesis, Univ. of Bonn, Germany, 1962. [17] Aylor, et. al., Performance and Fault Modeling With VHDL, Prentice-Hall, Englewood Clis, N.J., 1992, p p22-145. [18] Rao R., A Building Block Approach to Uninterpreted Modeling Using VHDL, Ph.D. thesis, Dept. of Electrical Engineering, Univ. of Virginia, Charlottesville, 1990. [19] Muller J., Kramer H., \Analysis of Multi-Process VHDL Speci cations with a Petri Net Model", European Design Automation Conference, 1993, pp 474-479. [20] Coelho D., The VHDL Handbook, Kluwer Academic Publishers, Netherlands 1989. [21] Zafar N., \Using Emulation to Cut ASIC and Veri cation Time", Computer's Design ASIC Design, April 1994, pp A25A29. [22] Note s., et. al., \Paradigm RP, A system for Rapid Prototyping of Real-Time DSP Applications", DSP Applications, January 1994, pp 17-23. [23] Walters S., \Computer-Aided Prototyping for ASIC-Based Systems", IEEE Design and Test of Computers, June 1991, pp 4-10.
VII. Acknowledgement
The authors are grateful to the RASSP team from Advanced Technology Division of Lockheed Sanders Inc., Nashua, NH, for valuable discussions and feedback during the work that resulted in this paper. Feedback from Dr. Mark Richards of ARPA/ESTO and Dr. Gary Shaw of MIT Lincoln Labs is also acknowledged with gratitude. References
[1] USA,\ Army Handbook, The Documentation of Digital Systems with VHDL", Preliminary Final Draft Manuscript, February 9, 1994. [2] Egolf T., Famorzadeh S., Madisetti V., \ Fixed-Point Co-Design in DSP", VLSI Signal Processing-VII, IEEE Press, Oct.1994 [3] McHenry J., Midki S., \VHDL Modeling for the Performance Evaluation of Multicomputer Networks", IEEE MASCOTS, 1994 pp 174-178 [4] Aylor, et al., \ A VHDL Based Environment for System Design and Analysis", IEEE VHDL Users Forum, 1994 pp 110-116 [5] O. Pullkkinen, K. Kronlof,\Integration of SDL and VHDL for High-Level Digital Design", European Design Automation conference , 1992, pp 624-629 [6] Drusinsky D., Harel D., \Using Statecharts for Hardware Description and Synthesis", IEEE Transactions on ComputerAided Design , 1989, pp 798-806 [7] Harel D., et. al., \STATEMATE: A Working Environment for the Development of Complex Reactive Systems ", IEEE Transactions on Software Engineering , 1990, pp 403-413 [8] Becker D., et. al., \ An Engineering environment for Hardware-Software Co-Simulation"., In Proceedings of the 29th ACM/IEEE Design Automation Conference (DAC) , 1992 pp 129-134 [9] Altma M., et. al., \Veri cation of Systems containing Hardware and Software" , In Proceedings of the EURO-VHDL, 1991, pp 149-156 [10] Gajski D., Kuhn R., \New VLSI Tools", IEEE Computer, December 1983, pp 11-14 [11] Juan H., Holmes N. et. al., \Top-Down Modeling of RISC Processors in VHDL", European Design Automation Conference ,1993, pp 454-459 [12] Ecker W., \Using VHDL for HW/SW Co-Speci cation", European Design Automation Conference, 1993, pp 500-505. [13] Villar E., Sanchez P., Fundamentals and Standards in Hardware Description Languages, Kluwer Academic Publishers, Netherlands 1993 pp 231-262
9
Fig 1. The Ecker Cube (1993)
STRUCTURAL REPRESENTATION
FUNCTIONAL REPRESENTATION
Processor Memory Switches
Systems
Algorithmic Register Transfer Boolean Expression Circuit
Fig 2. The Y Chart (1983) Mask Geometries
Cells
Layout Planning
GEOMETRICAL REPRESENTATION
10
begin package statistics is if process_delay < stat_data.min_time then -- Type for holding the statistics data as stat_data.min_time := process_delay; -- it is collected during execution end if; type stat_info is record if process_delay > stat_data.max_time then max_time: time; stat_data.max_time := process_delay; min_time: time; end if; average_time: real; stat_data.times_executed := stat_data.times_executed + 1; times_executed: natural; stat_data.total_time := stat_data.total_time + process_delay; total_time: time; stat_data.average_time := real(time'pos(stat_data.total_time)) utilization: real; /real(stat_data.times_executed); end record; stat_data.utilization := real(time'pos(stat_data.total_time)) constant STAT_INIT: stat_info := (0 ns, time'high, 0.0, 0, 0 ns, 0.0); /real(time'pos(now)); -- Various string constants used for printing statistics data. write( l, PROCESS_ID); constant CT: string := " Current Time = "; write( l, CT); constant DE: string := " Delay = "; write( l, now); constant TE: string := " Times executed = "; write( l, DE); constant NT: string := " Min Time = "; write( l, process_delay); constant XT: string := " Max Time = "; write( l, TE); constant TT: string := " Total Time = "; write( l, stat_data.times_executed); constant AT: string := " Average Time = "; write( l, NT); constant UT: string := " Utilization = "; write( l, stat_data.min_time); constant SP: string := " "; writeline(output, l); -- The procedure for computing and printing statistics data write( l, XT); procedure compute_stats( stat_data: inout stat_info; write( l, stat_data.max_time); process_delay: time; write( l, TT); process_id: string); write( l, stat_data.total_time); end statistics; write( l, AT); write( l, stat_data.average_time); use std.textio.all; write( l, UT); package body statistics is write( l, stat_data.utilization); procedure compute_stats( stat_data: inout stat_info; writeline(output, l); process_delay: time; end compute_stats; process_id: string) end statistics; is variable L: line;
Fig 3. A VHDL Package for Statistical Information
1
1
11
External Memory
DATA ADDR Control
Processor Memory
Registers
Prog. and Data Use Instr_Fetch.ALL; Use Instr_Processing.ALL
Busy
CPU: Process(clk, reset) begin -- advance the pipeline if it is applicable if( clk’event and clk = ’1’ ) then
Host Interface
Host Interface Controller
READ_INSTR(-----); pc := pc + 1; case opcode of
process()
-- Fetch Instruction -- Decode Instruction
when multiply => operand1 := instruction(25 downto 21) operand2 := instruction(20 downto 15) MULTIPLY(------) ; -- Execute Instruction
end process;
end case; end if; end process;
Reset
Instruction Clocking Process
Clk : Process
-- Generate the clock pulse for an X ns period end process;
Data
Serial Port Fig Interface
Control
Transmit Process:
4. An Algorithmic Structural Model. Transmit Buffer
Control
Hardware
Logic
Receive Process: Receive Buffer
12
Interrupt
if((MY_FLAG = true) or (PFMY_COUNTER < 3) or (PFMY_FLAG = true)) then if PFMY_FLAG = true then if(P = '0') then FL_REG(FDEST) := PFMY_RESULT; end if; end if; if MY_FLAG = true and PFMY_COUNTER = 0 then MY_FDEST := FDEST; end if; PFMY_COUNTER := PFMY_COUNTER + 1; if PFMY_FLAG = true then PFMY_RESULT := PFMY_STAGE2_RES; PFMY_STAGE2_RES := PFMY_STAGE1_RES; PFMY_REAL_SINGLE_OP1 := INT_TO_REAL_SINGLE(PFMY_OP1); PFMY_REAL_SINGLE_OP2 := INT_TO_REAL_SINGLE(PFMY_OP2); PFMY_REAL_SINGLE_RES := PFMY_REAL_SINGLE_OP1 * PFMY_REAL_SINGLE_OP2; PFMY_STAGE1_RES := REAL_SINGLE_TO_INT(PFMY_REAL_SINGLE_RES); end if; if PFMY_COUNTER = 3 and MY_FLAG = true then PFMY_REAL_SINGLE_OP1 := INT_TO_REAL_SINGLE(PFMY_OP1); PFMY_REAL_SINGLE_OP2 := INT_TO_REAL_SINGLE(PFMY_OP2); PFMY_REAL_SINGLE_RES := PFMY_REAL_SINGLE_OP1 * PFMY_REAL_SINGLE_OP2; PFMY_RESULT := REAL_SINGLE_TO_INT(PFMY_REAL_SINGLE_RES); FL_REG(MY_FDEST) := PFMY_RESULT; MY_FLAG := false; end if; PFMY_FLAG := false; end if;
Fig 5. A Performance Model of the Floating Point Unit of
i860XP.
1
13
-- Memory entity has a generic size which defaults to 256 bits. -- It also has address, data, chip select and read/write control -- signals. use work.std_logic_1164.all; entity memory is generic( memsize: integer := 256); port( data: inout std_logic; address: natural range 0 to memsize -1; CS: std_logic; R_W: std_logic); end; use work.std_logic_1164.all; architecture behavior of memory is begin memory : process ( CS, R_W, address, data ) variable memarray: std_logic_vector(0 to memsize -1); begin if CS = '1' then if R_W = '1' then data S );
Library compass_lib,ieee; use IEEE.STD_LOGIC_1164.ALL; use COMPASS_LIB.COMPASS.ALL; use COMPASS_LIB.STDCOMP.ALL; use COMPASS_LIB.COMPASS_ARITH.ALL;
U3 : add port map ( X => S3, Y => S4, CLK => CLK, ZOUT => S1 ); U4 : mult port map ( X => AA, Y => YOUT, CLK => CLK, ZOUT => S3 );
ENTITY iir IS port (XIN : in BIT_VECTOR(15 downto 0); CLK : in BIT; YOUT : inout BIT_VECTOR(15 downto 0)); END iir; architecture STRUCTURE of iir is constant A : BIT_VECTOR(15 downto 0) := "0000000000001111"; constant B : BIT_VECTOR(15 downto 0) := "0000000000001111"; constant C : BIT_VECTOR(15 downto 0) := "0000000000001111"; signal T1, T2, T3, T4, T5, T6, T7 : BIT_VECTOR(15 downto 0); signal S, S1, S2, S3, S4, S5, S6 : BIT_VECTOR(15 downto 0); signal AA, BB, CC : BIT_VECTOR(15 downto 0); COMPONENT reg port( X : in BIT_VECTOR(15 downto 0); CLK : in BIT; Y : out BIT_VECTOR(15 downto 0)); END COMPONENT; COMPONENT add port( X : in BIT_VECTOR(15 downto 0); Y : in BIT_VECTOR(15 downto 0); CLK : in BIT; ZOUT : out BIT_VECTOR(15 downto 0)); END COMPONENT; COMPONENT mult port( X : in BIT_VECTOR(15 downto 0); Y : in BIT_VECTOR(15 downto 0); CLK : in BIT; ZOUT : out BIT_VECTOR(15 downto 0)); END COMPONENT; begin AA CLK, ZOUT => YOUT );
U5 : mult port map ( X => BB, Y => T1, CLK => CLK, ZOUT => S4 ); U6 : add port map ( X => T2, Y => T6, CLK => CLK, ZOUT => S5 ); U7 : mult port map ( X => CC, Y => T7, CLK => CLK, ZOUT => S6 ); U8 : reg port map ( X => YOUT, CLK => CLK, Y => T1 ); U9 : reg port map ( X => T1, CLK => CLK, Y => T2 ); U10 : reg port map ( X => T2, CLK => CLK, Y => T3 ); U11 : reg port map ( X => T3, CLK => CLK, Y => T4 ); U12 : reg port map ( X => T4, CLK => CLK, Y => T5 ); U13 : reg port map ( X => T5, CLK => CLK, Y => T6 ); U14 : reg port map ( X => T6, CLK => CLK, Y => T7 ); U15 : add port map ( X => S5, Y => S6, CLK => CLK, ZOUT => S2 ); end STRUCTURE;
Fig 8. The VHDL Structural Description of IIR Filter at RT
level.
1
1
16
architecture STYLE1 of FSM is type STATES is (S0,S1,S2,S3) signal STATE is : STATES := S0; begin process(RESET,CLK) begin if (RESET = '0') then STATE