Using UML Activities for System-on-Chip Design and Synthesis Tim Schattkowsky1, Jan Hendrik Hausmann2, and Gregor Engels2 1
C-Lab, Paderborn, Germany
[email protected] 2 University of Paderborn, Paderborn, Germany {hausmann, engels}@upb.de
Abstract. The continuous advances in manufacturing Integrated Circuits (ICs) enable complete systems on a single chip. However, the design effort for such System-on-Chip (SoC) solutions is significant. The productivity of the design teams currently lags behind the advances in manufacturing and this design productivity gap is still widening. One important reason is the lack of abstraction in traditional Hardware Description Languages (HDLs) like VHDL. The UML provides more abstract concepts for modeling behavior that can also be employed for hardware design. In particular, the new UML Activity semantics fit nicely with the inherent data flow in hardware systems. Therefore, we introduce a UML-based design approach for complete SoC specification. Our approach enables generation of complete synthesizable HDL code. The equivalent hardware can be automatically generated using the existing tools chains. As an example, we outline Handel-C code generation for an MP3 decoder design.
1 Introduction For decades, the design of Integrated Circuits (ICs) has been driven by what has been called Moore’s law, a self-fulfilling prophecy that the complexity of ICs doubles every 18 months. Although physical effects recently broke the correlation between this increase in IC complexity and a similar direct increase in performance, the law still holds for complexity and will continue do so for at least another decade. The design methods for ICs failed to catch up with this exponential growth in complexity. This design productivity gap has widened over years and has become one of the most critical issues in hardware design. At the same time, shortened product cycles further increase the pressure for more productivity. To cope with the increasing complexity, Hardware Description Languages (HDLs) are currently moving from Register Transfer Level (RTL) hardware description towards more abstraction by introducing C-based languages. To some extent, this seems to be similar to the move from assembly language to higher level languages like C in software engineering. It appears that IC design now essentially faces the same complexity challenge that finally led to the move towards model-driven methods for software systems. Thus, the investigation of such methods for hardware design seems to be the logical next step. O. Nierstrasz et al. (Eds.): MoDELS 2006, LNCS 4199, pp. 737 – 752, 2006. © Springer-Verlag Berlin Heidelberg 2006
738
T. Schattkowsky, J.H. Hausmann, and G. Engels
Nowadays, model-driven software development is mostly based on the Unified Modeling Language (UML). Its upgrade to version 2.0 [11] has significantly extended the expressiveness of some of its core notations, thereby opening up new application areas. The new token-based Activity semantics fit nicely with the data flow dominated behavior of hardware systems and can be employed to describe such behavior at an increased level of abstraction while providing improved readability compared to traditional textual HDLs. In this paper we will make the case that UML Activities are well suited for modeling the data and control in hardware designs and can serve as the basis for a complete hardware design approach. The next section will discuss related work before section 3 introduces our design approach for hardware systems which is based on Activity Diagrams for behavioral specification and Class, Composite Structure and Deployment Diagrams for providing types, composition and deployment information. Our approach enables complete code generation of synthesizable HDL, which is equivalent to the actual IC. In section 4, we demonstrate such code generation for the Handel-C HDL before section 5 closes with a conclusion and future work.
2 Related Work There already exist approaches employing more abstract diagrammatic specifications for the specification of hardware designs. Various forms of Block Diagrams and Flow Charts have been used in the industry for a long time, resulting in the IEC standard notations Sequential Function Charts (SFC) and Function Block Diagram (FBD) [8]. However, UML 2.0 Activity Diagrams can be considered as a significant superset of SFCs [15] and provide a higher level of abstraction. Furthermore, Block Diagrams have been studied in [7] with the result that Class Diagrams can express all features of Block Diagrams without loss of expressiveness. Petri Nets are another behavioral modeling notation used for hardware specification. An overview of different approaches can be found in [17]. Although Activity Diagrams are based on Petri Net ideas, they seem to be more expressive and have a broad background in the UML. UML is the de-facto standard in the Software Engineering world. Having a common notation is beneficial for combined hardware/software development projects. Some compelling examples for these benefits are presented in [3]. Applying standard UML notations to hardware design has been approached in a number of ways. Hallal et al [7] evaluate various UML diagrams with respect to their applicability to hardware design. McUmber and Cheng [9] provide a metamodel mapping between UML Class Diagrams and state machines on the one hand and VHDL constructs on the other hand. The intention of this work is not only to serve as a basis for VHDL code generation but also to provide a precise semantics for Statecharts. A mapping from Class Diagrams to VHDL code is also proposed by Damasevicius and Stuikys. In [6] they complement this static mapping with metaprogramming techniques to obtain domain specific code-generators. They also focus on the process aspect of hardware development. This process aspect is also targeted by Bahill and Daniels in [2]. YAML [14] is a tool based mainly on UML class and object diagrams which is able to generate SystemC code. Interesting here is the use of Object Diagrams for the
Using UML Activities for System-on-Chip Design and Synthesis
739
detailed specification of a chip’s design. We also model the instance level, but use Deployment Diagrams to do this. The approach of Björklund and Lilius [3] is based on UML state machines only and produces VHDL code. Recently, Model Driven Architecture (MDA) has elicited a number of approaches which generate code from UML models [4]. One possible target for these generators are system-level hardware languages. Concrete works include XTUML which targets various C dialects of different microcontroller architectures, MOCCA which targets synthesizable VHDL, and works by Thiagarajan et al which translate Rose RT models to SystemC code. An overview of these approaches can be found in [10]. All these approaches are based on UML 1.x. They employ state machines to model the behavior of systems and cannot exploit the fundamentally different semantics for Activities in UML 2.0. The approach in [3] takes Activity Diagrams into account, albeit only as a representation of the transition system specified by a state machine. SysML [16] is a language for system modeling derived from the UML. Although it is syntactically a strict UML profile, it alters and adds various concepts, especially for modeling continuous systems. These modifications seem to result in semantics that are not consistent with the original UML. Still, the block oriented structure modeling as well as the emphasis on Activities seem to fit with hardware modeling, but are not directly applicable. Furthermore, these concepts are already contained in the UML. Finally, there is ongoing work to provide specific profiles for SoC design. Fujitsu has already presented preliminary results [12]. The approach focuses on structure modeling for system composition rather than enabling the engineering of a complete system including complete behavior models. The structure modeling employs similar concepts to our approach, but lacks some important elements like support for clock domains. The OMG MARTE RFP [1] is still in an initial phase. In the context of these ongoing efforts, we have already proposed the application of UML Activities as the core behavior notation for UML-based hardware description [13]. However, there we have only sketched our initial ideas, but did not provide a corresponding design approach. In this paper, we present a new approach for complete SoC design which leads to complete synthesizable system specifications.
3 Hardware Design Based on UML Activities A model-based design approach for SoC has to capture the system behavior as well as its composition from functional blocks and certain non-functional aspects, like clock domains or the allocation of physical resources. For this, we have identified a UML 2.0 subset for complete SoC specification. This subset is presented as a UML profile. Such a profile is a syntactic subset of the UML with extended domain-specific semantics. The core of this subset is formed by elements for behavior modeling through Activity Diagrams. These elements are complemented by specialized model elements for Class-, Composite Structure- and Deployment Diagrams for modeling the structure and physical aspects of the system. Together, the resulting diagrams form a complete system model. From this system model, full code generation for automatic hardware synthesis can be performed. The following subsections describe our modeling approach and the underlying profile. For this, we will use a MP3 decoder chip design as the illustrating example for the remainder of the paper.
740
T. Schattkowsky, J.H. Hausmann, and G. Engels
3.1 Structure Modeling In our approach, modeling the internal structure of a System-on-Chip is based on Class Diagrams for the type definition of its building blocks and Composite Structure for defining the assembly of the complete SoC from such blocks. The mapping to physical resources like clock domains is achieved through the application of Deployment Diagrams. g
g
BehavioralFeature
Association
Classifier
StructuralFeature
«metaclass» Integer
«metaclass» «metaclass» «metaclass» «metaclass» .1 Unregistered Trial Version EA 5.1 Unregistered Trial Version E Reception AssociationClass Class Property
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Version E «extends» «extends» «extends» «extends» «extends» .1 Unregistered Trial Version«extends» EA 5.1 Unregistered Trial Version E «stereotype»
«stereotype»
«stereotype»
«stereotype»
«stereotype»
.1 Unregistered 5.1 Unregistered Trial Version Channel Trial Version Block EA Interconnect Register SizedInteger E +
Bits: int
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Version E Fig. 1. Extensions for modeling hardware blocks
A SoC in our approach is composed from blocks of synchronous logic. Within our profile, types of blocks are defined through specialized active Classes (see Figure 1). Such an active class is called Block in our approach. The behavior of such a Block is defined through a private Activity. This Activity may call sub-Activities on the same instance as well as on instances owned through composition. The Activities belonging to a Class can be considered as the actual methods for Operations where the parameters are mapped to ActivityParameterNodes. However, due to the synchronous nature of Operations, this only fits for non-stream parameters. Thus, a Block may additionally have specialized Reception Features to receive signals that are fed as tokens into an Activity. Such a Reception Feature is called a Channel in our approach and is used to enable interaction between different executing Blocks. The attributes of a Block are Registers. Their type is limited to integer numbers and arrays and nested records of those. However, unlike in software systems, the bit ed Trial Version EA 5.1 Unregistered Trial Version Classifier
StructuralFeature
«metaclass» ed«metaclass» Trial Version EA 5.1 Unregistered Trial Version Interface Property
ed Trial Version EA 5.1 Unregistered Trial Version «extends» «extends» «extends» ed «extends» Trial Version EA 5.1 Unregistered Trial Version
«stereotype» «stereotype» ed«stereotype» Trial Version «stereotype» EA 5.1 Unregistered Trial Version Interconnect Output TriState Input
ed Trial Version EA 5.1 Unregistered Trial Version Fig. 2. Extensions for modeling hardware block interfaces
Using UML Activities for System-on-Chip Design and Synthesis
g
741
g
«Interconnect» I2C
«Interconnect» SerialMP3
«Interconnect» SerialPCM
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Versio + +
«TriState» SCL: int(1) «TriState» SDA: int(1)
+ +
«Input» DATA: int(1) «Input» DCLK: int(1)
+ + +
«Output» SCLK: int(1) «Output» SDA: int(1) «Output» SLRCLK: int(1)
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Versio 1
1
1
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Versio «Block»
«Block»
«Block»
I2C
I2CSlav e MP3Decoder StereoPCMSerializer .1 Unregistered Trial Version EA 5.1 Unregistered Trial Versio 1
2
2
.1 Unregistered Trial Version EA 5.1 Unregistered Trial Versio «Channel»
.1 Unregistered Trial Version EA 5.1 16BitPCM Unregistered Trial Versio +
1U
i t
dT i lV
i
PCMData: int(16)
EA 5 1 U
i t
dT i lV
i
Fig. 3. MP3 decoder Class Diagram
size of integers for hardware systems needs to be fixed as they determine the size of the resulting circuit. Thus, in our approach the bit size of an integer is explicitly specified through the application of a SizedInteger. For convenience, the SizedInteger is presented as an int with the size given in braces. Different Blocks can be connected through their electrical interfaces. This is represented by the Interconnect stereotype. However, the implementation of such an Interconnect is just a pair of lines. It is pointless to model these lines. Thus, we decided to treat them as both an Interface and a Class implementing it. For Class Diagrams, the Interface notation is employed to emphasize the interface semantics. At the instance level, a class instance is assumed to enable symmetric links between participating instances. The lines of the Interconnect have to be classified as Input, Output or TriState (see Figure 2). TriState lines may be used bidirectional and are essential for the construction of busses, be it on-chip or external. As an example, we will consider the design of an MP3 player as shown in Figure 3. The design consists of a core MP3Decoder class implementing the MP3 decode algorithm to decode an incoming serial bit stream of MP3 data on its SerialMP3 Interconnect to stereo 16 bit Pulse Code Modulated (PCM) audio. The PCM audio is sent through two 16 bit Channels to the StereoPCMSerializer class, which is responsible for converting the PCM data into serial stereo I2S data to directly interface with common Digital Analog Converter (DAC) circuits. However, no DAC is included in this model. Instead, a SerialPCM Interconnect is present for interfacing with an external DAC. Finally, the MP3Decoder contains an I2CSlave Block implementing the Philips I2C wire interface to enable external control of the decoder. The composition of a Block or a complete SoC from other Blocks has to be determined at design time. Dynamic instantiation is not possible in hardware as each instance of a block has to be implemented separately in silicon. In our approach, this assembly is specified through the newly introduced Composite Structure Diagram (CSD). We employ CSDs to describe the composition of a Block from other Blocks as well as the composition of the final system. All associations need to be resolved to actual elements. The “lollipop” notation is used to indicate Interconnections to external hardware.
742
T. Schattkowsky, J.H. Hausmann, and G. Engels
Figure 4 shows the CSD describing the complete SoC for our MP3 decoder example based on the Class Diagram in Figure 3. We notice the MP3Decoder, its I2CSlave, and the two Channels for feeding the decoded PCM data into the StereoPCMSerializer which outputs serial audio data to the PCMPort. The input for the decoder is provided by the Mp3DataPort. «SystemOnChip» MP3DecoderCore MP3DataPort
«Block»
SerialMP3
Decoder : MP3Decoder
LeftChannel:16BitPCM «Channel» RightChannel:16BitPCM
«Block» Serializer : StereoPCMSerializer
PCMPort SerialPCM
«Channel» +I2C I2CPort I2C
«Block» Slav e : I2CSlav e
Fig. 4. MP3 decoder Composite Structure Diagram
For synthesis, certain additional platform specific physical parameters must be determined. Some physical parameters, like the physical layout in the chip, are computed by the synthesis tools and require no explicit specification. Other properties like physical pin assignment and the definition of clock domains are the result of explicit design decisions. Such properties must be represented in the design model. For this, we employ a Deployment Diagram variant (see Figure 5). We use the Nodes to represent clock domains, which are an important feature in chip design. Deployed in these clock domains are the same Block instances as in the CSD. All Blocks in the SystemOnChip must be explicitly deployed on such a ClockDomain. The resulting model must conform to the respective CSD. Technically, a ClockDomain is a synchronous block of logic on the chip. Logically, in our approach it may be composed from several Block instances running synchronously at the same clock. The whole SystemOnChip is also a ClockDomain which reflects that the chip is externally clocked at a certain rate. The clock itself may be either an internal clock or supplied externally in the case of an ExternalClockDomain. Internal clocks can be derived from an existing clock through a simple divider or, in the case of DerivedClockDomain, a complex expression. Furthermore, AbsoluteClockDomains enable the specification of absolute clock frequencies, which will be implemented based on available system clocks. However, the combination of the chosen target platform (i.e., FPGA or ASIC type) and the logic depth of the real circuit are the limiting factors for the clock rate of the final circuit. Thus, an actual design may fail to meet an AbsoluteClock specification. This can be detected during simulation. It is important to note that the same Block Instance may actually be part of multiple ClockDomains in different contexts. Thus, the respective Deployment Diagrams have to clarify this context by including all relevant Associations. If a Block is placed only within a single ClockDomain, this is not necessary.
Using UML Activities for System-on-Chip Design and Synthesis
743
Class
«metaclass» Dev ice
«metaclass» Node
«extends»
«extends»
«stereotype» SystemOnChip
«stereotype» Deriv edClockDomain +
Expression: String
«stereotype» ClockDomain +
«stereotype» ExternalClockDomain
Divider: int = 1
«stereotype» AbsoluteClockDomain +
Frequency: int
Fig. 5. Extensions for modeling Clock Domains through Deployments
EA 5.1 Unregistered Trial«SystemOnChip» Version EA 5.1 Unregistered Tri MP3DecoderCore
EA 5.1 Unregistered Trial Version EA«AbsoluteClockDomain» 5.1 Unregistered Tri «ExternalClockDomain» SystemClock
AudioClock
EA 5.1 Unregistered Trial Version EA 5.1 Unregistered Tri «Block» tags Frequency = 14318180
Decoder :MP3Decoder
EA 5.1 Unregistered Trial Version EA 5.1 Unregistered Tri I2C
EA 5.1 Unregistered TrialLeftChannel Version EA 5.1 Unregistered Tri «Block»
Slav e :I2CSlav e
«Channel»
«Block»
Serializer : StereoPCMSerializer
EA 5.1 Unregistered TrialRightChannel Version EA 5.1 Unregistered Tri «Channel»
EA 5.1 Unregistered Trial Version EA 5.1 Unregistered Tri Fig. 6. MP3 decoder Clock Domain Diagram
The Deployment Diagram for our MP3 decoder example shown in Figure 6 shows how the blocks from the CSD are placed in ClockDomains. The Associations are included only for the orientation of the reader and could be omitted. In this example, the Blocks are placed in two different ClockDomains. While the MP3Decoder runs in the ExternalClockDomain controlled by the external chip clock, PCM related Blocks are placed in a separate AbsoluteClockDomain using a fixed clock frequence. The purpose here is to enable real-time playback by using a Clock that can be used to directly derive the respective sample rate for feeding data into the external DACs that are to be connected to the StereoPCMSerializer. 3.2 Activity Diagrams While the structural models provide information about the outside connections of a single block instance, the behavior specification details its inner workings. Behavior specification in our approach is solely based on activities represented by Activity Diagrams. These Activities represent the concurrent data flow and processing in a
744
T. Schattkowsky, J.H. Hausmann, and G. Engels
Block instance by means of the common model elements for activities. These elements include actions interconnected by object and control flows. Decision, merge, fork, and join nodes are used to control such flows in the Activity. The activities in our approach may contain four types of actions. SendSignalAction and AcceptEventAction are employed to transmit and receive data using a Channel. The CallBehaviorAction invokes sub-Activities and OpaqueActions are employed to embed C-style statements into the Activities (e.g., for assignments). C-style syntax is also used for expressions (e.g., in guards). The semantics of forks and joins for object flows must be defined. In our approach, we essentially consider object flows as direct connections between the logic for actions. Thus, there is no buffering of tokens in our approach. If such buffering is desired, an explicit implementation (e.g., through a FIFO queue) has to be provided. In this context, a fork on an object flow is considered as sending the same data input into multiple target nodes (e.g., actions). This aligns quite well with the proposed token copy semantics as defined by UML 2.0. Joining object flows is only allowed for tokens representing the same object. Joining different data flows should be done by specifying an action which combines these inputs. Finally, the concept of token competition is not supported in our approach. Thus, there can only ever be one outgoing edge from an object node (e.g., a pin).
Fig. 7. Pattern for spawning an unlimited number of control flows
Hardware is inherently non-reentrant. This also applies to the actions and activities in our approach since they map directly to a part of a hardware circuit. As a consequence, activities in our approach cannot issue recursive calls. Furthermore, a special pattern in an activity has to be avoided. The fragment in Figure 7 demonstrates the core problem: Along the right hand side loop any number of tokens can be spawned at this fork, leading to multiple concurrent executions of action A For hardware synthesis such a situation is very undesirable. We thus impose the general wellformedness condition that for each fork which is part of a cyclic flow structure (i.e. one flow outgoing from the fork is (transitively) also an incoming flow of the fork), a join must exist in the path which joins all outgoing flow form the fork node. This ensures that each action in the model may be activated by at most one logical thread. This rule of course covers implicit joins and forks on actions. Furthermore, the condition also holds for activities with multiple initial nodes as these can be considered to be forked from a single initial node. Thus, the corresponding fork cannot have a cycle and does not break the rule. For our example, the specification of the behavior of the main MP3Decoder class is shown in Figure 8. This activity controls the actual MP3 decode process which has several stages represented by CallBehaviorActions to nested sub-activities. Many of these stages can be performed concurrently for the two different channels of stereo
Using UML Activities for System-on-Chip Design and Synthesis
MP3Decoder
HandleI2C()
ReadFrame()
DecodeScaleFactors() ScaleFactors
HuffmanDecode()
Dequantize() ScaleFactors
Reorder(RightData)
DecodeStereo()
Reorder(LeftData)
Reordered
Reordered
Reordered
Reordered
Antialias()
Antialias()
Antialiased
Antialiased
HybridIn
HybridIn
HybridSynthesis()
HybridSynthesis()
HybridOut
HybridOut
Hybrid
Hybrid
FrequcenyInv ersion()
FrequcenyInv ersion()
Data PolyphaseSynthesis()
Data PolyphaseSynthesis()
PCM
PCM RightChannel
PCM WritePCM()
LeftChannel PCM WritePCM()
Fig. 8. MP3Decoder Class - Main Activity
745
746
T. Schattkowsky, J.H. Hausmann, and G. Engels
FrequencyInv ersion(int(32)[32][18] *Hybrid)
int(5) ss=0
[else] [ss