APGES 2007 Automatic Program Generation for ...

6 downloads 0 Views 7MB Size Report
Oct 4, 2007 - Paul Caspi (Verimag-CNRS). Some issues ... Paul H J Kelly, Imperial College London (Program Chair). 2 ...... [13] P. Feiler, D.Gluch, J. Hudak.
APGES 2007 Automatic Program Generation for Embedded Systems Workshop Proceedings October 4th 2007 Salzburg, Austria Co-located with GPCE and ESWEEK

Editors: Paul H J Kelly (Imperial College London) Kevin Hammond (University of St Andrews) Copyright on the papers in this proceedings remains with the respective authors.

APGES Workshop organisation Kevin Hammond (University of St Andrews, General Chair) Paul Kelly (Imperial College, London, Program Chair)

Invited speakers Paul Caspi (Verimag-CNRS) Some issues in automatic code generation for embedded control systems Walid Taha (Rice University) Resource Aware Programming

Program committee Danilo Beuche (pure-systems GmbH, Germany) Paul Caspi (Verimag-CNRS, Grenoble, France) Zbigniew Chamski (NXP, the Netherlands) Mark Dalgarno (Software Acumen Ltd, UK) Bernd Fischer (University of Southampton, UK) Reinhard von Hanxleden (Christian-Albrechts-Universität zu Kiel, Germany) Christoph Kirsch (University of Salzburg, Austria) Anne-Francoise Le Meur (INRIA & Université des Sciences et Technologies de Lille) Christian Lengauer (University of Passau, Germany) Michael Mendler (Otto-Friedrich-Universität Bamberg, Germany) Olaf Spinczyk (University of Erlangen-Nuernberg, Germany) Satnam Singh (Microsoft Research, Cambridge, UK) Jonathan Sprinkle (Berkeley and University of Arizona, USA) Bruce Trask (MDE Systems, USA)

1

Introduction and welcome This collection consists of the seven papers that are being presented on October 4th 2007 at a new workshop on Automatic Program Generation for Embedded Systems (APGES).

The purpose of the workshop The purpose of the workshop is to build and strengthen a community of researchers, and to identify a common body of principles. Program Generation is very widely used, especially in embedded systems software. Our goal is to bring new understanding and theory to the topic. The papers in this collection provide some progress towards this goal. Particular issues of concern include, for example, assuring correctness of program generators, generation of programs with assured safety properties, componentization, modularity, and separate compilation.

The refereeing process All seven papers were reviewed by at least three expert referees, in most cases by members of the Program Committee. Five papers were ranked “Accept” in at least two reviews. Two were supported slightly more weakly and have been revised with specific attention to referees’ comments.

Acknowledgements and thanks Thank you to the authors of all the submitted papers – we have a strong program thanks to your efforts, and to your enthusiasm for the workshop’s goals. We would like to thank our keynote speakers, Paul Caspi and Walid Taha, for their enthusiastic and thoughtful response to our invitation to help frame this event. Thanks in particular must go to the Program Committee, whose members have provided really valuable, helpful and balanced feedback to the authors, and guidance to the selection process. It has been a pleasure for us to work with them. We would also like to acknowledge the kind support of the GPCE and ESWEEK organizers, in particular Julia Lawall, Emir Pasalic, and Christoph Kirsch. Finally, we should acknowledge that inspiration for the workshop grew from IFIP WG2.11 on Program Generation, and from our involvement in the HiPEAC and Artist Networks of Excellence, funded by the EU.

Kevin Hammond, University of St Andrews (General Chair) Paul H J Kelly, Imperial College London (Program Chair)

2

Contents

Generative Design of Hardware-in-the-Loop Models Uwe Ryssel, Jörn Plönnigs, Klaus Kabitzsch and Michael Folie.

4

Clock-directed Modular Code Generation from Synchronous Block Diagrams Dariusz Biernacki, Jean-Louis Colaco and Marc Pouzet.

12

Separate Compilation of Hierarchical Real-Time Programs into Linear-Bounded Embedded Machine Code Arkadeb Ghosal, Daniel Iercan, Christoph Kirsch, Thomas Henzinger and Alberto SangiovanniVincentelli.

20

A Domain-Specific Language for Programming SelfReconfigurable Robots Ulrik Schultz, David Christensen and Kasper Stoy.

28

Automated component-based implementation of data-driven embedded applications Sergio Yovine, Marcelo Zanconi and Ananda Basu.

37

Generating a Statically-Checkable Device Driver I/O Interface Lea Wittie, Chris Hawblitzel and Derrin Pierret.

45

Architectural Exploration of Reconfigurable MonteCarlo Simulations using a High-Level Synthesis Approach José G. Coutinho, David Thomas and Wayne Luk.

53

3

Generative Design of Hardware-in-the-Loop Models Uwe Ryssel, Joern Ploennigs, Klaus Kabitzsch

Michael Folie ITK Engineering GmbH Munich, Germany

Department of Computer Science Dresden University of Technology Dresden, Germany

[email protected]

{uwe.ryssel, joern.ploennigs, klaus.kabitzsch}@inf.tu-dresden.de ABSTRACT

and in the automotive domain. The complexity and diversity of the implementations steadily increase with new features added to the already complex control tasks like ABS or ESP. The challenge of the engineering process is to ensure the reliability and safety of the software. Especially in the automotive domain safety guarantees are crucial for all future x-by-wire technologies. To increase and guarantee the safety and reliability of the created software components they are tested repeatedly during their development process. Most of these tests are performed within a simulated car environment, as cars and hardware are unavailable in early stages or the tests are too risky. These so-called in-the-loop tests are introduced and classified by the development stage of the system under test in Section 2.1. A car is a very complex physical system, where the simulation models of simple subsystems already get large, complex, and cumbersome to simulate. To increase the simulation performance, the designers use different simulation models that are tailored to the individual test case and function of interest. This results in a high number of simulation models for the components with their optional functions and the environment in different conditions and levels of detail. The simulation models are often built of components itself. This simplifies and speeds up the development process not only due to the reuse of components. All simulation models and components are stored in a component library. These libraries often develop a library scaling problem [1], as each feature added to a family of simulation models results in multiple copies of these models. The explosion of objects humbles first the designer, who has to select the correct simulation model or component out of many variants. As a result, he spends a long time on searching and if he cannot find the correct model, he recreates model variants or connects the wrong ones. Especially with the increasing complexity of the models and number of interfaces, this compatibility problem intensifies and creates not only costs but also counteracts the aim to increase reliability and safety. Generative programming can solve this problem by aggregating variants of the same simulation model or component in one parameterizable component. The user specifies the individual realization, which is then generated to his demands. This solution is comparable to a monolithic, parameterizable component, but these components contain many unused, specialized code, which causes a problem for resource limited real-time hardware used in hardware-in-the-loop test. Generative programming instead leaves only the relevant code fragments and results in better performing models therefore.

Embedded software is used nowadays in many applications. To ensure the function and reliability of the software, hardware-in-the-loop methods are commonly used to test it in a simulated environment. Due to the rising complexity of the implemented function, performance limitations and practicability reasons, the simulations are often specialized to test few aspects of the software and reach therewith a high diversity. This diversity is difficult to manage for a user and results in wrong selected components and compatibility problems. This paper presents a generative programming approach that handles the diversity and includes an interface concept to evaluate the compatibility. To use this approach profitably in real-world applications, a migration approach is presented using a model analyzer. The evaluation of the presented approach is exemplified in the automotive domain using MATLAB/Simulink.

Categories and Subject Descriptors D.2.13 [Software Engineering]: Reusable Software; D.2.9 [Software Engineering]: Management—Software configuration management; I.6.4 [Simulation and Modeling]: Model Validation and Analysis; I.6.5 [Simulation and Modeling]: Model Development

General Terms Design, Management, Reliability

Keywords Hardware-in-the-loop, Generative programming, Model design, Model migration

1.

INTRODUCTION

Various application areas use embedded software, which performs control tasks in process and building automation,

APGES ’07 October 4, 2007, Salzburg, Austria. Copyright the authors.

4

However, the best concept to reduce the number of models in a library is hard to establish, if the benefit applies only to new models and the number of existing models remains the same. Hence, it is very important to migrate existing models to the generative programming approach, to cut the number of variants. This paper addresses these three problems. First, the diversity of simulation models and components is managed by a generative programming approach. Second, the compatibility problem is addressed with an extended component model in Section 4 that leads to the extended library concept in Section 5. Third, the automatic migration of existing models to the library is explained in Section 6. The following Sections 2 and Section 3 introduce the related work and the example domain of this approach.

2.

ified before. White box methods can be additionally used to test the code. • Processor-in-the-loop (PIL) In the third optional level the compiled code is tested on a test board that contains the same processor used later in the target device. This level can be used to make black box tests. • Hardware-in-the-loop (HIL) In the last level the compiled code is uploaded to the target device under test. The environmental simulation, used in this level, has to meet real time requirements to allow conclusions about the reliability of the controller software. These methods can also be used in production to assure quality. The transition between these development stages can be simplified by model-driven architecture methods (MDA, see [12]). Using these methods, the platform independent model of the MIL test can be transformed to the code tested by SIL. This code can then be cross-compiled for the target device and tested by PIL and HIL test methods. One example for a simulation tool that supports MDA is MATLAB/Simulink [11] that allows to transform to code for embedded systems with Real-Time Workshop Embedded Coder [10] or TargetLink [7]. MATLAB/Simulink uses a function block oriented graphical meta language like many standard simulation tools as illustrated in Figures 10 and 11. The simulation is designed abstractly by connecting function blocks to a data flow graph. These signal oriented models are then compiled by the software to an efficient simulation. In-the-loop methods are often used in the automotive domain. In [13] for example, HIL is used to test controllers in their reaction to injected faults and [9] gives an example of testing an antilock brake system with HIL.

RELATED WORK

2.1

In-the-loop Testing

In-the-loop test methods can be used in embedded software development and production to ensure reliability and safety. These methods connect the system under test (SUT) to a simulation model of the environment (see Figure 1). For instance, if the brake system controller of a car have to be tested, the environment will be the brake system itself, whose simulation consists of models of hydraulic elements like pumps and tubes. Simulated environment model

Loop-back

Inputs

System under test

2.2

Outputs

Generative Programming

Generative programming (GP) is a software engineering paradigm to create software products from a software family automatically by a generator. The generation process is intended to create highly customized and optimized products based on elementary components according to a specification. These products can be software components or whole applications. Further information about GP can be found in [3]. GP is often used to create code in high level languages, like C# or Java [6, 8], or to generate graphical user interfaces (GUI), which represent a higher level compared to simple code generation [15]. Among other application areas, like business process software, GP is used for embedded systems, too. For instance, Czarnecki et al. [2] describe the experiences with GP in embedded domains, like automotive, space and aerospace. They generate code directly for electronic control units or other embedded systems. Weiland and Richter [18] use software product lines to configure Simulink models. But they do not generate Simulink models directly. Instead they create MATLAB scripts, which patch a given reference Simulink model. That means that only a few components are removed or added. Their goal was not to create optimal models, i. e. models without unused parts, or to solve the library management, but to simplify the configuration process. In previous work [14], Ryssel et al. presented an approach, where function block based models are generated directly by

Figure 1: In-the-loop test method The structure of in-the-loop methods is basically always equal. The simulated environment passes input signals to the SUT, which processes the input and creates output signals. The simulation uses these output signals to calculate its new state and new input signals, which close the loop. As long as both the environment simulation and the SUT are deterministic, the test scenarios are reproducible. This makes it possible to specify test conditions and retest the SUT during all development stages in a test-driven development. Depending on the development stage of the SUT the in-the-loop methods are classified in: • Model-in-the-loop (MIL) In this level the tested component is specified as a model. This executable model can be used to check if the specification complies with the requirements. • Software-in-the-loop (SIL) In the second level the tested component is available as code, e. g. in C language. This code can be, for instance, generated automatically from the model spec-

5

generative programming. The paper additionally introduced an active library concept [4] that is extended by interoperability definitions. This approach is continued in this paper.

3.

only syntactical data types, which make it difficult to automatically evaluate the compatibility between ports. Especially if components are connected that were created by different design groups, often compatibility problems will occur. For example, if one group designs the model for the brake system and another group for the tire, it will happen that the brake system model outputs the braking force, while the tire model needs a negative acceleration as input. Simulink permits to connect these models without noticing the incompatibility, because both use the double type to code the values.

EXAMPLE DOMAIN

For a better understanding this section introduces a brake system model from our industrial partner working in the automotive domain. They use HIL tests for quality improvement and validation of the functions of electronic control units.

Parameters

Input ports

Output ports

Component

Figure 3: Structure of a basic component Hence, it is necessary to evaluate the compatibility of models. But this requires semantical information that is usually only available informally in the function block documentation. To achieve both compatibility checks and the generation of components, they are decomposed in subparts and extended ports. A subpart is an atomic implementation of a simple functionality like signal sampling, a movingaverage filter or a PID controller algorithm. The extended port is an interface concept to encapsulate single input and output ports with advanced signal processing functionality like alarm-thresholds or signal limiters. This decomposition of components is introduced in [14]. In the example mentioned above, the brake system decomposes to the subparts pedal, feed flow, drain flow, reservoir and the brake cylinders. The braking force output is defined as extended port, while the tire model has an acceleration input as extended port. To assure a compatible connection two participating extended ports have to be predefined as compatible, the so-called complementary port pair. In this case the tire model and the brake system model are incompatible (see Figure 4). First, the tire model has to be completed to a wheel model with a braking force input, to get a compatible connection (Figure 5).

Figure 2: Schema of a brake system Figure 2 shows the hydraulic schema of the modeled example brake system. The model varies in the input signal simulating the brake pedal behavior, in the number of brake circuits (one or two) and brake cylinders (four, six or eight), and in the distribution of the cylinders to the circuits. MATLAB/Simulink is chosen as an example modeling language for this approach due to its function block based graphical language that is common in the automotive domain to model the environment for HIL tests.

Braking force

4.

COMPONENT MODEL AND EXTENDED PORTS

Acceleration

Before any models can be generated with Generative Programming, the components have to be described formally. Our approach uses a function block based component model. According to IEC 61499 and IEC 61804, a function block is an encapsulated algorithm with input and output data points complemented by parameters, which can alter both the function and the structure [5]. This corresponds to the component model shown in Figure 3. In this model the less abstract term port is used instead of data point for the interfaces. Input and output ports have predefined data types like integer, float or any structured type based on numeric types. As restriction for compatibility only ports, which have the same data type, can be connected. The blocks used in MATLAB/Simulink can be represented by such a component model. However, Simulink blocks have

Brake

no complementary port pair (not connectable)

Tire

Figure 4: Example of a non-complementary port pair The concept of complementary port pairs is not limited to the same physical value type, like force or pressure. The semantical information can further include ranges of values and extended functionality like alarms, too. As an example a pair can describe the transmission of a braking force signal between 30 and 3000 N, sampled with 100 Hz and creating an alarm above 2800 N. If these extended ports are reused

6

Braking force

Figure 7 shows the active element Brakesystem representing a system. This is a simplified version of the brake system model introduced in Section 3. An active element of the type system is defined as a closed group of other active elements, which are connected to implement certain functions. If the group offers an external interface, it will be called subsystem, like the active element Distribution. The content of this element is visible in this view and it contains other active elements and two subparts (the sum blocks). The other active elements Pedal, Feed flow and Drain Flow are components. They are the basic active elements and contain no other active elements, except for the special extended ports. Extended ports can contain subparts to implement certain interface functionality, like signal limitation, and are therewith active elements. Extended ports are often described as a complementary port pair, which is a pair of by definition compatible ports. The example uses two complementary port pair types: p for a hydraulic pressure and Q for a flow. The types are indicated on the connection. The second brake circuit in the example is optional. Each break circuit consist of one Feed flow and Drain flow component and the shared Distribution subsystem, which contains the brake cylinders (the four active elements inside the subsystem). These variants are expressed in ICCL as introduced in the next section.

Braking force

Brake

Complementary port pair

Wheel

Figure 5: Example of a complementary port pair in other components, the compatibility definition covers also these implemented and even new developed components. double

...

force

hydraulic brake

frictional

centrifugal

...

(30. . . 3000 N)

5.2

Figure 6: Hierarchical order of port types Because complementary port pairs can be detailed and extended in their definition, they can be hierarchically ordered to control their high number. Figure 6 shows a part of the hierarchy used in the example domain. The root is the numerical data type (double), which branches to physical value types and then to semantical types. Each level adds semantical information. The deepest leaf of this example means: a hydraulic braking force between 30 and 3000 N coded as double value.

5. 5.1

The ICCL

Our ICCL uses configuration parameters to differentiate the variants of a system. There are two types of configuration parameters: Structure parameters change the structure of the active element, which can affect the inner composition and interfaces. By contrast, behavior parameters change only the function, like filter parameters or other constants. Including these parameters the ICCL has to contain further information to express active elements: • Parameters Structure and behavior parameters for the configuration of the active elements.

EXTENDED ACTIVE LIBRARY

• Interfaces The input and output ports of the active element.

Active Elements

• Parts A list of the contained subparts and active elements with their parameter settings.

As mentioned in the introduction, a simple component library cannot solve the software reuse problems. The active library of generative programming is suitable for software reuse, but lacks the compatibility evaluation. Therefore the active library is extended by complementary port pairs to an extended active library. This library contains all elements described in the component model: The subparts, the extended ports and the structure of components and subsystems. These elements are specified in an implementation component configuration language (ICCL, see [3]), which is in our case expressed in XML [19]. Because components and subsystems are composed of subparts and other components, both structural elements are abstracted to so-called active elements in the library. These active elements define the construction plans for the items and can contain other active elements and subparts. Hence, they represent the hierarchy of the domain: Active elements are systems at the highest level and detail to subsystems and components on the basic level. Also extended ports and complementary port pairs are treated as special active elements. An example will illustrate the different types:

• Connections The connections between the parts. These information are independent of the target system, the active elements are created for. Hence, with a new generator and some additional information, like library paths of the subparts, the described active elements can be instantiated in every function block based modeling language. The instantiation of interfaces, parts and connections can be varied by the structure parameters. Listing 1 shows the sample code of the ICCL definition for the active element Brakesystem. In the structparameterlist section the structure parameter c is declared, which defines the number of brake circuits (one or two). This structure parameter changes the appearance of the system as defined in the specific elements. The interface section is empty because a system has no interface. The subparts and active elements contained in the specified element are defined in the parts section. The above

7

Q

p p

p

Q

p

Q

Q Drain flow 1

Feed flow 1 Pedal

p

Q Feed flow 2

Drain flow 2

optional

Q

Distribution (with four brake cylinders)

Figure 7: The active element Brakesystem They are used in generative programming to specify the requirements of the system to generate [3]. The language can be a script language, a graphical modeling language or simply a GUI. Our approach uses a GUI as DSL for an easy access by the user. Like in previous work [14] the GUI is dynamically created from additional information in the ICCL. Therefore, the active element definition can contain a dsl section that specifies the GUI elements to set the parameters. The example in Listing 1 would create for instance a text edit field with the description Number of brake circuits and a start value of two. The dynamic creation of the DSL GUI minimizes the effort to create the DSL. It requires neither a separated complex modeling nor a text-based specification language and therewith no expert knowledge of the system engineer. However, this approach is only possible as long as the generation targets a single, specified model language, which is Simulink in this case. There is no explicit dependency management across the active elements, but the hierarchical structure of the active elements defines implicit dependencies. As the system engineer later will work with the created models, i. e. recreate parts of it and adapt it manually to his needs, an explicit dependency management would reach its limit. Figure 8 shows a screenshot of the DSL GUI during configuration of a more complex brake system model than the model in Figure 7.

... ...

Listing 1: Example of an active element specification declared parameter c conditions the active element FeedFlow2 and the not enlisted DrainFlow2. These elements are instantiated only if the condition is fulfilled. This is the case if the system is configured for two brake circuits (c = 2). The active element Distribution will be created in any case, but it has to be configured to consider the number of brake circuits. Hence, its parameter bc is set to c, which means they have always the same value. In this way, subordinated parts are automatically configured by the system. The connections section contains all connections between the defined parts. In the example there are two connections between the Pedal element and the two FeedFlow elements, whereas the second connection is conditioned. The last section contains information for the dynamic creation of the DSL GUI, which is presented in the next section. The formal description of the active elements and the used subparts form the extended active library.

5.3

The DSL Figure 8: The DSL GUI

A domain-specific language (DSL) is a programming language to describe a specific task in a certain domain [17].

8

5.4

The Model Generator

els to active elements and can therefore also be used to simplify the design of new components. The domain engineer can adhere to his common software tool like Simulink to design the model and then he can convert this model to an active element without the need to describe it manually. The created active elements are added to the active library and then can be used by the generator. Because the conversion between models and active elements is possible in both directions with the model analyzer and the generator, the active library can be actually hidden from the system and application engineers.

The generator instantiates the configured active element by assembling the contained active elements and subparts according to the construction plan specified in ICCL (Section 5.1). The contained active elements are created recursively, i. e. the generator recalls itself to generate the contained elements. These recalls are done until the lowest level is reached, where only subparts have to be copied from a subpart library. After all contained active elements were created, the generator connects them. The result of the generation process is an executable simulation model, based on the requirements specified in the DSL GUI. The generator and the DSL GUI are implemented in the MATLAB script language. This language provides special functions to create models in Simulink and to integrate Java classes that implement the GUI.

6.

COMPONENT MIGRATION

6.1

The Necessity of Migration

6.2

The Model Analyzer

To automate the integration of existing components the model analyzer has to accomplish two consecutive duties: The first step is to identify potential component variants. That means that all existing models have to be searched and compared to find similar subsystems, which are candidates for active elements. And in the second step the found component variants have to be described automatically in the ICCL. So the commonalities and differences have to be identified and combined to one active element.

The introduction of software product lines is mostly an organizational process. There are many case studies that deal with the process of creating a new software product line and migrating existing knowledge. An example can be found in [16], where a more hardware-centric product line is migrated to a new software product line. In the previous sections the application of generative programming for HIL models were explained. This approach can be applied only if the components are newly developed. But in real world many components already exist and it is not reasonable to discard them for a new engineering paradigm. The solution is to integrate existing components and their variants. Doing this manually is very time-consuming and compensates the time saving benefits of generative programming. To accelerate the migration process, existing software components need to be automatically integrated. For example Yoshimura et al. [20] merge existing code of embedded systems automatically to get reusable components. Their merging process works on code level and is therefore not applicable to simulation models. But their basic approach, to merge existing models to new components, is transferable. Figure 9 illustrates the inclusion of the automatic migration in the whole design process. Existing models are taken from the old component library (a simple set of component and model blocks) which are analyzed and migrated by a model analyzer. The migration is a transformation of mod-

6.3

Identification of Potential Component Variants

The identification of potential component variants is a complex task, because existing models have to be scanned for structural similarities. The simplest approach is to compare two components and count the number of equal parts. However, each part gets it functional meaning by the context of the connected parts and it would be more promising to look for equal component pairs and larger groups. But also this approach is limited to find only syntactic similarity. Many complex functions can be designed in different ways and still have the same behavior. A simple example is a series of additions, which are commutative and exchangeable in sequence. These sequences are semantically identical but syntactically different. To find such semantical similarities more semantical information is necessary like the commutativity rule. However, due to these problems all scan algorithms can only identify candidates for an active element and need to be reviewed by the domain engineer. In the current approach the identification is still done manually by the domain engineer. He selects a set of existing components and passes it to the conversion algorithm. An automated identification is in research. Extension and refining of active elements

Import

Component library

Model analyzer

Existing models

New models

calls

Model generator

Active element

Active library

Requirements via GUI

Compatibility statements

creates Application engineer

Domain engineer

Figure 9: Managing of models using model analyzer and generator

9

Executable validated models

6.4

Converting the Variants to ICCL

connections as they are used in the active element description (see Section 5.1). The parts correspond to Simulink blocks, which are treated as subparts. Then these lists are compared pairwise to the lists of the other variants. In the part list, entries are equal when they have the same Simulink block type and name. In the example the block V1 of the type Volume-Block appears in all variants and is equal according to the comparison rule. Interface entries are equal, if the ports have identical names, like QA1. Connection list entries are equal, if the same parts are connected. The usage of the name as comparison criterion is not optimal. Two blocks with slightly different names like V1 and V_1 would not be set equal. Reducing the identification to the block type is more complex, because the connections have to be involved in the comparison of models. This addresses the semantical problem discussed in Section 6.3 and will be probably extended with the solutions found there. Until then, this simple, syntactical solution works for all examples, but potentially creates more entries than needed. The resulting list consists of the merged entries of the source lists extended by automatically created condition expressions. For example QA1 is common to all variants and is unconditioned. Instead, QA4 exists only in the fourth variant and therefore the condition nQ==4 is formulated. Beside the blocks itself, their configurations are compared, too. The left sum block exist in the variants 2 to 4 and the number of input ports needs to be adjusted with nQ. This results in the following listing:

An ICCL description of an active element contains all information needed to generate their variants. Section 5.1 has explained that the generated variants depend on conditions containing structure parameters. The converter has to gather all possible parts and connections, create the structure parameters and formulate the appropriate conditions. In the simplest case only one structure parameter exists that also enumerates the variants. But in many cases the component has more than one point of variation and structure parameter. Figures 10 and 11 introduce the variation points of a tube model, a part of the example domain of this paper. Equal in all variants is the block V1 that represents the volume of the tube. The input of this block is a volume flow QA, which is summed of up to four input flows (QA1, . . . , QA4). This is the first variation point as shown in Figure 10. The second variation point is an optional pressure input port p1 that is added to the result as done in Figure 11. This can be for example a back pressure. The varying subparts of the example are the sum blocks in front of and behind V1 that are omitted in the simplest case.

1 QA1

Q+

p

1 QA1

1 pP

Q+

2 QA2

V1

(a) 1in tube

Q+

2 QA2

1 pP

(b) 2in tube 1 QA1

1 QA1

p

V1

p

1 pP

V1

3 QA3

Q+

2 QA2

p

1 pP

V1

=2”>

3 QA3 4 QA4

(c) 3in tube

(d) 4in tube

Listing 2: Example of an automatically created subpart configuration

Figure 10: Variants of the variation point ‘Number of flow inputs’

6.6 2 QA1

Q+

V1 1 QA1

Q+

p

V1

1 pP

1 pP

1 p1

(a) 1in tube

(b) 1/1in tube

Figure 11: Variants of the variation point ‘Existence of additional pressure input’ The conversion is done in two steps: • In the first step active elements are created for each variation point. In the example that means that two active elements are created: one that varies the number of flow inputs by a parameter nQ and one that varies the additional pressure input by a parameter p. • In the second step, these two active elements are merged to one active element, which contains two structural parameters that can be set independently.

6.5

Merging Active Elements

In the second step, the set of active elements is merged to one active element with a set of independent variation points. The algorithm compares the lists of elements in the same way as during the creation of the single-variant active elements. Only the recombination rules of the conditions are different. However, in some cases the merging is not possible: If the volume block V1 of the example depends on both nQ and p, the merging would fail due to a decision problem occurring later in the generator. If the condition of the first active element with nQ evaluates to true, but the condition of the second active element with p decides contrary, the generator cannot decide, whether the component should be created or not. It lacks semantical information to combine both conditions (by AND, OR, etc.). This information has to be included in the condition by the model analyzer, but it lacks the same knowledge. However, this case cannot occur as long as the variation points are independent. That is why, the current implementation requires independent variation points or manual merging by the domain engineer.

p

Creating a Single Active Element

6.7

To convert a variation point to an active element, the single components are converted to lists of interfaces, parts and

Revising of the Active Elements

As a last step the domain engineer is instructed to review the created active element and add additional information

10

not contained in the existing models, such as extended port types or DSL descriptions. In the current implementation a set of MATLAB functions is developed to adapt the created active elements accordingly.

7.

[8]

CONCLUSION

The paper has introduced a generative programming approach to manage the diversity of simulation models based on function block like components. Therefore, it reduces the number of models by combining variants in a generative model, called active element. Further, it solves the compatibility problem by defining complementary port pairs. The model is stored in an extended active library, which structure was introduced. In our implementation, the user can access the library and specify the DSL in his used way via the Matlab/Simulink interface. Furthermore, the newly introduced migration approach permits the user to import existing models and to create new generative models in the common way with Simulink. In this sector, we are still researching the automatic identification of model variants for migration in an existing library. The common graphical interface and the migration approach simplify the paradigm change to a generative programming approach and make its usage convenient. The selected example domain of simulation models for inthe-loop tests of automotive embedded systems is particularly suitable for generative programming as it is component oriented, has a high model diversity, and requires optimized model realizations. Therefore, the developed solution is already in use by the industry. However, the introduced concept is applicable to any domain with comparable attributes, e. g. the device and system design in building automation.

8.

[9]

[10]

[11]

[12]

[13]

[14]

REFERENCES [15]

[1] Ted J. Biggerstaff. The library scaling problem and the limits of concrete component reuse. In Proceedings of the Third International Conference of Software Reuse, pages 102–109. IEEE, 1994. [2] Krzysztof Czarnecki, Thomas Bednasch, Peter Unger, and Ulrich W. Eisenecker. Generative programming for embedded software: An industrial experience report. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT conference on Generative Programming and Component Engineering, pages 156–172. Springer-Verlag, 2002. [3] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative Programming: Methods, Tools and Applications. Addison-Wesley, 2000. [4] Krzysztof Czarnecki, Ulrich W. Eisenecker, Robert Gl¨ uck, David Vandevoorde, and Todd L. Veldhuizen. Generative programming and active libraries. In Selected Papers from the International Seminar on Generic Programming, pages 25–39, London, UK, 2000. Springer-Verlag. [5] Christian Diedrich, Terry Blevins, Ludwig Winkel, and Francesco Russo. Function block applications in control systems based on IEC 61804. Annual Conference and Exhibition, ISA’s Houston (TX), 2001. [6] Dirk Draheim, Christof Lutteroth, and Gerald Weber. Generative programming for C#. SIGPLAN Not., 40(8):29–33, 2005. [7] dSPACE GmbH. TargetLink, visited 2007.

[16]

[17]

[18]

[19]

[20]

11

http://www.dspace.com/ww/en/pub/home/ products/sw/pcgs/targetli.cfm. Manuel F¨ ahndrich, Michael Carbin, and James R. Larus. Reflective program generation with patterns. In GPCE ’06: Proceedings of the 5th international conference on Generative programming and component engineering, pages 275–284, New York, NY, USA, 2006. ACM Press. Ki-Chang Lee, Jeong-Woo Jeon, Don-Ha Hwang, Se-Han Lee, and Yong-Joo Kim. Development of antilock braking controller using hardware in-the-loop simulation and field test. In Proceedings of the 30th Annual Conference of IEEE (IECON), volume 3, pages 2137–2141, 2004. The MathWorks, Inc. Real-Time Workshop Embedded Coder, visited 2007. http://www.mathworks.com/products/rtwembedded/. The MathWorks, Inc. Simulink — Simulation and Model-Based Design, visited 2007. http://www.mathworks.com/products/simulink/. Object Management Group (OMG). Model Driven Architecture, visited 2007. http://www.omg.org/mda/. M. Sonza Reorda and M. Violante. Hardware-in-the-loop-based dependability analysis of automotive systems. In IOLTS ’06: Proceedings of the 12th IEEE International Symposium on On-Line Testing, pages 229–234, Washington, DC, USA, 2006. IEEE Computer Society. Uwe Ryssel, Joern Ploennigs, and Klaus Kabitzsch. Generative function block design and composition. In Proceedings of the 6th IEEE Workshop on Factory Communication Systems (WFCS), Torino, Italy, pages 253–262, 2006. Max Schlee and Jean Vanderdonckt. Generative programming of graphical user interfaces. In AVI ’04: Proceedings of the working conference on Advanced visual interfaces, pages 403–406, New York, NY, USA, 2004. ACM Press. Mikael Svahnberg and Michael Mattsson. Conditions and restrictions for product line generation migration. In PFE ’01: Revised Papers from the 4th International Workshop on Software Product-Family Engineering, pages 143–154, London, UK, 2002. Springer-Verlag. Arie van Deursen, Paul Klint, and Joost Visser. Domain-specific languages: An annotated bibliography. SIGPLAN Notices, 35(6):26–36, 2000. Jens Weiland and Ernst Richter. Konfigurationsmanagement variantenreicher Simulink-Modelle. In Informatik 2005 — Beitr¨ age der 35. Jahrestagung der Gesellschaft f¨ ur Informatik e. V., Band 2, pages 176–180, 2005. World Wide Web Consortium (W3C). Extensible Markup Language (XML), visited 2007. http://www.w3.org/XML/. Kentaro Yoshimura, Dharmalingam Ganesan, and Dirk Muthig. Defining a strategy to introduce a software product line using existing embedded systems. In EMSOFT ’06: Proceedings of the 6th ACM & IEEE International conference on Embedded software, pages 63–72, New York, NY, USA, 2006. ACM Press.

Clock-directed Modular Code Generation from Synchronous Block Diagrams Dariusz Biernacki

Jean-Louis Colaco ∗

Marc Pouzet †

INRIA Futurs Orsay, France

Siemens VDO Toulouse, France

LRI, Univ. Paris-Sud 11 Orsay, France

ABSTRACT The compilation of synchronous block diagrams into sequential imperative code has been addressed in the early eighties and can be considered now as folklore. However, separate or modular code generation, though largely used in existing compilers and particularly in industrial ones, has been neither precisely described nor entirely formalized. Such a formalization appears now as a fundamental need in the long-term goal to develop a mathematically certified compiler for a synchronous language as well as in simplifying existing implementations. This article presents in full detail the modular compilation of synchronous block diagrams into sequential code. We consider a first-order functional language reminiscent of Lustre which it extends with a general n-ary merge operator, a reset construct and a richer notion of clocks. The clocks are used to express activation of computations in the program and are specifically taken into account during the compilation process to produce efficient imperative code. We introduce a generic object-based intermediate language to represent transition functions and we present a concise clock-directed translation function from the source to the intermediate language. We also address the target code generation phase by describing a translation from the intermediate language to Java and C.

1.

INTRODUCTION

Block diagram formalisms as found in Simulink [18] or Scade/Lustre [22] are widely used for embedded system design. Among them, synchronous block diagrams are based on a discrete model of time where signals are infinite streams and blocks define stream functions. The code generation from synchronous block diagrams into sequential imperative code is an old topic and has been addressed in the early years ∗

This work started while the author was at EsterelTechnologies. † This work was partially supported by the French ACI S´ecurit´e Alidecs.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright the authors, APGES 2007, Oct. 4th 2007, Salzburg, Austria.

12

of Lustre [4]. The subject can be considered now as a part of the original folklore in synchronous programming [2]. Given a stream function f : Stream(T ) → Stream(T 0 ) and a stream equation y = f (x), the code generation consists in producing a pair (ft , s0 ) made of a transition function of type S → T → T 0 × S and an initial state s0 of type S such that ∀n ∈ IN.yn , sn+1 = ft sn xn if x = (xi )i∈IN and y = (yi )i∈IN . The transition function takes a state and the current input and returns the current output with a new state. Its infinite repetition produces the sequence of outputs. In actual implementations, the transition function is written in imperative style with in-place modification of the state. Synchrony finds a very practical justification here: an infinite stream of type Stream(T ) is represented by a scalar value of type T and no intermediate memory nor complex buffering mechanism is needed. These principles generalize to functions with multiple inputs and multiple outputs. Code generation is obtained through a static scheduling of equations according to data dependences. Separate or modular code generation aims at producing a transition function for each block definition and composing them together to produce the main transition function. Nonetheless, modular code generation is not always feasible, even in the absence of causality loop, as illustrated by the equation (y, z) = f (t, y) with f (x, y) = (x, y). This equation defines two perfectly valid streams y and z (since y = t and z = y = t) but it cannot be scheduled independently of the way f is compiled. This observation has led to two different approaches to the compilation problem. One consists in compiling the program after a full inlining of function calls has been performed in order to keep the maximal expressiveness of the source language. The resulting set of equations can then be translated into imperative code through simple scheduling techniques. Techniques for forward or backward enumeration of state variables can be used to generate an explicit finite state automaton leading to a very efficient code [4, 14]. Unfortunately, this efficiency gain is at the price of modular compilation and, moreover, the size of the generated code may explode in practice. For this reason the enumeration must be restricted to a selected set of state variables (as done in the academic Lustre compiler [15]) but finding the adequate variables which lead to efficient code in both time and size is difficult. Conversely, modular compilation is mandatory in industrial compilers like the one of Scade. Each stream function is translated into an imperative function with no preliminary inlining unless requested by the programmer. Consequently, modular compilation imposes stronger causality constraints stating that every feedback

loop must cross an explicit delay. These constraints are, nonetheless, well accepted by Scade users. They are also justified by the need for tracability of the generated code, as required by certification authorities in the context of critical software. Modular compilation of synchronous block diagrams, though largely used in the Lucid Synchrone [21] compiler or in the industrial compiler of Lustre has never been described precisely or formalized entirely. Such a formalization appears now as a fundamental need in the long-term goal to develop a mathematically certified compiler of a synchronous language inside a proof assistant such as Coq [10] as well as in simplifying existing implementations. Additionally, it complements previous work done on the formalization of static analysis (such as clock calculus [7] and initialization analysis [8]), general principles of compilation [5] and language extensions [17, 6]. This article presents in detail the modular compilation of synchronous block diagrams into sequential code. The source language we consider is a first-order declarative language reminiscent of Lustre, general enough to make a suitable intermediate language for the compilation of automata as introduced in [6]. The language provides a n-ary merge operator as a way to combine complementary streams, a reset construct to restart a component in a modular way and a generalized notion of clocks. (Clocks express various activation conditions in the program). We introduce a generic object-based intermediate language to represent sequential transition functions and we illustrate its versatility by giving a translation into Java and C. Synchronous programs are translated modularly into programs from the intermediate language. Clocks play a central role during the process of translation and are specifically treated to generate good control structures. This approach is in contrast with classical compilation methods based on enumeration techniques. The use of an intermediate language and the special treatment of clocks leads to a very concise description of the compilation process. This work is part of a long-term project to develop a certified Lustre compiler implemented in Coq. A reference compiler (based, in particular, on the material presented in this article) has been written in OCaml. Also, the implementation and proofs in Coq are under way. For lack of space, we only describe the main steps in the compilation chain and do not give the formal semantics of the source and target languages. The article is organized as follows. In Section 2, we present a synchronous data-flow kernel. In Section 3, we address the issue of schedulability of a set of equations and of transforming it into a normal form. In Section 4, we define an intermediate sequential language for representing transition functions. In Section 5, we define the translation from the data-flow language to the intermediate language. In Section 6, we describe Java and C code generation from the intermediate language. In Section 7, we sketch the construction of the entire compiler. In Sections 8 and 9, we discuss related and future work and we conclude.

2.

A CLOCKED DATA-FLOW LANGUAGE

We define a synchronous data-flow kernel considered as a basic calculus into which any Lustre program can be translated. Actually, we make it a little more general by equipping it with a means to reset a function application

13

in a modular way, following [16] and we provide value constructors belonging to some enumerated types and filtering mechanisms as introduced in [6]. Moreover, the code generation being done after type and clock verification, we assume that every term is annotated with its proper type and clock information.

2.1

Syntax and Intuitive Semantics

A program is made of a list of global node declarations (d) and type declarations (td). A global node declaration is of the form node f (p) = p with var p in D. To simplify the presentation, only abstract and enumerated types are provided here. a stands for annotated expressions (e) with their clock (ct). Expressions are made of values (v), tuples (a1 , ..., an ), initialized delays (v fby a), variables (x), pointwise applications (op (a1 , ..., an )), node instantiations with a possible reset condition (f (a1 , ..., an ) every a), a combination operation (merge x (C1 → a1 ) ... (Cn → an )) or a sampling operation (a when C(x)). The expression a when C(x) is the sampled stream of a on the instants where x equals C. Symmetrically, merge is the combination operator: if a is a stream producing values belonging to a finite enumerated type bt = C1 + ... + Cn and a1 , ..., an are complementary streams (at a given cycle, at most one stream is producing a value), then it combines them to form a faster stream. f (a1 , ..., an ) every a is the resetable function application: the internal state of the application of f is reset every time the boolean stream a is true. To simplify the presentation, we write op (a1 , ..., an ) for the point-wise application of an external function op (e.g., +, not) to its argument and f (a1 , ..., an ) every False for the application of a stateful function. A value (v) can be a constructor (C) belonging to an enumerated type or any immediate value (i) (e.g., an integer). A pattern pat may be a variable or a tuple of patterns (pat, ..., pat). A declaration (D) can be a collection of parallel equations. An equation defines a value (pat = a). To simplify the presentation, the boolean type is not explicitly given and we assume the existence of an initial environment defining bool = False + True. In the same way, combinatorial functions are provided externally. a e

::= ::=

as D pat d p td v ck ct

::= ::= ::= ::= ::= ::= ::= ::= ::=

ect v | x | v fby a | a when C(x) | (as) | op (as) | f (as) every a | merge x (C → a) ... (C → a) a, ..., a pat = a | D and D x | (pat, ..., pat) node f (p) = p with var p in D x : bt; ...; x : bt type bt | type bt = C + ... + C C|i base | ck on C(x) ck | ct × ... × ct

Clock annotations do not play any role in the data-flow semantics of the language so we omit them in the examples below. v fby a stands for the initialized delay. It is assumed that the first parameter of fby is an immediate value. x y v fby x x+y

x0 y0 v x0 + y0

x1 y1 x0 x1 + y1

x2 y2 x1 x2 + y2

x3 y3 x2 x3 + y3

... ... ... ...

-- count the number of top between two tick node counting (tick:bool; top:bool) returns (o: int) var v: int; let o = if tick then v else 0 -> pre o + v; v = if top then 1 else 0; tel;

Figure 1: The counting node in Scade and in Lustre If op is a combinatorial function, op (a1 , ..., an ) applies it point-wise to its arguments (classical arithmetic operations are written in infix form). The kernel provides a general sampling mechanism based on enumerated types. This way, the classical sampling operation e when x of Lustre, where x is a boolean stream, is written e when True(x). In the same way, e when not x is now written e when False(x). The conditional if/then/else, the delay pre and initialization operator -> of Lustre can be encoded in the following way: if x then e2 else e3

=

e1 -> e2

=

pre (e)

=

merge x (True → e2 when True(x)) (False → e3 when False(x)) if True fby False then e1 else e2 nil fby e

The conditional if/then/else is built from the merge operator and the sampling operator when. The initialization operation e1 -> e2 first returns the very first value of e1 and then the current value of e2 . The uninitialized delay operation pre (e) is a shortcut for nil fby e where nil stands for any constant value which has the type of e. 1 h x y x -> y pre (x) z = x when True(h) t = y when False(h) merge h (True → z) (False → t)

True False x0 x1 y0 y1 x0 y1 nil x0 x0 y1 x0 y1

True x2 y2 y2 x1 x2 x2

False x3 y3 y3 x2 y3 y3

ck

ck

((v fby xck ) + y ck ) , if ck is the clock of x and y, and similarly he writes merge h (True → z)(False → t) instead of ck (merge h (True → z ck on True(h) )(False → tck on False(h) )) , if ck is the clock of h. To make the article self-contained, we present the associated clock conditions which must be verified by annotated terms. In a real implementation, the clock calculus apply to unannotated terms and produces annotated terms (the clock calculus shown here is based on [6]). We define several judgements to express that a program is well clocked. For instance, the judgement H ` e : ct states that the expression e has a clock type ct under the clock environment H, whereas the judgement H ` D states that D is a set of well clocked equations under the clock environment H. An environment H is of the form [x1 : ck1 , ..., xn : ckn ], where xi 6= xj for i 6= j. H ` e : ct ct

H`e

H ` a1 : ck ... H ` an : ck H ` op (a1 , ..., an ) : ck

: ct

H ` a1 : ck ... H ` an : ck

... ... ... ... ... ... ... ...

H ` a : ck

H ` f (a1 , ..., an ) every a : ck × ... × ck H ` a : ck

H ` a : ck

H ` x : ck

H ` a when C(x) : ck on C(x) H ` v fby a : ck H ` x : ck

H ` a1 : ck on C1 (x) ... H ` an : ck on Cn (x)

H ` merge x (C1 → a1 ) ... (Cn → an ) : ck

Forgetting clock annotation, the counting example of Figure 1 is written: node counting (tick:bool; top:bool) = (o:int) with var v: int in o = if tick then v else 0 -> pre o + v and v = if top then 1 else 0

2.2

Clocks do not have to be explicitly given in the source language, e.g., the programmer writes (v fby x) + y instead of

H ` a1 : ct1 ... H ` an : ctn

H ` v : ck

H ` (a1 , ..., an ) : ct1 × ... × ctn

H, x : ck ` x : ck

H ` pat : ct

H ` a : ct

H ` pat = a

H ` D1

H ` D2

H ` D1 and D2

H ` pat1 : ct1 ... H ` patn : ctn H, x : ck ` x : ck

Annotating Terms with Their Clocks

The code generation applies once type verification and clock calculus have been performed. At the end of these steps, every term is annotated with its type and clock. Typing is almost standard [20]. The purpose of the clock calculus is to reject programs which cannot be executed synchronously and is also defined as a type inference system. 1

Then, it is the purpose of the initialization analysis to check that the computation result does not depend on the actual nil value.

14

`base p : Hp

H ` (pat1 , ..., patn ) : ct1 × ... × ctn

`base q : Hq

` r : Hr

Hp , Hq , Hr ` D

` node f (p) = q with var r in D ` x1 : t1 , ..., xn : tn : [x1 : ck1 , ..., xn : ckn ] `base x1 : t1 , ..., xn : tn : [x1 : base, ..., xn : base] In the rules for when and merge, it is assumed that the type correctness of the control variable and the type constructors has been verified by the type checker.

3.

TOWARDS SEQUENTIAL CODE

3.2

The language of Section 2 is declarative, with the evaluation of expressions controlled by the clocks formalism. In order to generate sequential code from the source language, we first need to address the issue of finding a right order of equations as well as of dealing with faster, possibly stateful, computations inside slower ones.

3.1

Syntactic Dependences and Scheduling

Following the definition introduced in [14], we say that an expression a statically depends on x if x appears free in a and not as an argument of a delay fby. Left (a) returns the set of variables appearing this way in a (we overload the notation for Left (e) and Left (D)). Def (D) defines the set of variables defined in D. If pat = a is an equation in D, every variable from pat immediately depends on variables from Left (a). The transitive closure of this relation defines the notion of static dependence. A program is causal when for each node the corresponding graph of dependencies is acyclic. ck

Left (e ) Left (v fby a) Left (op (a1 , ..., an )) Left (f (a1 , ..., an ) every a) Left (x) Left (v) Left (merge x (C1 → a1 ) ... (Cn → an )) Left (a when C(x))

= Left (e) ∪ Vars(ck) = ∅ = ∪1≤i≤n Left (ai ) = ∪1≤i≤n Left (ai ) ∪ Left (a) = {x} = ∅ = ∪1≤i≤n Left (ai ) ∪ {x}

=

{x} ∪ Left (a)

Left (pat = a) Left (D1 and D2 )

= =

Left (a) Left (D1 ) ∪ Left (D2 )

Def (pat = v fby a) Def (pat = a) Def (D1 and D2 )

= = =

∅ Vars(pat) Def (D1 ) ∪ Def (D2 )

Vars(x) Vars((pat1 , ..., patn ))

= =

{x} ∪1≤i≤n Vars(pati )

An equation pat = a from a set of equations D is ready ((pat = a) ∈ R (D)) when it does not depend on any other equations. We make a particular treatment of equations of the form pat = (v fby a)ck . Indeed, in this case, pat corresponds to a memory so it will have to be scheduled after any other computation reading variables from pat. (pat = v fby ack ) ∈ R (D) if (pat = a) ∈ R (D) if

Vars(pat) ∩ Left (D) = ∅ Left (a) ∩ Def (D) = ∅

We write D|pat=e for the exclusion of the equation pat = e from D. A sequence of equations l = pat1 = e1 , ..., patn = en is a feasible schedule of D if l ∈ Sch (D), where: pat = a ∈ Sch (pat = a) if pat = a, l ∈ Sch (D) if

Left (a) ∩ Vars(p) = ∅ (pat = a) ∈ R (D) ∧ l ∈ Sch (D|pat=a )

In the remainder, we assume that programs have passed a causality check to insure the existence of a schedule. The data-flow nature of this language makes the implementation of classical graph-based optimization (e.g., copy elimination, common-subexpression elimination) particularly easy. We do not detail them here.

15

Putting Equations in Normal Form

We introduce a source-to-source transformation which consists in extracting stateful computations which appear inside expressions. This is a necessary step towards the translation into sequential code. For example, the following equation (omitting nested clock annotations for clarity): z = ((((4 fby o) ∗ 3) when True(c)) + k)ck on True(c) and o = (merge c (True → (5 fby (z + 1)) + 2) (False → ((6 fby x)) when False(c)))ck is rewritten into: z = (((t1 ∗ 3) when True(c)) + k)ck on True(c) and o = (merge c (True → t2 + 2) (False → t3 when False(c)))ck and t1 = (4 fby o)ck and t2 = (5 fby (z + 1))ck on True(c) and t3 = (6 fby x)ck on False(c) In the same way, node instances (f (a1 , ..., an ) every e) are extracted from nested expressions. The extraction is made through a linear traversal, introducing equations for each stateful computation. After the extraction, equations and terms can be characterized by the following grammar. a ::= eck e ::= a when C(x) | op (a, ..., a) | x | v ce ::= merge x (C → ca) ... (C → ca) | e ca ::= ceck eq ::= x = ca | x = (v fby a)ck | (x, ..., x) = (f (a, ..., a) every x)ck D ::= D and D | eq The extraction is straightforward and not detailed here. In the remainder we assume that equations have been normalized. Note that it would also be possible to introduce a new intermediate language instead of the source-to-source transformation. This is essentially a matter of taste, the main advantage of the present formulation being to save the redefinition of auxiliary definitions.

4.

A SIMPLE OBJECT-BASED LANGUAGE

A classical way to encapsulate a state and a collection of functions that manipulate this state is given by the object paradigm. We are not interested in inheritance and object polymorphism aspects but only in the capability to encapsulate a piece of memory managed exclusively by the methods of the class. We propose here to define a very simple objectbased language (in the sense of encapsulation) that will be used as an intermediate language for the translation. Adopting this point of view has two main advantages compared to a direct translation into one target language like C or Java. First, object orientation is a well known paradigm and this may help to understand the basic principles of the first level of our transformation. Second, using it as a generic intermediate language allows to derive a very simple translation to any target language like C or Java. A stateful stream function or node can be considered as a simple class definition with instance variables and two methods step and reset. Variables are used to represent the internal state of the node (i.e., one for each delay). The

method step inherits its signature from the node it was generated from and it implements a single step of the node. The method reset is parameterless and it is in charge of the initialization of state variables. One difference with respect to object orientation is the absence of dynamic object creation since block diagrams we have considered are not recursive. The syntax of the language is given below. A program is made of a sequence of global definitions (d) of classes. An instruction S may be the assignment of a local variable (x := c) or of a state variable (state (x) := c), a sequence (S ; S), the re-initialization method invocation of an object o (o.reset), the invocation of the step method of object o (o.step (e1 , . . . , en )), a void statement (skip) or a control structure (case (x) {C1 : S1 ; ...; Cn : Sn }). If x is of type a = C1 + ... + C + ... + Cn , we shall write case (x) {C : S} or case (x) {C1 : skip; ...; C : S; ...; Cn : skip} indifferently. An expression (e) can be either the access to a local variable (x) or to a state variable (state (x)), an immediate integer constant (i) or a value constructor (C), a tuple (e1 , ..., en ) or a function call (f (e1 , . . . , en )). A class (f ) defines a set of memories (m), a set of instances for objects used inside the body of the methods step or reset and these two methods. fun f (p) returns (p) = var p in S | class f = memory m instances j reset() returns () = S step (p) returns (p) = var p in S S ::= x := c | state (x) := c | S ; S | skip | o.reset | (x, ..., x) = o.step (e, ..., e) | case (x) {C : S; ...; C : S} e ::= x | v | state (x) | op(e, ..., e) v ::= C | i j ::= o : f, ..., o : f p, m ::= x : t, ..., x : t d ::=

5.

THE TRANSLATION

The translation closely follows the principle of the coiterative semantics described in [5]. The main differences are that absent values are not explicitly represented at runtime and states are modified in-place instead of being returned by transition functions. Moreover, we restrict it to the first-order case. We introduce the following notation. If p = [x1 : t1 ; ...; xn : tn ] and p2 = [x01 : t01 ; ...; x0k : t0k ] then p1 + p2 = [x1 : t1 ; ...; xn : tn ; x01 : t01 ; ...; x0k : t0k ] provided for all i, j such that 1 ≤ i ≤ n, 1 ≤ j ≤ k, xi 6= x0j . [ ] denotes the empty substitution. In the same way, we write m1 + m2 for the composition of two substitutions on memory variables and j1 + j2 on object instances. If s1 = S1 , ..., Sn and s2 = S10 , ..., Sk0 are two lists of instructions, we write s1 @s2 = S1 , ..., Sn , S10 , ..., Sk0 for their concatenation. Clocks in the source language are transformed into control structures in the target language. Intuitively, a computation S on clock base on C1 (x1 ) on C10 (x01 ) is transformed into the code: case (x1 ) {C1 : case (x01 ) {C10 : S}}. We define the function Control (., .) such that Control (ck, S) returns a control structure so that S is executed only when ck is true: Control (base, S) = Control (ck on C(x), S) =

S Control (ck, case (x) {C : S})

16

We define the function Join(., .) which merges two control structures gathered by the same guards: Join( case (x) {C1 : S1 ; ...; Cn : Sn }, case (x) {C1 : S10 ; ...; Cn : Sn0 }) = case (x) {C1 : Join(S1 , S10 ); ...; Cn : Join(Sn , Sn0 )} Join(S1 , S2 ) = S1 ; S2 JoinList(S) = S JoinList(S1 , ..., Sn ) = Join(S1 , JoinList(S2 , ..., Sn )) The translation is defined by a set of mutually recursive functions. TE (m,si,j,d,s) (e) defines the translation of an unannotated expression e in a context (m, si, j, d, s) and returns an expression from the target language c. We overload the notation for annotated expressions a. m stands for a memory environment, si stands for a list of instructions that initialize the memory, j is an environment for node instances, d is an environment for local variables and s is a list of instructions. TA(m,si,j,d,s) (x, ca) defines the translation of an expression which is stored into x and it returns a new context. TEq (m,si,j,d,s) (eq) defines the translation of an equation. We use two auxiliary functions: the operation TEList (m,si,j,d,s) (a1 , ..., an ) translates a list of expressions and returns a list of expressions from the target language, whereas TEqList (m,si,j,d,s) (l) translates a list of equations. The definitions of the translation functions are given in Figure 2. The first six rules apply to stateless expressions. The translation of a merge operator whose result is stored into a pattern pat is obtained by translating each branch and storing the corresponding result in pat. Note that since the result of each branch is annotated with its proper clock, the merge construction does not generate any code by itself. For a node instance (f (a1 , ..., an ) every x)ck , we introduce a fresh name o which is an object of class f . The initialization code consists in calling the reset method. The step function is essentially the result of calling the reset method when x is true and calling the step function associated to o. These two actions must be performed only when ck is true. A memory equation x = (v fby a)ck is translated into an assignment of the state variable x executed when ck is true. Finally, the code generation of a node consists in first scheduling the set of equations and then to translate them iteratively.

6.

TARGET CODE GENERATION

The intermediate language of Section 4 can be quite naturally translated into either a fully-fledged object-oriented language or into a low-level imperative language. Our main interest lies in the generation of C code which is the traditional target of compilers of synchronous languages. Moreover, a compiler for C has been recently certified in Coq [3] which should make it possible to develop a complete certified compiler from Lustre to assembly code. Nonetheless, in order to illustrate the versatility of the intermediate language, we consider also Java code generation.

6.1

Translation into Java

As already pointed out, the intermediate language of Section 4 can be seen as a sequential language with the data encapsulation mechanism characteristic of object-oriented languages. As such, it lends itself to a straightforward translation into existing object-oriented languages, e.g, Java. Each class definition is translated into a Java class definition with two methods step and reset. The state variables

TE (m,si,j,d,s) (ect ) TE (m,si,j,d,s) (v) TE (m,si,j,d+[x:t],s) (x) TE (m+[x:t],si,j,d,s) (x) TE (m,si,j,d,s) (op(a1 , ..., an )) TE (m,si,j,d,s) (a when C(x))

= = = = = =

TE (m,si,j,d,s) (e) v x state (x) let c1 , ..., cn = TEList (m,si,j,d,s) (a1 , ..., an ) in op(c1 , ..., cn ) TE (m,si,j,d,s) (a)

TEList (m,si,j,d,s) (a1 , ..., an )

=

(TE (m,si,j,d,s) (a1 ), ..., TE (m,si,j,d,s) (an ))

TAList (m,si,j,d,s) (x1 , ..., xn )(ca1 , ..., can )

= let (m1 , si1 , j1 , d1 , s1 ) = TA(m,si,j,d,s) (x1 , ca1 ) in ...TA(mn−1 ,sin−1 ,jn−1 dn−1 ,sn−1 ) (xn , can ) TA(m,si,j,d,s) (y, (merge x (C1 → ca1 )...(Cn → can ))ck ) = TAList (m,si,j,d,s) (y, ..., y)(ca1 , ..., can ) TA(m,si,j,d,s) (x, eck ) = (m, si, j, d, Control (ck, x := TE (m,si,j,d,s) (e))) TEq (m,si,j,d,s) (x = ca) TEq (m,si,j,d+[x:t],s) (x = (v fby a)ck ) TEq (m,si,j,d,s) ((x1 , ..., xk ) = (f (a1 , ..., an ) every x)ck )

= TA(m,si,j,d,s) (x, ca) = let c = TE (m,si,j,d,s) (a) in (m + [x : t], [state (x) := v]@si, j, d, [Control (ck, state (x) := c)]@s) = let (c1 , ..., cn ) = TEList (m,si,j,d,s) (a1 , ..., an ) in (m, [o.reset]@si, [(o, f )] + j, d, Control (ck, case (x) {(True : o.reset)})@ Control (ck, (x1 , ..., xn ) = o.step (c1 , . . . , cn ))@s) where o 6∈ Dom(j)

TEqList (m,si,j,d,s) (eq) TEqList (m,si,j,d,s) (eq, l)

= =

TP (node f (p) = q with var r in D)

= let m, si, j, d, s = TEq ([],[],[],r,[]) (l) in class f = memory m instances j reset() returns () = si step (p) returns (q) = var d in JoinList(s) where l ∈ Sch (D)

TEq (m,si,j,d,s) (eq) TEq TEqList (m,si,j,d,s) (l) (eq)

Figure 2: The Translation Function specified in the memory section are translated into field declarations. The instance variables specified in the instances section are translated into object creations using their default constructors. Actions and expressions are directly translated into the corresponding Java constructs. In case of multiple outputs, the answer type of the step method is represented as a structure with the fields representing the subsequent elements of the tuple. For instance, the counting example of Figure 1 is translated into the following Java code: public class counting { boolean x_1; int x_2; public void reset() { x_1 = true; x_2 = 0; }

typedef struct { int x_1; int x_2; } counting_mem;

public int step(boolean tick, boolean top) { int o; int x_3; int v; boolean b; b = x_1; x_1 = false; if (top) {v = 1;} else {v = 0;} if (b) {x_3 = 0;} else {x_3 = x_2 + v;} if (tick) {o = v;} else {o = x_3;} x_2 = o; return o; }}

6.2

The C code generator follows the principles already demonstrated by the ReLuC compiler. 2 For each class, the state variables specified in the memory section and the instance variables specified in the instances section are gathered in a separate structure, used for representing the internal state of each object. Both the reset and the step functions are translated into functions that accept an additional argument self, passed by reference, that points to a concrete instance of the corresponding state structure (object). If necessary, the answer type of the step function again is represented as a structure that allows for tuples to be returned. 3 Actions and expressions are directly translated into the corresponding C constructs. For instance, the counting example of Figure 1 is translated into the following C code:

void counting_reset(counting_mem *self) { 2

ReLuC is a prototype compiler developed at Esterel Technologies; it is used as an implementation reference for the next Scade generation. 3 As a matter of fact, ReLuC differs from our approach in the way multiple outputs are handled. In ReLuC a memory structure is extended with an appropriate number of fields for storing the outputs.

Translation into C

17

self->x_1 = 1; self->x_2 = 0; } int counting_step(int tick, int top, counting_mem *self) { int o; int x_3; int v; int b; b = self->x_1; self->x_1 = 0; if (top) {v = 1;} else {v = 0;} if (b) {x_3 = 0;} else {x_3 = x_2 + v;} if (tick) {o = v;} else {o = x_3;} self->x_2 = o; return o; }

7.

TOWARDS A COMPLETE COMPILER

In this section, we discuss the organization of the entire compiler as well as possible extensions of the technique proposed in this article. The source language we have presented is a first-order data-flow language similar to Lustre. Nonetheless, it exhibits specific constructions which make it both a good target for implementing extensions of Lustre as well as a good input language to generate efficient sequential code. The two specificities are the n-ary merge (instead of the current operator of Lustre) and a modular reset construct (also absent in Lustre). The merge is used to combine n complementary streams which introduces a general notion of clocks. The reset is used to restart the behavior of a node. These two constructions can be encoded in Lustre but then the generated code is inefficient or calls for complex optimization techniques to cancel the effect of the encoding. Providing merge and reset as basic primitives allows for a more direct and efficient compilation. In [6], we have proposed a conservative extension of Lustre with hierarchical state automata, basing it on a translation semantics into the clocked data-flow kernel considered in the present article. The merge and reset constructs were used extensively in this encoding. We advocated that this translation not only gives the semantics of the whole language but appears to be an effective way to implement the compiler in the sense that the generated code is reasonably good in terms of size and efficiency. This solution has been integrated in the ReLuC compiler of Lustre and the Lucid Synchrone compiler. Thus, the present article completes this work and highlights the missing part of the compilation chain. Altogether, these results serve as the basis of Scade 6, the next version of Scade. The code generation is done after type checking, clock checking and specific static analyzes such as causality or initialization analysis. If one of these steps fails, the compilation process stops. Type checking is almost standard [20]. The clock calculus rejects programs which cannot be executed synchronously and is defined as a type inference problem [7]. The causality analysis checks the absence of instantaneous loops in order to ensure that a static schedule is feasible. Finally, the initialization analysis checks that the behavior does not depend on the initial values of delays [8]. At the end of these analyzes, the program is annotated with type and clock information. Then, constructs that are not part of the data-flow kernel (e.g., control structures such as activation conditions or state machines) are translated into the clocked data-flow kernel. In Section 1, we have stressed the importance of modular compilation for separate compilation, code tracability and to keep the size of the generated code linear in the

18

size of the source program. The price to pay is an extra constraint on feedback loops which must explicitly cross a delay (not nested inside nodes). Thus, in practice, modular compilation affects the causality analysis which has to reject semantically correct programs because they cannot be compiled modularly. To avoid this restriction, an industrial compiler such as the one present in the Scade-Suite proposes to inline, on user demand, specific nodes of the model. This feature can also be used to find a good compromise in terms of program size/program speed (as any compiler optimizer silently does). This explains why it is important to complement a synchronous compiler with an inliner. Note that such an inliner is a trivial task in Lustre thanks to its substitution principle. In Section 5 we have presented a control optimization which gathers two consecutive control structures on the same guard. There are other optimizations that can be implemented in this translation, particularly around the scheduling policy. The role of scheduling is to transform a partially ordered bunch of equations into a sequence of assignments. The solution is not unique, in general, and we can take advantage of the freedom to favor certain optimizations. For instance, the scheduling can contain heuristics which try to schedule consecutively equations that are guarded by the same clock. Then the merging of consecutive control structures will be able to factorize more control conditions. Another classical optimization is related to the reuse of variables (which corresponds to removing copy variables in classical compilation terminology [19]). As mentioned in [14], a stream x and its previous value pre x can be stored in the same variable if the computation of x is not followed by a use of pre x. The ReLuC compiler as well as the reference compiler we have developed to support the present article implement a scheduling heuristics for that purpose.

8.

DC AND DC+

This article is related to the work done on the code generation of synchronous languages and in particular Lustre and Signal. We have already pointed out the differences with the academic compiler of Lustre. The distinction with Signal comes from the different expressiveness of our source language and its associated clock calculus. For example, the language does not allow to express relations as Signal does but only functions. Moreover, we use a simpler clock calculus based on ML-type inference whereas the clock calculus of Signal calls for boolean resolution [1, 11]. It is not possible, for example, to express in our language the disjunctive clocks of Signal of the form ck1 ∨ ck2 (stating that a value is present if one of the two clocks is true). Clocks are only of the form base on c1 on ... on cn and they correspond directly to nested control-structures. The introduction of an n-ary merge and the general form of clocks presented here does not seem to have been considered in Signal. Whereas this construction could be encoded in Signal, obtaining good code would call for the full expressiveness of its clock calculus. It would be interesting to know if the resulting code would coincide with the one obtained here with simpler (but dedicated) techniques. This work is connected also with the works on the DC format [13] and its extension DC+ [9] introduced for the compilation of synchronous languages. The DC format allows for the control properties that the source language, we consider, does. However, as the author in [13] points out,

DC was not considered a programming language whereas the language we consider does have a static and dynamic semantics. This means that the result of all steps in the compilation chain can be statically typed or clock checked. This feature is important in compilers used for critical software and has already been used in the qualification process of industrial projects that use Scade as a development tool. Finally, code generation is often related to code distribution (see [12] for a survey and most recent references). It does not seem, however, that the description of the modular compilation of a language such as the one treated here has been considered in this context.

[7]

[8]

[9]

9.

CONCLUSION AND FUTURE WORK

This article has presented the code generation of a synchronous data-flow language into imperative code. This code generation is modular in the sense that each node definition is translated into an independent pair of imperative functions. The principles presented in this article have been in use for several years in the compilers of Lucid Synchrone and the ReLuC compiler of Scade/Lustre and have been experimented with on various real-size examples. However, their precise description has never been published or described before. Such a formalization appears now as a fundamental need in order to develop a certified compiler for a synchronous language in a proof assistant as well as to simplify existing implementations. Moreover, it offers an opportunity to replace process-based certification as used today by Scade customer with a stronger mathematical argument of certification using proof techniques.

[10] [11]

[12]

[13]

[14]

Acknowledgements: We thank Alexandre Bertails, Malgorzata Biernacka, Florence Plateau and the anonymous reviewers for useful comments on the presentation of this work.

10.

[15]

REFERENCES

[1] T. Amagbegnon, L. Besnard, and P. Le Guernic. Implementation of the data-flow synchronous language signal. In Programming Languages Design and Implementation (PLDI), pages 163–173. ACM, 1995. [2] A. Benveniste, P. Caspi, S.A. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The synchronous languages 12 years later. Proceedings of the IEEE, 91(1), January 2003. [3] Sandrine Blazy, Zaynah Dargaye, and Xavier Leroy. Formal verification of a C compiler front-end. In FM 2006: Int. Symp. on Formal Methods, volume 4085 of Lecture Notes in Computer Science, pages 460–475. Springer-Verlag, 2006. [4] P. Caspi, N. Halbwachs, D. Pilaud, and J. Plaice. Lustre: a declarative language for programming synchronous systems. In 14th ACM Symposium on Principles of Programming Languages. ACM, 1987. [5] Paul Caspi and Marc Pouzet. A Co-iterative Characterization of Synchronous Stream Functions. In Coalgebraic Methods in Computer Science (CMCS’98), Electronic Notes in Theoretical Computer Science, March 1998. Extended version available as a VERIMAG tech. report no. 97–07 at www.lri.fr/∼pouzet. [6] Jean-Louis Cola¸co, Bruno Pagano, and Marc Pouzet. A Conservative Extension of Synchronous Data-flow

19

[16]

[17]

[18] [19] [20] [21]

[22]

with State Machines. In ACM International Conference on Embedded Software (EMSOFT’05), Jersey city, New Jersey, USA, September 2005. Jean-Louis Cola¸co and Marc Pouzet. Clocks as First Class Abstract Types. In Third International Conference on Embedded Software (EMSOFT’03), Philadelphia, Pennsylvania, USA, october 2003. Jean-Louis Cola¸co and Marc Pouzet. Type-based Initialization Analysis of a Synchronous Data-flow Language. International Journal on Software Tools for Technology Transfer (STTT), 6(3):245–255, August 2004. Sacres consortium. The declarative code dc+ , version 1.4. Technical report, Esprit project EP 20897 : Sacres, 1997. The coq proof assistant, 2007. http://coq.inria.fr. Thierry Gautier and Paul Le Guernic. Code generation in the sacres project. In Towards System Safety, Proceedings of the Safety-critical Systems Symposium, SSS’99, pages 127–149, Huntingdon, UK, Feb 1999. Springer. Alain Girault. A survey of automatic distribution method for synchronous programs. In International Workshop on Synchronous Languages, Applications and Programs (SLAP), Edinburg, UK, April 2005. ENTCS. N. Halbwachs. The declarative code DC, version 1.2a. V´erimag, Grenoble, France, October 1995. unpublished report. N. Halbwachs, P. Raymond, and C. Ratel. Generating efficient code from data-flow programs. In Third International Symposium on Programming Language Implementation and Logic Programming, Passau (Germany), August 1991. N. Halbwachs and Pascal Raymond. A tutorial of lustre. http://www-verimag.imag.fr/SYNCHRONE/, 2002. Gr´egoire Hamon and Marc Pouzet. Modular Resetting of Synchronous Data-flow Programs. In ACM International conference on Principles of Declarative Programming (PPDP’00), Montreal, Canada, September 2000. F. Maraninchi and Y. R´emond. Mode-automata: a new domain-specific construct for the development of safe critical systems. Science of Computer Programming, (46):219–254, 2003. The MathWorks. http://www.mathworks.com/products/simulink. Steven S Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997. B. C. Pierce. Types and Programming Languages. MIT Press, 2002. Marc Pouzet. Lucid Synchrone, version 3. Tutorial and reference manual. Universit´e Paris-Sud, LRI, April 2006. Distribution available at: www.lri.fr/∼pouzet/lucid-synchrone. SCADE. http://www.esterel-technologies.com/scade/, 2007.

Separate Compilation of Hierarchical Real-Time Programs ∗ into Linear-Bounded Embedded Machine Code Arkadeb Ghosal

Daniel Iercan

Christoph M. Kirsch

UC Berkeley

"Politehnica" U. of Timisoara

University of Salzburg

[email protected] [email protected] [email protected] Thomas A. Henzinger Alberto EPFL Sangiovanni-Vincentelli [email protected] UC Berkeley [email protected] ABSTRACT

task frequencies, mode switching, and I/O times and dependencies but not task implementations, which are assumed to be done in some general purpose language such as C or Java. A task in HTL is essentially sequential code that reads input, computes, and writes output. HTL offers two fully hierarchical programming constructs: (sequential, conditional, parallel) composition of tasks as well as refinement of abstract into concrete tasks. An abstract task has a frequency, specific I/O times and dependencies, and a worstcase execution time (WCET) but no implementation. An abstract task is a temporally conservative placeholder for a concrete task with an implementation. A concrete task refines an abstract task if the concrete task has the same frequency but at least as much time to compute, i.e., possibly relaxed I/O times and dependencies, and as much or smaller WCET than the abstract task. The result is that a concrete HTL program is time-safe (schedulable) if it refines a time-safe abstract HTL program [6]. In general, checking refinement in HTL is exponentially faster than checking time safety (schedulability). However, there are abstract HTL programs that are not time-safe but for which time-safe refinements exist. After checking refinement and time safety, HTL programs are compiled into so-called E code of the Embedded Machine [9] or E Machine. E code is virtual machine code with specific instructions for timing I/O activity, native task computation, and hostto-host communication. Time-safe E code is portable and predictable, and therefore provides a hardware- and OS-independent target abstraction for compiling possibly distributed real-time programs. Further compiling E code into native code is possible but has so far not been necessary even for high-performance applications such as helicopter flight control [12, 11]. However, E code has originally been designed as target for compiling non-hierarchical programs written in Giotto [8], which is the predecessor of HTL. As a consequence, HTL compilation into conventional E code involves flattening the input HTL programs and may therefore result in exponentially larger output E code programs. Flattening also prohibits compiling parts of large HTL programs separately. In this paper, we propose to extend E code by adding instructions for maintaining hierarchical program structure at runtime to enable separate compilation of (parts of) HTL programs into E code programs whose size linearly bounded by the size of HTL code. Our solution trades off runtime performance for compile-time convenience and E code size because execution of E code compiled from flattened HTL programs may result in lower runtime overhead than the execution of such hierarchical E code, or HE code. All original and most new instructions can be executed in constant time. However, a single new instruction that involves traversing hierarchical structure requires linear time with respect to the size of the original HTL program. This may only be avoided by again flattening

We have recently proposed a coordination language, called Hierarchical Timing Language (HTL), for distributed, hard real-time applications. HTL is a hierarchical extension of Giotto and, like its predecessor, based on the logical execution time (LET) paradigm of real-time programming. Giotto is compiled into code for a virtual machine, called the Embedded Machine (or E machine). If HTL is targeted to the E machine, the hierarchical program structure needs to be flattened which makes separate compilation difficult and may result in code of exponential size. In this paper, we propose a generalization of the E machine which supports a hierarchical program structure at runtime through real-time trigger mechanisms that are arranged in a tree. We present the generalized E machine, and a modular compiler for HTL that generates code of linear size. The compiler may generate code for any parts of a given HTL program separately in any order.

Categories and Subject Descriptors C.3 [Special-Purpose and Application-Based Systems]: Realtime and embedded systems

General Terms Language, Compiler, Virtual Machine

Keywords Real Time, Hierarchy, Code Generation

1.

INTRODUCTION

Hierarchical Timing Language (HTL) is a hierarchical coordination language for distributed, hard real-time applications [6]. HTL programs determine portable and predictable real-time behavior of periodic software tasks running on a possibly distributed system of host computers. An HTL program specifies task-to-host mappings, ∗This work was supported in part by the GSRC grant 2003-DT-660, the NSF grant CCR-0208875, HYCON, the Artist II Network of Excellence on Embedded Systems Design, the European Integrated Project SPEEDS, the SNSF NCCR on Mobile Information and Communication Systems, the Austrian Science Fund Project P18913-N15 and the Center for Hybrid and Embedded Software Systems (CHESS) at UC Berkeley, which receives support from the National Science Foundation (NSF award #CCR-0225610), the State of California Micro Program, and the following companies: Agilent, DGIST, General Motors, Hewlett Packard, Infineon, Microsoft, and Toyota

Copyright by the authors, APGES 2007, Oct. 4th 2007, Salzburg, Austria.

20

HTL programs prior to compilation. For simplicity and clarity, we have chosen to define new instructions in RISC style where most instructions have rather simple, “atomic” semantics. So far, runtime performance has not been an issue but may easily be improved using CISC-style macro instructions. The contributions of this paper are the design of HE code (Section 5), the design of compile-time support for separate HTL compilation (Section 6, Section 7) into HE code, the implementation of runtime support for HE code as part of an existing E Machine implementation, and the implementation of compile-time support for separate HTL compilation into HE code in an existing HTL compiler implementation [1]. Throughout the paper we use a case study (Section 2) to illustrate the contributions of the paper. Sections 3 and Section 4 discuss the key features of HTL and E Machine, respectively. Section 8 compares our approach with related work.

2.

Figure 2: Overview of implementation event (even if the task terminates earlier). The LET model decouples time when the input is read and output is written from actual execution which makes the model time- and value-deterministic, portable and composable [9]. The communication model of HTL is based on communicator [6], a typed variable that can be accessed (read from or written to) with a specified periodicity. Communicators are used to exchange data with environment (sensors and actuators are special cases of communicators) or between tasks. A task in HTL reads from certain instances of some communicators, computes a function and writes to certain instances of other communicators. Fig. 3 shows three communicators, h1 (period 100ms), u1 (period 100ms) and p1 (500 ms): h1 denotes the height in tank T1, u1 denotes the motor current (for pump P1) computed by the controller and p1 denotes the perturbation in tank T1. Task t1 reads the fourth instance of h1, computes control law for tank T1 and writes to the fifth instance of u1. The latest read and earliest write time implicitly specify the LET of the task; in case of t1, the LET is from 300 to 400 ms. The sequential code of the task is not expressed in HTL but in a “foreign” language (e.g. C in our example).

CASE STUDY

The case study implements a distributed real-time controller for a three-tank system (3TS in short). There are three tanks T1, T2 and T3 (Fig. 1) each with an evacuation tap tap1, tap2 and tap3 respectively. The tanks are interconnected via taps tap13 and tap23. Two pumps, P1 and P2, feed water in the tanks T1 and T2 respectively. The goal of the controller is to maintain the level of water in tanks T1 and T2 under the presence and absence of perturbations (simulated by the evacuation taps). If there is no perturbation, a P (proportional) controller is used; under perturbations, a PI (Proportional Integral) controller is used [10]. The modeling generates four possible scenarios: (1) both pumps controlled by P controllers, (2) P1 and P2 controlled by P and PI controllers respectively, (3) P1 and P2 controlled by PI and P controllers respectively, and (4) both pumps controlled by PI controllers.

Figure 1: Overview of three tanks system The controller is implemented in a distributed fashion on three E machines (Fig. 2). Each E Machine is implemented in C on a Unix machine. The three E machines implement the controller for P1, the controller for P2, and the interface controller. The tasks are implemented in C. The schedulers in the Unix machines are used for scheduling the released tasks. The E Machines communicate with each other through UDP. Communication with the 3TS plant (reading heights of water in the tanks and sending fill debit for each pump) is done via a TCP server implemented on a Windows 98 machine. Refer to http://htl.cs.uni-salzburg.at/HEcode for implementation details and online demo.

3.

Figure 3: Interaction between tasks and communicators Tasks can also communicate with one another through untimed variables referred as ports. Fig. 3 shows two tasks readHeights and estimateP1 communicating via port p. Task readHeights reads the sensors and writes to the fourth instance of communicator h1. Task estimateP1 reads port p and fifth instance of u1, computes perturbation for tank T1 and writes to the second instance of p1. Task estimateP1 reads the port p and hence reads the output of the task readHeights as soon as the task readHeights completes execution and does not have to wait until the fourth instance of h1. Hierarchical Programming Structure. A set of interacting tasks with the same frequency form an HTL mode with a specified mode period. For example, the tasks readHeights and estimateP1 belong to mode imode, which has a period of 500ms. All tasks in a mode execute with the periodicity of the mode. The tasks within a mode interact through ports and communicators; tasks from different modes interact only through communicators. For example, readHeights and t1 are in different modes and interact through communicator h1; readHeights writes h1 while t1 reads h1. HTL allows mode switching (at the end of mode periods) to model changes in real-time controllers. In the complete specification we define two modes oneP and onePI invoking P and PI control tasks for pump P1 respectively; the modes switch between themselves based on the perturbation in tank T1(i.e. the

HIERARCHICAL TIMING LANGUAGE

HTL is centered around two constructs: the core computation and communication model, and the hierarchal programming structure. The first deals with task specification and communication between tasks, while the second deals with composition and refinement of tasks. Computation and Communication Model. The computation model is the Logical Execution Model (LET) of task execution. A LET task is a sequential code block with no internal synchronization points. Each task has a release event and a termination event specified by clock ticks or completion events of other tasks. The task reads the inputs at the release event (even if the task starts executing later) and the task updates the outputs at the termination

21

program with another). Mode refinement helps in conservatively simplifying program analysis: e.g. schedulability check can only be done for the top-level program, and the refinement constraints preserve [6] the schedulability across the hierarchy. Distribution. HTL modules can be distributed over several hosts. Distribution is specified through a mapping of top-level modules to hosts. All refinements of all modes in a top-level module are bound to the same host to which the module is mapped. The distribution is implemented by replicating shared communicators on all hosts, and then have the tasks that write to shared communicators broadcast the outputs. For this purpose, the LET model is extended to include both WCETs as well as the worst-case output transmission times (WCTTs). The semantics (i.e., the real time behavior) of an HTL program is independent of the number of hosts, but code generation and program analysis take the distribution into account. In the case study, the three modules pumpOne, interface and pumpTwo are implemented on three different hosts.

value of the communicator p1). A network of modes (with one being the start mode) and mode switches is an HTL module; e.g. modes oneP and onePI are grouped in one module. An HTL program is a set of modules and a set of communicators. The modes within a module are composed sequentially while modes from different modules are composed in parallel. The communicators are used to exchange data between tasks in same module (but possibly different modes) and between tasks in different modules. The HTL program (Fig. 4) for the controller, 3TS Controller, consists of three modules pumpOne, interface and pumpTwo and six communicators. Refer to http://htl.cs.uni-salzburg.at/HEcode for the full specification and the complete program.

4. THE EMBEDDED MACHINE The Embedded Machine or E Machine controls the release of tasks and the time when variable values are exchanged (i.e. copied or initialized). The variables are accessed through so called drivers. A task or a driver is implemented in any other language e.g. C. In the original E Machine definition there are six E code instructions. There are three non-control flow instructions: call, release and future. The instruction call(d) executes a driver d . The instruction release(t) releases a task t for execution. The task may not be immediately executed; the actual execution of the task will depend on the real-time schedular being used. The instruction future(e, a) marks E code at address a for future execution when the predicate e evaluates to true, i.e., when e is enabled. The pair (e, a) is a trigger: predicate e observes events such as time tick events (raised by the real-time clock) and completion events of tasks (raised by the executing platform) and is enabled when all observed events have occurred. The E machine maintains a FIFO queue of triggers. If multiple triggers in the queue are enabled at the same instant, the corresponding E code is executed in FIFO order, i.e., in the order in which the future instructions that created the triggers were executed. There are two control flow instructions: if and jump. The conditional instruction if (cnd , a) branches to the E code at address a if predicate cnd evaluates to true. A condition cnd observes variable states. The non-conditional control flow instruction jump(a) executes an absolute jump to E code address a. There is one termination instruction return which completes the execution of an E code sequence.

Figure 4: HTL program for 3TS controller A mode (referred as parent mode) can be refined by another HTL program (referred as refinement program); any mode in the refinement program is a child mode of the parent mode. The mode modeOne is refined (Fig. 4) by program programOne which has a single module with two modes oneP and onePI. Both the modes oneP and onePI are child modes to parent mode modeOne. Each task (referred as child task) in a child mode maps to a unique task (referred as parent task) in the parent mode. Modes modeOne, oneP and onePI invokes tasks t1, t1P and t1PI respectively with t1 being the parent of the other two tasks. During execution, the parent task is replaced by the child task i.e. instead of t1, either t1P or t1P1 executes. While t1 represents a control task for pump P1, t1P and t1PI are the P and PI version of the controller respectively. In other words, parent task is an abstract specification while children tasks are concrete implementations of the specification. Instead of specifying a functional behavior, a parent task specifies the timing behavior of the concrete task e.g. parent mode and child mode have identical periods, child task cannot be released (resp. terminated) later (resp. earlier) than the parent task and the WCET of the child task is bounded by that of the parent task; these constraints are referred as refinement constraints. There can be tasks (in parent mode) which are not parent to any child task and will execute in parallel with tasks in child mode. Mode refinement does not add expressiveness; an HTL program with multiple levels of refinement can be translated into an equivalent flat program without refinement. Mode modeOne can be replaced by the switching modes oneP and onePI. However mode refinement helps in a structured and concise specification. In the top-level, mode modeOne invokes control task for pump P1; however no distinction is made for different scenarios. In the second level (program programOne) the distinction is made between absence and presence of perturbations and thus requiring the use of P and PI control tasks. There can be subsequent refinement (e.g. refOne) which distinguishes slower and faster invocations of PI control task. Refinement helps in succinct expression of choice (a task is parent to several chidren tasks in different sequential modes), change (parent and child task have different I/O), space (empty parent tasks that can be refined later) and replacement (replacing a refinement

Figure 5: Triggers, queue of triggers and implicit tree We make the following changes to allow execution of hierarchical code on E Machine. First, the trigger definitions are modified. Each trigger in addition to an event predicate and E code address, tracks a parent trigger and a set of children triggers. With the new trigger definition, a trigger queue is an implicit tree (Fig. 5). Second, two stacks are added to track the hierarchy of the program. The stacks are used to remember the position of code being executed in the hierarchy of the whole program and to add parent and children information to newly created triggers. Third, the modified E machine maintains three trigger queues instead of one. While one FIFO queue order the actions of simultaneously ordered triggers,

22

parallel FIFO queues provide second ordering on simultaneously enabled triggers. In case of code generated for HTL programs, the multiple queues are used to order communicator updates, mode switche checks, communicator reads and task releases. Fourth, E code instructions are modified/ added to operate on the new triggers and to access the stacks and the queues. The new E code is referred as Hierarchical E code (HE code).

5.

The E machine is waiting if none of the triggers in any of the queues are enabled, PC = ⊥ and address stack is empty. The machine is in state writing if there exists at least one enabled trigger in the write queue. The machine is in state switching if there exists no enabled trigger in the write queue but there exists at least one enabled trigger in the switch queue. The machine is in state postswitch if there exists no enabled trigger in the write and the switch queue but there exists at least one enabled trigger in the read queue. If the machine is waiting, a time tick or a task completion event updates the event for the triggers. For a time tick event: for all triggers ((n, ·), ·, ·, ·) where n > 0, the trigger is updated to ((n − 1, ·), ·, ·, ·). For a completion event for task t: for all triggers ((·, cmps), ·, ·, ·) and t ∈ cmps, the trigger is updated to ((·, cmps\ {t}), ·, ·, ·). If the E machine enters into non-waiting state (by enabling some triggers) after handling an event, the write queue is traversed in FIFO order until an enabled trigger is found and the trigger is handled. When a trigger (·, a, ·, ·) is handled, program counter PC is set to a, the name of the trigger is stored in register R0 and the trigger is removed from the queue. The E machine continues the execution at addresses following a until a return instruction is executed. When a return execution is executed, the trigger (which triggered the code execution) is deleted from the system and code execution starts from the address popped from the address stack. This is continued until the address stack is empty. At this point the control starts searching for other enabled triggers in the write queue; if no other trigger is enabled, the machine enters into switching state. If the E machine enters into switching state, the switch queue is traversed in FIFO order (and enabled triggers are handled) until the machine is in state post-switch. If the E machine enters into post-switch state, the read queue is traversed in FIFO order (and enabled triggers are handled) until the machine is in state waiting. The handling of triggers in all the three queues are identical. Next we discuss the effect of executing the HE code instructions. Let the configuration be (state, writeQ , switchQ, readQ , tasks, PC , R0 , R1 , R2 , R3 , parent stack , address stack ) when an instruction at address a is being executed (i.e. PC = a). Once the instruction is executed, the new configuration be (state ′ , writeQ ′ , switchQ ′ , readQ ′ , tasks ′ , PC ′ , R0 ′ , R1 ′ , R2 ′ , R3 ′ , parent stack ′ , address stack ′ ). If ins(a) is being executed, PC ′ = next (a) unless otherwise mentioned. A parameter has the same value over the execution unless otherwise mentioned.

SEMANTICS OF HE CODE

The semantics of an HE code program can be represented as a set of traces where each trace is a sequence of configurations. Each configuration tracks the following: state of program variables, set of released tasks, queues of triggers, address of the current instruction being executed, set of registers storing trigger names, stack of trigger names and stack of addresses. Formally, a trace is a (possibly infinite) sequence of configurations u0 , u1 , · · · where u0 is the starting configuration. Each configuration is a tuple (state, writeQ , switchQ , readQ , tasks , PC , R0 , R1 , R2 , R3 , parent stack , address stack ), where state is variable state, writeQ, switchQ and readQ are FIFO queues of triggers, tasks is a set of tasks, PC is a program counter, R0 , R1 , R2 , and R3 are registers to store trigger names, parent stack is a stack of trigger names, and address stack is a stack of addresses. For any two consecutive configurations ui−1 , ui where i > 0, ui is the result of progress of clock (time tick event), completion of task (task completion event) or execution of an instruction (see below) at configuration ui−1 . The variable state state tracks the values of program variables; e.g. for HTL programs the variables are communicators and ports. The task set tasks tracks the set of tasks released for execution; once a task completes execution the task is removed from tasks . The program counter PC is the address of the current instruction being executed. The set of program addresses is adrset ∪ {⊥}; PC = ⊥ signifies there is no instruction being executed and the E machine is either checking for enabled triggers or waiting for an event. We will denote the instruction at address a as ins(a) and the next address following a as next (a). A trigger g is a tuple (e, a, par , clist), where e is an event, a is an address, par is a trigger name, and clist is a list of trigger names. An event is a pair (n, cmps), where n ∈ N≥0 and cmps is a set of task names. The positive integer n denotes the number of time tick events being waited for. The set cmps denotes the tasks whose completion event is being waited for. A trigger is enabled when n = 0 and cmps = ∅. When a trigger is created, it is assigned an unique name until the trigger is removed. A trigger name is the reference to a trigger; a trigger can be accessed through trigger names. The registers store trigger names. A register can be copied and/or reset without affecting the trigger unless the trigger is removed or modified by HE code instructions. The triggers are unique identities and are not duplicated; however they can be modified when events occur. A trigger may be modified by updating the associated event, changing the parent, or by modifying the children list. The trigger queues writeQ, switchQ and readQ are FIFO queues of triggers. A trigger can be present in at most one queue. The address stack tracks the hierarchical position of the program, mode and module for which code is being executed. The parent stack remembers the hierarchy of the switch triggers. There are two operations to access the stacks: push and pop. Operation push(address stack , a) pushes address a on address stack . Operation pop(address stack ) returns the top value of address stack ; the value is an address a if the stack is non-empty, ⊥ otherwise. Operation push(parent stack , Rx) pushes the trigger name stored in register Rx (where x ∈ 0, 1, 2, 3) on parent stack . Operation pop(parent stack ) returns the top value of parent stack ; the value is a trigger name if the stack is non-empty, ⊥ otherwise.

• ins(a) = call (d ): driver d is executed which updates variable state to state ′ • ins(a) = release(t): tasks ′ = tasks ∪ {t} • ins(a) = writeFuture (e, a): writeQ ′ = writeQ ◦ g ′ where g ′ = (e, a, ⊥, ∅) and R1 ′ stores the name of g ′ • ins(a) = switchFuture (e, a): switchQ ′ = switchQ ◦ g ′ where g ′ = (e, a, ⊥, ∅) and R1 ′ stores the name of g ′ • ins(a) = readFuture (e, a): readQ ′ = readQ ◦ g ′ where g ′ = (e, a, ⊥, ∅) and R1 ′ stores the name of g ′ • ins(a) = jumpIf (cnd , a): if condition cnd is true, then PC ′ = a ′ else PC ′ = next (a) • ins(a) = jumpAbsolute(a ′ ): PC ′ = a ′ • ins(a) = jumpSubroutine(a ′ ): PC ′ = a ′ and address stack ′ = push(address stack , next (a)) • ins(a) = copyRegister (Rx, Ry) where x, y ∈ {0, 1, 2, 3} and x 6= y: copy the content of register Rx to register Ry • ins(a) = pushRegister (Rx) where x ∈ {0, 1, 2, 3}: push the content of register Rx on to parent stack i.e. parent stack ′ = push(parent stack , Rx)

23

• ins(a) = popRegister (Rx) where x ∈ {0, 1, 2, 3}: pop content from parent stack to register Rx i.e. Rx′ = pop(parent stack )

queue. The writing of communicators in a module, reading of communicators in a mode and releasing of tasks in a mode are independent of other modes, modules and programs. The above holds if the HTL program is race free (ensured by structural checks) and if all communicators are written before they are read (ensured by handling triggers in the write queue before that of the switch and the read queue). However checking switches (and subsequent actions) in a mode depend on other modes. For code generated from HTL, triggers in the write and the read queue have no parent and children information; in other words they do not carry any hierarchy information. Only triggers in the switch queue have hierarchy information. In HTL, switches for a parent mode and its children modes are enabled simultaneously due to constraints on timing behavior. The HTL semantics prioritizes the mode switch check (and subsequent action) of the parent mode over those of the children. Consider an instance when modes modeOne, onePI and oneSlow are active (Fig. 6). Mode modeOne has no mode switches i.e. it is invoked repeatedly. There are three possible scenarios: (1) none of the modes switches, (2) only oneSlow switches to oneFast i.e. the new combination is modeOne, onePI and oneFast, and (3) onePI switches i.e. the new combination is modeOne and oneP; the switch of oneSlow does not matter in the transition.

• ins(a) = getParent (Rx, Ry) where x, y ∈ {0, 1, 2, 3} and x 6= y: load the name of parent of trigger pointed to by Rx into register Ry • ins(a) = setParent (Rx, Ry) where x, y ∈ {0, 1, 2, 3} and x 6= y: the trigger name in Ry is stored as the parent of the trigger pointed to by register Rx • ins(a) = copyChildren (Rx, Ry) where x, y ∈ {0, 1, 2, 3} and x 6= y: the children list of the trigger pointed to by Ry is stored as the children list of the trigger pointed to by register Rx • ins(a) = setParentOfChildren (Rx, Ry) where x, y ∈ {0, 1, 2, 3} and x 6= y: set the trigger name in Ry as the parent of all the triggers in the children list of the trigger pointed by register Rx • ins(a) = deleteChildren(Rx) where x ∈ {0, 1, 2, 3}: for all trigger names in children list of trigger referred by register Rx: (recursively) delete the triggers pointed by the children list and remove the triggers from the queue • ins(a) = replaceChild (Rx, Ry, Rz) where x, y, z ∈ {0, 1, 2, 3} and x 6= y 6= z: in the children list of trigger pointed to by register Rx, replace the trigger name identical to that in Ry by the trigger name in Rz • ins(a) = cleanChildren(Rx) where x ∈ {0, 1, 2, 3}: delete the children list of trigger pointed by register Rx • ins(a) = return (): PC ′ = pop(address stack ) Once a trigger is handled and removed from the queue, the trigger is deleted from the system when the code block (started by the trigger) ends. For general HE code program, a garbage collector may be necessary to properly remove all de-referenced triggers and to ensure that there is no reference fault (trigger name is being used but the trigger itself has been deleted). Code generated from an HTL program does not create any such problem; so we avoid the definition of a formal garbage collector. All of the above instructions except deleteChildren can be executed in constant time. The execution of deleteChildren requires time linear in the size of the original HTL description of the involved children. The E machine starts with the following configuration: state is default value of each variable, writeQ = ∅ , switchQ = ∅, readQ = ∅, tasks = ∅, PC = ⊥, R0 = ⊥, R1 = ⊥, R2 = ⊥, R3 = ⊥, address stack = ∅, and parent stack = ∅.

6.

Figure 6: Mode switch for HTL programs The switching action of HTL is reflected in the HE code as follows. The compiler generates code in such a way that there is exactly one trigger per mode in the switch queue i.e. the implicit tree in the switch queue is the hierarchy of the modes in the program. When a trigger in the switch queue is enabled, the corresponding mode switch is checked; if the mode switch is false then the mode is reinvoked, otherwise all triggers (in the switch queue) related to the modes in the refinement program of the mode are removed and the target mode is invoked.

HANDLING HIERARCHY IN HE CODE

There are two major concerns for handling HTL programs in HE code: tracking the current position in the hierarchy (i.e. which program, module or mode is being executed) and maintaining the hierarchical relation between modes. The first is done by subroutinelike calls to initialize and execute programs, modules and modes; refer Section 7 for details. Intuitively, the address stack stores the addresses of programs, modules and modes in a tree like fashion so that E Machine knows which program, module or mode is to be initialized/executed once the current one has been initialized/executed. Maintaining the hierarchical relation is more involved and is done through triggers and HE code instructions. For HTL programs, the compiler generates triggers as follows: all triggers associated with writing communicators are stored in the write queue, all triggers associated with mode switch checks are stored in the switch queue and all triggers associated with reading communicators (and subsequently releasing tasks) are stored in the read

Figure 7: Handling switch checks in HE code Consider the situation when modes modeOne, onePI and oneSlow are executing and switch condition for OnePI is true. Fig. 7.a shows the associated triggers in the switch queue; instead of the queue, the implicit tree structure has been shown. First, the triggers in the switch queue from refinement program of onePI are removed (Fig. 7.b). A new trigger for the target mode oneP is generated (Fig. 7.c), the parent information is transferred to the new trigger(Fig. 7.d) and the trigger for mode onePI is removed. The trigger for mode oneSlow is removed without even checking whether the switch condition is true or false. In another scenario, consider the mode switch condition of onePI is false i.e. the mode will be reinvoked. First a new trigger is created for mode onePI in the switch queue (Fig. 7.e) with no parent and children information. Next, the parent and children information of the old trigger

24

for onePI is redirected to the new trigger for onePI (Fig. 7.f) and the old trigger for onePI is removed from the switch queue. The E machine will next traverse the queue to check mode switch for oneSlow.

7.

sets up the execution order of communicator writes, switch checks and communicator reads (and task releases) is mode body address [m]. Instructions may forward reference to any of the above symbolic addresses and therefore need fix up during compilation. Alg. 1 generates code for a program P on a host h. The code at address program init address [P] initializes all communicators declared in P by calling respective initialization drivers (init(·) denotes the initialization driver for a communicator or a port) and then calls initialization subroutines for each of the modules. Code at address program start address [P] calls the start subroutine for each module M in P.

HTL COMPILER

The compiler (Fig. 8) for HTL ensures that the program satisfies the constraints on parallel composition of modules, refinement of modes and timing of tasks relative to the target platform [6]. The WCET/ WCTT information for tasks are provided by an external tool. If the checks go through, HE code generator generates code for a distributed implementation. The code generation is done by compiling the whole program for each host. Each host maintains its own copies of all communicators and ports; however tasks are executed on the host only if the corresponding mode (in which the task is invoked) is mapped onto that host. Whenever a task completes execution, the output is broadcast to all hosts and stored in local ports; when a communicator (on a host) is to be written, the value of the local port is copied to the communicator. Release tasks are dispatched for execution by an EDF scheduler; the scheduler is external to the E Machine.

Algorithm 1 GenerateECodeForProgramOnHost(P, h) set program init address [P] to PC and fix up // initialize communicators ∀c ∈ communicators(P):emit(call (init(c))) // initialize all the modules in P ∀M ∈ modules(P): emit(jumpSubroutine(module init address [M])) // return from initialization subroutine of P emit(return ) set program start address [P] to PC and fix up // start all the modules in P ∀M ∈ modules(P): emit(jumpSubroutine(module start address [M])) // return from start subroutine of P emit(return )

Figure 8: Structure of compiler and runtime system The compiler generates code for program, module and mode by invoking Alg. 1, Alg. 2 and Alg. 3 respectively. The compiler uses symbolic addresses to refer to different parts of the code. For each program P, program init address [P] and program start address [P] denotes the address of the HE code block that initializes and executes P respectively. For each module M, module init address [M] and module start address [M] denotes the address of the HE code block that initializes and executes M respectively. For each mode m, mode start address [m] is the address of the HE code block that starts m and target mode address [m] is the address of HE code block that will be executed when another mode switches to m. Each mode m is divided in uniform units corresponding to the smallest period between two time events (i.e., write of a communicator or read of a communicator) in m. Given a mode m, the duration of an unit γ[m] is the gcd of all access periods of all communicators accessed (i.e. read or written) in m and the total number of units is π[m]/γ[m], where π[m] is the period of m. For each unit i of every mode m the compiler generates separate code blocks for updating communicators, checking switches (and related actions) and reading communicators (and releasing tasks): the address of the HE code block that writes communicators is mode unit write[m, i], the address of the HE code block that checks switch condition is mode unit switch[m, i], and the address of the HE code block that reads communicators is mode unit read [m, i]. HTL semantics constraints that at any instance, communicator writes, mode switch checks, communicator reads and task releases should be done in the above order to maintain consistency of communicator values across all modules. The address of the HE code block that

25

Alg. 2 generates code for a module M on host h. Code at address module init address [M] initializes all task ports (denoted by taskPorts(M)) of the tasks in M by calling respective initialization drivers. All tasks maintain two sets of local ports, called task input ports and task output ports, which are not accessible by other tasks. At release, the tasks reads communicators and ports to task input ports and execute on the value of the task input ports. At completion, the task output ports are updated. The communicators and ports are written from the task output ports when the writing is due. Code at module start address [M] calls the execution code for the start mode, start[M], for the module M. Algorithm 2 GenerateECodeForModuleOnHost(M, h) set module init address [M] to PC and fix up // initialize task ports ∀p ∈ taskPorts(M):emit(call (init(p))) // return from initialization subroutine of M emit(return ) set module start address [M] to PC and fix up //start the start mode of M emit(jumpSubroutine(mode start address [start[M]])) // return from start subroutine of M emit(return ) We will use the following auxiliary operators for Alg. 3. The set readDrivers(m, i) contains the drivers that load the tasks in mode m with values of the communicators that are read by these tasks at unit i. The set writeDrivers(m, i) contains the drivers that load the communicators with the output of the tasks in mode m that write to these communicators at unit i. The set portDrivers(t) contains the drivers that load task input ports of task t with the values of the ports on which t depends. The set complete(t) contains the events that signal the completion of the tasks on which task t depends, and that signal the read time of the task t. The set releasedTasks(m, i) contains the tasks in mode m, with no precedences, that are released at unit i. The set precedenceTasks(m) contains the tasks in mode m that depend on other tasks.

Alg. 3 first emits code (at address mode start address [m]) for checking all the mode switches (lines 1 - 3) in a mode m, so that they are tested first time m is invoked. Next, code is generated (at address target mode address [m]) to handle the case when no switch is enabled: a call to code at mode body address [m], followed by a call to the refinement program (if any). This sets the execution of a mode before the execution of the refinement program. Code at mode body address [m] (lines 40 - 49) sequences the execution order of communicator writes, switch checks and communication reads (and subsequent task release), for unit zero of mode m. This is done by emitting a future instruction (line 41) for mode unit write[m, 0] (trigger added to writeQ), a future instruction (line 42) for mode unit switch[m, 0] (trigger added to switchQ ) and a future instruction (line 49) for mode unit read [m, 0] (trigger added to readQ ). Whenever a trigger is created and added to a queue, the relevant trigger pointer is stored in register R1 . Once a trigger is added in the switch queue, the hierarchy information has to be updated (lines 43 - 48). There are two scenarios: one, the code is invoked by handling an enabled trigger in the switch queue i.e. a mode switch has occurred or a mode is being reinvoked (lines 28 - 39) and two, the code is invoked when a mode is executed for the first time (line 5). In both the scenarios register R0 records the relevant hierarchy information. In the first scenario it stores the name of the last trigger in the switch queue that was handled (by semantics, if any trigger is handled the name is stored in R0 ). In the second scenario, it stores the name of the last trigger in the switch queue that was created. Code in lines 43 - 47 redirects the parent and children of R0 to R1 . A copy of R1 needs to be stored in R2 (line 48), as a new trigger for the read queue may remove the information of the last trigger added to the switch queue from R1 . Code emission at lines 6 - 17 checks whether a refinement program exists and subsequently updates the hierarchy information if there is one. Before the code generation for refinement program (line 12), the hierarchy is updated (lines 7 - 11) as refinement adds one level of hierarchy; once the code generation of the refinement program completes the level is restored (lines 13 - 16). The hierarchy is updated through register R0 . The parent of R0 is pushed onto the stack (lines 8 - 9); the parent of the trigger pointed by R0 is changed to the trigger name in R2 (which contains a pointer to the last trigger added to the switch queue) and children list is reset (code for refinement program has yet to be generated and thus there is no children information). In effect, for the code generation of the refinement program, parent of R0 points to the parent trigger of all the triggers to be added in the switch queue for that program. To restore the hierarchy level, the parent of R0 is updated by popping the parent stack and is used by modes of parallel modules. The code at mode unit write[m, i] (lines 23 - 27) calls the driver for each communicator being written at the unit i of mode m. The code at mode unit switch[m, i] (lines 29 - 39) checks the mode switches. In HTL, modes can switch only at period boundaries; so the switches are checked only for unit zero (line 28). If no mode switch occurs (line 33) the code jumps to mode body address [m]. If a mode switch occurs, then all children of the last enabled trigger in the switch queue (the name is stored in register R0 ) are removed (lines 34 - 37). The removal of children is recursive, thus all children of subsequent children are also removed. Once the children are removed, the code jumps (lines 38 - 39) to the target address of the destination mode target mode address [m′ ], where m′ is the destination mode. The code at mode unit read [m, i] (lines 52 - 71) reads all communicators (by calling drivers that copy from communicators into task input ports) that are to be read at unit i, and releases all tasks (with no precedences), that should be released at unit i. For unit zero (line 58), code is generated to release precedence tasks (lines 59 - 69). For each task t with precedences, a

trigger is added to readQ : the trigger is activated at the completion of preceding tasks of t; and the subsequent code writes input ports of t and releases t. Lines (72 - 76) emit code to jump from one unit to the next; the codes add triggers to the write and the read queue only as switches are not possible in the middle of HTL modes. The code generation algorithm for a program/ module/ mode accesses other programs, modules or modes through symbolic addresses and does not influence the code generation of other programs, modules and modes. Thus parts of HTL programs can be compiled in any order separately.

8. COMPARISON AND RELATED WORK

Generated HE Code Size

E code vs. HE Code. The E code and the HE code are compared in two ways: runtime overhead and code size generated by the HTL compiler. We measured the time spent in interpreting E code and HE code for the 3TS case study HTL program; the delay introduced by code interpretation is below 1% for both E code and HE code.

Generated E Code Size

8000

8000

6000

6000

4000

4000

2000

2000

0 10 8 6

4 Number of Modules

8

1 1

10

6 4 Number of Programs

Figure 9: Number of E code instructions

0 10 8 6

4 Number of Modules

1 1

10 8 6 4 Number of Programs

Figure 10: Number HE code instructions

of

The code size is compared for HTL descriptions with m programs ( i.e. one top-level program and m − 1 refinement programs) and n modules (m ≤ n i.e. we ruled out empty programs) where each module has two modes switching between themselves. For each such scenario there are a number of possible HTL descriptions. Consider the case when m = 2 and n = 3; with the above restrictions there is one top-level program and one refinement program (refining one of the modes of the top-level program). There are two possible HTL descriptions: top-level program with two modules (i.e. refinement program with one module) and refinement program with two modules (i.e. top-level program with one module). For each m and n, the worst-case code size for E code and HE code are compared. The number of HE code instructions depends upon the number of programs and modules and is thus fixed for any description for given m and n. The number of E code instructions depends upon the flattening and thus widely varies across the different descriptions for given m and n. Fig. 9 and Fig. 10 compares the code size for the E code and the HE code respectively for 1 ≤ m ≤ 10 and 1 ≤ n ≤ 10. The worst case E program (7177 E code instructions) is an order of magnitude larger than that of the HE program (555 HE code instructions). Code Generation for Timed Languages. Timed languages have been pioneered by Giotto [8]. In Section 1 we discussed the difference in code generation for a flat structure like Giotto and our proposed approach for HTL. Other LET based languages include TDL [5] and Timed-Multitasking (TM) [13]. Like Giotto, TDL is restricted to one level of periodic tasks and the code generation technique does not address hierarchical programs. TM, an actor based language, uses an event-triggered approach by expressing LET through deadlines. TM can express hierarchy by having actors defined in other actors; however the code generation does not explicitly addresses the hierarchical structure. Code Generation for Synchronous Languages. Synchronous languages (e.g. Esterel [3] and Lustre [7]) theoretically subsume

26

Algorithm 3: GenerateECodeForModeOnHost(m, h) 0 set mode start address [m] to PC and fix up 1 // check mode switches 2 ∀(cnd , m′ ) ∈ switches(m): 3 emit(jumpIf (cnd , target mode address [m′ ])) 4 set target mode address [m] to PC and fix up 5 emit(jumpSubroutine (mode body address [m])) 6 if (program P refines m) 7 //increment the level 8 emit(getParent(R0 , R3 )) 9 emit(pushRegister (R3 )) 10 emit(setParent (R0 , R2 )) 11 emit(cleanChildren (R0 )) 12 emit(jumpSubroutine (program start address [program[m]])) 13 //decrement the level 14 emit(popRegister (R3 )) 15 emit(setParent (R0 , R3 )) 16 emit(cleanChildren (R0 )) 17 end if 18 // return from start subroutine of m 19 // OR wait for other triggers to become enabled 20 emit(return) 21 i := 0 22 while i < π[m]/γ[m] do 23 set mode unit write[m, i] to PC and fix up 24 // write communicators from task output ports 25 ∀d ∈ writeDrivers(m, i):emit (call (d)) 26 // wait for other triggers to become enabled 27 emit(return) 28 if (i = 0) 29 set mode unit switch[m, 0] to PC and fix up 30 // check mode switches 31 ∀(cnd , m′ ) ∈ switches(m): 32 emit(jumpIf (cnd, PC + 2)) 33 emit(jumpAbsolute (PC + 4)) 34 // cancel all triggers related to the refining 35 // program of m, and its subprograms 36 emit(deleteChildren (R0 )) 37 emit(cleanChildren (R0 )) 38 // switch to mode m’ 39 emit(jumpAbsolute (target mode address [m′ ])) 40 set mode body address [m] to PC and fix up

41 emit(writeFuture(π[m], mode unit write[m, 0])) 42 emit(switchFuture (π[m], mode unit switch[m, 0])) 43 emit(getParent (R0 , R3 )) 44 emit(replaceChild (R3 , R0 , R1 )) 45 emit(setParentOfChildren (R0 , R1 )) 46 emit(setParent (R1 , R3 )) 47 emit(copyChildren (R1 , R0 )) 48 emit(copyRegister (R1 , R2 )) 49 emit(readFuture (0, mode unit read [m, 0])) 50 emit(return ) 51 end if 52 set mode unit read [m, i] to PC and fix up 53 if (mode m is contained in a module on host h) 54 // read communicators into task input ports 55 ∀d ∈ readDrivers(m, i):emit (call (d)) 56 // release tasks with no precedences 57 ∀t ∈ releasedTasks(m, i):emit (release(t)) 58 if (i = 0) 59 // release tasks with precedences 60 ∀t ∈ precedenceTasks(m): 61 // wait for tasks on which t depends to complete 62 emit(readFuture (complete(t), PC + 2)) 63 emit(jumpAbsolute (PC + 3 + |portDrivers(t)|)) 64 // read ports of tasks on which t depends, 65 // then release t 66 ∀d ∈ portDrivers(t):emit (call (d)) 67 emit(release(t)) 68 // wait for other triggers to become enabled 69 emit(return) 70 end if 71 end if 72 if(i < π[m]/γ[m] − 1) 73 // jump to the next unit of mode m 74 emit(writeFuture(γ[m], mode unit write[m, i + 1])) 75 emit(readFuture (γ[m], mode unit read [m, i + 1])) 76 end if 77 // wait for other triggers to become enabled 78 // OR return from body subroutine of m 79 emit(return) 80 i := i + 1 81 end while

10. REFERENCES

HTL; however HTL offers an explicit hierarchical program structure that supports refinement of tasks into task groups with precedences. Simulink-to-SCADE/Lustre-to-TTA [4] is a tool chain that accepts discrete time models written in Simulink, translates to Lustre models, verifies system properties (e.g. schedulability) and generates code for a target time-triggered architecture. Taxys [2], a tool chain that combines Esterel and model checker Kronos, generates an application specific scheduler that ensures timing commitment of tasks. Our code generation technique differs from the above two approaches in accounting for the hierarchical structure (e.g. Simulink models are hierarchical but Lustre is not which necessitates the code generator to flatten the structure) and in generating code for a virtual machine (both the above tool chains generate code for specific target) which makes the generated code portable across implementations.

9.

[1] J. Auerbach, D.F. Bacon, D.T. Iercan, C.M. Kirsch, V.T.‘Rajan, H. R¨ock, and R. Trummer. Java takes flight: Time-portable real-time programming with exotasks. In LCTES, 2007. ACM. [2] V. Bertin, E. Closse, M. Poize, J. Pulou, J. Sifakis, P. Venier, D. Weil, and S. Yovine. Taxys = Esterel + Kronos. A tool for verifying real-time properties of embedded systems. In Conference on Decision and Control, 2001. IEEE. [3] F. Boussinot and R. de Simone. The ESTEREL language. Proceedings of the IEEE, 79(9). 1991. [4] P. Caspi, A. Curic, A. Maignan, C. Sofronis, S. Tripakis, and P. Niebert. From Simulink to SCADE/Lustre to TTA: a layered approach for distributed embedded applications. In LCTES, 2003. ACM. [5] E. Farcas, C. Farcas, W. Pree, and J. Templ. Transparent distribution of real-time components based on logical execution time. In LCTES, 2005. ACM. [6] A. Ghosal, D. Iercan, T. A. Henzinger, C. M. Kirsch, and A. Sangiovanni-Vincentelli. A hierarchical coordination language for interacting real-time tasks. In EMSOFT 2006. ACM. [7] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proceedings of the IEEE, 79(9). 1991. [8] T. A. Henzinger, B. Horowitz, and C. M. Kirsch. G IOTTO: A time-triggered language for embedded programming. Proceedings of the IEEE, 91. 2003. [9] T. A. Henzinger and C. M. Kirsch. The Embedded Machine: Predictable, portable real-time code. In PLDI, 2002. ACM. [10] D. T. Iercan. Tsl compiler. Technical report, ’Politehnica’ University of Timisoara, 2005. [11] C.M. Kirsch, M.A.A. Sanvido, and T.A. Henzinger. A programmable microkernel for real-time systems. In USENIX VEE, 2005. ACM. [12] C.M. Kirsch, M.A.A. Sanvido, T.A. Henzinger, and W. Pree. A Giotto-based helicopter control system. In EMSOFT, 2002. LNCS 2491. Srpinger. [13] J. Liu and E. A. Lee. Timed multitasking for real-time embedded software. IEEE Control Systems Magazine, 23(1). 2003.

CONCLUSION

Previously we presented an implementation of HTL, a hierarchical coordination language for distributed hard real-time applications on E Machine, a virtual machine. However, HTL programs must be flattened because of the limitations of the E Machine. This paper presents a modified E Machine to enable separate and linearspace-bounded compilation of HTL. We introduced the semantics of the modified E Machine and the changes in compile-time and runtime infrastructure. In the future, we plan to use the modified E Machine for high-performance, 50-100Hz helicopter flight control [1].

27

A Domain-Specific Language for Programming Self-Reconfigurable Robots Ulrik P. Schultz, David Christensen, Kasper Støy University of Southern Denmark

ABSTRACT A self-reconfigurable robot is a robotic device that can change its own shape. Self-reconfigurable robots are commonly built from multiple identical modules that can manipulate each other to change the shape of the robot. The robot can also perform tasks such as locomotion without changing shape. Programming a modular, self-reconfigurable robot is however a complicated task: the robot is essentially a real-time, distributed embedded system, where control and communication paths often are tightly coupled to the current physical configuration of the robot. To facilitate the task of programming modular, self-reconfigurable robots, we have developed a declarative, role-based language that allows the programmer to define roles and behavior independently of the concrete physical structure of the robot. Roles are compiled to mobile code fragments that distribute themselves over the physical structure of the robot using a dedicated virtual machine implemented on the ATRON self-reconfigurable robot.

1.

INTRODUCTION

A self-reconfigurable robot is a robot that can change its own shape. Self-reconfigurable robots are built from multiple identical modules that can manipulate each other to change the shape of the robot [4, 9, 11, 14, 16, 18, 24, 23]. The robot can also perform tasks such as locomotion without changing shape. Changing the physical shape of a robot allows it to adapt to its environment, for example by changing from a car configuration (best suited for flat terrain) to a snake configuration suitable for other kinds of terrain. Programming self-reconfigurable robots is however complicated by the need to (at least partially) distribute control across the modules that constitute the robot and furthermore to coordinate the actions of these modules. Algorithms for controlling the overall shape and locomotion of the robot have been investigated (e.g. [5, 21]), but the issue of providing a high-level programming platform for developing controllers remains largely unexplored. Moreover, constraints on the physical size and power consumption of each module limits

Figure 1: The ATRON self-reconfigurable robot. Seven modules are connected in a car-like structure. the available processing power of each module. In this paper, we present a role-based approach to programming a controller for a distributed robot system independently of the concrete physical structure of the robot. A role defines a specific set of behaviors for a physical module that are activated when the structural invariants associated with the role are fulfilled. Using the principle of distributed control diffusion [17], the roles are compiled into code fragments that are dynamically diffused throughout the physical structure of the robot and activated where applicable. Our programming language targets the distributed control diffusion virtual machine (DCD-VM) running on the ATRON modular, self-reconfigurable robot [11, 13]. Although the compiler implementation is still preliminary, it is capable of generating code for a complex example involving multiple roles. The rest of this paper is organized as follows. First, Section 2 presents the ATRON hardware, discusses issues in programming the ATRON robot, and describes the DCDVM. Then, Section 3 presents the main contribution of this paper, a high-level role-based programming language for the DCD-VM. Last, Section 4 presents related work and Section 5 concludes and outlines directions for future work.

2. 2.1

THE ATRON SELF-RECONFIGURABLE ROBOT Hardware

The ATRON self-reconfigurable robot is a 3D lattice-type robot [11, 13]. Figure 1 shows an example ATRON car robot built from 7 modules. Two sets of wheels (ATRON mod-

Copyright the authors, APGES 2007, Oct. 4th 2007, Salzburg, Austria. .

28

ules with rubber rings providing traction) are mounted on ATRON modules playing the role of an axle; the two axles are joined by a single module playing the role of “connector.” As a concrete example of self-reconfiguration, this car robot can change its shape to become a snake (a long string of modules); such a reconfiguration can for example allow the robot to traverse obstacles such as crevices that cannot be traversed using a car shape. An ATRON module has one degree of freedom, is spherical, is composed of two hemispheres, and can actively rotate the two hemispheres relative to each other. A module may connect to neighbor modules using its four actuated male and four passive female connectors. The connectors are positioned at 90 degree intervals on each hemisphere. Eight infrared ports, one below each connector, are used by the modules to communicate with neighboring modules and sense distance to nearby obstacles or modules. A module weighs 0.850kg and has a diameter of 110mm. Currently 100 hardware prototypes of the ATRON modules exist. The single rotational degree of freedom of a module makes its ability to move very limited: in fact a module is unable to move by itself. The help of another module is always needed to achieve movement. All modules must also always stay connected to prevent modules from being disconnected from the robot. They must avoid collisions and respect their limited actuator strength: one module can lift two others against gravity. A module has 128K of flash memory for storing programs and 4K of RAM for use during execution of the program. Other examples of self-reconfigurable robots include the M-TRAN and the SuperBot self-reconfigurable robots [14, 18]. These robots are similar from a software point of view, but differ in mechanical design e.g. degrees of freedom per module, physical shape, and connector design. This means that algorithms controlling change of shape and locomotion often will be robot specific, however general software principles are more easily transferred.

2.2

Software

Programming the ATRON robot is complicated by the distributed, real-time nature of the system coupled with limited computational resources and the difficulty of abstracting over the concrete physical configuration when writing controller programs. General approaches to programming the self-reconfigurable ATRON robot include metamodules [5], motion planning and rule-based programming. In the context of this article, we are however interested in role-based control. Role-based control is an approach to behavior-based control for modular robots where the behavior of a module is derived from its context [22]. The behavior of the robot at any given time is driven by a combination of sensor inputs and internally generated events. Roles allow modules to interpret sensors and events in a specific way, thus differentiating the behavior of the module according to the concrete needs of the robot.

2.3

Distributed control diffusion

To enable dynamic deployment of programs on the ATRON robot, we have developed a virtual machine that enables small bytecode programs to move throughout a structure of ATRON modules [17]. The virtual machine supports a concept we refer to as distributed control diffusion: controller code is dynamically deployed to those modules where

29

a specific behavior is needed. The virtual machine, named DCD-VM, has an instruction set that is dedicated to the ATRON hardware and includes operations that are typically required in ATRON controllers. For example, the virtual machine maintains an awareness of the compass direction of each module and the roles of its neighbors, and specific instructions allow this information to be queried. Moreover, the virtual machine provides a lightweight and highly scalable broadcast protocol for distributing code throughout the structure of ATRON modules, making the task of programming controllers that adapt to their immediate surroundings significantly easier. The DCD-VM supports a basic notion of roles to indicate the state of a module and to provide polymorphic dispatching of remote commands between modules, but at a very low level of abstraction. There is no explicit association between roles and behaviors, this currently has to be manually implemented by the programmer. Moreover, initial experiments with the virtual machine were performed by writing bytecode programs by hand, since no higher-level language (not even an assembly language) was available. To improve the situation, a high-level language for programming the DCDVM has been developed, which is the subject of this paper.

3. 3.1

A HIGH-LEVEL ATRON PROGRAMMING LANGUAGE Motivating example: obstacle avoidance

As a motivating example, consider a simple obstacle avoidance scenario where a car (such as the one shown in Figure 1 in the introduction) is moving forwards until it detects an obstacle using the forward proximity sensors of the frontmost modules. In this case it reverses while turning, and then continues moving forwards. There are however many ways of making a car from ATRON modules, as shown in Figure 2: the car can be made longer (although more than 6 wheels makes turning impractical) and we can imagine two ATRON cars joining up in the field to create a more powerful vehicle. A controller that was programmed independently of the concrete physical configuration of the robot would solve many of these issues, and we describe how that can be done using distributed control diffusion, as follows. First, a query mobile program is used to identify the wheels in the robot. For simplicity, any module with a rotational axis perpendicular to the direction we wish to go in can be considered a wheel when it only has a single, upwards connection. (For the ATRON, a single upwards connection means that the other hemisphere is free to rotate and hence can act as a wheel.) Note that we assume that the robot has been configured with car-motion as a purpose: we do not detect any orthogonally aligned modules that may cause friction when moving forward, and free-hanging modules that cannot reach the surface are still considered wheels. The mobile program queries the position and connectivity properties of the module, and sets the role to either “left wheel” or “right wheel,” as appropriate. When setting the role, any neighboring modules are notified of the role change, facilitating queries that include the role of neighboring modules. (For example, an “axle” has a “wheel” as a neighbor.) Once the wheels have been identified, appropriate control commands turning the main actuator in either direction can be sent to the left and right wheels, respectively. Moreover,

Basic car

Long car

Collaborating cars

Figure 2: Different car configurations (simulated)

Experimental setup

Approaching obstacle

Stopped

Reverse and turn

Moving forward again

New obstacles!

Stopped (again)

Reverse and turn (again)

Figure 3: Obstacle avoidance using generic distributed controller diffusion program. event handlers for detecting obstacles using the proximity sensors are installed in the front wheels of the robot using another mobile program. When the event is triggered, the module that detected the obstacle sends out a “reverse” command to all wheels in the robot. This way, the controller has effectively been distributed to the relevant modules of the robot. Before the wheels start reversing the role is changed to a “reversing wheel,” which is observed by the axle behavior. The axles then turn an appropriate number of degrees to make the car change orientation while reversing. Once the wheels have finished reversing, they return to their respective forwards-moving roles, and the axles react accordingly by returning to the original position.

3.2

The RDCD language

The Role-based Distribution Control Diffusion (RDCD) language provides roles as a fundamental abstraction to structure the set of behaviors that are to be diffused into the module structure. Diffusion of code is however implicit: RDCD is a declarative language that allows roles to be assigned to specific modules in the structure based on invariants; behaviors are implicitly distributed to modules based on their association with roles. RDCD is a domain-specific language in the sense that it is targeted to the ATRON robots and moreover has very limited support for general-purpose com-

30

putation. RDCD provides primitives for simple decisionmaking, but all complex computations must be performed in external code. Our compiler currently does not have a parser and must therefore be given an abstract syntax tree constructed manually. Nevertheless, to present the RDCD language, we show the proposed BNF for RDCD in Figure 4 (non-terminals are written using italicized capitals, concrete syntax in courier font). An RDCD program declares a number of roles. A role normally extends a super-role meaning that it inherits all the members of the super-role; the common super-role Module defines the capabilities of all modules. A role can be concrete or abstract, with the usual semantics: all abstract members must be overridden by concrete members for a role be usable at runtime. A role declares a number of members in the form of constants, invariants, and methods. There currently is no explicit notion of state, so state must be represented using external code and accessed using functions. A constant can be concrete or abstract, and always defines an 8-bit signed value. For a module to play a given role, all invariants declared by the role must be true (in case of conflicts between roles, the choice of role is undefined). An invariant is simply a boolean expression over constants and functions. Methods are used to define behavior that is active when

PROGRAM ROLE MEMBER CONSTANT INVARIANT METHOD BLOCK STATEMENT EXP FUNCTION MODIFIER VAR VALUE DEPLOYMENT

::= ::= | ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::=

ROLE∗ DEPLOYMENT abstract? role NAME extends NAME { MEMBER∗ } role NAME modifies NAME { MEMBER∗ } CONSTANT | INVARIANT | METHOD abstract NAME | NAME := VALUE EXP ; MODIFIER∗ NAME () BLOCK { STATEMENT∗ }  | FUNCTION ; | if( EXP ) { STATEMENT∗ } else { STATEMENT∗ } VAR | FUNCTION | EXP BINOP EXP | BLOCK self. NAME ( EXP∗ ) | NAME ( EXP∗ ) | NAME . NAME ( EXP∗ ) abstract | behavior | startup | command NAME NUMBER | PREDEFINED deployment { NAME∗ }

Figure 4: Proposed BNF for RDCD. Note that for simplicity, commas between function arguments are omitted in the BNF. a module plays a given role or any of the super-roles. A method is simply a sequence of statements that either are function invocations or conditionals. For simplicity methods currently always take zero arguments, but we expect this limitation to change in the future. Function invocations are either local commands, functions, or global commands. Local commands access the physical state of the module (sensors, actuators, external code) and are prefixed with the term “self.” to indicate that it is a local operation. Functions are basically used to represent stateless operations such as computing the size of (i.e., number of bits in) a bit set.1 Global commands are of the form “Role.command” and causes the command to be asynchronously invoked on all modules currently playing that role or any of its sub-roles. Arguments to functions are expressions, either constants, compound expressions, or code blocks; a code block allows code to be stored for later use (e.g., an event handler) or to be executed in a special context (see example below). Note that since the code is stateless no closure representation is required. The function invocation syntax for primitive functionality from the role Module (such as turning the main actuator) is the same as that of user-defined functions. A method declaration can be prefixed by a modifier, as follows. The method modifier “abstract” works in the usual way (forces the enclosing role to be declared abstract). The method modifier “behavior” causes the method to execute repeatedly so long as the role is active, whereas the method modifier “startup” causes the method to execute once when the role is activated. Last, the method modifier “command” causes the method to become exported for invocation as a global command. To supplement the basic “extends” approach to creating a hierarchy of roles, a role can also “modify” another role meaning that it is a mixin role that can be applied to the designated role or any of its sub-roles. This approach allows smaller units of behavior to be encapsulated into wellstructured roles that can be activated throughout specific parts of the role hierarchy. Mixin roles currently cannot be activated automatically using invariants but must be explicitly selected using a special self function assumeRole that 1

Bit sets are used in the DCD-VM to represent sets of connectors, which conveniently can be done using a single byte since there are only 8 connectors.

31

Module

Wheel

Axle

Reverse LeftWheel

LeftWheel+Reverse

RightWheel

RightWheel+Reverse

Figure 6: Hierarchy of sub-role relations for the RDCD program of Figure 5. Arrows represent the sub-role relationship, roles in italics are abstract, roles in bold are mixins. takes a mixin role and a code block as arguments and causes the module to temporarily change to the given role while the code block is executed. To facilitate the implementation we currently require the programmer to explicitly specify in what order roles discovery is performed in the structure. For example, in a car an axle is a module that is attached to wheels, so wheels must be identified before axles. Such dependencies can be detected automatically by an analysis on the invariants or even made redundant by having role discovery run for a while until it stabilizes; such extensions are however considered future work.

3.3

Example resolved

The complete RDCD program for implementing obstacle avoidance in an arbitrary car-like structure of ATRON modules is shown in Figure 5. The role structure is illustrated in Figure 6: a generic wheel role is used as a basis for defining concrete roles for left and right wheels. The difference between a left wheel and a right wheel is what direction to turn the main actuator to advance and on what connector to monitor for obstacles. Moreover, a mixin role is used to indicate a reversing wheel since its behavior is different which should be observable to the rest of the structure. In more detail, the abstract role Wheel abstracts over constants defining on what side the wheel should be connected, what event handler vector should be monitored for proximity

abstract role Wheel { abstract constant connected_direction, event_handler, turn_direction; self.center_position == EAST_WEST; sizeof(self.total_connected()) == 1; sizeof(self.connected(UP)) == 1; sizeof(self.connected(connected_direction)) == 1; startup move() { if(self.y()>0) self.handleEvent(event_handler, { self.disableEvent(event_handler); Wheel.stop(); }); self.turnContinuously(turn_direction); } command stop() { self.assumeRole(Reverse,{ self.turnContinuously(-turn_direction); self.sleepWhileTurning(3); self.turnContinuously(turn_direction); self.enableEvent(event_handler); }); } } role RightWheel extends Wheel { constant connected_direction := EAST; constant turn_direction := 1; constant event_handler := EVENT_PROXIMITY_5; } role LeftWheel extends Wheel { constant connected_direction := WEST; constant turn_direction := -1; constant event_handler := EVENT_PROXIMITY_1; } role Reverse modifies Wheel { } role Axle { sizeof(connected_role(DOWN,Wheel)) > 0; behavior steer() { if(connected_role(DOWN,Reverse) > 0) { if(self.y>0) self.turnTowards(30); else self.turnTowards(-30); } } else self.turnTowards(0); } } deployment { RightWheel, LeftWheel, Axle } Figure 5: RDCD program implementing obstacle avoidance (manually pretty-printed to improve readability)

32

RDCD program abstract role P invariantP abstract constant c; startup b1() { S1 } behavior b2() { S2 } role Q extends P constant c = value; invariantQ command b3() { S3 }

Member copy-down



Bytecode programs



role Q extends P startup b1() { S1 } behavior b2() { S2 } command b3() { S3 }

if(invariantP && invariantQ ) setRole(Q); migrate; if(hasRole(P)) S1; migrate;

if(hasRole(P)) schedule { S2; repeat; } migrate; if(hasRole(Q)) install(Q.b3) { S3; }

Figure 7: Basic RDCD compilation process from roles to mobile bytecode programs detection, and in what direction the main actuator should turn. Then follows a number of invariants for defining a wheel: rotational axis perpendicular to the direction the vehicle should be moving, only connected to a single module etc. The initial behavior of a wheel is to install an event handler if the y coordinate is positive2 and then to start turning continuously to make the car move forward. The event handler starts by disabling itself (to avoid triggering multiple events) and then invokes the method stop on all wheels. The stop command temporarily assumes the mixin role Reverse, reverses, and then restores the wheel to its previous state. The left and right wheels simply concretize the abstract wheel class by defining the abstract constants. The mixin role Reverse is empty, it is in fact used as a “marker role”: the role Axle reacts to its adjacent wheel modules assuming the reverse role, and will in this case turn the axle as appropriate. Note that turning the axle depends on the global y coordinate which for example causes the front and back wheel on the 6-wheeled car to turn in different directions. This steering behavior in the axle is represented using a behavior method that continuously monitors the role of the connected module. (The DCD-VM maintains a locally stored awareness of the roles of the adjacent modules, meaning that distributed communication only is used when the wheel module changes roles, not every time the steering behavior is run.)

3.4

Compiling RDCD to the DCD-VM

We now describe the compilation of RDCD into stateless, mobile programs for the DCD-VM. A critical constraint is the size of the compiled programs, since the DCD-VM currently transmits mobile programs using the standard ATRON communication primitives which can become unstable when the buffer size exceeds 50 bytes. For this reason, we prefer multiple smaller mobile programs that move concurrently throughout the structure as opposed to a single, larger program that is harder to transmit correctly on the physical hardware. Apart from a few peephole optimizations the compiler does not do any analysis and optimization, but there are numerous opportunities for optimizations, as will be discussed later. Unless otherwise noted, all mobile programs generated use migration instructions to disperse throughout the module structure. The compilation process form roles to mobile bytecode 2 The DCD-VM maintains compass directions and a 3D coordinate system of the entire structure relative to the module where the program was injected into the structure. This module thus determines the directionality for the entire structure.

33

programs is illustrated in Figure 7. The first step of the compilation process is to copy down members from superroles to sub-roles; as will be explained later, mixin roles can currently be ignored at this point. For each role a mobile program is then generated that checks the associated invariants and sets the role accordingly if all the invariants are satisfied. Next, for each startup method a mobile program is generated that first checks the role and evaluates the method body if the role matches. Similarly for behavior methods, except that compiled behaviors use a special “repeat” instruction that causes the method to be rescheduled for later execution from the start. Last, commands are installed on all modules that implement the appropriate roles. Since mixin roles currently only can be activated explicitly using the function assumeRole, they can simply be represented by generating code sequences for changing to a different role and back again. (Method overriding is non-trivial to update when the role changes, but this can be done since mobile programs essentially can modify the “remote invocation vtable” of each module.) The copy-down approach allows role-specific constants to be inlined into every program, and moreover simplifies the implementation of features such as startup methods since they simply can check for an exact role match instead of having to take method overriding in sub-roles into account. The downside is that deep role hierarchies will generate numerous mobile programs, many of which may be redundant. We believe a useful optimization would be to reduce the number of mobile programs by combining them without exceeding the optimal message size for the physical modules.

3.5

Experiments

The RDCD compiler has been implemented in roughly 2000 lines of Java code, but currently does not include a parser. Nevertheless, our running example is the program of Figure 5 which has been fed to the compiler by manually building the AST.3 The output of the compiler is a C program that initializes a collection of arrays with DCDVM bytecodes. This approach is currently required to run programs on the DCD-VM since it does not support downloading code from a non-module source. In effect, all the code is loaded in a single module (the “connector” module in the car) and diffused throughout the module structure from this module.4 The generated code is equivalent in func3 We are currently investigating different options for the concrete syntax based on feedback from researchers in the robotics community, and plan on implementing a complete parser when this study has been completed. 4 This approach corresponds to reprogramming a single mod-

tionality to the hand-written bytecode initially used for the obstacle avoidance, which can perform the obstacle avoidance described at the beginning of this section in simulation (the DCD-VM is not currently working on the physical hardware due to various low-level implementation issues). The generated code is however less efficient: generated code fragments are typically 50% larger and twice the number of code fragments are generated by the compiler than was present in the manually written code. As mentioned earlier, we believe that improved peephole optimizations and sharing of code between related classes will close the gap between the automatically generated code and the manually written code.

3.6

Assessment

The use of a high-level language to program the ATRON modules using the DCD-VM provides a significantly higher level of abstraction to the programmer which we expect will result in a massive increase in productivity. A larger set of experiments are however required to determine if this is the case. Moreover, we are also interested by how useful the individual features of RDCD are when writing programs. The required experiments are however out of the scope of this paper due to the preliminary state of the compiler. Nevertheless, we can conclude that the use of inheritance between roles combined with abstract constants allows the compiler to generate mobile code fragments that are small and have minimal resource requirements, which is a perfect fit for the DCD-VM. Moreover, the use of explicitly activated mixin roles provides language support for behaviors to temporarily modify the role that a module is playing without requiring numerous redundant declarations at the source level. The RDCD compiler currently does not implement a type checking, but the language is by design statically typed in the sense that it is possible to check statically that local invocations of behaviors always succeed. (The lack of threading on a single module combined with the simple lexical nesting of the argument to self.assumeRole facilitates type checking.) Due to the distributed nature of the ATRONs, remote invocation of behaviors cannot be guaranteed to succeed. For example, a module may change role just after a remote command has been delivered to the module, but before it has been scheduled for execution (such a command is ignored in the current implementation). In general, we believe that a statically typed approach is likely to be too brittle for a dynamically evolving distributed system, but large-scale experiments are needed to determine what is the most useful approach. The RDCD language currently does not support state, which significantly simplifies the role change mechanism. Programmers thus have to resort to defining their own operations implemented in C code for manipulating state, which is obviously not a satisfactory solution. Moreover, there is currently no support for migrating state with mobile programs, which complicates e.g. writing a mobile program that finds those potential wheel modules that are at the bottom of the structure. Resolving these issues is considered future work, but we envision allowing the programmer to declare state both globally (persistent across role changes) and loule which in practice is much easier than reprogramming all modules in a robot; memory is not a practical issue since the constant arrays holding the generated C code are stored in the 128K flash program memory which is much larger than the 4K of RAM.

34

cally to roles (transient across role changes), since a preliminary study of existing programs for the ATRON seems to indicate the need for both kinds of state.

4.

RELATED WORK

As an alternative to the DCD-VM, we have developed the RAPL system that statically compiles role declarations written in a simple XML-based language to conventional C programs [7, 8]. Each role declaration is explicitly tied to the physical structure of the robot, making it easy to deploy and experiment with in practice, but less flexible in terms of what robot structures a given program can support. The commands declared for each role can simply be called remotely by the neighboring module. We see this system as a simple precursor to RDCD, since it only supports a small subset of its features, namely the basic concept of roles. Nevertheless, this system is more complete in the sense that it from an XML specification generates code that works on the physical modules. Autonomous robots are often programmed using behaviorbased control [3]; behaviors are typically sensor-driven, reactive, and goal-oriented controllers. Certain behaviors may inhibit other behaviors, allowing the set of active behaviors to vary. Modular robots often use the concept of a role albeit in an ad-hoc fashion: complex overall behaviors can be derived from a robot where different modules react differently to the same stimuli, in effect allowing each module to play a different role (e.g., [1, 5, 19]). Recently, Støy et al have explicitly used the concept of a role to obtain a very robust and composable behavior [20, 22]. Compared to RDCD, the implementation of roles is ad-hoc and the only control examples investigated are cyclic, signal-driven behaviors for locomotion. Apart from RDCD and RAPL, the only high-level programming language for modular robots that the authors are aware of is the Phase Automata Robot Scripting Language (PARSL) [10, 25]. Here, XML-based declarations are used to describe the behavior of each module in the PolyBot selfreconfigurable robot [24]. Compared to RDCD and RAPL, the tool support is much more complete and the language has many advanced features for controlling locomotion using behavior-based control. Nevertheless, PARSL completely lacks the concept of a role for structuring the code: each behavior is assigned to a specific module as an atomic unit. Moreover, PARSL has no support for dynamically distributing code in the robot. Outside the field of robotics, roles and mixins have been investigated in numerous cases, which forms the basis of our language design. Regarding static typing, our role-change mechanism resembles that of Fickle [6], but since RDCD roles have no state and role change is always local to a single behavior, our approach is much more restricted but also easier to both implement and type check statically (although the latter property has not been investigated in practice). Mixin roles are a particularly simple use of the more general concept of a mixin [2] that we expect to explore more generally in future work.

5.

CONCLUSION AND FUTURE WORK

In the paper we have presented the design of the RDCD language for programming ATRON modules using role-based programming coupled with distributed control diffusion. The

Playware: initial prototype

Playware: new prototype

iBlocks: initial prototype

iBlocks: new prototype

Figure 8: Examples of physically interlocked systems. The Playware modules are interactive playgrounds with pressure sensors and color LEDs, adjacent modules communicate using a wired connection that is established when the modules are combined. The iBlocks are physical artifacts that allow children to interact with computing devices, they are equipped with connectors with infrared communication (the newest version uses magnetic connectors), tilt sensors and LEDs. In both cases, when users physically reconfigure the system the behavior of the system as a whole should evolve accordingly. design is supported by a preliminary implementation of a compiler that can generate code for the non-trivial obstacle avoidance scenario. Ongoing improvements to the compiler includes completing the front-end parser and improving the back-end optimizations. In terms of future work, there are numerous improvements that could be made to RDCD and the DCD-VM. In the shorter term, a major issue is enabling the programmer to express more precisely the relations and collaborations between modules, as opposed to describing the individual behaviors that give rise to the collaboration. For example, the obstacle avoidance program in Figure 5 is not obviously an obstacle avoidance algorithm; we believe that a programming language that has a greater focus on the collaborations would facilitate expressing such an algorithm clearly and succinctly. As a long-term perspective, we are however interested in generalizing the application domain, not only to other types of modular self-reconfigurable systems, but also to a more general class of embedded devices that could be referred to as physically interlocked systems: networked embedded systems with physical connectors, where the way the systems are connected affects their behavior. Modular robots are an example of such a system, but the authors are currently investigating other systems that share the same characteristics, such as the Playware Tiles and the iBlocks, both shown in Figure 8 [12, 15]. We believe parts of the DCD-VM and the RDCD language also would be applicable to such systems, which can lead to the development of a family of language platforms for physically interlocked systems.

6.

REFERENCES

[1] H. Bojinov, A. Casal, and T. Hogg. Multiagent control of self-reconfigurable robots. In Proceedings of Fourth International Conference on MultiAgent Systems, pages 143–150, 2000. [2] G. Bracha and W Cook. Mixin-based inheritance. In N. Meyrowitz, editor, OOPSLA/ECOOP ’90 Proceedings, pages 303–311. ACM SIGPLAN, 1990.

35

[3] R. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2:14–23, March 1986. [4] A. Castano and P. Will. Autonomous and self-sufficient conro modules for reconfigurable robots. In Proceedings of the 5th International Symposium on Distributed Autonomous Robotic Systems (DARS), pages 155–164, Knoxville, Texas, USA, 2000. [5] D.J. Christensen and K. Støy. Selecting a meta-module to shape-change the ATRON self-reconfigurable robot. In Proceedings of IEEE International Conference on Robotics and Automations (ICRA), pages 2532–2538, Orlando, USA, May 2006. [6] Sophia Drossopoulou, Ferruccio Damiani, Mariangiola Dezani-Ciancaglini, and Paola Giannini. More dynamic object reclassification: Fickle∥. ACM TOPLAS, 24(2):153–191, 2002. [7] Nicolai Dvinge. A programming language for ATRON modules. Master’s thesis, University of Southern Denmark, 2007. [8] Nicolai Dvinge, Ulrik P. Schultz, and David Christensen. Roles and self-reconfigurable robots. In Proceedings of the ECOOP’07 Workshop Roles’07 — Roles and Relationships in OO Programming, Multiagent systems and Ontologies, 2007. To appear. [9] S.C. Goldstein and T. Mowry. Claytronics: A scalable basis for future robots. Robosphere, November 2004. [10] Alex Golovinsky, Mark Yim, Ying Zhang, Craig Eldershaw, and Dave Duff. Polybot and PolyKinetic system: A modular robotic platform for education. In IEEE International Conference on Robots and Automation (ICRA), 2004. [11] M. W. Jorgensen, E. H. Ostergaard, and H. H. Lund. Modular ATRON: Modules for a self-reconfigurable robot. In Proceedings of IEEE/RSJ International Conference on Robots and Systems (IROS), pages 2068–2073, Sendai, Japan, September 2004. [12] H. H. Lund, T. Klitbo, and C. Jessen. Playware

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

technology for physically activating play. Artificial Life and Robotics Journal, 9:165–174, 2005. H.H. Lund, R. Beck, and L. Dalgaard. Self-reconfigurable robots with ATRON modules. In Proceedings of 3rd International Symposium on Autonomous Minirobots for Research and Edutainment (AMiRE 2005), Fukui, 2005. Springer-Verlag. S. Murata, E. Yoshida, K. Tomita, H. Kurokawa, A. Kamimura, and S. Kokaji. Hardware design of modular robotic system. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2210–2217, Takamatsu, Japan, 2000. J. Nielsen and H. H. Lund. Modular robotics as a tool for education and entertainment. In Proceedings of IADIS International Conference on Cognition and Exploratory Learning in Digital Age (CELDA 2005), 2005. D. Rus and M. Vona. Crystalline robots: Self-reconfiguration with compressible unit modules. Journal of Autonomous Robots, 10(1):107–124, 2001. Ulrik P. Schultz. Distributed control diffusion: Towards a flexible programming paradigm for modular robots. Submitted for publication, preliminary version available at http://www.mmmi.sdu.dk/~ups/apges07/dcd.pdf. W.-M. Shen, M. Krivokon, H. Chiu, J. Everist, M. Rubenstein, and J. Venkatesh. Multimode locomotion via superbot robots. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, pages 2552–2557, Orlando, FL, 2006. Wei-Min Shen, Yimin Lu, and Peter Will. Hormone-based control for self-reconfigurable robots. In AGENTS ’00: Proceedings of the fourth international conference on Autonomous agents, pages 1–8, New York, NY, USA, 2000. ACM Press. Kasper Stoy, Wei-Min Shen, and Peter Will. Using role based control to produce locomotion in chain-type self-reconfigurable robots. IEEE Transactions on Robotics and Automation, special issue on self-reconfigurable robots, 2002. K. Støy. How to construct dense objects with self-reconfigurable robots. In Proceedings of European Robotics Symposium (EUROS), pages 27–37, Palermo, Italy, May 2006. K. Støy, W.-M. Shen, and P. Will. Implementing configuration dependent gaits in a self-reconfigurable robot. In Proceedings of the 2003 IEEE international conference on robotics and automation (ICRA’03), pages 3828–3833, Tai-Pei, Taiwan, September 2003. M. Yim. A reconfigurable modular robot with many modes of locomotion. In Proceedings of the JSME international conference on advanced mechatronics, pages 283–288, Tokyo, Japan, 1993. M. Yim, D. Duff, and K. Roufas. Polybot: A modular reconfigurable robot. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 514–520, San Francisco, CA, USA, 2000. Ying Zhang, Alex Golovinsky, Mark Yim, and Craig Eldershaw. An XML-based scripting language for chain-type modular robotic systems. In Proceedings of

36

the 8th Conference on Intelligent Autonomous Systems (IAS), 2004.

An approach to derivation of component-based implementations from data-oriented specifications



A. Basu

S. Yovine†

M. Zanconi

VERIMAG, Centre Equation, 2 Ave. Vignate, 38610 Gières, France

[email protected], [email protected], [email protected]

ABSTRACT The design and implementation of software-intensive embedded product lines requires dealing with a variety of constantly changing application- and system-dependent functional and non-functional requirements and constraints that spread out throughout the development process. Moreover, because product lines are built upon a set of core services which are improved, customized, extended and integrated to come up with differentiated products, there is a need to resort to component-based approaches. However, many embedded applications (e.g., video compression) are most likely specified in a transformational data-oriented style. The componentization of such applications is therefore deferred to the implementation phase, where performance and platform constraints are taken into account. This paper discusses a formally-grounded method to carry on this process. The approach consists in integrating (1) the componentbased language and execution engine BIP [4], and (2) the coordination language and code-generation infrastructure FXML/Jahuel [1]. The framework is illustrated with an MPEG-4 video encoder.

1.

INTRODUCTION

Today’s consumer-electronic embbedded applications, such as mobile phones, HDTV, ..., are becoming more and more softwareintensive systems built from a set of domain-specific core functionalities. Software product lines [9] aim at to provide engineering capabilities to deal with mass customizations of core applications in order to enable rapid time-to-market. However, the great variability, and constant and rapid evolution of user-, system- and hardware-level requirements, constraints and services, throughout the product life-cycle, make the design, implementation, and maintenance of product lines costly and error-prone. The main challenge is to provide tool-assisted software development frameworks capable of (1) capturing and propagating application requirements, and design and platform constraints throughout the development cycle, and (2) supporting software componentization to ease customization, integration and evolution. ∗ †

Partially supported by projects NEVA, SCEPTRE, and TAPIOCA. Currently visiting Professor at DC, FCEyN, Univ. Buenos Aires, Arg.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright the authors, APGES 2007, Oct. 4th 2007, Salzburg, Austria.

37

The most common industrial practice consists in programming in C, together with application-specific libraries. Typically, application code is too early customized for a particular target, limiting reusability and portability. Moreover, correctness verification and performance optimization are hard to achieve because system calls (e.g., for threading and resource management) make software analysis extremely difficult [7] [17]. One way of overcoming these problems consists in using a formally-defined domainspecific programming language (e.g., Lustre [8], Esterel [5]). These approaches enable verification, but propose a fully-automatic platform-dependent implementation phase not well suited for product lines. To some extent, model-driven engineering approaches, based on OMG’s model-driven architecture (MDA) [14], modelintegrated development [15], or architecture-description languages (ADL) [13], to mention only a few, provide component-based design and implementation frameworks for software product lines, but offer none or very limited means to deal with timing properties. To overcome the lack of support for dealing with timing properties in state-of-the-art model-driven component-based approaches, BIP [18] provides a language and a theory for incremental composition of components encapsulating time-dependent behaviors. BIP liberates the programer of the burden of having to take care of system-wide behavioral properties such as mutual exclusion, deadlock-freedom and time-progress which are therefore ensured by BIP’s semantics. That is, the global system model obtained by composition is proven to satify these properties by construction, as a consequence of BIP’s composition rules. Nevertheless, many applications, such as video encoding, are not programmed following a component-based approach, but most likely using a transformational data-oriented one. These applications are better described using languages such as StreamIt [11] and FXML [1]. StreamIt is a full-fledged programming language which provides specific statements and data types (e.g., filters, pipelines, feedback loops) for coding single-input/single-output stream-like computations. It does not handle timing constraints and its semantics is not formally defined. In contrast, FXML is a formal coordination language with general-purpose constructs for expressing concurrency (e.g, par, forall), where coordination is thought as managing dependencies between activities [16]. The main difference with other coordination languages (see [16] for a comprehensive survey) is that FXML (1) can express rich control and data precedence constraints, and (2) can be gradually extended with more concrete constructs in order to provide synchronization, communication, and schedulling mechanisms for implementing the abstract behavior. That is, by design, FXML is an extensible and customizable language oriented towards generating code for multiple platforms via domain-specific semantics-preserving syntactic transformations.

Semantics This paper presents a first step in order to combine the benefits of FXML and BIP towards providing a tool-assisted formallygrounded method for the derivation of componentized implementations in BIP from data-oriented specifications in FXML. We illustrate the concept with an industrial MPEG-4 video encoder [2].

2. 2.1

W0

R0

W0

R0

W0

R0

W0

R0

W0

R0

W1

R1

W1

R1

W1

R1

W1

R1

W1

R1

W2

R2

W2

R2

W2

R2

W2

R2

W2

R2

THE CODE GENERATION CHAIN (a)

The language: FXML

FXML [1] is a coordination language for expressing concurrency, together with control and data dependencies. The basic computation units are assignments and legacy (C, Java, ...) code blocks. Basic units can be composed sequentially or in parallel. FXML provides two parallel composition operations: par, and a forall primitive to declare several concurrent iterations of the same block. This construct is similar to FORTRAN 95 with the difference that dependencies between iterations are allowed. In FXML, parallel composition does not entail physical parallel execution at run-time, but only logical concurrency. Control and data dependencies can be annotated with properties to restrict parallelism because of timing or precedence constraints. Here, we informally introduce FXML syntax and semantics through an illustrative example. The reader is referred to [1] for the formal definition of FXML. The actual syntax is defined by an XML schema. For the sake of readability, in this paper we use a simplified textual version. The specification of a producer-consumer system is as follows: var int x dep W -> R par Producer: var int p {# p = 0; #} while(true) W: {# x = p++; #} Consumer: while(true) R: {# printf("%d\n", x); #} The body of an FXML specification is composed of blocks called pnodes. The basic pnode types are assignments, variable declarations, e.g., var int x, or legacy code, e.g., {# p = 0 #}. Basic pnodes are executed atomically. Legacy declarations can be used to encapsulate either pre-existing or newly developed “hazardfree” (e.g., without system calls) code, which can be safely compiled with an optimizing compiler. Tags can be used to provide summaries of legacy code in order to highlight dependencies hidden inside legacy declarations. For instance, the tag , states that variable p is written by the legacy code. Pnodes can be labeled: e.g., W: {# x = p++ #}. Pnodes inside while (and for) loops are automatically indexed. The semantics of the producer’s while loop is a sequence W0 W1 . . . , where Wi is the i-th occurrence of the assignment labeled W. The statement dep W → R specifies a dependency between occurrences of pnodes labelled W and R. Notice that variables x and p are declared in FXML, but only used in legacy C code. This declaration, together with the annotations about variable usage, allows the compilation chain to eventually synthesize the data dependency dep W → R, if not explicitly declared, even if it is hidden in the legacy code. The arrow → means that variable x must be written at least once, before being read. This weak semantics can be further constrained: W [strong]→ R means that every value of x must be read at least once (no losses). Dependencies can also be specified to relate specific iterations, e.g., W

38

(b)

(c)

Figure 1: Examples of executions. 06/13/07

(c) Sergio Yovine, 2007

(i, i)→ R specifies that the value of x read at the i-th iteration of the Consumer’s while loop, is the one written by the i-th iteration of the Producer’s loop. In general, indexed dependencies have the form p (i, f (i)) → q, where f (i) is affine. The semantics of a pnode p, denoted [[p]], is a (possible infinite) set of partial orders, called executions. Fig. 1 shows examples of executions of the producer-consumer system for different types of dependencies between pnodes W and R: (a) weak, (b) strong, and (c) (i, i). Each execution of the composed system contains the union of the executions (in this case, total orders) of the Producer and Consumer pnodes, namely, W0 W1 . . ., and R0 R1 . . ., resp., with precedences added by the dependency declaration dep W → R. Notice that, the (i, i) dependency results in a single execution. The semantics of FXML consists of partial orders consistent with the conjunction of constraints imposed by dependencies. This allows, for instance, specifying the case of a (consumer-like) pnode C which computes, say y = f (x1 , . . . , xn ), for xi written by (producer-like) pnodes Pi , i ∈ [1, n]. Another case consists of a pnode P broadcasting the value of a variable x to several consumers Ci , computing yi = fi (x), i ∈ [1, n]. The conjunctive semantics does not allow, for instance, easily capturing the case where the value of variable x written by P is to be read by a single non-deterministically chosen consumer. The other case where a single consumer C needs the value produced by any of many producers P1 . . . Pn is not easily specified, either. To overcome this inconvenience, FXML supports hyper-dependencies of the form P1 . . . Pn {φ} → C, and P {φ} → C1 . . . Cn , where φ specifies the composition of the individual dependencies Pi → C and P → Ci . For simplicity, we only consider here the case where φ is the exclusive disjunction of the dependencies, and restrict individual depedencies to be weak or strong.

2.2

The compilation chain: Jahuel

Jahuel is a FXML-based prototype compilation chain. Compiling a FXML specification consists in transforming it until actual executable code for a specific platform could be generated. Let L denote a language. Concretely, L is given by an XML schema, where each element definition has an associated type. A transformation from L to L0 is an injective map φ : L → L0 , that is, every element of the XML schema L is in the set of elements L0 . Let EL be the set of executions of type L, and Fφ : EL0 → EL be the “forgetting” function that forgets any information specific to executions of type L0 . φ : L → L0 satisfies that for all executions e0 |=L0 φ(p) it follows that Fφ (e0 ) |=L p. The compilation process is a sequence of transformations L0 7→∗ L0 7→ L1 7→∗ . . . Ln , where L0 is basic FXML. Li 7→∗ Li is a sequence of transformations from Li to Li , resulting in a sequence k+1 of programs p1i . . . pn ]] ⊆ [[pki ]]. An example of i , such that [[pi a transformation from L0 to L0 consists in replacing weak de-

6

pendencies by strong ones. Li 7→ Li+1 is a transformation that adds information not expressible in Li . An example consists in inserting communication and synchronization mechanisms (e.g., semaphores, queues, ...) to ensure dependencies are met. JAHUEL is a FXML-based compilation framework, constructed to be easily extended to cope with new execution models, by extending the basic FXML XML-schema, and by adding transformations. JAHUEL is implemented in Java, using the Java Architecture for XML Binding API 1 , to manipulate XML documents. JAHUEL provides some general transformations which can be customized for different execution and simulation platforms. Currently, it generates code for, e.g., Java, C with pthreads, SystemC, and PWare [3]. The compilation chain is indeed to be instanciated with the sequence of transformations to be applied. Each transformation reads an input XML file and outputs another XML file to be used by the next one, thus ensuring traceability of implementation choices. The code generation phase for the target platform is done via a stylesheet. An example of using JAHUEL for generating a parallel implementation of an industrial video application has been presented in [2]. Here, we briefly overview two simple examples. In section 4, we discuss the instantiation of JAHUEL to generate componentized C code via the BIP platform.

Producer-Consumer Assume that we would like the producer-consumer specified before to be executed on a platform providing threads, locks, and condition variables, implemented, for instance, by the pthread library and Java. The implementation will consist of two threads, sharing variable x. Concurrent accesses to x must be mutually exclusive, and for a weak dependency, x must be written at least once by the producer, before the consumer could read it. In order to do this, basic FXML is extended with the appropriate constructs to handle theses notions, independently of the actual API provided by the run-time. The transformed specification looks as follows: var int x mx:mutex init:b=0 dep W -> R par Producer: var int p {# p = 0; #} while(true) mx.lock W: {# x = p++; #} mx.notify:b=1 Consumer: while(true) mx.wait:b==0 R: {# printf("%d\n", x); #} mx.unlock

pthread_mutex_lock(mx); while(b==0) pthread_cond_wait(..); And so on. The code generated for Java looks very much alike. Besides strengthening the type of the dependency, another implementation decision could be to add a fifo buffer to store the values of variable x. This makes sense, for instance, in the case of a (i, i) dependency when the producer is faster than the consumer. This leads to another extension of FXML with appropriate tags. The transformed producer-consumer specification would be as follows: var buffer:int buf dep W -> R par Producer: var int x, p {# p = 0; #} while(true) W: {# x = p++; #} buf.put(x) Consumer: var int x while(true) buf.get(x) R: {# printf("%d\n", x); #} The shared variable x is replaced by a shared buffer buffer. The actual implementation (e.g., array, queue, socket, ...) and size are to be determined later, by a subsequent transformation. Since x is used in the legacy code, a local declaration of x is added to each block of the par statement. Writing and reading x become buf.put(x) and buf.get(x), respectively. For a weak dependency, the buffer is only required to produce values in a way consistent with the order of writes and reads, that is, the value get the i + 1-th time should not have been put before the value get the i-th time. For strong, get() is required to deliver all written values. This imposes a fairness constraint. One way of realizing it, is to implement get() so as to produce a fresh value at each time, if there is one. Notice that not allowing reading a value twice is more constrained than strong semantics, but it is a legal implementation, anyway. The (i, i) case can be implemented with a fifo buffer.

Smith-Waterman Let us consider now the implementation of an application with forall loops. The Smith-Waterman [12] local sequence matching algorithm consists of computing the elements of a N + 1 by M + 1 matrix, from two strings of lengths N + 1 and M + 1. In FXML, this algorithm can be expressed as follows:

The tag specifies that pnode Producer will later become a thread. The translation of this tag into actual code (e.g., C with pthreads) requires a rather involved transformation which is out of the scope of this paper. The declaration mx:mutex allows to attach a mutex to the shared variable x. mx.lock can be directly translated into the corresponding lock operation of the runtime, e.g., pthread_mutex_lock(mx). Similarly, the code generated for the notification mx.notify:b=1 looks like: b=1; pthread_cond_signal(...); pthread_mutex_unlock(mx); 1

On the consumer’s side, the wait statement is translated into

forall(int j=1; j LA dep LZ ((i,j),(i+1,j+1)) -> LA

http://java.sun.com/developer/technicalArticles/WebServices/jaxb/

39

(1,1)

(1,2)

(1,3)

2 (1,1)

(2,1)

(2,2)

(2,3)

3 (2,1)

3 (1,2)

(3,1)

(3,2)

(3,3)

4 (3,1)

4 (2,2)

4 (1,3)

5 (3,2)

5 (2,3)

in

[0 5) return 5; else return x; }

This function adds two numbers. If the sum is greater than 5, the function returns 5, otherwise it returns the sum. This example uses several logical-level variables, also known as specification-level variables, ghost variables, or type-level variables. These logicallevel variables, seen in capital letters, are used at compile-time to check the function specification but are not around at run-time. For instance the singleton type Int[N] can be read as the set of integers i where i is equal to N, or more formally {i : int | i == N }. Int the above example, the type information after the function name forms the precondition M ≥ 0 and 2 ∗ P == N so M is positive and N is even. The return type shows a postcondition X ≥ 0 and X ≤ 5 so X is between zero and five. Constructs on pre- and

46

postconditions are encoded in the same style as DML, DTAL [24], and ATS [23]. They are also very similar to the conditions in ESC Java [8]. The conditions to access a device register can be encoded in the header of a Clay function. Clay is also able to track logical device state by creating a new singleton type for each state component. @type0 State[int Parameter] = native

This type depends on an integer type variable. Values of this type can be passed in and out of functions and the type variable can be used in pre- and postconditions. The beginning of the declaration, @type0, declares the type to be linear and logical rather than actual state. These features provide Clay’s memory safety [20]. exists [int X; X>=0 && X=0 && 2*P==N && N>Y] (Int[N] n, Int[M] m, State[Y] y) { let x = n + m; if (x > 5) return 5; else return @(x,y); }

This example modifies the previous example to add a parameter for device state, y, and include N > Y in the precondition. We use a tuple (@[]), common in functional languages like ML or Haskell, to return both x and the state y. Clay’s typechecker verifies that all function calls obey the function precondition. It also verifies that given each function precondition and function body, it can conclude that the postcondition also holds. The verification is done using the Omega [16] constraint checker. The typechecker also verifies that the code is free of other errors previously mentioned. After typechecking, the Clay compiler translates the Clay code to C++code. Clay currently uses a templated struct containing an array to handle tuple return types. This is the only actual C++feature in the generated C++code which can be linked and compiled with C code. Since Clay’s typechecker has already verified that function preand postconditions are always true, these conditions do not generate run-time checks in C++. Likewise, logical device state variables such as the State example shown above have also been checked statically and will only generate C++code where the typechecker could not verify safety statically; i.e. where the device state may have changed and the driver must query the device. The logical device state variables themselves do not generate C++variables and therefore do not use additional space in memory. This is valuable in systems with space constraints such embedded systems. Additionally, the static typechecking produces robust code with fewer errors at run-time. This could lead to fewer patches later in the life of software written in Clay. Although Clay is expressive enough to include a driver IO specification in its code and powerful enough to statically check that the specification is obeyed, it is a difficult language to write in. Figure 1 shows just the header of one register read function in Clay. Clay is relevant to embedded systems programming because it has many of the properties laid out in [9]. Correctness: a Clay program is statically guaranteed to meet the specification in it’s types and pre- and postconditions. Concurrency: Clay has built-in features for resource sharing and concurrency via locks [10, 20]. Clay programs have the same time and space constraints as C++as well as its ability to handle asynchronous events.

4.

LADDIE

The Language for Automated Device Drivers (Laddie) was intended to provide a simple and safe way to encode the IO specifications for a device driver. Laddie was developed by Lea Wittie and

native exists [u32 Q2, u1 IC, u1 ER, u3 Code, u11 RxBytes; Q2==RxBytes] @[Base[D,A], Busy[D,B], RxQUsed[D,Q2], Window[D,W], Int[IC], Int[ER], Int[Code], Int[RxBytes]] read_RxStatus [u32 A, u32 D, u1 B, u32 Q2, u32 W; (W>=0 && W4*count; ensures RxQUsed==old(RxQUsed)-(4*count);

Basic information The basic information in an IO rule, except for the register offset, is given using C-style assignment statements which set the variables name, size, type, and access. 1

This is a common phenomenon occurring in many other devices such as the SMSC LAN91C111 10/100 Non-PCI Ethernet Single Chip MAC + PHY [18]

47

}

Figure 2: Part of the 3Com 3c509 specification

name = size = type = access

however, these will be caught during a consistency test that checks for pre- and postconditions that are impossible to satisfy. This test and Laddie’s static semantic checking allow specifications to be checked for errors before writing the Clay portion of the driver.

RX_PIO_DataRead; 4; repeated; = read;

Each IO rule must have a unique name. The register size gives the number of bytes in this register. The size statement is optional and the default size is 1 byte. If this is a repeated IO call, such as insl(), then the type is repeated and the pre- and postconditions can refer to a variable named count. The IO function will take an input count which determines how many times it repeats the IO call. The default type is non-repeated since that is the standard IO call. The access for a rule may be read, write, or readwrite. A readwrite access implies that the rule is for both reading and writing the register. The access statement is optional and defaults to readwrite. The register offset is declared before the IO rule. Separate read and write rules can be given for the same register provided they use different names. port 0x0E { name = Command; } port 0x0E { name = Status; }

This feature is used when a register has separate rules governing reads and writes as seen in the Command and Status register of the 3c509. It is also useful when the same port offset has a different meaning in different windows.

Fields Registers are frequently divided up into named fields which hold logically distinct data. The Status register in Figure 2 is divided into many fields, four of which are shown here. [15:13] window; [12] command_in_progress; [11] reserved; [7] update_stats;

Postconditions referring to old device state Postconditions may refer to old device state using the keyword old. ensures RxQUsed==old(RxQUsed)-(4*count);

This postcondition relates the postcondition value of device state to its precondition value. The remaining topics in Laddie are syntactic sugar for notational convenience.

Enumerated field values Fields may be declared with an enumerated set of allowed values as seen in the command field of the Command register [15:11] command; enum {GlobalReset, SelectRegWindow, ..., StatsEnable=21, StatsDisable, ...};

The format is similar to the standard C enum. On a write, this field can only take one of the enumerated values. On a read, assuming the register was readable, the device will return one of those values for this field. (Note: Enum values on a read can only be enforced dynamically and constitute a check on the device rather than the driver. Laddie is primarily intended for statically checking the driver.)

Conditions on the whole IO value

Fields can be reserved or omitted which means they should not be used by the driver. Bit 11 of the Status register is explicitly reserved. Bits 10-8 are reserved implicitly. Laddie’s static semantics forbid fields to overlap or go beyond the size of the register.

Pre- and postconditions

Conditions can refer to the whole read or written value rather than to its fields. The preconditions of a readable register may not include the value because it is not known yet. This avoids making a special field that is the size of the whole register when the register in not normally partitioned into fields. We use the keyword value to make this problem simpler to express. The 3c509 does not use this feature but the Signature register of the Logitech busmouse driver does. requires value == 0xa5;

The precondition section is given before the postcondition section. Both sections are optional and may be omitted. A precondition is preceded by the keyword requires and a post condition is preceded by ensures. The actual conditions are a set of boolean statements on the device state and register fields. requires Window==1 && Busy==false; ensures RxQUsed==RXbytes;

Conditions may use the standard boolean and relational operators as well as the +, −, and ∗ math operators. The relational and math operators perform standard C operations on 32-bit integers. We will describe the conditions more formally in Section 5. The preconditions of a readable register may not include the fields because their value is not known yet. The set of allowed values in a logical state declaration is an implicit pre- and postcondition on every register access.

Switch statements For convenience, a switch statement on a register field may be used instead of a complex boolean expression. ensures switch command { SelectRegWindow : Window==argument; StatsEnable : StatsEnabled==true; ... }

is equivalent to ensures ((command==SelectRegWindow && Window==argument) || command!=SelectRegWindow) && ((command==StatsEnable && StatsEnabled==true) || command!=StatsEnable) && ... ;

The body of each switch case is a boolean statement. Like C, Laddie switch statements have an optional default case.

integer Window [0:6];

The restrictions on Window add the condition (Window≥0 and Window≤6) to every pre- and post condition where Window is mentioned.

Debugging

Multiple pre- or postconditions Multiple conditions may be given for a register. In the Command register of the 3c509, we used multiple preconditions for simplicity.

It is possible to write false conditions such as

requires Busy==false; requires switch command { ... }

requires 1>5; ensures Window 11) & 7; // bits 11:13 c.A[1] = (value >> 0) & 2047; // bits 0:10 c.A[2] = (value >> 15) & 1; // bit 15 c.A[3] = (value >> 14) & 1; // bit 14 return c; }

When Clay compiles, all of type annotations and logical device state values are erased leaving us with a function that takes and returns only the base address and fields. A Clay_Obj is a struct containing an array (A) of a given length. It matches the Clay tuple that is returned by the Clay function stub. The necessary bit twiddling is done for each field [i:j] using Pj Pregister j−i n n 2 (value > i) & n=i 2 n=0 for reading for writing a similar C++function is produced for register writes.

5.4

IO macros

Finally, because all Clay IO stubs generated by Laddie are repetitive and take the base address as well as a collection of known capabilities, Laddie generates a macro which takes the name of input or output variables. #define R_RxStatus(code, RxBytes, IC, ER) \ let [] (s_baseSTATE##2, \ s_BusySTATE##2, s_RxQUsedSTATE##2, \ s_WindowSTATE##2, code, RxBytes, IC, ER) = \ read_RxStatus(IOADDR, s_baseSTATE, s_BusySTATE, \ s_RxQUsedSTATE, s_WindowSTATE); \ s_BusySTATE = s_BusySTATE##2; \ s_RxQUsedSTATE = s_RxQUsedSTATE##2; \ s_WindowSTATE = s_WindowSTATE##2; \ s_baseSTATE = s_baseSTATE##2;

A programmer using the generated Clay code would replace unsafe IO calls like

The exists keyword declares the return type to be an existential where the type variables declared here have some value which satisfies the postcondition.

rx_status = inw(ioaddr + RX_STATUS); short error = rx_status & 0x3800; short pkt_len = rx_status & 0x7ff;

5.2

with its respective safe IO call and macro usage.

Types for logical state

Laddie also generated a type for each logical device state and for the base address of the device.

R_RxStatus(code, pkt_len, complete, error);

@type0 StateWindow [int Device, int Val] = native @type0 StateBusy [int Device, int Val] = native @type0 Statebase [int Device, int Val] = native

6.

Each type has two type variables. The first is the driver memory address. This will be the same for all states associated with a specific driver and device. The driver address is used to link all of the capabilities to a specific driver and device so the states of two different devices cannot be accidentally interchanged. This allows the produced code to scale for multiples of the same device or different devices that happen to have the same logical state names. Unlike the logical device state components, the base address is a state component of the driver and is stored in driver memory. Although we could have used the memory capability for the base address, we chose to create a new state component so that Laddie did not need to know the exact location of the base address within driver memory.

5.3

RESULTS

Laddie is simple language to use and we were able to write Laddie specifications for several drivers

Native C++IO function

There is a matching C++function for every Clay IO function stub.

50

• 3Com 3c509 Network Interface Card • National Semiconductor PC16550D UART • National Semiconductor DP8573A Real Time Clock These specifications took less than an hour to write after we had thoroughly read the respective device manuals published by the manufacturers. Our Laddie specifications are available at [21]. Simple timing tests conducted on a handwritten Clay 3c509 driver and Donald Becker’s 3 C 3c509 [2] driver using a series of ping packets indicate only a slight increase in run time for the Clay 3c509 driver. To evaluate the decrease in programmer workload, we can make code length comparisons between Laddie, Clay, and C. 3 Donald Becker has written many of the Ethernet device drivers for Linux.

C 3c509 driver Donald Becker’s 3c509 driver is around 800 lines of code. It is much shorter than the equivalent Clay driver since it does not include the IO specification.

Clay 3c509 driver The Clay 3c509 driver has four sections of code: Portion of Driver Lines of Code IO interface 1188 driver functions 1939 system code 1090 capability handling code 411 total 4628 The driver functions are the main body of the driver. The IO code includes all of the IO functions, their native C++translations and their macros. The system code is made up of the included Linux .h files for the original driver since they had to be translated into Clay. However, the .h files are not driver specific and could be re-used. The capability handling code converts the driver memory capability from a tuple to individual memory capabilities and back again. The IO code is about 25 percent of the total Clay code.

Laddie portion of the 3c509 driver Laddie needed 314 sparse lines of code to replace the Clay IO code. (The generated Clay was 1197 dense lines of code, very similar to the original 1188 hand-written lines of Clay code). The main reason for the dramatic difference between Laddie IO code length and Clay IO code length is that the Clay IO code is very repetitive. The main things that differ from function to function are data size, offset, and pre- and postconditions. Since Laddie’s syntax provides a concise way to present the necessary information, the generator is able to provide the rest.

7.

RELATED WORK

The languages that have the most in common with Laddie are Devil [11, 17], NDL [4], and Hail [19, 25]. Both these languages and Laddie provide a device driver specification for IO operations on registers. Devil, NDL, and Hail drivers are shorter than C drivers and have similar performance times. Laddie allows much of the functionality of these languages in its register IO specifications. However all three other languages provide safety through some standard static checks and run-time checks on the IO specifications while Laddie compiles to Clay which enforces most IO specification invariants at compile time. This section presents a comparison of these languages with Laddie. It also presents other related static verifiers and type safe languages.

Devil Devil provides a specification for IO operations on registers as well as a range of legitimate port offsets from the base address and a set of variables tied to register fields. The variables provide a way to track device state at run time. Devil supports pre- and postconditions on IO operations. Unlike Laddie, Devil is able to define a variable as the concatenation of subfields from several different registers. The compiled Devil code includes read and write functions which hide the details of how each variable is assembled. This allows simpler read and write access to complex variables. Both Devil and Laddie statically guarantee that read/write access and size constraints are obeyed in the driver functions. Both languages can track logical device state in pre- and postconditions; Devil via standard variables and Laddie via ghost variables.

51

Conditions in Devil are variable assignments that must be performed before or after the IO operation respectively. In comparison, Laddie allows more flexible conditions because its conditions are boolean expressions on ghost variables. Devil’s conditions only allow equality rather than the full range of boolean operators. Laddie’s ghost variables only exist at compile time so they do not use extra space in memory during run time. However, Laddie, unlike Devil, is unable to refer to standard program variables in its preand postconditions.

NDL NDL builds on Devil and uses similar syntax. An NDL specification includes IO operations on registers and a collection of driver functions. It also includes a state machine for the logical states a device may be in (reading, sleeping, etc..). Preconditions on the current state are used in the IO specification. NDL does not appear to support post conditions. Both NDL and Laddie allow the same IO location to have two different rules for reading and writing. NDL code compiles to C and uses runtime checks to enforce its preconditions and state machine transitions. Unlike Laddie, the entire driver is written in NDL’s C-like driver syntax. This has the advantage of allowing NDL to support buffer copying to and from a device in one or two lines of code, an operation which normally takes many IO operations.

Hail Hail provides a specification for IO operations and invariants on registers as well as an address space description and a description of the device instantiation. The Hail compiler is capable of catching inconsistencies in a specification. The Clay compiler can catch similar inconsistencies in a Laddie specification. Like Devil, the Hail compiler generates IO functions in C with optional run time checks on the invariants. The actual driver functions of a Hail driver are written in C using the generated IO functions. According to the HAIL website, the address space descriptions are not implemented yet in a HAIL compiler. Laddie currently provides #define stubs for the different possible addressing strategies, but Hail’s syntax is easier to use and we may adopt this strategy in the future.

SDV Microsoft SDV, static driver verifier, works on Windows device drivers based on the Windows Driver Model [12, 1]. The SDV statically checks that the driver obeys a set of built-in rules about the driver/kernel interface. The project is similar to Laddie in that both statically verify a driver’s specification usage. The SDV focuses on a fixed driver/kernel specification for the Windows Driver Model while Laddie focuses on a user-defined driver/device specification which can be tailored to each device. The SDV is part of the SLAM toolkit.

Type safe languages Typed Assembly Language [13] is similar to Clay in that both access memory through load and store primitives and use types to guarantee memory safety. TAL is also too low-level to easily write drivers. Popcorn, a safe C-like language which compiles to TAL does not support the complex pre- and postconditions needed by safe driver IO. Hoare Type Theory [14] adds types to the standard Hoare triple to form {P} x:t {Q} where the pre- and postconditions can depend on the types. This is similar to the pre- and postconditions on Clay functions. This language is currently formalized but not implemented.

The Vault [7] programming language embeds keys to manage temporal events in the types of system resources. Keys, like the capabilities of Clay, are associated with a resource and must be held to access the resource. Vault allows a notion of pre- and postconditions with key states. However, they are less flexible than the arithmetic constraints, relations, and inequalities of Clay.

8.

CONCLUSIONS AND FUTURE WORK

The combination of Laddie and Clay provides a usable specification language for device IO access functions and static type safety guarantees that the driver uses the device access functions in accordance with the specification. Using Laddie we have written several device IO specifications which are available at [21]. These specifications could be produced by the device manufacturer and distributed to driver writers along with the standard manual. Currently, Laddie only handles the IO end of a driver. The remainder of the driver must still be in Clay, a more difficult language to use. We expect to be able to automate the translation of the system code (C structs such as the driver memory), from C to Clay with a reasonably straight forward compiler which will also generate the capability-handling code needed to access these data in Laddie still in Clay structures. IO driver system capability code functions code code We have left an easier language for writing the remainder of the driver as future work. A mix of Clay’s type system and function pre- and postconditions with a language, such as Hume [9], primarily intended for embedded systems might yield a simpler intermediate language with guarantees of correctness and space and time costs for constrained systems.

9.

AVAILABILITY

An ML program formally translating Laddie to Clay, the Laddie and Clay compilers and Users Guides, and the Laddie specifications of several devices are available at [21].

10.

REFERENCES

[1] Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. In EuroSys Conference, Leuven, Belgium, 2006. [2] Donald Becker. 3Com EtherLink III 3c5x9 driver v. 1.18. http://joshua.raleigh.nc.us/docs/ linux-2.4.10_html/284303.html, 2000. [3] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An empirical study of operating system errors. In ACM Symposium on Operating Systems Principles, pages 73–88, Banff, Alberta, Canada, 2001. [4] Christopher L. Conway and Stephen A. Edwards. NDL: A domain-specific language for device drivers. In ACM Conference on Languages, Compilers, and Tools for Embedded Systems, Washington, DC, June 2004. [5] 3Com Corporation. Etherlink III Parallel Tasking ISA, EISA, Micro Channel, and PCMCIA Adapter Drivers Technical Reference. 1-800-NET-3Com, August 1994. [6] National Semiconductor Corporation. PC16550D Universal Asynchronous Receiver/Transmitter with FIFOs. http://www.national.com, June 1995.

52

[7] Robert DeLine and Manuel Fähndrich. Enforcing high-level protocols in low-level software. In ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. [8] Cormac Flanagan, K. Rustan Leino, Mark Lillibridge, Greg Nelson, James B. Saxe, and Raymie Stata. Extended static checking for Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, 2002. [9] K. Hammond and G. Michaelson. HUME: A domain specific language for real-time embedded systems. In International Conference on Generative Programming and Component Engineering, Erfurt, Germany, October 2003. [10] Heng Huang, Lea Wittie, and Chris Hawblitzel. Formal properties of linear memory types. Technical report, Dartmouth College, 2003. [11] F. M´erillon, L. R´eveill`ere, C. Consel, R. Marlet, and G. Muller. Devil: An IDL for hardware programming. In USENIX Symposium on Operating Systems Design and Implementation, San Diego, CA, October 2000. [12] Microsoft. Static driver verifier - finding driver bugs at compile-time. http://www.microsoft.com/whdc/ devtools/tools/sdv.mspx, 2007. [13] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From system F to typed assembly language. ACM Transactions on Programming Languages and Systems, 21(3):528–569, May 1999. [14] Aleksandar Nanevski, Greg Morrisett, and Lars Birkedal. Polymorphism and separation in Hoare type theory. In International Conference on Functional Programming, Portland, Oregon, 2006. [15] George C. Necula and Peter Lee. Safe kernel extensions without run-time checking. In USENIX Symposium on Operating Systems Design and Implementation, Seattle, Washington, October 1996. [16] William Pugh. The Omega project. http://www.cs.umd.edu/projects/omega/, 2007. [17] L. Réveillère, F. Mérillon, C. Consel, R. Marlet, and G. Muller. The Devil language. Technical Report 1319, IRISA, Rennes, Francs, 2000. [18] SMSC. LAN91C111 10/100 Non-PCI Ethernet Single Chip MAC + PHY. http://www.smsc.com, 2005. [19] J. Sun, W. Yuan, M. Kallahalla, and N. Islam. HAIL: A language for easy and correct device access. In ACM Conference on Embedded Software, Jersey City, NJ, September 2005. [20] Lea Wittie. Type-Safe Operating System Abstractions. PhD thesis, Dartmouth College, 2004. [21] Lea Wittie. http://www.eg.bucknell.edu/∼lwittie/research.html, 2007. [22] Hongwei Xi. Dependant Types in Practical Programming. PhD thesis, Carnegie Mellon University, 1998. [23] Hongwei Xi. Applied type system (extended abstract). In post-workshop Proceedings of TYPES 2003. Springer-Verlag LNCS 3085, 2004. [24] Hongwei Xi and Robert Harper. A dependantly typed assembly language. In International Conference on Functional Programming, Florence, Italy, 2001. [25] W. Yuan, J. Sun, and N. Islam. HAIL language specification and user guide. Technical Report DCL-TR-2005-0006, DoCoMo USA Labs, 2005.

Architectural Exploration of Reconfigurable Monte-Carlo Simulations using a High-Level Synthesis Approach J.G.F. Coutinho

D.B. Thomas

W. Luk

Imperial College London Department of Computing 180 Queen’s Gate London SW7 2AZ, UK

Imperial College London Department of Computing 180 Queen’s Gate London SW7 2AZ, UK

Imperial College London Department of Computing 180 Queen’s Gate London SW7 2AZ, UK

[email protected]

[email protected]

[email protected]

ABSTRACT

1.

This paper describes an approach for automatically generating programs that can be compiled into efficient hardware architectures, and illustrates its application in producing designs for Monte-Carlo simulation that are optimised for user requirements in resource utilisation or execution time. Monte-Carlo simulations are widely used in many financial applications and embedded systems, such as option pricing and portfolio evaluation. The intrinsic parallel nature of these applications, together with their inherent computational complexity, make them ideal candidates for acceleration using reconfigurable hardware. We use the Haydn approach to automatically derive simulation architectures from a C-like description, which describes the functionality of the design without focusing on hardware details such as timing. This approach can also exploit resource sharing within pipelined architectures, allowing the tradeoffs between resource utilisation, parallelism, and execution time to be rapidly, safely, and automatically explored. To illustrate our approach, we present: (1) a model of the GARCH walk simulation in Haydn-C to perform design exploration, (2) a design-flow which supports interactive and batch modes for deriving these architectures, and (3) an evaluation of our approach targeting different CPU and FPGA devices. Our results show that an automatically generated Xilinx Virtex-II design operating at 180MHz is 36 times faster than a 3.2GHz Pentium 4, and 4.4 times faster than a 2.2GHz Quad Opteron system.

Reconfigurable devices, such as FPGAs, are increasingly popular for implementing computationally-intensive applications, often for embedded systems and other computationally-intensive applications. The key advantage of reconfigurable technology is that they combine the performance of dedicated hardware with the flexibility of software, without the cost and risk associated with circuit fabrication. In particular, performance can be achieved by exploiting the application’s inherent parallelism. Furthermore, reconfigurable devices can be reused many times over for implementing different hardware architectures, thus offering far more flexibility than ASIC solutions. As reconfigurable technology makes progress in capacity and performance, there is an increasing need for high-level design methods and tools that can effectively address the growing complexity of hardware design to improve designer productivity. Such tools must enhance design maintainability and portability as system requirements evolve, and should facilitate design exploration so that various trade-offs, such as performance versus resource utilisation, can be examined. To address these concerns, we are developing Haydn [5], a novel hardware compilation approach which offers designers a way to capture both cycle-accurate data-paths and high-level designs. Both manual and automated optimisation transforms can be used separately or in combination, so that one can achieve the best compromise between development time and design quality: some of our automatically generated designs are comparable in performance to hand crafted designs. In this paper we illustrate how to exploit Haydn to describe and implement financial Monte-Carlo simulations in reconfigurable hardware. Monte-Carlo simulations are popular in financial applications, as they are able to value multi-dimensional options and portfolios without an exponential growth in run-time and memory use. Due to their high computational load and intrinsic parallelism, they are ideal candidates for acceleration using reconfigurable hardware. We automatically generate several architectures with different tradeoffs in resource utilisation and performance. The main contributions of this paper are:

Categories and Subject Descriptors B.6.3 [Design Aids]: Automatic synthesis, Hardware description languages

General Terms Hardware Design, Design Exploration, Monte-Carlo simulations

Keywords FPGA, High-level synthesis, Pipelining, Resource sharing Design exploration, GARCH walk

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. APGES 2007, Oct. 4th 2007, Salzburg, Austria. Copyright the authors.

53

INTRODUCTION

• a parameterised design model of the GARCH random walk using the Haydn-C language, which allows developers to control the amount of resources and experiment with different timing configurations (Section 3). • a fully automated hardware design-flow that operates in interactive and batch mode (Section 4). • an evaluation of the proposed approach (Section 5). In particular, we automatically generate 57 architectures that target

five FPGA devices with different resource and timing configurations, and compare the performance against five CPUs. For instance, the fastest hardware architecture running on a Xilinx Virtex-2 at 180Mhz can run on-average 4.4 times faster than a Quad Opteron 275 running at 2.2Ghz, and 36 times faster than a Pentium 4 running at 3.2Ghz.

2.

THE HAYDN APPROACH

This section covers our approach: using the Haydn-C language for high-level descriptions which can be used to automatically generate optimised hardware architectures that meet user requirements. The generation process is guided by resource and scheduling information, and designs can be transformed from high-level behavioural descriptions to low-level structural descriptions, all within Haydn-C. In the following, Sect. 2.1 first explains our motivation; Sect. 2.2 then describes our methodology. Sect. 2.3 covers hardware descriptions and their interpretation. Finally Sect. 2.4 provides an account of source-level transformation.

2.1

Motivation

Current design tools that target reconfigurable devices fall into two different camps, namely behavioural and structural approaches. Each has its own benefits and drawbacks. The behavioural approach usually employs a hardware description language that is similar in syntax and semantics to popular software application languages, such as C. The goal of behavioural hardware compilers is to derive one or more hardware implementations from a high-level description (a process known as high-level synthesis), abstracting from low-level details such as timing and resource utilisation to allow developers to focus on algorithmic details. However, high-level synthesis often suffers from lack of user control and transparency over the implementation process. For instance, in the event of generating an unsuitable design, there is little a designer can do, except to experiment with behavioural constraints, or to manually alter the generated design. On the other hand, structural languages, such as languages at the register-transfer level (RTL) and other cycle-accurate description languages give developers more control over low-level implementation. At this level of abstraction, developers are able to make decisions that are left to the compiler in the behavioural approach. In particular, developers are able to fine-tune their hardware implementations to achieve an optimal solution. However, the structural design methodology can have two major disadvantages over highlevel synthesis, namely low productivity and poor maintainability, which make it highly ineffective for implementing large designs, and to perform design exploration. The lack of productivity is due to the fact that many implementation details and architectural decisions have to be provided at design time.

2.2

Methodology

To overcome the limitations of current design tools, we are developing Haydn, which combines the benefits of both behavioural and structural approaches. In particular, we have devised a source-level transformation process which can transform a behavioural-level design to multiple alternative structural-level designs, and from a structural-level design to either structural or behavioural-level descriptions. We can then exploit the strengths of each design level as follows: • Behavioural to Structural. Developers can start the hardware design process with a simple behavioural description without providing architectural details such as timing and architecture-specific resources. Next, the high-level synthesis

54

tool is used to derive an architecture with the aid of high-level annotations to guide the transformation process. This way, developers can rapidly obtain an optimised solution without worrying about low-level hardware details. • Structural to Structural. Once developers have created the first implementation, they can systematically improve design performance or find the best tradeoff between resource area and execution time by modifying constraint parameters, running the source-level transformation process and verifying the performance of the generated design. Alternatively, they can manually modify and fine-tune their designs without computer intervention. • Structural to Behavioural. Structural designs can be very difficult to understand. This is because the high-level design and data-flow becomes obscured by the implementation specific details of resource-binding and timing. In this case, a generic C description can be automatically produced without including low-level details, so that is easier to read, modify, verify and subsequently optimise.

2.3

Hardware Description and Interpretation

The Haydn-C hardware description language has been developed to support the above methodology. Haydn-C is based on the Handel-C language [4], but contains significant differences. It shares the same subset of ANSI C, such as assignment, conditional, and loop statements. Like Handel-C, it adds the par block to express explicit parallel computations, and flexible word-length sizes when declaring variables. However, unlike Handel-C, Haydn-C is a component-based language, like VHDL and Verilog: rather than declaring functions, developers write components with an explicit port interface. We believe that this feature makes it easier to import and export library blocks (such as IP cores), and work with other HDL tools. Haydn-C also supports other hardware constructs, such as the pipe and wire data structures, which carry data across pipeline stages and within a combinatorial cycle respectively. Furthermore, it also provides a source annotation facility to control the source-level transformation process, such as describing resource and schedule constraints, the type of transformation, and also the code to optimise. There are two sets of interpretation rules applied to a HaydnC description. The first, the structural interpretation is based on Handel-C timing semantics [4], and maps all operations in the design to a fixed time order. Because hardware synthesis and simulation processes adhere to the structural interpretation rules, developers are able to exert control over the quality of their designs. In other words, users can derive the schedule for a design, and change the schedule by revising the design. For instance, in Fig. 1(c), when operating under the structural interpretation rules, the code is mapped to a fully pipelined schedule, where every operation executes concurrently, and completes execution in one clock cycle. This is because according to the structural interpretation rules all statements in a parallel block start execution simultaneously and each assignment statement executes in a single clock cycle. Note that we define the initiation interval (II) as the number of cycles required to output a result after the previous result. Hence, the design in Fig. 1(c) has an initiation interval of 1. If this code is restructured as shown in Fig. 1(e), the new description is now mapped to a pipelined structure that shares an adder and produces a result every other cycle (II=2). In this case, operations in step 1 are executed in parallel, followed by operations in step 2. While we have saved one adder, we generate a result at half the rate of Fig. 1(c).

(b)

(a)

abstraction a

behavioural description SQRT

{

c

b

s = SQRT(a); y = (s + b) * (c + d);

abstract code generation

}

behavioural interpretation

d

+

+ *

unscheduler

DFG

y

behavioural interpretation a

b

c

d

unscheduler

sqrt_v4

par { { sqrt_v4.in(a); adder_v4[0].in(sqr.res, b); } { par {t0 = adder_v4[0].res; adder_v4[0].in(c, d); } mult_v4.in(t0, adder_v4[0].res); } { t1 = mult_v4.res; structural y = t1; interpretation } } a

a

adder_v4 [0] adder_v4 [1]

1

mult_v4

SQRT

b

y par { sqrt_v4.in(a); adder_v4[0].in(sqr.res, b); adder_v4[1].in(c, d); mult_v4.in(adder_v4[0].res, adder_v4[1].res); y = mult_v4.res; }

fully pipelined design (II=1)

DFG

c

*

d

1

sqrt_v4

d

+

+

structural interpretation

c

b

adder_v4 [0]

y

mult_v4

constraints

scheduler

scheduled code generation

high-level synthesis

(d)

(c)

2 1

adder_v4 [0]

y

2 1 2 step control

pipelined design (II=2) with resource sharing

(e)

Figure 1: Example of design modelling and source-level transformation using the Haydn approach. In order to perform source-level transformations, we relax the structural interpretation rules, which would otherwise be difficult to satisfy. In this case we follow the behavioural interpretation rules, which map a Haydn-C description into a dataflow graph (DFG). The DFG captures the behaviour of a design by representing program operations and their dependencies. Unlike the structural interpretation where operations are assigned to a fixed time-order, operations in a DFG are partially ordered by their dependencies. An example of a DFG is shown in Fig. 1(b). A Haydn-C description can reference operators and resources. Operators refer to abstract operations, whereas resources are associated to architectural implementations. In Fig. 1(a), we use the SQRT operator to represent the square root operation without committing to a particular implementation, and thus, the design cannot be synthesised to hardware. In contrast, the code in Fig. 1(c) can be synthesised to hardware because it contains the resource reference sqrt_v4, which is a library block targeting the Virtex-4 device that implements the square operation. The key point is that developers have the option to use operators or resource references in their Haydn-C descriptions according to the abstraction level they wish to program. In our approach, we define a Haydn-C description as behavioural (Fig. 1(a)) when the code is sequential (does not contain parallel blocks) and does not reference resources. On the other hand, the code is considered structural when no operators are used, and therefore the design can be synthesised to hardware (Fig. 1(c) and Fig. 1(e)). Note that resource and operators can share the same name. In this case, it is possible that a behavioural description can also be structural.

55

2.4

Source-Level Transformations

The source-level transformation process receives a Haydn-C description as input, and produces a new restructured Haydn-C design. There are two types of transformations: high-level synthesis (HLS) and abstraction. The HLS transformation (Fig. 1(d)) transforms a behavioural or structural Haydn-C description into a structural Haydn-C design. For instance, the code in Fig. 1(a) or Fig. 1(e) can be automatically restructured to produce the code shown in Fig. 1(c) when given the instruction to generate a pipelined architecture with II=1, and allocated enough resources to fully parallelise the design. Similarly, the HLS process can derive the code shown in Fig. 1(e) from either Fig. 1(a) or Fig. 1(c) when instructed to share an adder using a pipelined architecture with II=2. Note that the functionality of the code shown in Fig. 1(e) is not completely obvious, and therefore it can be difficult to maintain if we wish to correct or update the algorithm. For this purpose, we use the abstraction transformation (Fig. 1(b)) to transform the structural design (Fig. 1(e)) into a behavioural description (Fig. 1(a)) in order to better expose its functionality. In particular, the abstraction process sequentialises the parallel program, and converts all resource references to operators. To support HLS and abstraction, the source-level transformation process is composed by two modules: the unscheduler and the scheduler. The unscheduler module generates a dataflow graph from a Haydn-C description using the behavioural interpretation rules. The scheduler, on the other hand, performs the reverse: takes as input a dataflow graph, binds each operator to a resource, and places each resource in a time-order. The code generated reflects that new schedule. The HLS process (Fig. 1(d)) combines both unscheduler and scheduler modules. The abstraction process (Fig. 1(b)), on the other hand, relies only on the unscheduler module.

1

a (asset price)

ST=(13,5) 12 10

(9,4) (8,2)

8

(7,1)

(7,3)

6 S0=(5,0) 4

a

(13,5) (10,5) (7,5) (4,5) (2,5)

13 10 7 4

2 3

2

6 7

2

0

1

2

3 t (time)

4

5

2

E(A ) = Σa /n - E(A) = 67.6 - 51.84 = 15.76

design garch_walk { in bit 1 rng_load_enable ; in bit 1 rng_load_data ;

5

8 10

E(A) = Σa / n = (13+...+2)/5 = 7.2

2 0

(a,t)

11 12 13 15 16 17

in in in in

float float float float

out out out out

uint 12 out_data_t ; float 32 out_data_sigma ; float 32 out_data_eps ; float 32 out_data_v ;

32 32 32 32

parm_a0 ; parm_a1 ; parm_a2 ; parm_mu ;

in bit 1 in_valid ; out bit 1 out_valid ;

20

Figure 2: Example of path based simulation.

21

code { @HLS.run;

23

MODELLING

24

This section focuses on designing a GARCH asset path simulator in Haydn-C. In Section 3.1 we provide an overview of the Monte Carlo simulation framework, and Section 3.2 presents the parameterised behavioural description of the GARCH model.

3.1

uint 12 in_data_t ; float 32 in_data_sigma ; float 32 in_data_eps ; float 32 in_data_v ;

2

18

3.

in in in in

Monte-Carlo Simulation Framework

float float float float

26 27 28 29

32

• S = {(σ ∈

Suggest Documents