An Approach to Automatic Code Generation for Safety-Critical Systems Michael W. Whalen
Mats P.E. Heimdahl
Department of Computer Science and Engineering University of Minnesota fwhalen,
[email protected]
Abstract Automated translation, or code generation, of a formal requirements model to production code can alleviate many of the problems associated with design and implementation. In this report we outline the requirements of such code generation to obtain a high level of con dence in the correctness of the translation process. We then describe a translator for a state-based modeling language called RSML that largely meets these requirements.
1 Introduction
Incorrect, incomplete, ambiguous, and generally inadequate software system speci cations is one of the main sources of aws in safety-critical systems. Formal speci cation languages partially help us address these problems. When used correctly, such speci cations can completely and unambiguously de ne the expected behavior of the software system. Nevertheless, even if a formal requirements eort produces a correct speci cation, designing and developing production quality code from the speci cation can be a time-consuming and error-prone process. Automatically generating the production code from a speci cation alleviates many of the problems in system development. If correct, automated translation guarantees that the behavior of the production code is consistent with the formal speci cation. There are several commercial systems that are based on this idea and translate a formal or semi-formal speci cation to executable code, for instance, Statemate from I-Logix [5], the Rose tools from Rational Corporation, and SCADE from Verilog [3]. There are, however, various reasons to distrust any code generated from a formal or semi-formal speci cation. First, the speci cation language and/or
target language may lack formal semantics. Second, the translation may not be formally de ned. Finally, the translation tool may be incorrectly implemented. Consequently, the full bene ts of code generation are not realized in safety-critical systems since the generated code cannot be trusted and must be subjected to the same expensive validation and veri cation as hand generated code. In our work we are attempting to devise an approach where code generation from high-level speci cations can be used to satisfy customer quality, regulatory, legal, and ethical requirements imposed on safety-critical applications. Our aim is to translate a formal speci cation expressed as a hierarchical state machine to source code expressed in some imperative language. In particular, we are working with a formal speci cation language called RSML [7]. Our code generation approach is based on the denotational semantics of both RSML and a simpli ed imperative target language (SIMPL), which are used to prove that the translator correctly implements the RSML semantics in SIMPL. SIMPL is a subset of several popular imperative languages (for example, C, C++, and Ada) so that we can trivially map a SIMPL program to the true target language.
2 Code Generation Overview
The goal of a high-assurance software development effort is to cost-eectively produce a well-understood, well-documented software system that works correctly in its target environment. The goal of code generation is to increase productivity and quality by directly deriving production code from the formal model. If the code generation is convincingly correct, the veri cation and validation (V&V) performed in the speci cation stage can be leveraged in V&V of the code for further cost savings. This work has been partially supported by NSF grants To provide the level of con dence in the code genCCR-9624324 and CCR-9615088, and University of Minnesota eration approach needed for its use on safety-critical software in a regulated and litigious society, we have Grant in Aid of Research.
de ned a set of requirements of the translation in a These languages have been successfully used in large industrial projects, including Airbus ight softprevious paper [11]. ware. As synchronous languages can be transformed into nite state automata, very ecient code can be Requirement 1 The source and target languages must have formally generated from the models. well de ned syntax and semantics. Synchronous programming languages are not, however, designed to be used as speci cation languages. Requirement 2 They are, instead, an improvement on standard imThe translation between a speci cation expressed in a perative languages for implementation of reactive syssource language and a program expressed in a target tems. Also, although it is possible to create a very language must be formal and proven to maintain the ecient automaton for the generated code, the genermeaning of the speci cation. ated automatons bear little resemblance to the original program, so it is dicult to read the generated Requirement 3 code, trace it back to the original program, and inThe implementation of the translator must be for- dependently verify that the translation was done cormally veri ed to con rm it correctly implements the rectly. translation.
Requirement 4
3 The Approach
The implementation of the translator must be rigorously tested and treated as high-assurance software. Here we outline our approach to code generation for systems and how we address the reA formal translation mathematically describes how safety-critical quirements outlined in the previous section. The forto correctly convert a speci cation to a target pro- mat of the short paper a very brief disgram, but it does not provide us with a useful soft- cussion. The interested necessitates reader is referred to [11] for ware artifact. The implementation of the translator more details. must be proven to correctly implement the abstract translation described in Requirement 2. In addition, since the proofs most likely will be rather complex Source and Target Language: RSML was develand possibly erroneous, rigorous testing of the trans- oped as a requirements speci cation language speciflator is also needed to provide an additional level of ically for embedded systems and is based on David con dence in its correctness. Harel's Statecharts [4]. An RSML speci cation consists of a state hierarchy Requirement 5 and collections of transitions, variables, interfaces, The generated code must be well structured, well doc- functions , macros, and constants. umented, and easily traceable to the original speci - The State Hierarchy is organized as a tree of states. cation. At the lowest level of the tree are the atomic states, which have no children, and are analogous to states Finally, to provide the highest level of con dence in traditional nite state machines. If states have that the generated code is correct and to satisfy some children, the children are organized into equivalence regulatory agencies, the structure of the code must al- classes. These classes exist in parallel low independent means of veri cation, such as man- with one anotherequivalence and are used to model parallel or ual inspections and testing. concurrent parts of a system. If a state has children The work outlined in this short paper is based organized into more than one equivalence class, it is on several existing compilers and code generators for considered a parallel state. If it has only one equivalence class, it is considered a compound state. various formal speci cation languages. High Integrity Compilers: High integrity com- Transitions in RSML control the way in which the pilers are concerned with creating a provably correct state machine can move from one state to another. mapping between an imperative language and an as- Variables in the speci cation allow the analyst to sembly language for safety critical systems. Some record the values reported by various external sensors of the requirements for provably correct code gen- (in the case of input variable) and provide a place eration are originally from research in developing a to capture the values of the outputs of the system prior to sending them out in a message (in the case high-integrity compiler [10, 9]. Synchronous Programming Languages: Syn- of output variables). chronous programming languages such as Esterel [1] The semantics of the basic constructs of RSML and Lustre [2] are programming languages with ex- (constants, variables, transitions, functions and tensions to support abstract parallelism and control macros) are in our work speci ed using denotational structures based on events. Synchronous languages semantics. Denotational semantics de ne a language have a formal model, and provide some mechanisms by assigning it a mathematical meaning. Each construct within the language is assigned a meaning for formal veri cation.
function, that given an initial state, returns an updated state based on the semantics of the construct. The denotational semantics of the basic constructs provide the building blocks for the behavior of the whole speci cation. The behavior of an RSML machine is created by assembling the transitions (the highest level blocks), into the next-state relation. Each transition can be thought of as a partial function that is de ned if its guarding condition is satis ed. These transitions are grouped by their trigger event and by the source state of the transition. We can thus de ne the complete behavior of an RSML state machine. For additional information on the RSML semantics the interested reader is referred to [6, 11]. Many speci cation languages have fully formal semantics; an imperative target language with a manageable formal semantics, however, is dicult to nd. We are primarily interested in imperative languages since most functional or logic languages would be unacceptable for safety-critical embedded systems; they make extensive use of recursion and dynamic memory allocation-two features highly undesirable in critical applications. The denotational semantics for SPARK-Ada (a subset of Ada used for safety-critical applications) requires over 500 pages of Z [8]. A similar eort to formalize Modula-2 also required several hundred pages [9]. The likelihood that such voluminous language semantics de nitions are correct is small. Thus, to provide rigorous arguments for the correctness of a translation, it is not feasible to use any of the existing general purpose programming languages as a target language. Instead, one must target a much smaller language by removing complicating features such as sequencers (i.e., statements that cause unusual transfer of control, for instance, goto, break, and continue), and pointers, that do not belong in safety-critical applications. The language, SIMPL (for Safety-critical IMPerative Language), contains constructs for variables, constants, functions, procedures, basic and composite (array and record) types. Since SIMPL corresponds to a strict subset of several languages, it is a straightforward process to translate the SIMPL code to a commercial imperative language.
Formal Translation: With formal semantics for our source and target languages, it is possible to create a translation between them. Since the basic constructs in RSML such as variables, types, and userde ned functions have a denotational meaning, it is possible to directly perform equivalence proofs between these entities in the RSML speci cation and their equivalents in SIMPL. However, the behavior of the RSML machine as a whole is not directly described by the speci cation. The behavior is described by the next state function, which is created from a speci cation by composing
the partial transition functions based on the structure of the state hierarchy. To verify the code generator, we must show that the denotation of next state function created by the translator is equivalent to the denotation of the next state function described in the RSML formal semantics. We do this by performing structural induction on the state hierarchy as stored in the denotational environment created when traversing the RSML abstract syntax tree. Much of the mapping between the RSML speci cation and SIMPL is trivial; variables, types, and expressions have a 1:1 correspondence in RSML and SIMPL, and have the same denotation as well. However, there are also more challenging translations to be performed. Transitions are de ned to be partial functions in RSML, but are represented as functions in the SIMPL syntax, which must always be total. Therefore, we cannot directly prove equivalence between transitions in RSML and SIMPL. However, we can say that the behavior of a transition in SIMPL is equivalent to a transition in RSML if the RSML transition is de ned, and equal to the identity function otherwise. This weaker equivalence actually turns out to be sucient to prove normal equivalence of the next state function as a whole if the speci cation is a total function; since the next-state function is total, if the SIMPL and the RSML functions are equivalent wherever the RSML function is de ned, then they are totally equivalent.
Provably Correct Translator: In an ideal world,
the veri ed design of the code generator would be implemented using a validated compiler in a provablycorrect language that provides a high level of abstraction. To the best of our knowledge, there are no languages that match these criteria. The next-best choice is to use well-known and well-understood features in a high-level language with a rigorously validated compiler. Languages such as Ada and Prolog can be used for this purpose. Since current state of the art is unable to extend formal proofs to such endeavors, it is imperative that the implementation is as simple as possible, and is transparently based on the formal translation discussed in the previous section. The simpler and clearer the mapping between the formal semantics and the implementation of the translator, the easier it is to make formal correctness arguments regarding the translator. We have implemented a prototype code generator that, to the best of our knowledge, correctly implements a translation from RSML to C++. However, because the translation process was not fully formalized at the time of the development, and because the translator is a quite complicated piece of software, we cannot easily reason about its correctness. We are in the process of creating a new code generator that directly and transparently implements the translation process that we described in the previous section. Currently, our code generator does not satisfy Re-
quirement 3 and only partially Requirement 4 de ned mally de ned and proven to maintain the meaning of in Section 2. The translator currently under develop- an RSML speci cation in a SIMPL implementation. ment, however, will satisfy both these requirements. The generated code is clear, well documented, and traceable to the original speci cation so that third party tools, testing, and inspections can be used to Traceability and Clarity of the Target Code further the correctness of the code with reAlthough the formal arguments generated above pro- spect to verify the formal model. Currently, the only weak vide a high level of assurance that the software will link is the translator that has not been foroperate as intended, they are not infallible. Consid- mally veri ed. We are,itself however, in the process of ering the complexity of formal semantics and the vol- re-implementing our translator a provably correct ume of proof required to verify the translation and, to manner. Since the semantics ofinRSML and SIMPL a greater extent, the translator, it is possible that er- have been de ned with this implementation and proof rors have been introduced and some of the proofs are in mind, we believe this will be a manageable task. incorrect. The output must be readable in order to eectively test and inspect the generated code (and thereby test the code generator). It is also possible that the user may want to instrument the code in some fashion to perform further analysis. If the code [1] Gerard Berry and Georges Gonthier. The esterel synis not readable, these analyses would not be possible. chronous programming language: Design, semantics, Although we are still in the process of implementimplementation. Science of Computer Programming, ing a fully formal code-generation process, the out19(2):87{152, 1992. put from our current code generator is designed with [2] N. Halbwachs, P. Caspi, P. Raymond, and D. Pireadability and testability in mind. The current code laud. The synchronous data ow programming langenerator produces C++ code, but it uses only a safe guage lustre. Proceedings of the IEEE, 79(9):1305{ subset of the language, corresponding to the SIMPL 1320, September 1991. language. Speci cally, it uses no dynamic memory, no pointers, no recursion, and no sequencers (i.e., [3] N. Halbwachs, P. Raymond, and C. Ratel. Generating ecient code from data- ow programs. In continue or goto) that would otherwise disrupt the Third International Symposium on Programming program ow of control.
References
4 Summary and Conclusion
[4]
Code generation holds the promise of eliminating much of the time and eort required to design and [5] implement safety-critical software, while at the same time eliminate errors introduced in these stages of development. However, without stringent guidelines on the translation, the implementation of the translator, and the structure of the output, this promise will not be realized because the generated code cannot be [6] trusted. In this paper, we brie y discussed a minimum set of requirements for creating a code generator that is t to produce code for a safety-critical system. We then outlined our approach to code generation [7] from the speci cation language RSML. The target language, SIMPL, was designed as a strict subset of many imperative languages used for development of safety-critical systems. By using SIMPL, we can sim- [8] plify the equivalence proofs for the formal translation and also target multiple implementation languages in [9] a separate, very small, and easy to implement translator. [10] Our whole approach is designed with correctness and acceptance by regulatory agencies in mind. Consequently, we have taken great care in satisfying the set of requirements that a code generator for [11] critical systems must satisfy. Our source language (RSML) and target language (SIMPL) both have formal denotational semantics and the translation is for-
Language Implementation and Logic Programming, Passau (Germany), August 1991. D. Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming, pages 231{274, 1987. D. Harel, H. Lachover, A. Naamad, A. Pnueli, M. Politi, R. Sherman, A. Shtull-Trauring, and M. Trakhtenbrot. Statemate: A working environment for the development of complex reactive systems. IEEE Transactions on Software Engineering, 16(4):403{414, April 1990. Mats P. E. Heimdahl and Nancy G. Leveson. Completeness and consistency in hierarchical state-base requirements. IEEE Transactions on Software Engineering, pages 363{377, June 1996. N.G. Leveson, M.P.E. Heimdahl, H. Hildreth, and J.D. Reese. Requirements speci cation for processcontrol systems. IEEE Transactions on Software Engineering, pages 684{706, September 1994. Program Validation Ltd. Formal Semantics of SPARK. Program Validation Ltd., 1998. Susan Stepney. High Integrity Compilation. Prentice Hall, 1993. Susan Stepney. Incremental development of a high integrity compiler: Experience from an industrial development. In Proceedings of the IEEE High Assurance Systems Engineering Workshop, 1998. M.W. Whalen. Provably correct code generation for safety-critical systems. Master's thesis, University of Minnesota, in preparation.