On ASM-Based Speci cation of Programming Language Semantics and Reusable Correct Compilations Andreas Heberle and Welf Lowe IPD, Universitat Karlsruhe, 76128 Karlsruhe, Germany, fheberle,
[email protected]
Abstract. We de ne general transformations on ASM speci cations of programming language semantics. These transformations preserve the semantics of the programming language and can thus be used for the de nition of correct compilations. Additionally, we de ne an extensible language AL for the speci cation of dynamic programming language semantics and describe how this allows reuse of veri ed transformations. Together with a library of object-oriented veri ed implementations this leads to a framework for the construction of correct compilers based on the formal speci cation of source and intermediate language.
1 Introduction Abstract state machines are well suited for the operational speci cation of dynamic programming language semantics. They were used for the speci cation of C ([GH93]), C++ ([Wal94]) and Java ([BS98]) or with Montages for the speci cation of Oberon ([KP97a])and SQL ([DiF97]). For the veri ed construction of compilers the formal speci cation of source and target language semantics is the basis of correctness proofs. These proofs show that transformations of a program preserve the behavior. We introduce general transformations of operational speci cations of programming language semantics which preserve the observable behavior. They can be used to de ne correct program transformations of a compiler. Furthermore we de ne speci c semantics transformations for the compilation of imperative programming languages. Nevertheless, the veri ed construction of compilers remains a complex and time consuming task. The reuse of veri ed compilations would lead to a further simpli cation of the whole process. However, up to some restrictions, the speci er is free in the choice of the language that de nes the ASM. Thus, generally no two ASM speci cations relate to each other. Restricting the language that de nes the ASMs (we will call it AL which stands for ASM-de ning Language) allows to relate two speci cations (or even arbitrary many) that are de ned independently of each other. This relation allows the reuse of transformations. Together with an object-oriented library of transformation implementations, the language AL, the set of general transformations,
and the set of speci c transformations build a framework for the construction of realistic veri ed compilers. The paper is organized as follows. In section 2 we discuss related work, describe the basic scheme for the speci cation of operational semantics, and de nes our notion of compilation correctness in terms of ASMs. Section 3 shows general transformations of ASM speci cations of programming language semantics. In section 4 we introduce our speci cation language for dynamic semantics speci cation of imperative languages, we discuss how AL allows the relation of independent source and intermediate language speci cations, and de ne speci c transformations for the compilation of imperative source languages. Finally, section 5 concludes the results and shows directions of future work.
2 Basics and Related Work We use abstract state machines (ASM), [Gur95], as the basic formalism to de ne the dynamic semantics of programming languages. An extension which allows the speci cation of complete programming languages including context free syntax, static and dynamic semantic are Montages [KP97b]. Montages allow the generation of syntactical and semantical analysis. They de ne a rst restriction on the de nition of ASMs since they require a program counter and a successor function on tasks. Beside this, the user is free in the speci cation of dynamic semantics. In this paper we further restrict ASMs and focus on speci cation and transformation of dynamic semantics. This gives us the possibility to reuse existing transformations. The complete language speci cation including syntax and static semantics looks in our framework like a Montages speci cation. But, the intepretation is dierent. We interprete the static part of the Montages attribute grammar where in Montages the semantics of the static part is de ned by an ASM. We do not need the rule part of Montages because we only allow prede ned tasks with xed semantics in the static part. The interpretation of syntax and static semantics as an attribute grammar is similar to the approach with the MAX system, cf. [PH97] where occurrence algebras together with ASMs are used to speci y programming languages. The focus of the MAX framework is not the construction of veri ed compilers but the prototyping of realistic programming languages and the generation of language-speci c software. Therefore veri cation issues are not considered.
2.1 Operational Semantics of Programming Languages Existing ASM speci cations of dynamic semantics base on the same principle. Programs are represented by a task graph where tasks model the functionality of syntactic programming language constructs. Tasks de ne dierent sorts of the ASM. Additionally, the ASM de nes a program counter (usually ct for current
task) and functions which model an abstract memory, de ne a run time stack, and hold static information, e.g. about types or variables. For each task, the ASM de nes a transition rule, i.e. the dynamic semantics. The edges of the task graph de ne data and control ow. They are modeled by functions (e.g. nexttask represents the control ow, i.e. the next task which is executed after the current task). The abstract representation of the program together with static information gained during semantic analysis are part of the initial state of the ASM. W. l. o. g. we assume that an ASM rule which describes the dynamic semantics of a task T has the following form:
if T(ct) ^
cond Updates ct := ...
then
endif
We call a task a ground task if, beside the update of the program counter, it updates at most one dynamic function (Updates == lhs := rhs ).
2.2 Correctness of Compilations The operational semantics of a programming language L de nes an ASM A. The program is part of the initial state of the ASM. The state transitions are based on the instructions of L or of the program, respectively. In general, not all state transitions of A are observable from outside. An observer can not distinguish runs of two dierent programs as long as they show the same input/output behavior. For our purposes it is sucient to assume that only events are observable which read an input of the environment or write an output to the environment. We model these events by input and output streams. Thus, observable behavior can be modeled by merging all states where the following state transition does not change the interpretation of the input or output stream, see gure 1. A compilation of a program 1 2 L1 to a program 2 2 L2 is correct if 1 and 2 show the same observable behavior. The correctness bases on simulation of ASMs, i.e., A2 (2 ) has to simulate A1 (1 ), in a sense similar to simulation in complexity and computability theory. The relation in gure 1 maps injectively the observable parts of the states of the target program to the observable parts of the states of the source program. For every observable behavior of the target program there must be a corresponding behavior of the source program. can be implemented by a relation which is compatible with , i.e., any states can be related whose observable states can be related by . A detailed discussion of observable behavior and our notion of compiler correctness can be found in [GDG+ 96] and [ZG97].
i
q1 ρ
ρ
q
q2
I/O
3
q I/O
4
ρ
ρ
ρ
ρ
ρ I/O i’
q’1
I/O q’
q’2
3
q’4
Fig. 1. Simulation of observable behavior
3 General Transformations on ASM Speci cations In this section we discuss transformations of operational semantics speci cations.1 We de ne some general theorems on transformations which can be used to prove the correctness of transformations and compilations, respectively. The ASM semantics de nes that updates of a rule are executed in parallel. There exist dependencies on updates if for example the new interpretation of a term l1 depends on the old interpretation of a term l2 which is also updated within the same rule.
De nition 1. Let READ be the set of functions that are read, i.e. interpreted, in a rule R and UPDATED be the set of functions that are rede ned, i.e. updated, in R. A function l 2 UPDATED depends on the interpretation of a function r 2 READ i l is rede ned in an update u 2 R and r is interpreted in u. This relation is denoted by r l, the union of dependencies of all updates in a rule R by , and its transitive closure by . Let l1 2 UPDATED be the function updated in update u1, and READ 2 READ be the set of functions interpreted in an update u2 . u2 is independent of u1, denoted by u1 6 u2 , i :9(l1 ; r2 ); r2 2 READ 2 : l1 r2 : u
R
R
R
3.1 Sequentialization of Updates During the compilation of a source language to an intermediate representation we replace complex source language constructs by implementations which use simpler intermediate language constructs. The rst theorem is the basis for splitting complex tasks. 1 To avoid confusion with transformations of programs as part of the compilation
process we call the latter compilations.
Theorem 1. Let R be a rule that updates at the most one observable function, R1; R2 be the sequential execution of rules R1 and R2. R1; R2 simulates R if 8u 2 R ; i 2 f1; 2g : u 2 R 8u 2 R : u 2 R1 ^ u 62 R2 _ u 2 R2 ^ u 62 R1 8u 2 R1 : 8u 2 R2 : u 6 u
(1) (2) (3)
i
i
j
i
j
(1) and (2) guarantee that the two sets of updates in R1 and R2 , respectively, de ne a partitioning of the set of updates of R. (3) guarantees that concurrent read of all functions in R and subsequent update equals the read of functions in R1 and R2 , interleaved by updates of the functions in R1 and R2, respectively. Proof. Let R be applied in a state p and yield a state q, i.e. p ! q. In order to prove that this transition is simulated by p !1 q0 !2 q00 , we show that q = q00 . R
R
R
(1) and (2) imply that q = q00 , provided that the interpretation of the functions read in R1 and R2 , respectively, is the same as the interpretation of the corresponding functions read in R. This is obviously true for the functions in R1 , since both rules interprete the functions in state q. Furthermore, the interpretation of functions not updated by R1 is the same in q and q0 . By (3), all functions read in R2 are not updated in R1 . Therefore, the interpretation of the functions read in R2 is the same as the interpretation of the corresponding functions in R. In order to de ne the simulation relation we have to consider three alternatives: 1. R1 performs an observable update, 2. R2 performs the observable update, or 3. neither R1 nor R2 perform an observable update. Figure 2 shows the corresponding simulations. The case that R1 and R2 perform an observable updated is excluded by the assumption of the theorem. 1.
2. p
R
q ρ
ρ p
R1
q’
R2
3. p
q’’
R
p
q ρ
ρ p
R1
q’
R2
q
R
ρ
q’’
Fig. 2. Simulation of R by R1 ; R2
p
R1
q’
R2
q’’
3.2 Composition of Tasks Sometimes the machine oers more complex tasks than the intermediate language, e. g. for memory access. Therefore a theorem which de nes the introduction of more complex tasks is useful. Using the sequentialization theorem we are able to de ne a composition theorem for tasks. But, we have to deal with restrictions introduced by the observable behavior. Our de nition of compiler correctness requires that the mapping of states has to map the states of the target machine injectively to the states of the source machine. Let R perform the same updates like R1 ; R2 . Then, we distinguish four cases: 1. R1 changes an observable function, the execution of R2 is not observable: q !1 q0 is observable from outside because an observable function is updated. q0 !2 q00 does not change an observable function. q0 and q00 can not be distinguished. Therefore q ! q00 is a simulation of q !1 q0 !2 q00 . 2. R2 executes the observable update, the updates of R1 is not observable: see 1. 3. The updates of R1 and R2 are not observable: the execution of R1 before R2 can not be distinguished by the execution of R. 4. R1 and R2 execute observable updates: the transition q !1 q0 as well as the transition q0 !2 q00 is observable. Thus, q; q0 , and q00 can be distinguished. Since R executes the updates of R1 and R2 together there does not exist any state p which relates to q0 ; the execution of R has a dierent observable behavior than the execution of R1 ; R2 . Figure 3 shows 1-3 graphically. R
R
R
R
R
R
R
1. q ρ
2. q’
q’’ ρ
q
q’’
3.
q
q’ ρ q
q’’
q
q’ ρ
ρ q’’
q’’
q
q’’
Fig. 3. Simulation of R1; R2 by R
Theorem 2 (Composition). The sequential execution of rules R1 and R2 is simulated by the execution of a rule R i R1 ; R2 is a correct sequentialization of R, cf. theorem 1.
Proof. The construction of R from R1 and R2 , using theorem 1, assures that R and R1 ; R2 perform the same updates and q !1 q0 !2 q00 , q ! p implies q00 = p. Furthermore theorem 1 excludes case 4. Thus R simulates R1 ; R2 de ning as shown in gure 3. R
R
R
Fig. 4 shows the transformation to compose two tasks. Based on ground tasks, we are able to de ne more complex tasks inductively. The corresponding compilation replaces all occurrences of the sequence T1 ; T2 by T1 2. ;
if T1 (ct) then
Updates1 ct := ct2 --
endif if T2 (ct) then
ct2
2
if T1 2 (ct) then
T2
;
=)
Updates2 Proceed
Updates1 Updates2 Proceed
endif
endif
Fig. 4. Composition of Tasks
3.3 Specialization of Tasks Another purpose of intermediate representation is the specialization of tasks. This means that static semantics properties are explicitly represented in the program. I.e. if the not -operator of the source language is overloaded, the semantics depends on the type of the argument. The transition rule for the corresponding not -task re ects this property by a conditional statement. Figure 5 shows the usual de nition of not and the corresponding intermediate code implementation.
if Not(ct) then if (ct.arg).type = BOOL then ct.value := : (ct.arg).value else =) ct.value := ~(ct.arg).value endif endif
if BooleanNot(ct) then ct.value := : (ct.arg).value endif if Complement(ct) then ct.value := ~(ct.arg).value endif
Fig. 5. Specialization of Tasks
Theorem 3 (Specialization of Tasks). If the updates of a task t depend on n values of a static function f the task can be replaced by n specialized tasks t1; : : : ; t which de ne updates depending on the possible values of f . The observable behavior of two programs and 0 can not be distinguished if 0 results from by just replacing each occurrence of t by the specialized task t . n
i
Proof. The specialization theorem preserves the observable behavior because it assures that the same updates are performed. Just the representation of the tasks becomes more specialized.
3.4 Elimination of Indeterminism in Control Flow Another important transformation is the elimination of indeterminism. Source languages often de ne an indeterministic evaluation order, i. e. for parameters. The target machine computes deterministically. Therefore the indeterminism has to be eliminated on the way from source to target language. It is common sense that choosing one order out of a set of possible orders is a correct implementation of indeterminism. The following theorem represents this strategy.
Theorem 4 (Elimination of Indeterminism). If a task t1 de nes an indeterministic execution order it can be replaced by a task t2 which chooses deterministically one execution order out of the set of valid execution orders of t1. Proof. Since t1 allows every order out of the set of valid orders it allows especially the order of t2 thus the implementation of t1 by t2 is correct. Remark 1. Theorem 4 should be used carefully. Information about indeterminism in execution order should be preserved as long as possible because it is the basis for various kinds of optimizations.
4 Reuse of Veri ed Transformations Our goal is a general framework which simpli es the construction of correct compilers. The general transformations schemes of section 3 simplify the de nition of correct compilations but they do not allow the reuse of veri ed transformations or compilations. Up to now, the speci cation of programming languages is too liberate because the speci er of a language is free to choose the language that de nes the ASM. As a consequence, transformations and compilations depend on the combination of source and intermediate language and the ASMs that they are de ned with. Restricting these ASMs, by restricting the constructs to de ne an ASM, allows us to relate independently de ned source and intermediate languages.
We describe a possible syntactical and semantical instance of this speci cation language and show how this language can be used to reuse transformations. We call this language AL; it is de ned in the appendix A. Using the transformations from the previous section, we additionally introduce special transformations that simplify the speci cation of imperative source and intermediate languages.
4.1 The ASM De nition Language AL The presented de nition language AL is one possible language that meets the following requirements2: { In order to de ne the semantics for each program of an imperative language, we need constructs to model various data types, memory, control and data
ow. { In order to be as less restrictive as possible in this de nition, we need the ability to de ne indeterminism, e.g. in evaluation order of parameters. Additionally, the level of abstraction should be convenient for specifying source as well as intermediate languages. We have to ensure that a compilation from AL to IR is complete. If source and intermediate language are de ned independently, we need at least a transformation for each AL construct to intermediate code to guarantee a complete transformation. Therefore the language should be small.
Types, Values, and Memory: It should be possible to specify the usual types
of imperative (object-oriented) programming languages in terms of AL. Therefore, we need at least the basic types Int, Real 3 , and Bool and a constructor for structured types. A set type is general enough to specify all kinds of structured types but is still simple enough to compile it to intermediate code without any problems. Names are represented by strings of nite length. Pairs of (name, type) describe declarations. We allow name to be an integer value because this is useful for the speci cation of arrays. Instances of types are values. The memory of AL is considered to be an unstructured collection of objects which is initially empty. Operations on the memory are speci ed explicitly by Create, Set, and Get. These functions exist for each type T 2 Type . Objects are accessed by a symbolic reference. This is close to existing intermediate languages and simpli es the transformation. 2 There may exist other languages that are complieed with these requirements. We do
not consider AL a universal compiler language.
3 Int and Real are generic data types. A conversion procedure from one valid instance
to another can be de ned depending on the parameters of the two instances.
Tasks: In this section, we introduce only tasks which are used in the following to de ne speci c transformations. A complete formal description of the tasks in AL is given in the appendix. We already mentioned the basic tasks for types. Create extends the memory by a reference which describes an object. This reference is then used to access the object. The following rule describes the dynamic semantics of Create. if
then
Create(ct) BasicType(ct.static_type) Reference r r.static_type := ct.static_type r.value := ct.value(reclevel) := r
if
extend
with
?
endextend else extend Reference with
r r.static_type := ct.static_type ct.value(reclevel) := r
do 8 t 2 Parts (ct ) r.(t.selector) := enddo endextend endif proceed endif
t.value
Set and Get use the reference to change or read the value of an object. They are speci ed in the appendix. From the theoretical point of view it is sucient to de ne a goto language with an in nite memory because such a simple language is Turing complete. For the speci cation of languages this is not wanted but it prevents the discussion about the expressive power of AL if we de ne at least a construct similar to goto in imperative languages.
if
One(ct) then choose t 2 Alternatives(ct) satisfying t.key=ct.condition.value(reclevel) if t 6= ? then ct := t else proceed endif endchoose
endif One branches control ow according to the value of a key attribute. This represents the case construct of imperative programming languages. All is another special task which is used to model indeterminism in control ow.
if
then j then do 8 t 2 Parts (ct ) NotExecuted(t) := true enddo proceed else choose t satisfying t 2 NotExecuted(Parts(ct))
All(ct) NotExecuted(Parts(ct)) = 0
if j
NotExecuted(t) := false ct := t
endchoose endif endif
All executes a list of subtasks in arbitrary order. A task is computed completely before the next task is started. We use this construct to model for example indeterminism in evaluation of subexpressions or function parameters. Additionally, we use All represents freedom in control ow which is needed for eective optimizations. The language AL comprehends also tasks for calling a function or procedure (Call ) without parameters, it has tasks for returning from a procedure (Return ), and for read (READ ) and write (Write) operations. It is not necessary that AL provides a call with parameters because this can be modeled by an implementation in AL.
4.2 AL and the Reuse of Transformations In this section we describe how AL can be used to de ne dynamic semantics of programming languages. Then we show how this speci cation can be used to relate independently de ned source and intermediate language speci cations. Additionally, we discuss how this speci cation allows the reuse of transformations which simpli es the de nition of correct compilations and thus the construction of correct compilers. For the speci cation of a programming language we use a technique known as translational semantics speci cation. Usually, an ASM or another operational speci cation de nes the dynamic semantics of a language. In our approach the
dynamic semantics is de ned by a mapping : SL ! AL instead. Since there exists a formal speci cation of AL (by the ASM in section 4.1) de nes the semantics of SL indirectly. AL acts as an additional intermediate language during the compilation of a source language SL. AL separates the compilation process from SL to IR and can be considered as a semantic interface between source languages and intermediate representations. An additional compilation from : AL ! IR completes the compilation from SL to IR. The de nition of is independent of . Nevertheless, the use of AL as an intermediate level allows us to relate SL and IR without additional work. AL was designed to be as small and simple as possible because this reduces the complexity of the transformation . For the speci cation of source languages this causes problems. At least, it makes the speci cation of a language dicult because the language designer has to de ne complex transformations in terms of low level constructs. In fact, this disadvantage does not exist. We are able to prede ne more complex tasks based on AL tasks. These new tasks represent extensions of AL. Their semantics is also de ned by a mapping to AL. From the compilation point of view, these extensions introduce new intermediate languages AL+ . Consideration of existing compilers for imperative languages shows that the extensions introduced for a simpler speci cation of dynamic semantics represent intermediate languages which already appear in a real compiler. For example we would introduce more complex data types like arrays or records. The compiler maps these structures to simpler machine types. The de nition of the complex structure in terms of AL is exactly such a mapping. In the other direction, for the compilation of AL to a language which has for example basic blocks, we de ne a transformation to basic blocks, which is reusable for transformations to other intermediate representations. Remark 2. The de nition of AL extensions for the transformation into intermediate representation has additional reasons. A future goal of this approach is that intermediate languages are speci ed like source languages by a mapping 0 : IR ! AL which de nes the semantics of IR. Then we try to invert 0 and construct the transformation automatically. This is not always possible but we are able to introduce generic AL extensions for which the inversion exists already. This approach leads to a framework were the de nition of source and intermediate language allows the generation of a correct compiler. Furthermore, source to source transformations are possible if the source language ful lls some properties of intermediate representations.
4.3 Special Transformations and Compilations We discuss several examples of relevant transformations and give in each case a sketch for the veri cation of the transformation or de ne sucient requirements
that imply the correctness. A more detailed application of the methodology to a real life C like programming language together with examples of AL extensions can be found in [Heu98].
Mapping of Structured Data Types: The set data type in AL is quite general. Though it is not dicult to de ne a record type or an array type in terms of set. To introduce static arrays, we extend the type algebra of AL by static array : Int Int Type ! Type
The rst argument describes the lower index (l) of the array, the second argument describes the upper (u) index. The Type argument de nes the element type of the array. The creation of a new array object is de ned recursively. This means that we create a new set instance and create size = u ? l + 1 elements of the corresponding type. The following rule describes the new task CreateStaticArray :
if CreateStaticArray(ct) then if ct.counter = ? then extend Reference with r
r.static_type := ct.static_type r.value := r ct.value(reclevel) := r
endextend
ct.counter := 0 ct := ct.next_task %element_type.Create
else ct.value(reclevel).(ct.counter) := ct.next_task.value(reclevel) if ct.counter < ct.static_type.size - 1 then ct.counter := ct.counter + 1 ct := ct.next_task
else
?
ct.counter := ct := ct.rettask
endif endif endif
Multiple applications of theorem 1 and 3 would lead to a sequence of create tasks. First CreateStaticArray creates a set object and then it creates size objects of type element type and assigns the references to the corresponding selectors. The resulting task graph is the AL implementation of CreateStaticArray.
Compilation of Basic Types: Integer and oating point types in AL are generic. [HH97] de nes the semantics of the integer data type together with a generic transformation scheme between dierent integer data types. The de nition of oating point numbers is according to the IEEE standard. The de nition of a generic conversion routine remains to be done.
Control Structures: AL oers enough control structures to ful ll the requirements of a goto language. Together with an in nite memory AL is Turing complete. The de nition of the standard control structures is quite obvious. The semantics of a While task if
While(ct) then if ct.condition.value(reclevel) then ct := ct.truetask else ct := ct.falsetask endif endif
is implemented by a One task without further work. A more interesting transformation is the introduction of functions with parameters because this implies an extension of the run time system of AL. In general we are able to implement extensions of the run time system in AL itself. I.e. a parameter stack could be modeled as a stack structured data type together with some functions for modi cations of this stack. Another possibility for an extension of the run time system is the introduction of new dynamic functions with new tasks for the modi cation of these functions. Theoretically it is possible to derive the AL implementation of the tasks by multiple applications of the general theorems.
Splitting of One: One describes a case construct. It is common knowledge that a case construct can be implemented by a cascade of if-then-else constructs. Hence, we can transform One in the same way and implement it by a cascade of special One tasks with exactly two subtasks. Sequentialization of All: In paragraph 3 we discussed the transformation of indeterminism already. Thus, the mapping of All is the multiple application of the theorems 4 and 1. Memory Mapping and Address Calculation: Intermediate languages often have the address space of the target machine. Since AL uses symbolic addresses, we have to de ne a mapping of symbolic to relative addresses which ful lls the restrictions of the target machine, i.e. size of basic types and alignment. The correctness proof is quite complex but the mapping itself can be de ned generic. A de nition of the memory mapping together with a detailed correctness discussion can be found in [GZG+ 98].
5 Conclusions We presented general transformations of ASM based operational programming language semantics. The transformations preserve the observable behavior of a program and can thus be used to simplify the de nition and veri cation of compilations. Additionally, we established a language AL for the speci cation of programming language semantics. AL is powerful enough to specify properties of imperative programming languages. However, it does not claim to be a universal compiling language. Instead, we presented mechanisms to extend AL. The extensions come together with special transformations (cf. 4.3) whose correctness has to be proven by hand using the general or speci c theorems on transformations. Prede ned extensions simplify the speci cation of programming languages. This is reuse of speci c and veri ed transformations. Usually, source and intermediate language are de ned independently. The use of AL for the speci cation of dynamic semantics relates two possibly independent speci cations. The speci cation of the source language SL de nes a transformation : SL ! AL and the de nition of the intermediate language de nes a mapping : AL ! IR. de nes the transformation from SL to IR. The correctness of the complete mapping is established by the correctness of local transformations. The language AL, the general and the speci c transformations are parts of a framework for semantic based generation of complete correct compiler frontends. For the speci cation of complete programming languages we use a formalism similar to Montages . A Montage in our framework consists of three parts: the de nition of context free syntax, the speci cation of a mapping to AL constructs which de nes tasks, data and control ow information, and the de nition of the static semantics. In our interpretation, these parts de ne a special attribute grammar. We do not need the fourth part of original Montages which describes the dynamic semantics of tasks because we only allow AL tasks and functions which already have a formal semantics. For the implementation we use an objectoriented library of compilations. This library provides implementation for general and special transformations. Some of these implementations are generic, e.g. address mapping or the conversion of basic types. Currently, for each speci cation of an intermediate language IR, we have to de ne the mapping : AL ! IR by hand. It is our goal that the mapping which de nes IR in terms of AL can be inverted to generate the function automatically. This would lead to a fully automatic generator with language speci cations as input. Automatic inversion of a language speci cation requires a little bit magic because the speci cation of dynamic semantics looses static information needed for e.g. address calculation. If we de ne a mapping from IR addresses to AL addresses we do not need to say anything about alignment and type size. Though we need this information to de ne a mapping from symbolic AL addresses to real IR addresses. Beside real life case studies, e.g. for a C like
language ([Heu98]), the inversion problem is one of the main parts of our current work. Acknowledgements: This work is supported by the Deutsche Forschungsgemeinschaft project Go 323/3-1 Veri x (Construction of Correct Compilers). We are grateful to our colleagues in Veri x.
References [BS98]
E. Borger and W. Schulte. Programmer Friendly Modular De nition of the Semantics of Java. In J. Alves-Foss, editor, Formal Syntax and Semantics of Java, LNCS. Springer, 1998. [DiF97] B. DiFranco. Semantica Statica e Dinamica di SQL diretto (ISO 9075) mediante i Montaggi. Master's thesis, Universita di L'Aquila, 1997. In preparation (in italian). [GDG+ 96] W. Goerigk, A. Dold, T. Gaul, G. Goos, A. Heberle, F. von Henke, U. Homann, H. Langmaack, H. Pfeifer, H. Ruess, and W. Zimmermann. Compiler Correctness and Implementation Veri cation: The Veri x Approach. In Compiler Construction, volume 1060 of LNCS. Springer, 1996. Poster Session, International Conference on Compiler Construction 1996. [GH93] Yuri Gurevich and James K. Huggins. The Semantics of the C Programming Language. In LNCS, volume 702, pages 274{308. Springer-Verlag, February 1993. [Gur95] Y. Gurevich. Evolving Algebras: Lipari Guide. In E. Borger, editor, Speci cation and Validation Methods. Oxford University Press, 1995. [GZG+ 98] Wolfgang Goerigk, Wolf Zimmermann, Thilo Gaul, Andreas Heberle, and Ulrich Homann. Correct compilation of a while-language with parameterless recursive procedures. Technical report, IPD, Universitat Karlsruhe, 1998. [Heu98] Dirk Heuzeroth. Spezi kation und Veri kation von standardisierten Transformationen am Beispiel der U bersetzung der imperativen Sprache IS . Master's thesis, University of Karlsruhe, 1998. In preparation. [HH97] Dirk Heuzeroth and Andreas Heberle. Algebraische Spezi kation eines generischen Integer-Datentyps. Technical report, IPD, Universitat Karlsruhe, Oktober 1997. [IEE85] IEEE. Standard for Binary Floating-Point Arithmetic, Std. 754-1985. Technical report, ANSI/IEEE, 1985. [KP97a] P.W. Kutter and A. Pierantonio. The Formal Speci cation of Oberon. J.UCS, 3(5):443 { 503, 1997. [KP97b] P.W. Kutter and A. Pierantonio. Montages: Speci cations of Realistic Programming Languages. J.UCS, 3(5):416 { 442, 1997. [PH97] A. Poetzsch-Heter. Prototyping realistic programming languages based on formal speci cations. Acta Informatica, 34(10):737{772, 1997. [Wal94] C. Wallace. The Semantics of the C++ Programming Language. In E. Borger, editor, Speci cation and Validation Methods, pages 131{164. Oxford University Press, 1994. [ZG97] W. Zimmermann and T. Gaul. On the Construction of Correct Compiler Back-Ends: An ASM Approach. Journal of Universal Computer Science, 3(5):504{567, 1997.
A An ASM Speci cation for AL AL programs are task graphs where the nodes describe tasks and the edges describe control and data ow information. AL consists of sorts Type, String, Pair and Task.
A.1 Types, Values, and Memory The algebra Type describes the types of AL. Constants of sort Type are int, real, bool (basic data types), > (error type), and ? (unde ned type). The sort String de nes all strings of nite length. String is the set of all valid names. Pair relates names or selectors (Selector = String [ Index , Index int ) to types. This is necessary to express declarations. There are the following xed function symbols to construct complex types. func : String Type Task ! Type " : Type ! Type pair : Type Selektor ! Pair set : Pair Set ! Type func de nes a function type. The rst argument is the function name, the second argument describes the result type and the third argument points to the rst task of the function. " is used to de ne reference types. pair represents pairs, set is necessary to de ne record or array types. After static semantic analysis of AL programs each node of the program graph (or the AST) has an attribute static type (may be ?). Instances of types are values. The memory is an unstructured collection of objects which is initially empty. Objects are modeled by references. For each type T 2 Type ; T 6= ? and T 6= > there are functions: T :create : !" T T :set : " T T T :get : " T ! T The function T.create adds an instance, i.e. a value, of type T to the memory and returns a reference to this value. The function T.set takes a reference to an old value of type T and a new value and replaces, as side eect on the memory, the old by the new value. The function T.get takes a reference to a value of type T and returns the value. It has no side eects on the memory. The dynamic semantics of the operations is speci ed later on. Beside the error type and the unde ned type, AL de nes the basic types Int,Real and Bool. Int is a generic data type. A concrete instance of Int is de ned by the
functions: int :minint : ! N int :maxint : ! N int :arithmetic : ! Bool where minint and maxint denote the minimum and maximum integer of the Int. over ow de nes the arithmetic type (ring or over ow) of the instance. Hence Int is de ned as int : N N bool ?! type Additionally, Int de nes the usual unary and binary operations on integers. binary operation : Int Int ! Int unary operation : Int ! Int binary operation 2 f+; ?; ; pow ; div ; mod g and unary operation 2 f?; abs g. The type Real is de ned by the IEEE oating point standards [IEE85]. Thus, it also depends on the bit representation. Real : f32; 64; 128g ?! Type
The type Bool is de ned as the common boolean algebra, i.e., it de nes constants true and false, and the operations _; ^; :. Function types de ne the following projections: result : T ! T rsttask : T ! Task Remark 3. For extensions of the function type, we can de ne arguments : T ! T
which represents the parameters of a function. result is ? if we model a procedure type. Set types de ne a projection for each pair of the set. The name of the projection is equal to the selector component of the pair. Hence, for each pair (T; s) of a set S , there is a projection: s : S ?! T which gives a value of type T .
Input and output are modeled in AL by potentially in nite lists of values. input : Values output : Values We assume the usual operations on list to be de ned.
A.2 Tasks AL de nes dierent types of tasks: Task = Create [ Set [ Get [ BasicOps [ One [ All [ Call [ Return [ Read [ Write Each task computes (depending on the recursion level) a value which may be unde ned. value : Task N ! Value
The sort Value comprises the elements of Int, Real and Bool, and holds all references. Value = Int [ Real [ Bool [ Reference [ f?g
Control ow is modeled by an abstract program counter ct and a static function nexttask. ct : Task nexttask : Task ! Task The basic operations Create, Get and Set are used to create a new reference, read a value, and set a value.
if
then
Create(ct) BasicType(ct.static_type) Reference r r.static_type := ct.static_type r.value := ct.value(recl) := r
if
extend
with
?
endextend else extend Reference with
r r.static_type := ct.static_type ct.value(recl) := r
do 8 t 2 Parts (ct )
r.(t.selector) := t.value
enddo endextend endif proceed endif Create creates new objects and adds the new references to the universe Reference. Create on structures or arrays creates a new reference and assigns the parts of the structures to the selectors. This assumes that the parts have to be created explicitly. The Set task changes the value of an object. The destination (source) is accessed via the dest (source ) attribute.
if
then
Set(ct) ct.dest.value(recl).value := ct.source.value(recl).value proceed
endif
Get reads the value of an object.
if
Get(ct) then if BasicType(ct.type) then ct.value(recl) := ct.source.value(recl).value else ct.value(recl) := ct.source.(ct.selector.value(recl)) endif proceed endif
The basic operations are constructed similar. They use the results of other tasks for the computation of their result. These tasks are accessed by left and right. The value of a task depends on the recursion level recl. We show the integer addition and omit the other operations because of space limitations.
if
then
intplus(ct) ct.left.value(recl)+I ct.right.value(recl) > MaxInt exception := IntOverflow IntArithmetic = Overflow ct := OverflowError
if
if else proceed endif
then
then
elsif ct.left.value(recl)+ ct.right.value(recl) < exception := IntUnderflow if IntArithmetic = Overflow then ct := UnderflowError else proceed endif else I
MinInt
then
ct.value(revlevel):=ct.left.value(recl)+I ct.right.value(recl) proceed
endif endif
The semantics of intplus considers over ow and under ow during the computation and indicates this with an exception. Since AL should be as simple as possible, there are only four control ow changing tasks. One is comparable to a case construct in imperative languages. It changes control depending on the value of a key.
if
One(ct) then choose t 2 Alternatives(ct) satisfying t.key=ct.condition.value(recl) if t 6= ? then ct := t else proceed endif endchoose endif
Another requirement for AL was that we can model indeterminism. All is the AL task which executes a set of tasks in arbitrary order. Each subtask is executed completely before control changes for the execution of the next task.
if
then j then do 8 t 2 Parts (ct ) NotExecuted(t) := true enddo proceed else choose t satisfying t 2 NotExecuted(Parts(ct))
All(ct) NotExecuted(Parts(ct)) = 0
if j
NotExecuted(t) := false
ct := t
endchoose endif endif
Parts(ct) describes the set of subtasks. Each of these tasks has an attribute NotExecuted which indicates when a task is already computed. Call represents a function call without parameters. A call saves the return task (caller ), increments the recursion level (recl ) and changes control to the rst task of the function.
if
then
Call(ct) ct := firsttask(ct.id) caller(recl + 1) := ct recl := recl + 1
endif
Remark 4. It is not necessary to de ne function calls with parameters because they can be implemented with existing AL constructs.
Finally, Return de nes the return from a function. It decrements the recursion level and gives control to the successor of the calling task.
if
then
Return(ct) ct := nexttask(caller(recl)) recl := recl - 1
endif
input and output represent (possibly in nite) streams. Read reads the rst element of the input stream. Write adds a value to the output stream.
if
then
Read(ct) ct.value(recl) := first(input) input := tail(input) Proceed
endif if
then
Write(ct) output := output ++ [ct.source.value(recl)]
Proceed
endif
The de nition of the macro proceed on the de nition of control ow. AL allows that control ow is de ned explicitly by control ow edges or implicitly by data dependencies. Then control ow edges have to be inserted into the task graph if data dependencies make this necessary. For all the other cases the following task is chosen arbitrarily out of the set of valid successors. We de ne the macro proceed == this_task ct.nexttask task ct.nexttask: task before this_task implies Executed(task) ct := this_task Executed1(this_task) := true
choose satisfying 8
endchoose
2
2