More Advice on Proving a Compiler Correct

0 downloads 0 Views 270KB Size Report
variant of context free grammars there is no agreement about a method, let alone ... interpreters in tail recursive form, corresponding to the fetch/decode/execute-loop ...... vide and conquer approach, taking further advantage of the algebraic ...
c 1993 Cambridge University Press 1 J. Functional Programming 1 (1): 1{000, January 1993

More Advice on Proving a Compiler Correct: Improve a Correct Compiler Erik Meijer

University of Utrecht Department of Computer Science POBox 80.089 NL-3508 TB Utrecht The Netherlands email: [email protected]

Abstract

As an alternative to the classical approach to the compiler correctness problem where a given compiler is proved correct from scratch, we propose a method for deriving correct compilers from a denotational semantics via a series of re nements. Each such optimization step corresponds to an eciency improvement in the corresponding compiler. Our technique combines the standard initial algebra semantics approach with aspects of Action Semantics. Instead of expressing semantic functions as homomorphisms from the initial algebra (syntax) to some semantic domain, they are factored through an algebra of actions. Compilers can then be obtained by viewing semantic functions as translations from the source language to the initial action algebra. We illustrate our method by deriving several compilers for a language of arithmetic expressions. Though simple, this example shows all the steps necessary to deal with more realistic languages.

1 Introduction One of the objectives of denotational semantics is to give a precise description of programming languages that can serve as a standard against which implementations can be veri ed, or preferably, from which correct compilers can be derived. In this paper we want to advocate the use a calculational approach to derive correct and ecient implementations for programming languages from their denotational descriptions. Besides the correctness aspect, a transformational approach can help to understand the relationships between various existing implementations, or even suggest new ones. In contrast to the consensus about formal de nition of syntax by means of some variant of context free grammars there is no agreement about a method, let alone about a notation, for specifying the semantics of programming languages. Following the adagium \the essence of denotational semantics is the translation of conventional programs into equivalent functional programs" (MacLennan, 1990), we give semantics to a language by de ning an interpreter in a functional language. For con-

2

Erik Meijer

creteness we will use Gofer (Jones, 1994) as our semantic meta-language, though any lazy functional language with constructor classes would serve equally well. Starting from an obviously correct but possibly inecient interpreter, we progressively make interpreters more operational. Because of the way interpreters are factored through an algebra of run-time actions via a polymorphic transformer, the re nement of one interpreter to another can be con ned to a local re nement on these actions alone. It suces to show how the old actions can be emulated by the new ones. Compilers are obtained by viewing interpreters as translations from the source language to the initial algebra (syntax) of the underlying action algebra. This is achieved by replacing semantic actions by their syntactic counterparts. Informally, one can think of an interpreter as mapping a source program to some abstract functional denotation, whereas the corresponding compiler maps the source program into a data structure representing this denotation. The structure of the two is the same except for the underlying action algebra. The idea of viewing functions as data to turn an interpreter into a compiler has been used by many researchers, albeit in an informal way. In this paper we make this idea precise and formal. The generation of a compiler from an interpreter also yields a residual interpreter for the target language. It will be our ultimate goal to arrive at residual interpreters in tail recursive form, corresponding to the fetch/decode/execute-loop of a conventional von Neuman architecture. A language of simple arithmetic expressions is used as the running example, larger examples can be found in (Meijer, 1992). We begin with a standard interpreter, which is rst re ned into a stack-based interpreter, and nally into a continuationpassing version. At each stage a compiler and a residual interpreter can be obtained by the above mentioned factorisation process. The expressions example is small enough to keep derivations short and understandable, yet all techniques needed to derive compilers for more involved languages will pass the revue. We conclude with a discussion of related work and point out some possibilities for future research. In the appendix we collect all code developed along the way.

2 Abstract syntax as initial algebras To be able to speak about the abstract syntax of a language and to de ne interpreters in a compositional way we will use the elementary categorical concepts of functor, (initial) algebra, and catamorphism. A functor is a type constructor f together with a functional map that lifts a function of type (a ! b) to a function of type (f a ! f b) that respects the identity and distributes over composition: map id = id and map (f  g) = map f  map g. In Gofer we capture this by de ning a constructor class (Jones, 1993) Functor:

class Functor f where map

::

(a

! b) ! (f a ! f b)

It is not possible to impose laws on a class de nition, therefore it remains the

3

More Advice

responsibility of the programmer to ensure that each instance of class Functor satis es the two properties required for a functor. Given a functor f that captures the recursive shape of an algebraic data type, we de ne a type constructor Rec that ties the recursive knot:

data Rec f

=

In (f (Rec f))

This de nition makes essential use of the ability in Gofer to abstract over type constructors instead of only over type variables. The abstract syntax of a programming language can be de ned as the recursive type Rec f for a suitably chosen functor f. The running example we use is a language of arithmetic expressions. Usually the abstract syntax for expressions is de ned by

Num Int j Add Expr Expr Here we de ne expressions by type Expr = Rec E, where E is a functor that describes the shape of expressions Expr: data E e = Num Int j Add e e The map corresponding to type constructor E is given by instance Functor E where map f = e:case e of Num n ! Num n Add e e0 ! Add (f e) (f e0 ) For convenience, in the sequel we will omit the e:case e of-part and instead write such de nitions as map f = Num n ! Num n; Add e e0 ! Add (f e) (f e0 ). An f-algebra is a type a together with an operation ' of type f a ! a. In Gofer,

data Expr

=

this could be expressed by the type synonym

Functor f ) f a ! a The target a of an f-algebra is called its carrier, and functor f its signature. A homomorphism from an f-algebra ' :: f a ! a to an f-algebra :: f b ! b is a function h :: a ! b that recursively replaces operation ' by operation ; equationally this is expressed by h  ' =  map h, or as a diagram: fa '-a

type Algebra f a

=

map h

?

fb

h

- b?

The f-algebra In :: f (Rec f) ! Rec f is special among the category of f-algebras, it is the initial f-algebra: for any other algebra ' :: f a ! a there is a unique homomorphism (j'j) :: Functor f ) Rec f ! a, called a catamorphism (Meijer et al., 1991), that recursively replaces the initial algebra In by the target algebra '

jj jj

( ) ( ' ) (In x)

:: =

Functor f ) (f a ! a) ! (Rec f ! a) ' (map (j'j) x)

4

Erik Meijer

Catamorphisms are the generalization of the familiar operator foldr on lists to arbitrary recursive data types. We will see an example shortly.

3 Semantics as catamorphisms

An interpreter for a programming language Rec f, which for us is synonymous with a denotational semantics, is a function from Rec f to some semantic domain m. A semantics is compositional if it is de ned as a catamorphism (j'j) :: Rec f ! m. Structuring a denotational semantics by considering the syntax of the source language as the carrier of an initial algebra and giving the valuation function as a catamorphism is called Initial Algebra Semantics (Goguen et al., 1977). In a traditional denotational semantics, the algebra ' of a semantics (j'j) is encoded using the untyped -calculus. Although mathematically sound, such unstructured language descriptions have rather poor pragmatic qualities. It is hard to identify the essential semantic constructs of the programming language under consideration, and the (automatic) generation of compilers from the semantic description is virtually impossible. Action Semantics as developed by Mosses and Watt (1983; 1986) is an attempt to improve the readability and modularity of formal descriptions of programming languages. The semantic domain m is presented as the carrier of a g-algebra , where the set of actions corresponds to the run-time operations of the programming language in question. For our expression language we take the usual domain of integers Int, with individual numbers and addition as the semantic actions: (Num n ! n; Add e e0 ! e + e0 ) :: E Int ! Int. A standard interpreter eval :: Rec E ! Int for expressions can then be de ned as the catamorphism eval = (jNum n ! n; Add e e0 ! e + e0 j). Unfolding this de nition to eliminate the use of catamorphisms and maps, we see that it indeed has the expected compositional behavior:

eval (In x)

(Num n ; Add e e0

=

! !

n (eval e) + (eval e0 )) x

3.1 Transformers Unlike in the example above, the signatures of the run-time algebra and that of the abstract syntax often do not match directly. For real life examples, the essence of writing an interpreter for a language Rec f is to transform the given run-time algebra :: g m ! m into a compile-time algebra t :: f m ! m of the same signature as the abstract syntax by means of transformer (Fokkinga, 1992); a polymorphic function from g- to f-algebras:

type Trans g f a

=

(Functor

g; Functor f) ) (g a ! a) ! (f a ! a)

Intuitively, the polymorphism requirement on transformers avers that the compiletime algebra t is built by composing the run-time actions , without assuming anything about the particular carrier type. A transformer acts as an adaptor that permits the use of a g-algebra where an f-algebra is required.

5

More Advice

4 Re ning interpreters Given a language and its interpreter, expressed respectively as an initial algebra and a catamorphism, our goal now is to re ne the interpreter to a more operational version. The free theorems (Wadler, 1989) for catamorphisms and transformers will be our main instruments for doing this. From the polymorphic type (j j) :: Functor f ) (f a ! a) ! (Rec f ! a) of catamorphisms we obtain a free theorem that is known as the fusion law ().

h  ' =  map h h  (j'j) = (j j)

A diagram makes the various types explicit:

'-a

fa map h

?

fb

h

-? b

j j- a @ (j j)@ @@R ?h

Rec f

)

(')

b

The fusion law states that the composition of a catamorphism with a homomorphism is again a catamorphism. Unlike homomorphisms, catamorphisms are in general not closed under composition. If partial or in nite values of Rec f are allowed, we must add the extra condition that h is strict. In this paper we won't consider such values. The free theorem for transformers t :: (Functor g; Functor f) ) (g a ! a) ! (f a ! a) tells us that homomorphisms on g-algebras are homomorphisms on the transformed f-algebras too.

h  ' =  map h h  t ' = t  map h

The overloading of map makes a diagram useful in this case:

ga map h

?

gb

'-a

f a t '- a

h

- b? )

map h

?

fb t

h

- b?

As argued above, interpreters will often be based on a transformed algebra, that is, they will be built from the composition of a transformer and a catamorphism: (jt j) :: (Functor g; Functor f) ) (g a ! a) ! (Rec f ! a). The free theorem for

6

Erik Meijer

this combination will be referred to as the transformer/fusion law:

ga map h

-a

h

?

j -j) a @ (jt j)@ @@R ?h

Rec f

(t

- b? ) b An interpreter (jt j) :: Rec f ! m is re ned into another interpreter (jt j) :: Rec f ! u by fusing with a homomorphism h :: m ! u. If we assume that h is gb

injective, the re ned interpreter is equivalent to the old one:

j j)

(t =

assume h is injective with left-inverse h?1

h?1  h  (jt j) = assume h is a g-homomorphism: h  =  map h ? 1 h  (jt j) Of course, re nement only makes sense if the new interpreter (jt j) is more ecient then the previous one (jt j), and the inverse h?1 is cheap to compute. Ideally, the new algebra is itself of the form t0 , where the actions are in some sense more primitive than . This corresponds to enlarging the layer of interpretation. The creative part of re ning an interpreter lies mainly in inventing the embedding homomorphism h. The choice for h is usually suggested by its type or by the algebraic properties of .

4.1 Example We are now able to re ne our expression evaluator so that the evaluation of expressions is broken into a sequence of elementary steps, using a stack for storing intermediate results. Stacks are represented by lists, and come with operations for pushing a number on a stack, adding the two topmost elements, sequencing, and the identity operation on stacks.

push push n ns

:: =

Int ! ([Int] ! [Int]) n : ns

add add (n0 : n : ns)

:: =

[Int] [Int] 0 n + n : ns

seq

:: =

([Int] ! [Int]) ! ([Int] ! [Int]) ! ([Int] ! [Int]) s0 (s ns)

:: =

[Int] ! [Int] ns

(s `seq`

skip skip ns

s0 ) ns

!

7

More Advice

These operations can be combined into an S-algebra (PUSH n ! push n; ADD ! add; SEQ s s0 ! s `seq` s0 ; SKIP ! skip) where functor S is de ned as

data S s

PUSH Int j ADD j SEQ s s j SKIP

=

To re ne the standard interpreter for expressions into a stack-based interpreter we require an injective function h :: Int ! ([Int] ! [Int]) and a transformer trans :: (S a ! a) ! (E a ! a) such that h is an E-homomorphism:

! !

h  (Num n ; Add e e0

n e + e0 )

trans (PUSH n ! push n ; ADD ! add 0 0 ; SEQ s s ! s `seq` s ; SKIP ! skip)  map h Type considerations suggest to try push as the re ning homomorphism, even more so because push is injective with left-inverse f:head (f [ ]). As transformer, we take =

the following function that captures the idea of post x traversal of an expression:

trans trans stack

:: =

! ! (E a ! a) ! push n 0 ! s `seq` s `seq` add)

(S a a) (Num n ; Add s s0

where

push n add s `seq` s0

= = =

stack (PUSH n) stack ADD stack (SEQ s s0 )

Using the fact that push distributes over addition:

push (e + e0 )

= (push e) `seq` (push

e0 ) `seq` add

it is straightforward to prove that push is indeed an E-homomorphism. Thus we have re ned our previous interpreter eval = (jNum n ! n; Add e e0 ! e + e0 j) with no imperative order of evaluation into a stack-based interpreter in which the order of evaluation is completely xed:

eval e

head ((j trans (PUSH n ; ADD ; SEQ s s0 ; SKIP

=

! ! ! !

push n add s `seq` s0 skip) j) e [ ])

By unfolding this de nition to eliminate the use of catamorphisms, maps and transformers this is readily seen to be the case.

eval e

=

head (eval0 e [ ])

eval0 (In x)

=

(Num n ; Add e e0

! !

push n (eval0 e) `seq` (eval0 e0 ) `seq` add) x

8

Erik Meijer

5 Factoring an interpreter into a compiler Besides the fact that the combined transformer/fusion law can limit the work needed to re ne an interpreter, it can also be used to turn an interpreter into a compiler. An interpreter (jt j) based on an action algebra :: g m ! m can be factored into a compiler (jt Inj) from source language Rec f into target language Rec g, and a residual interpreter (j j) from Rec g to the original semantic domain m:

j Inj) - Rec g @ ? ?(j j) (jt j)@ @@R ? ? m

Rec f

(t

That is, compilers can be constructed from interpreters by replacing the action algebra by the initial algebra In, or more informally, by replacing semantic actions by their syntactic counterparts. The factorization theorem follows immediately from the combined transformer/fusion law with substitutions = In, = and h = (j j). Here we see two examples of important heuristics in functional programming in action: inspecting free theorems of polymorphic functions; and instantiating a conditional law so that the preconditions of this law become vacuous. The stack-based interpreter yields a compiler into stack machine code by factoring the catamorphism (jtrans (PUSH n ! push n; : : :)j) into the composition (jPUSH n ! push n; : : : j)  (jtrans Inj). In the expanded version of eval, this corresponds to replacing the core of the interpreter eval0 :: Rec E ! ([Int] ! [Int]) by the composition of the compiler comp :: Rec E ! Rec S and the residual interpreter run :: Rec S ! ([Int] ! [Int]):

eval e comp (In x)

= =

head (run (comp e) [ ]) (Num n ! push n ; Add s s0 ! (comp s) `seq` (comp s0) `seq` add) x

where

push n add s `seq` s0

However, the residual interpreter

run

=

j

( PUSH n ; ADD ; SEQ s s0 ; SKIP

In (PUSH n) In ADD In (SEQ s s0 )

= = =

! ! ! !

push n add s `seq` s0 skip

is not yet tail-recursive, so further re nement is needed. Factoring an interpreter into a compiler and a residual interpreter is a disciplined form of partial evaluation (Jones and Gomard and Sestoft, 1993). The work done at compile-time is specializing a translated source program in the carrier of f-algebra (t In) to a target program in Rec g, thereby removing a layer of interpretation.The

9

More Advice

more complicated transformer t is, the more work is done at compile-time. In ordinary partial evaluation the signature of residual programs g (and the run-time actions ) are the unknowns in the factorization process. Much care has to be taken to write interpreters in such a way that they are amenable to good specialization. In contrast we stage our interpeters manually into a compile-time and a run-time part in an attempt to structure the interpreter a priori.

5.1 Parsing and un-parsing Strictly speaking, a compiler is a mapping from the concrete syntax of the source language to the concrete syntax of the target language, and not as we de ned, a mapping from the abstract syntax of the source language to the abstract syntax of the target language. Hence a real compiler is a function compile :: String ! String such that the following diagram commutes:

String compile- String

parse

?

Rec f

parse

?g Rec (jt Inj)

The concept of free theorems also proves useful for making the jump from abstract to concrete syntax. If we assume that every abstract syntax tree has a textual representation, i.e. that the parsing function is surjective with right-inverse unparse, we can de ne the compiler as the composition of three passes:

unparse  (jt Inj)  parse If the unparser is expressed as a catamorphism (j'j) and if the parser is expressed in the form parser = p In with p a polymorphic function of type (f a ! a) ! a, compile

=

the \Acid Rain" theorem (Meijer, 1994)

jj

( )p

In = p gives a monolithic compiler compile = p (t ') that need not to built intermediate

trees at all. Instead, the parser for the source program computes the target program on the y. The expression p (t ') exempli es once again that the essence of writing a compiler is to nd a transformer t that shows how each operation of the source language is translated into a combination of operations in the target language.

10

Erik Meijer

Take for example the following parser for expressions, written using monad comprehensions:

parse

[In (Add

=

2 2 2 2 2

e e0 ) j ;e ; ; e0 ;

++ [In (Num n)

symbol 0 (0 expr symbol 0 +0 expr symbol 0 )0 ]

j n 2 number]

and a catamorphic unparser for stack-machine code:

unparse

j

( PUSH n ; ADD ; SEQ s s0 ; SKIP

=

! ! ! !

"push(" ++ show "add; " s ++ s0 "skip; " )

n ++ "); "

j

The complete compiler unparse  (jtrans Inj)  parse can be combined into a single function that directly translates the concrete syntax of expressions to the concrete syntax of code for a stack machine.

compile

=

2 2 2 2 2

[e ++ e0 ++ "add; "

j

++ ["push(" ++ show

n ++ "); " j n 2 number]

;e ; ; e0 ;

symbol 0 (0 expr symbol 0 +0 expr symbol 0 )0 ]

6 Attribute grammars for context information

Using transformers (g a ! a) ! (f a ! a) of rst-order type, it is only possible to de ne the meaning of a construct in terms of its immediate substructures. In particular it is impossible to let the meaning of a construct depend on its compile-time context. Realistic compilers generate code in a context-sensitive way. To incorporate context information we will use transformers of higher type t :: (g a ! a) ! (f (h a ! j a) ! (h a ! j a)). Catamorphisms (jt j) :: (g a ! a) ! (Rec f ! (h a ! j a)) built using higherorder transformers correspond to attribute grammars with inherited attributes of type h a and synthesized attributes of type j a. Often it will be the case that h a = j a = a. This provides a heuristic for re ning an interpreter of type Rec f ! a into one of type Rec f ! (a ! a), i.e., by nding an embedding homomorphism of type a ! (a ! a). The combined transformer/fusion law for attribute grammars expressed as cata-

11

More Advice

morphisms (jt

j) :: (g a ! a) ! (Rec f ! (h a ! j a)) is: h a (jt j) x - j a ga -a

map h

?

gb

h

- b? )

map h

? hb

map h

j j) x

(t

- j ?b

A corollary of this law is the compiler factorization law for attribute grammars.

map (j j)  (jt Inj) x

j j) x  map (j j)

= (t

We leave it as an interesting exercise for the reader to write down the free theorems for higher-order catas and transformers separately.

6.1 Eliminating sequencing As the last step in the re nement of our expression interpreter, we will eliminate the sequencing operation seq from the set of run-time actions. No actual machine has such sequencing operation in its instruction set. Instead instruction are chained one after the other, each one taking the next as its continuation. In this way we will arrive at a tail-recursive version of our interpreter. As suggested by the above heuristic for re ning interpreters to attribute grammar form (substitue a = [Int] ! [Int]), we take ([Int]

! [Int]) ! ([Int] ! [Int])

as the new semantic domain. Now we must nd an injective function h of type ([Int] ! [Int]) ! (([Int] ! [Int]) ! ([Int] ! [Int])), a transformer trans0 :: (S a ! a) ! (C (a ! a) ! (a ! a)) and a C-algebra cont such that h is an S-homomorphism:

h  (PUSH n ; ADD ; SEQ s s0 ; SKIP

! ! ! !

push n add s `seq` s0 skip) = trans0 cont  map h There is one apparent choice for h, namely the action seq which is already of the right type; moreover seq is injective with left-inverse ss:ss skip. There is also a transformer trans0 from S a ! a to the continuation algebra

12

Erik Meijer

C (a ! a) ! (a ! a), namely data C s = PUSH0 Int s j ADD0 s j SKIP trans0 trans0 cont

! !

(S a a) (C (a (PUSH n ; ADD ; SEQ ss ss0 ; SKIP

:: =

! ! ! !

where

push0 n s add0 s

! a) ! (a0 ! a)) s:push n s s:add0 s s:ss0 (ss s) s:s )

cont (PUSH0 n s) cont (ADD0 s) A straightforward proof shows that, because seq is associative with identity skip, = =

the required homomorphism property holds for the following new action algebra:

trans0 (PUSH0 n s ! push0 n s ; ADD0 s ! add0 s 0 ; SKIP ! skip) where push0 n ns s = s (n : ns) and add0 s (n0 : n : ns) = s (n + n0 : ns) In summary we have derived a continuation-passing interpreter:

head ((jtrans (trans0 (PUSH0 n s ! push0 n s ; ADD0 s ! add0 s 0 ; SKIP ! skip )) j) e skip [ ]) Factoring this interpreter gives a compiler e:(jtrans (trans0 In)j) e (In SKIP0), and a residual interpreter (jPUSH0 n ! push0 n; ADD0 ! add0 ; SKIP0 ! skipj).

eval e

=

Unfolding the latter makes clear that it is tail-recursive, as required.

run (In x)

=

(PUSH0 n ; ADD0 s ; SKIP0

s

! ! !

ns:run (push0 n ns s) ns:run (add0 ns s) ns:skip ns) x

7 Dealing with sharing The factorization of an interpreter into a compiler and a residual interpreter might lead to problems if parse trees of either the source or the target language posses a graph structure. For an interpreter this is ne, but such a possibility will cause a compiler to duplicate code or in case of cycles even to run forever. As an example we extend our source and target languages with conditionals.

Num Int j Add e e j If e e e PUSH0 Int s j ADD0 s j SKIP0 j IF0 s s The new transformer trans00 :: (C a ! a) ! (E (a ! a) ! (a ! a)) is the composition of transformers trans and trans0, extended to deal with conditionals. First

data E e data C s

= =

composing and then extending avoids the labour of extending the two transformers

13

More Advice

trans and trans0 separately. trans00 cont = Num n Add ss ss0 If ss ss0 ss00

where

! ! !

push0 n s add0 s gofalse0 s s0

s:push0 n s s:ss (ss0 (add0 s)) s:ss (gofalse0 (ss0 s) (ss00 s)) = = =

cont (PUSH0 n s) cont (ADD0 s) cont (IF0 s s0 )

The interpreter for the extended language is now given by

head ((jtrans00 (PUSH0 n s ! push0 n s ; ADD0 s ! add0 s 0 0 0 0 ; IF s s ! gofalse s s ; SKIP0 ! skip ) j) e skip [ ]) where the new action gofalse0 uses the value on top of the stack to decide which eval e

=

branch to pursue.

gofalse s s0 (n : ns)

=

if n == 0 then s0 ns else s ns

Code duplication occurs in the interpretation of conditionals

If ss ss0 ss00 ! s:ss (gofalse (ss0 s) (ss00 s))

One solution would be to assume that sharing is not lost when computing the target program. Then if the (built-in) unparser takes care of sharing, no code duplication will occur and even cyclic target programs can be mapped to a nite textual representation. This is essentially the approach taken by Sethi (1984). Since we don't want to assume anything unconventional about our semantic meta-language (I know of no lazy functional language that allows access to the internal graph representation of values) this option is rejected as too naive. We can gain the e ect of the above solution by making the sharing explicit on the object level using labels and references. In Gofer, adding the two new constructors needed to achieve this can be done very elegantly:

data Ext f a

=

LABEL Idf a j REF Idf a j Just (f a)

This is the second example where we abstract over a type constructor instead of over just a type variable. Sharing of continuations in the interpretation of conditionals is made explicit by labelling one occurence with a fresh label and letting the other ocurrence refer to

14

Erik Meijer

that label. The state monad takes care of the supply of fresh labels.

trans000 trans cont

C a ! a) ! (E (a ! State Int a) ! (a ! State Int a)) 0 n s] Num n ! s:[push 0 00 0 Add ss ss ! s:[s j s00 2 ss0 (0add0 s) ; s 2 ss s ] 0 00 0000 If ss ss ss ! s:[s j fi 2 gensym ; s0 2 [label fi s] ; s00 2 ss0 (ref fi s0 ) ; s000 2 ss00 s0 ; s0000 2 ss (gofalse s00 s000 )]

(Ext

:: =

where

push0 n s add0 s gofalse0 s s0 label l s ref l s

= = = = =

( Just (PUSH n s) ; Just (ADD s) ; Just SKIP ; Just (IF s s0 ) ; LABEL l s ; REF l s

! ! ! ! ! !

cont (Just (PUSH0 n s)) cont (Just (ADD0 s)) cont (Just (IF0 s s0 )) cont (LABEL l s) cont (REF l s)

A good example of a higher-order transformer. An interpreter built using trans000 instead of trans00 only needs to de ne the interesting operations by using transformer ext :: (f a ! a) ! (Ext f a ! a) to strip labels and references. Compilers leave labels and references intact for the unparser. The following unparser for Rec (Ext C) maps node labels to program labels and node references to gotos.

unparse

=

j

"push(" ++ show n ++ "); " ++ s "add" ++ "; " + +s "skip; " "if(pop) " ++ s ++ " " ++ s0 l ++ " : " ++ s "goto" ++ l ++ "; " )

f

g

j

8 Comparison with related work In this nal section we take step back and consider our approach to the compiler correctness problem in the context of other work in this area. Many people (Burstall and Landin, 1969; Chirica, 1976; Dybjer, 1985; Jones and Schmidt, 1980; McCarthy and Painter, 1967; Morris, 1973; Burstall and Landin, 1969; Thatcher et al., 1980; Mosses, 1980) have advocated the use of algebraic means to tackle the compiler correctness problem. The traditional initial algebra semantics approach can be phrased as follows. Given a compiler c :: Rec f ! Rec g that translates a source language to a target language, together with semantics m :: Rec f ! m and a :: Rec g ! u for these languages, one seeks an embedding e :: m ! u such that the following diagram commutes

15

More Advice

Rec f c- Rec g m

?

m

a

? u e

This is most easily done by making the corners of the diagram into carriers of falgebras and de ning the arrows as homomorphisms. Then the diagram commutes by initiality of Rec f. Our paper can be seen as a next step in the development of initial algebra semantics. Instead of proving a given compiler correct we aim to calculate a compiler for a language from its denotational semantics alone. We accomplish this by a divide and conquer approach, taking further advantage of the algebraic framework. First we re ne interpreter (jt j) based on action algebra into an interpreter (jt (t0 )j) based on action-algebra via a homomorphic embedding e such that e  = t0  map e.

Rec f

? @@ 0 (jt j) ? (jt (t )j) ? @ ? @R- u m e

Via the combined transformer/fusion law the improved interpreter (jt (t0 In)j) can be factored into a compiler (jt (t0 In)j) and a residual interpreter (j j)

Rec f

@

j

( t (t0

In)j) - Rec g

? ?(j j) @@R ? ? u

)j)@

j

( t (t0

By glueing together the two triangles, we see that we have solved the traditional compiler correctness problem. 0 Rec f (jt (t In)-j) Rec g

j j)

(t

?

m

jj - u?

( )

e

The work of (Lee, 1990) on High Level Semantics also builds on ideas from Action Semantics. Lee calls the interpreter (jt j) the macro semantics and the actual de nition of the run-time actions the micro semantics. Di erent implementations can be given for the actions of the micro semantics, for example a code generator or a continuation style semantics, but the macrosemantics remains unchanged. This xed division removes the possibility of shifting computations from run-time to compile-time, which is a key aspect of our method. Moreover, no attempt is made

16

Erik Meijer

to relate di erent micro semantics; Lee's goal is to develop a modular approach to writing compilers rather than proving compilers correct. In a sense the monadic approach to semantics (Moggi, 1989; Wadler, 1983) is similar in spirit to the idea of Action Semantics. The fact that the set of monad operations is xed makes them less suitable as run-time actions. As we have seen, monads can however be used to structure interpreters on the meta-level. Our technique of factoring an interpreter into a compiler and an abstract machine is used informally in (Friedman et al., 1992, Chapter 12). Given an interpreter they rst abstract the elementary actions performed by the interpreter. This corresponds to making explicit the transformer from run-time actions to compile-time actions. Next they modify the interpreter so that instead of performing each action it generates a fragment of code representing the action by replacing the actions by corresponding data objects. Finally a residual interpreter for the compiled code is derived from the actions performed by the original interpreter. The problem of code duplication due to compiling conditionals as well as dealing with recursive procedures is deferred to an exercise. Moreover, no attempt is made to prove the correctness of this approach. In many related papers on semantics directed code generation (1984; 1989; 1986; 1991; 1990; 1984) the distinction between denotations and their representations as data is also left implicit. In retrospect this is perhaps not surprising, because in Lisp (the traditional meta-language of choice) there is no distinction between code and data. A more operational route is taken by Pagan (1988) who converts an interpreter into a compiler by replacing \the interpreters action chunks with `write' statements that output the chunks". Although Pagan recognizes the problems associated with loops in the source language, he does not propose a general solution for solving this question. As already said before the factorization of an interpreter into a compiler via the combined transformer/fusion law is a restricted form of partial evaluation. That is, optimizations like constant propagation, procedure inlining, loop unrolling and dead code elimination are not automatically performed by the resulting compiler. Not surprisingly, many of the binding-time improvement transformations used to make programs more suited for partial evaluation (Jones and Gomard and Sestoft, 1993, Chapter 12) can be recognized in this paper as well. For example conversion into continuation passing style, eta conversion (Danvy et al., 1994) and of course improvements from free theorems (Holst and Hughes, 1990). For further details about work in semantics directed compiler generation, the reader is referred to Schmidt (1986); for a more fundamental study consult Tofte (1990).

9 Future work The crucial factor that makes it possible to re ne an inecient interpreter into an ecient interpreter is nding a homomorphic embedding between the semantic domains of the two interpreters. For rst-order languages this is usually no problem, but things get notoriously hard when domains themselves become recursive,

More Advice

17

especially when function-spaces are involved (Sethi and Tang, 1980; Stoy, 1977; Henson, 1987; Reynolds, 1974). This e ectively means that our techniques do not apply directly to interpreters for functional languages (Lester, 1988), and thus we cannot reach our original goal of bringing some structure in the wealth of compilation techniques for functional languages by calculating each one of them from the generic meta-interpreter for the -calculus (Meijer and Paterson, 1991). Recent work on recursive types involving function-spaces (Freyd, 1990; Pitts, 1993) seems likely to provide the extra theory to solve this problem. When a technique works for toy examples, this doesn't automatically mean that it scales up to real problems as well. To see whether our method does work for real languages we are in the process of constructing a compiler for the Oberon language (Wirth, 1988) using the techniques of this paper. The PacSoft-project at OGI (PacSoft, 1994) has already shown that algebraic programming is feasible for general purpose programming.

acknowledgements I am very grateful for the constructive criticism of the JFP referees, in particular referee \D", that led me to expose the algebraic foundations of this work more clearly. Maarten Fokkinga, Graham Hutton,Johan Jeuring, and Je Lewis pointed out several errors and obscurities in the endless stream of drafts. Finally I would like to thank Phil Wadler for encouraging me to write and revise this paper.

References

Appel, A. 1984. Semantics-Directed Code Generation. In POPL'84, pp. 315{324. Burstall, R.M. and Landin, P.J. 1969. Programs and their Proofs: an Algebraic Approach. Machine Intelligence 4, Edinburgh University Press. Chirica, L.M. 1976. Contributions to Compiler Correctness. PhD thesis, University of California at LA, USA. Clinger, W. 1984. The Scheme 311 Compiler: An Exercise in Denotational Semantics. In Lisp&FP'84, pp. 356{364. Danvy, O. Malmkjr, K and Palsberg, J. 1994. The Essence of Eta-Expansion in Partial Evaluation. In PEPM'94, pp 11{20. Technical Report 94/9, Department of Computer Science, The University of Melbourne, Australia. Dybjer, P. 1985. Using Domain Algebras to Prove the Correctness of a Compiler. LNCS 182, pp. 98{108. Fokkinga, M. 1992. Law and Order in Algorithmics. PhD thesis, University of Twente, Enschede, The Netherlands. Freyd, P. 1990. Recursive Types reduced to Inductive Types. In Proc. LICS'90. Friedman, D.P., Wand, M. and Haynes, C.T. 1992. Essentials of Programming Languages. The MIT press. Goguen, J.C., Thatcher, J.W., Wagner, E.G. and Wright, J.B. 1977. Initial Algebra Semantics and Continuous Algebras. JACM, 24 (1): 68-95. Henson, M.C. 1987. Elements of Functional Languages. Blackwell Scienti c Publications. Holst , C.K. and Hughes, J. 1990. Towards Binding-Time Improvement For Free. In S.L. Peyton Jones, G. Hutton and C.K. Holst Hughes (editors) Functional Programming, Glasgow 1990, Springer Workshops in Computing, pp 83{100.

18

Erik Meijer

Jones, M.P. 1993. A System of Constructor Classes: Overloading and Implicit HigherOrder Polymorphism. In Proceedings FPCA'93, pp 52{61. Jones, M.P. 1994. Gofer 2.30 release notes. At http://www.cs.yale.edu/HTML/YALE/CS/haskell/yale-fp.html Jones, N.D. (editor) 1980. Semantics-Directed Compiler Generation. LNCS 94. Jones, N.D., Gomard, C.K. and Sestoft, P. 1993 Partial Evaluation and Automatic Program Generation. Prentice Hall. Jones, N.D. and Schmidt, D.A. 1980. Compiler Generation From Denotational Semantics. In (Jones, 1980), pp. 70{93. Kelsey, R. and Hudak, P. 1989. Realistic Compilation by Program Transformation. In POPL'89, pp. 281{292. Krantz, D., Kelsey, R., Rees, J., Hudak, P., Philbin, J. and Adams, N. 1986. ORBIT: An Optimizing Compiler For Scheme. In SIGPLAN'86 Symposium on Compiler Construction, pp. 219{233. Lee, P. 1990. Realistic Compiler Generation. The MIT press. Lester, D. 1988. Combinator Graph Reduction: A Congruence and it's Applications. D.Phil. Thesis, report PRG-73, Oxford. MacLennan, B.J. 1990. Functional Programming Practice And Theory. Addison-Wesley. McCarthy, J. and Painter, J. 1967. Correctness of a Compiler for Arithmetic Expressions. In Symposium on Applied Mathematics 19. Meijer, E. 1992. Calculating Compilers. PhD thesis, University of Nijmegen, The Netherlands. Meijer, E. 1994. Acid Rain: Deforestation for Free. Utrecht University, The Netherlands. Meijer, E. and Paterson R. 1991. Down with Lambda Lifting. Utrecht University, The Netherlands. Meijer , E., Fokkinga, M. and Paterson, R. 1991. Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire. In John Hughes (editor) Proceedings FPCA'91, LNCS 523, pp 124{144. Moggi, E. 1989. Computational lambda-calculus and monads. In Proc. LICS'89. Morris, F.L. 1973. Advice on Structuring Compilers and Proving them Correct. In POPL'73, pp. 144-152. Mosses, P. 1980. A Constructive Approach to Compiler Correctness. In (Jones, 1980), pp. 189{210. Mosses, P. 1983. Abstract Semantic Algebras! In D. Bjrner (editor), Formal Description of Programming Concepts II, pp. 63{88, North-Holland. The PacSoft papers directory: http://www.cse.ogi.edu/PacSoft/ Pagan, F.G. 1988. Converting Interpreters into Compilers. SP&E, 18 (6): 509{527. Pettersson, M. 1990 Generating Ecient Code From Continuation Semantics. LNCS 477, pp. 165{178. Pitts, A.M. 1993. Relational properties of recursively de ned domains. In Proc. LICS'93. Reynolds, J.C.. 1974. On the relation between direct and continuation semantics. LNCS 19, pp. 157{168. Schmidt, D. 1986. Denotational Semantics. Allyn and Bacon. Sethi, R. 1984. Control Flow Aspects of Semantics Directed Compiler Generation. ACM Toplas, 5 (4): 554{595. Sethi, R. and Tang, A. 1980. Constructing Call-by-Value Continuation Semantics. JACM, 27 (3): 580{597. Stoy, J.E. 1977. Denotational Semantics, The Scott-Strachey Approach to Programming Language Theory. The MIT press. Thatcher, J.W., Wagner, E.C. and Wright, J.B. 1980. Advice on Structuring Compilers and Proving them Correct. In (Jones, 1980), pp. 65{188. Tofte, M. 1990. Compiler Generators, EATCS Monographs on Theoretical Computer Science, Springer-Verlag.

19

More Advice

Wadler, P. 1989. Theorems For Free! In FPCA'89. Wadler, P. 1992. Monads for functional programming! In M. Broy (editor), Program Design Calculi, Proceedings of the Marktoberdorf Summer School, 30 July{8 August 1992. Wand, M. 1991, Correctness of Procedure Representations in Higher-Order Assembly Language. LNCS 598, pp. 294{311. Watt, D.A. 1986, Executable Semantic Descriptions. SP&E, 16, (1):13{43. Wirth, N. 1988, From Modula to Oberon. SP&E, 18, (7):671{690.

A Actual Gofer code For those who like to play we collect in this appendix all code needed to try out the interpreters and compilers developed in the paper.

A.1 Auxillary functions First of all the de nitions for recursive types and catamorphisms. The class is part of the standard constructor class prelude.

Functor

data Rec f = In (f (Rec f)) cata :: Functor f => (f a -> a) ->(Rec f -> a) cata phi (In x) = phi (map (cata phi) x)

The parsing monad and state monad are among the standard examples of monads in functional programming. Restricted type synonyms avoid some subtle strictness issues that arise from using data de nitions. By making Parser an instance of various classes of monads, we can take full advantage of monad comprehension notation. type Parser a = String -> [(a,String)] in mapP, resultP, bindP, zeroP, orelse, item, parse resultP :: a -> Parser a resultP a i = [(a,i)] bindP :: Parser a -> (a -> Parser b) -> Parser b (p `bindP` f) i = [ (b,i'') | (a,i') Add (f e) (f e') number = [ord n - ord '0' | n a) -> Parser a expr f = [f (Add e e') | _ e+e') compile :: Functor E => String -> String compile = cata (\e -> case e of Num n -> show n; Add e e' -> e++"+"++e') .cata In .parse (expr In) compile' :: Functor E => String -> String compile' = parse (expr (\e -> case e of Num n -> show n; Add e e' -> e++"+"++e'))

22

Erik Meijer

A.3 Stack-based semantics The semantic actions for the stack-based interpreter push, add, seq, and skip have signature S. There is no need to prime the constructors and operations to distinguish them from the constructors and operations of the direct interpreter. data S s = PUSH Int | ADD | SEQ s s | SKIP | IF s s instance Functor S where map f = \s -> case s of PUSH n ADD SEQ s s' SKIP push n ns add (n':n:ns) (s `seq` s') ns skip ns

= = = =

-> -> -> ->

PUSH n ADD SEQ (f s) (f s') SKIP

n:ns (n+n':ns) s' (s ns) ns

trans :: (S a -> a) -> (E a -> a) trans stack = \e -> case e of Num n -> push n Add s s' -> s `seq` (s' `seq` add) where push n = stack (PUSH n) add = stack ADD s `seq` s' = stack (s `SEQ` s') skip = stack SKIP

The stack-based interpreter, and the two variants of the corresponding compiler are built using the above transformer trans. eval :: (Functor E, Functor S) => Rec E -> Int eval e = head (cata (trans (\s -> case s of PUSH n -> push n ADD -> add SEQ s s' -> s `seq` s' SKIP -> skip)) e []) compile :: (Functor E, Functor S) => String -> String compile = cata (\s -> case s of PUSH n -> "push("++ show n++");" ADD -> "add;" SEQ s s' -> s ++ s') .cata (trans In) .parse (expr In)

More Advice

23

compile' :: (Functor E, Functor S) => String -> String compile' = parse (expr (trans (\s -> case s of PUSH n -> "push("++ show n ++");" ADD -> "add;" SEQ s s' -> s ++ s')))

A.4 Continuation-passing semantics For the continuation passing interpreter we extend functor E with an extra alternative and modify the parser for expressions to accept conditional expressions of the form ( expr ? expr : expr ). data E e = Num Int | Add e e | If e e e expr :: (E a -> a) -> Parser a expr f = ... as before ... ++ [ f (If e e' e'') | _ \s -> [s'' | s' case s of PUSH n s -> push n s ADD s -> add s SKIP -> skip IF s s' -> gofalse s s'))) e skip `startingWith` 0) []) compile :: String -> String compile e = cata (\s -> case s of Just (PUSH n s) Just (ADD s) Just SKIP Just (IF s s')

-> -> -> ->

"push("++show n++");"++s "add"++";"++s "skip;" "if(pop){"++s++"}"++s'

More Advice

25

LABEL l s -> l++":"++s REF l s -> "goto "++l++";") ) (cata (trans In) (parse (expr In) e) (In (Just SKIP)) `startingWith` 0)

Fusing the compiler is just a matter of cut and paste.

Suggest Documents