∀ni=0 . (Ti ≈ Ti ) n (T i=1 ) F T0 ) ≈ (->
F ≈e F (T ni=1 ) F T0 )
All other type-equivalence rules for FLARE/E are straightforward. Figure 16.1
FLARE/E types, effects, and regions.
[→-≈]
950
Chapter 16
Effects Describe Program Behavior
The dup! procedure takes a cell c containing a list, modifies it to contain a list with the first element duplicated, and returns the new list. For maximum utility, the dup! procedure should have a polymorphic type that abstracts over (1) the type ?t of the elements in the list in the cell c and (2) the region ?r of the cell c. Here is a type schema for dup! with the desired degree of polymorphism: (generic (?t ?r) (-> ((cellof (listof ?t) ?r)) (maxeff (read ?r) (write ?r)) (listof ?t)))
{?t is type of list elements} {?r is region of cell} {type of the argument c} {latent effect of dup!} {type of result of dup!}
The apply-twice procedure is polymorphic in the input type of f, the output type of f, and the latent effect of f: {?t1 is input type of f} {?t2 is output type of f} {?e is latent effect of f} (-> ((-> (?t1) ?e ?t2) {type of f} ?t1) {type of x} ?e {latent effect of apply-twice, inherited from f} ?t2)) {type of result of apply-twice}
(generic (?t1 ?t2 ?e)
In this case, the latent effect ?e of the argument procedure f is inherited by apply-twice. If we assume that the bools cell is allocated in region r4 and the ints cell is allocated in region r5, then we have the following instantiations for the generic-bound variables in the two applications of apply-twice: Variable ?t ?r ?t1 ?t2 ?e
(apply-twice dup! bools) bool r4 (cellof (listof bool) r4) (listof bool) (maxeff (read r4) (write r4))
(apply-twice dup! ints) int r5 (cellof (listof int) r5) (listof int) (maxeff (read r5) (write r5))
So the type and effect of Etwice are: Etwice : (pairof (listof bool) (listof int)) ! (maxeff (init r4) (read r4) (write r4) (init r5) (read r5) (write r5))
Types, effects, and regions together are descriptions — they describe program expressions. We saw descriptions earlier, in Section 12.3.1, where they were used to specify the structure of type constructors. Here, effects and regions are new kinds of descriptions for describing program behavior. In FLARE/E, a de-
16.2.2 Type and Effect Rules
951
scription identifier δ can name any description and supersedes FLARE’s type identifier τ , which can name only types. This allows us to treat descriptions uniformly in type schemas (as illustrated above) and allows us to define notations for substitution and unification uniformly with types, effects, and regions. This uniformity simplifies our presentation, but can lead to ill-formed descriptions (e.g., a type appearing in a position where an effect is expected, or vice versa). Such ill-formed descriptions can be avoided by using a simple kind system as discussed in Section 12.3.2 (see Exercise 16.3).
16.2.2
Type and Effect Rules
An effect system is a set of rules for assigning effects to program expressions. Figures 16.2 and 16.3 present a type and effect system that assigns both a type and an effect to every FLARE/E expression. The system is based on type/effect judgments of the form TE E : T ! F
This is pronounced “expression E has type T and effect F in type environment TE .” As in FLARE, the type environments in this system map identifiers to type schemas. The type/effect rules in Figure 16.3 are similar to the type rules for the full FLARE language presented in Figures 13.3 (page 775), 13.19 (page 805), and 13.24 (page 815), except that they determine the effects of expressions in addition to their types. Literals, variable references, errors, and abstractions are all pure because their evaluation does not touch the store and so can have no store effect. Variable references would not be pure if FLARE/E included mutable variables (set!). The [genvar] rule allows substitution of arbitrary descriptions (types, effects, regions) for the generic-bound description variables in a type schema. Substituting the wrong kind of description (e.g., substituting a type for a description variable used as an effect) would lead to an ill-formed type expression. But this is not problematic, because descriptions in the [genvar] rule must be “guessed” correctly to show that an expression is well typed. The formal parameters of generic can be annotated with kind information to guarantee that all types resulting from this substitution are well formed (see Exercise 16.3). The rules for all compound expressions (except abstractions) use maxeff to combine the effects of all subexpressions and include them in the effect of the whole expression. The [→-intro] and [→-elim] rules communicate effect information from the point of procedure definition to the point of procedure application. The [→-intro] rule includes the effect of an abstraction’s body as the latent ef-
952
Chapter 16
Effects Describe Program Behavior
Domains TE ∈ TypeEnvironment = Ident TypeSchema Other type and effect domains are defined in Figure 16.1. Type Functions egen : Type → TypeEnvironment → TypeSchema (egen T TE ) = (generic (δ ni=1 ) T ), where {δ1 , . . . , δn } = FrDescIds ty [[T ]] − (FrDescIds tyenv TE ) egenPureSP : Type → TypeEnvironment → Exp → TypeSchema ⎧ ⎨(egen Tdefn TE ) if pure E , where pure is defined in Figure 13.25 on page 816 (egenPureSP Tdefn TE E ) = ⎩ Tdefn otherwise Figure 16.2
Type/effect rules for FLARE/E, Part 1.
fect in the procedure type of the abstraction, and the [→-elim] rule includes the latent effect of a procedure type in the effect of a procedure application. Latent effects are also propagated by the [prim] rule to handle the fact that the types of cell operators must now carry nontrivial latent effects. Operator types in the primitive type environment TE prim must now carry latent effects, which are pure except for the cell operators. For example: cell : (generic ^ : (generic := : (generic + : (-> (int cons : (generic
(?t ?r) (-> (?t) (init ?r) (cellof ?t ?r))) (?t ?r) (-> ((cellof ?t ?r)) (read ?r) ?t)) (?t ?r) (-> ((cellof ?t ?r) ?t) (write ?r) unit)) int) pure int) (?t) (-> (?t (listof ?t)) pure (listof ?t)))
The [letSP ] and [letrecSP ] rules for FLARE/E are similar to the [letLP ] and [letrecLP ] rules for FLARE (Figure 13.24 on page 815). One difference is that egenPureSP is defined in terms of egen, which generalizes over all free description variables (not just type variables) in the type T that do not appear in the type environment TE . We assume that the FrDescIds ty function returns the free type, effect, and region variables in a type and the FrDescIds tyenv function returns all of the free type, effect, and region variables in a type environment. The definitions of these functions are left as an exercise (Exercise 16.2). The SP subscript, which stands for “syntactic purity,” emphasizes that these rules and functions use the same syntactic test for expression purity that is used in FLARE. This seems crazy — why not use the effect system itself to deter-
16.2.2 Type and Effect Rules
953
Type/Effect Rules TE #u : unit ! pure [unit] TE N : int ! pure [int] TE B : bool ! pure [bool] TE (sym Y ) : symb ! pure [symb]
TE (error Y ) : T ! pure [error]
TE I : T ! pure where TE (I ) = T
[var]
TE I : ([Di /δi ]ni=1 )Tbody ! pure where TE (I ) = (generic (δ ni=1 ) Tbody ) [genvar] TE Ethen : T ! Fthen TE Eelse : T ! Felse TE Etest : bool ! Ftest TE (if Etest Ethen Eelse ) : T ! (maxeff Ftest Fthen Felse )
[if ]
n
TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody n TE (abs (I i=1 ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure
[→-intro]
n TE E0 : (-> (T i=1 ) Flatent Tres ) ! F0 ∀ni=1 . (TE Ei : Ti ! Fi ) n TE (E0 E i=1 ) : Tres ! (maxeff Flatent F ni=0 )
[→-elim]
TE prim O : (-> (T ni=1 ) Flatent Tres ) ! pure ∀ni=1 . (TE Ei : Ti ! Fi ) [prim] TE (prim O E ni=1 ) : Tres ! (maxeff Flatent F ni=1 ) ∀ni=1 . (TE Ei : Ti ! Fi ) n TE [Ii : (egenPureSP Ti TE Ei )]i=1 E0 : T0 ! F0 n TE (let ((Ii Ei )i=1 ) E0 ) : T0 ! (maxeff F ni=0 )
n ∀ni=1 . TE [Ij : Tj ]j=1 Ei : Ti ! Fi n TE [Ii : (egenPureSP Ti TE Ei )]i=1 E0 : T0 ! F0 TE (letrec ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 ) TE E : T ! F , where F e F TE E : T ! F
[letSP ]
[letrecSP ]
[does]
n
prog Figure 16.3
{Ii : Ti }i=1 Ebody : Tbody ! Fbody n (flarek (I i=1 ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure
[prog]
Type/effect rules for FLARE/E, Part 2.
mine purity? The reason is that an effect-based test for purity complicates the reconstruction of types and effects and the relationship between FLARE/E and FLARE. This is explored in more detail in Section 16.2.5. Because our use of effect equivalence and type equivalence in FLARE/E type derivations is implicit, the type and effect system does not include an explicit type
954
Chapter 16
Effects Describe Program Behavior
rule for type equivalence (e.g., the [type-≈] rule in Figure 11.20 on page 680). For example, consider the following FLARE/E type/effect derivation: .. . TE E1 : (-> (int) (maxeff (read r1) (write r2)) bool) ! pure .. . TE E2 : int ! (maxeff (read r1) (read r2)) TE (E1 E2 ) : bool ! (maxeff (read r1) (read r2) (write r2)) [→-elim]
This is valid because the effect (maxeff (maxeff (read r1) (write r2)) pure (maxeff (read r1) (read r2)))
specified by the [→-elim] rule can be simplified to the following effect using implicit effect equivalence: (maxeff (read r1) (read r2) (write r2))
The effect of an expression determined by our type and effect system is a conservative approximation of the actions performed by the expression at run time. Combining the effects of subexpressions with maxeff can lead to effects that overestimate the actual actions performed. For example, suppose that Epure has effect pure, Eread has effect (read r6), and Ewrite has effect (write r6). Then (if #t Epure Eread ) has effect (read r6) even though it does not touch the store at run time, and the conditional (if b Eread Ewrite ) has effect (maxeff (read r6) (write r6)) even though only one of its branches is taken at run time. It is possible to inflate the effect of an expression via the [does] rule, which allows an expression with effect F to be given effect F as long as F is a subeffect of F . In order to derive a type and effect for an expression, it is sometimes necessary to use the [does] rule to get the latent effects embedded in two procedure types to be the same. Consider the expression Eifproc = (if b (abs () Eread ) (abs () Ewrite ))
relative to a type environment TE in which b has type bool, Eread has type T and effect (read r6), and Ewrite has type T and effect (write r6). Without using the [does] rule, we can show: TE (abs () Eread ) : (-> () (read r6) T ) ! pure TE (abs () Ewrite ) : (-> () (write r6) T ) ! pure
16.2.2 Type and Effect Rules
955
The [if ] rule requires that the types of the two branch expressions be the same, but in this case the procedure types are not the same because their effects differ. To show that Eifproc is well typed, it is necessary to use the [does] rule to give the effect Frw = (maxeff (read r6) (write r6)) to the bodies of both procedures, resulting in the following derivation: TE b : bool ! pure [var] .. . TE Eread : T ! (read r6) TE Eread : T ! Frw [does] TE (abs () Eread ) : (-> () Frw T ) ! pure [→-intro] .. . TE Ewrite : T ! (write r6) TE Ewrite : T ! Frw [does] TE (abs () Ewrite ) : (-> () Frw T ) ! pure [→-intro] TE (if b (abs () Eread ) (abs () Ewrite )) : (-> () Frw T ) ! pure [if ]
In the above example, the [does] rule is used to artificially inflate the effects of procedure bodies before forming procedure types so that the procedure types (and, specifically, their latent effect components) will be identical elsewhere in the type derivation. This is the key way in which the [does] rule is used in practice. The FLARE/E type system does not support any form of subtyping, so that there is no direct way to show that a procedure with type (-> (int) (maxeff (read r) (write r)) int) can be used in place of one with type (-> (int) (maxeff (init r) (read r) (write r)) int). However, as illustrated above, the [does] rule can be used in conjunction with the [→-intro] rule to inflate the base effects in the latent effect of a procedure when it is created. The [does] rule permits an expression to take on many possible effects. We shall see below (on page 962 in Section 16.2.3) that there is a well-defined, indeed practically computable, notion of least effect. Henceforth, when we refer to the effect of an expression, we mean the smallest effect that can be proven by our rules. When we discuss effect reconstruction (Section 16.2.3), we will show how to automatically calculate the smallest effect allowed by the rules. How is the FLARE/E type and effect system related to the FLARE type system studied earlier? It has exactly the same typing power as FLARE — a
956
Chapter 16
Effects Describe Program Behavior
program is typable in FLARE if and only if it is typable in FLARE/E. This relationship is a consequence of the following theorem, which uses the notations T eT and TE eT E (see Exercise 16.4) to stand for the result of erasing effect and region information from the FLARE/E type T and FLARE/E type environment TE : Theorem 16.1 TE E : T in the FLARE type system if and only if there exists a FLARE/E type environment TE , a FLARE/E type T , and an effect F such that TE eT E = TE , T eT = T , and TE E : T ! F in the FLARE/E type/effect system. Proving that TE E : T ! F implies TE E : T is easily done by showing that erasing all effect information in the FLARE/E type/effect derivation yields a FLARE type derivation (see Exercise 16.5). The other direction (TE E : T implies TE E : T ! F ) is proven by showing that the judgments and procedure types in a FLARE type derivation can always be extended with effect information to yield a FLARE/E type/effect derivation (Exercise 16.6). Exercise 16.1 Consider the following program: (flarek (b) (let ((c (prim cell 2))) (let ((one (abs (x) 1)) (get (abs (y) (prim ^ y))) (setc! (abs (z) (let ((_ (prim := c z))) z)))) ((abs (appc) ((if b setc! one) (prim + (appc get) (appc one)))) (abs (f) (f c))))))
a. Give a type derivation showing that the above program is well typed in the FLARE/E type system. You will need to use the [does] rule to inflate some latent effects in procedure types, but use the minimal latent effect possible. b. How would your answer to part a change if the subexpression ((abs (appc) . . . ) (abs (f) (f c)))
were changed to (let ((appc (abs (f) (f c)))) . . . )?
Exercise 16.2 Define the following functions for determining the free description identifiers of various domains: FrDescIds reg : Region → P(DescId) FrDescIds eff : Effect → P(DescId) FrDescIds ty : Type → P(DescId) FrDescIds tysch : TypeSchema → P(DescId) FrDescIds tyenv : TypeEnvironment → P(DescId)
16.2.2 Type and Effect Rules
957
Exercise 16.3 Intuitively, each description identifier that is a formal parameter in a FLARE/E generic expression denotes one of a type, an effect, or a region. For example, in the type schema (generic (?a ?b ?c) (-> ((-> (?a) ?b ?a) (cellof ?a ?c)) (maxeff ?b (read ?c) (write ?c)) ?a))
?a denotes a type, ?b denotes an effect, and ?c denotes a region. This intuition can be formalized using a simple kind system (cf. Section 12.3.2) based on the following domains: K ∈ Kind ::= type | effect | region DK ∈ DescIdKind ::= (δ K ) TS ∈ TypeSchema ::= T | (generic (DK ∗ ) T )
The TypeSchema domain has been changed so that every formal parameter declared by generic has an explicit kind. In the modified system, the example type schema above would be rewritten to have the form (generic ((?a type) (?b effect) (?c region)) . . . )
We say that a type schema with explicitly kinded parameters is well kinded if each reference to the parameter in the body of the type schema is consistent with its kind. For example, the type schema above (with explicit kinds) is well kinded. However, the schema (generic ((?d type) (?e region)) (-> (?d) pure (cellof ?e ?d)))
is not well kinded because region ?e is used as a type and the second occurrence of type ?d is used as region. a. Develop a formal deduction system for determining the well-kindedness of a type schema with explicitly kinded parameters. b. Define variants of each of the functions in Exercise 16.2 that return an element of P(DescIdKind) (in which each description identifier is paired with its kind) rather than an element of P(DescId). c. Modify the definition of the egen function in Figure 16.2 to use the functions from part b to return a type schema with explicitly kinded parameters. Under what simple conditions is the type schema guaranteed to be well kinded? Explain. d. Modify the [genvar] rule to guarantee that only descriptions of the appropriate kind are substituted for generic-bound description parameters in the body of the type schema. Argue that the type resulting from these substitutions is always well formed.
958
Chapter 16
Effects Describe Program Behavior
Exercise 16.4 Define the following effect-erasure functions for FLARE/E types, type schemas, and type environments: effectErasety : TypeFLARE /E → TypeFLARE effectErasetysch : TypeSchemaFLARE /E → TypeSchemaFLARE effectErasetyenv : TypeEnvironmentFLARE /E → TypeEnvironmentFLARE
The notations T eT , TS eT S , and TE eT E abbreviate (respectively) (effectErasety T ), (effectErasetysch TS ), and (effectErasetyenv TE ). Each function should erase all effect and region information from the FLARE/E entity to yield the FLARE entity. In the definition of effectErasetysch , it is helpful (but not absolutely necessary) to assume that it is possible to determine the kind of each generic parameter (see Exercise 16.3). Exercise 16.5 The notion of effect erasure from Exercise 16.4 can be extended to type/effect judgments and type/effect derivations in FLARE/E as follows: effectErasejudge : TypeJudgmentFLARE /E → TypeJudgmentFLARE The notation TJ eT J abbreviates (effectErasejudge TJ ). TE FLARE /E E : T ! F eT J = TE eT E FLARE E : T eT effectErasederiv : TypeDerivationFLARE /E → TypeDerivationFLARE The notation TDeT D abbreviates (effectErasederiv TD). & ’e TD [does] = TDeT D TE FLARE /E E : T ! F TD & ’e TD 1 eT D . . . TD n eT D TD 1 . . . TD n = , for all other type derivations. TJ eT J TJ TD
Prove that effectErasederiv is a well-defined function. That is, if TD is a FLARE/E type derivation, then TDeT D is a legal FLARE type derivation according to the type rules of FLARE. Your proof should be by induction on the structure of a type derivation TD and by case analysis on the type rule used in the root node of the type derivation tree. The well-definedness of effectErasederiv proves that TE FLARE /E E : T ! F implies TE eT E FLARE E : T eT in Theorem 16.1. Exercise 16.6 This exercise sketches a proof of the forward direction of Theorem 16.1 (i.e., that any FLARE type derivation can be annotated with appropriate effect information to yield a FLARE/E type/effect derivation) and asks you to work out the details. A simple approach is to assume that all cells are allocated in a single region (call it δreg ), in which case the maximal effect is Fmax = (maxeff (init δreg ) (read δreg ) (write δreg ))
Then any FLARE type derivation can be transformed to a FLARE/E type/effect derivation by:
16.2.3 Reconstructing Types and Effects: Algorithm Z
959
• changing every FLARE cell type (cellof T ) to the FLARE/E cell type (cellof T δreg ); n • changing every nonprimitive FLARE arrow type (-> (T i=1 ) T0 ) to the FLARE/E n arrow type (-> (T i=1 ) Fmax T0 );
• using the [does] rule to inflate the effect of every procedure body to Fmax before the [→-intro] rule is applied; and • introducing and propagating effects as required by the FLARE/E analogues of the FLARE type rules. a. Based on the above sketch, formally define a transformation T D : TypeDerivationFLARE → TypeDerivationFLARE /E
that transforms a valid FLARE type derivation for a well-typed expression into a valid FLARE/E type/effect derivation for the same expression. b. Suppose that TD is a FLARE type derivation for the type judgment TE E : T . Then (T D TD) is a FLARE/E type/effect derivation for the type/effect judgment TE E : T ! F . Show that TE eT E = TE and T eT = T . This completes the proof of Theorem 16.1.
16.2.3
Reconstructing Types and Effects: Algorithm Z
Effect-Constraint Sets We can adapt the FLARE type reconstruction algorithm (Algorithm R) from Section 13.3 to reconstruct effects as well as types. Recall that R has the signature R : Exp → TypeEnvironment → (Type × TypeConstraintSet)
and is expressed via deductive-style rules involving judgments of the form R[[E ]] TE = T , TCS
Elements TCS ∈ TypeConstraintSet are abstract sets of type-equality constraints that are collected and solved by the algorithm. The extended algorithm, which we call Algorithm Z (Figures 16.4, 16.6, 16.8, and 16.9), has the signature Z : Exp → TypeEnvironment → (Type × TypeConstraintSet × Effect × EffectConstraintSet)
and is expressed via deductive-style rules involving judgments of the form Z[[E ]] TE = T , TCS , F , FCS
960
Chapter 16
Effects Describe Program Behavior
Domains FC ∈ EffectConstraint = DescId × Effect ; (>= δ F ) stands for an element δ, F of EffectConstraint FCS ∈ EffectConstraintSet = FC ∗ ; Define dom(FCS ) = {δ | (>= δ F ) ∈ FCS } σ ∈ DescSubst = DescId Desc us ∈ UnifySoln = DescSubst + Failure Other type and effect domains are defined in Figure 16.1. Functions solveFCS : EffectConstraintSet → DescSubst = λFCS . fixDescSubst λσ . λδ FCS . (σ (maxeff δ F ni=1 )(>= δ Fi )∈FCS ) where ⊥DescSubst = λδ FCS . pure and λδ FCS . dbody stands for λδ . if δ ∈ dom(FCS ) then dbody else undefined end solveTCS : TypeConstraintSet → UnifySoln is defined as in Figure 13.12 on page 790, where it is assumed that unify is modified in a straightforward way to handle the unification of listof types, pairof types, cellof types, and -> types with latent effects (which are guaranteed to be effect variables). A successful unification now results in an element of DescSubst rather than TypeSubst because nontype description variables are encountered in the unification of cellof types (in which region variables are unified) and -> types (in which effect variables are unified). Figure 16.4 algorithm.
Domains and functions for the FLARE/E type and effect reconstruction
In addition to returning the type T and type-constraint set TCS of an expression E relative to a type environment TE , Algorithm Z returns: 1. The effect F of the expression. 2. A collection FCS ∈ EffectConstraintSet of effect inequality constraints having the form (>= δ F ). Such a constraint means that the effect F denoted by the effect variable δ must be at least as large as F — i.e., F "e F . What are the effect inequality constraints for? As mentioned in the discussion of the [does] rule beginning on page 954, the most challenging problem encountered when constructing a type/effect derivation in FLARE/E is guaranteeing that the latent effects of procedure types are identical in all situations where the rules require that two procedure types be the same. The purpose of the effect-
16.2.3 Reconstructing Types and Effects: Algorithm Z
961
constraint sets generated by Algorithm Z is to solve this problem. An effect constraint (>= δlat Fbody ) is generated by Algorithm Z when a derivation in the implicit type/effect system would use the [does] rule to inflate the effect of a procedure body from Fbody to δlat = Fbody in conjunction with an application of the [→-intro] rule: n
TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody TE [Ii : Ti ]i=1 Ebody : Tbody ! Fbody n
[does]
n ) Ebody ) : (-> (T ni=1 ) Fbody Tbody ) ! pure TE (abs (I i=1
[→-intro]
The extent to which Fbody needs to be inflated by the [does] rule depends on how the procedure type introduced by the [→-intro] rule flows through the rest of the type/effect derivation and is compared to other procedure types. Algorithm Z addresses this problem by introducing the description variable δlat to stand for Fbody and by generating an effect inequality constraint (>= δlat Fbody ) that must later be solved. The type and effect reconstruction system handles the above derivation pattern by a single application of the [→-introZ ] rule: n
Z[[Ebody ]] TE [Ii : δi ]i=1 = Tbody , TCS body , Fbody , FCS body n ) Ebody )]] TE = (-> (δ ni=1 ) δlat Tbody ), TCS body , Z[[(abs (I i=1 pure, (>= δlat Fbody ) . FCS body
[→-introZ ]
This is like the [→-introR ] type reconstruction rule for FLARE except that: (1) it introduces the description variables δ ni=1 for the parameters instead of type variables; (2) it specifies that abstractions have a pure effect; and (3) it adds the effect constraint (>= δlat Fbody ) to whatever effect constraints were generated in reconstructing the type and effect of Ebody . For a reason explained later, an effect-constraint set is concretely represented as a sequence of effect constraints. So (FC . FCS ) is the result of inserting the effect constraint FC into the effect-constraint set FCS , FCS 1 @ FCS 2 is the union of effect-constraint sets FCS 1 and FCS 2 , and @ni=1 FCS i is the union of the n effect-constraint sets FCS 1 , . . . , FCS n . We still use the set notation FC ∈ FCS to indicate that effect constraint FC is an element of the effect-constraint set FCS . We define dom(FCS ) as the set of effect variables δ appearing in constraints of the form (>= δ F ) within FCS . A solution to an effect-constraint set FCS is a substitution σ ∈ DescSubst = DescId Desc such that dom(σ) = dom(FCS ) and (σ δ) "e (σ F ) for every effect constraint (>= δ F ) in FCS . Although the formal signature of a solution
962
Chapter 16
Fi = (init r7) Fr = (read r7) Fw = (write r7) Fmax = (maxeff Fi Fr Fw ) FCS ex = [(>= δ1 pure), (>= δ2 (maxeff δ1 (>= δ3 (maxeff δ2 (>= δ4 (maxeff δ2 (>= δ4 δ5 ), (>= δ5 (maxeff δ3 Figure 16.5
Effects Describe Program Behavior
Fi )), Fr )), Fw )), δ4 ))]
FCS ex is an example of an effect-constraint set.
substitution is DescId Desc, the signature is really DescId Effect since all the description variables being solved denote effects.5 There are infinitely many solutions for any effect-constraint set. For example, consider the effect-constraint set FCS ex in Figure 16.5. Below are four solutions to FCS ex : σ σex1 σex2 σex3 σex4
(σ δ1 ) pure Fi Fr Fmax
(σ δ2 ) Fi Fi (maxeff Fi Fr ) Fmax
(σ δ3 ) (maxeff Fi Fr ) (maxeff Fi Fr ) (maxeff Fi Fr ) Fmax
(σ δ4 ) Fmax Fmax Fmax Fmax
(σ δ5 ) Fmax Fmax Fmax Fmax
There are also infinitely many solutions of the form σexF parameterized by F ∈ Effect that map every effect variable in FCS ex to (maxeff F Fmax ). Since the Effect domain is a pointed CPO (see Sections 5.2.2 and 5.2.3) under the e ordering, the domain DescId Effect is also a pointed CPO, and so there is a well-defined notion of a least solution to an effect-constraint set. The structure of the CPO and the existence of a least solution depend critically on the ACUI nature of effect combination via maxeff. The iterative approach to finding least fixed points from Section 5.2.5 can be used to calculate the least solution to an effect-constraint set FCS . We start with an approximation σ0 that maps each effect variable δ in dom(FCS ) to pure. For each step j, we define a better approximation σj that maps each δ in dom(FCS ) to an effect that combines (σj −1 δ) with (σj −1 Fi ) for each F such that (>= δ F ) is in FCS . (σj δ) is guaranteed to be at least as big as (σj −1 δ), and so in this 5
An effect constraint also typically contains description variables denoting regions, but these will not be in dom(σ) for a solution substitution σ.
16.2.3 Reconstructing Types and Effects: Algorithm Z
963
sense is a “better” approximation. Since there are finitely many effect variables δ ∈ dom(FCS ) and since (σj δ) always denotes some combination of the finite number of base effects mentioned in FCS , the iteration is guaranteed to converge to a fixed point in a finite number of steps. For example, this process finds the least solution to FCS ex in three steps (the fourth step verifies that σ3 is a solution): j 0 1 2 3 4
(σj δ1 ) pure pure pure pure pure
(σj δ2 ) pure Fi Fi Fi Fi
(σj δ3 ) pure Fr (maxeff Fi Fr ) (maxeff Fi Fr ) (maxeff Fi Fr )
(σj δ4 ) pure Fw (maxeff Fi Fw ) Fmax Fmax
(σj δ5 ) pure pure (maxeff Fr Fw ) Fmax Fmax
The definition of the solveFCS function in Figure 16.4 formalizes this strategy for finding the least solution to an effect-constraint set. Let λδ FCS . effect-expression
stand for a partial function that denotes the value of effect-expression when δ ∈ dom(FCS ) and is otherwise undefined. The least solution of an effect-constraint set FCS is the least fixed point of a series of solutions starting with the bottom solution ⊥DescSubst = λδ FCS . pure. An approximate solution σ is transformed to a better solution σ by mapping each effect variable δ ∈ dom(FCS ) to (σ (maxeff δ F1 . . . Fn )), where {F1 , . . . , Fn } is the set of all effects Fi appearing in constraints of the form (>= δ Fi ) in FCS . Since (σ δ) = (σ (maxeff δ F1 . . . Fn )) = (maxeff (σ δ) (σ F1 ) . . . (σ Fn ))
clearly (σ δ) e (σ δ) for each δ mentioned in FCS , so the transformation is monotonic. By the argument given above, each chain of solutions is finite, so monotonicity is sufficient to guarantee the existence of a least solution for fixDescSubst . Simple Type/Effect Reconstruction Rules The Algorithm Z type/effect reconstruction rules for most expressions are presented in Figure 16.6. The rules for literals, errors, and nongeneric variable references are similar to the FLARE type reconstruction rules, but additionally specify a pure effect and an empty effect-constraint set. The [→-introZ ] rule has already been discussed above. The [ifZ ], [→-elimZ ], [primZ ] rules are similar to
964
Chapter 16
Effects Describe Program Behavior
Function Signature for Type/Effect Reconstruction of Expressions Z : Exp → TypeEnvironment → (Type × TypeConstraintSet × Effect × EffectConstraintSet) Type/Effect Reconstruction Rules for Expressions Z[[#u]] TE = unit, {}TCS , pure, [ ]FC [boolR ], [intR ], and [symbR ] are similar Z[[(error Y )]] TE = δ, {}TCS , pure, [ ]FC where δ is fresh. Z[[I ]] TE = T , {}TCS , pure, [ ]FC where TE (I ) = T Z[[I ]] TE = unit, failTCS , pure, [ ]FC where I ∈ dom(TE )
Z[[(if E1 E2
[unitZ ] [errorZ ] [varZ ] [var-failZ ]
∀3i=1 . (Z[[Ei ]] TE = Ti , TCS
i , Fi , FCS i ) . . 3 [ifZ ] E3 )]] TE = T2 , i=1 TCS i {T1 = bool, T2 = T3 }TCS , 3 3 (maxeff F i=1 ), @i=1 FCS i n
Z[[Ebody ]] TE [Ii : δi ]i=1 = Tbody , TCS body , Fbody , FCS body n Z[[(abs (I i=1 ) Ebody )]] TE = (-> (δ ni=1 ) δlat Tbody ), TCS body , pure, (>= δlat Fbody ) . FCS body where δ ni=1 and δlatent are fresh ∀ni=0 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) Z[[(E0 E ni=1 )]] TE . n n ) δlat δres )}TCS , = δres , ( i=0 TCS i ) {T0 = (-> (T i=1 n n (maxeff F i=0 δlat ), @i=0 FCS i where δlat and δres are fresh Z[[Oop ]] TE prim = Top , TCS 0 , pure, [ ]FC ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) Z[[(prim Oop E ni=1 )]] TE . n n ) δlat δres )}TCS , = δres , ( i=0 TCS i ) {Top = (-> (T i=1 n n (maxeff F i=1 δlat ), @i=1 FCS i where δlat and δres are fresh
[→-introZ ]
[→-elimZ ]
[primZ ]
Figure 16.6 The FLARE/E type/effect reconstruction algorithm for simple expressions expressed via deduction rules. For let polymorphism see Figure 16.8.
their FLARE type reconstruction counterparts except that they (1) combine the effects of all subexpressions (and the latent effect of the applied procedure in the case of [→-elimZ ] and [primZ ]) and (2) they combine the effect-constraint sets of all subexpressions.
16.2.3 Reconstructing Types and Effects: Algorithm Z
965
Z[[b]] TE = bool, {}TCS , pure, [ ]FC [varZ ] .. . Z[[Eread ]] TE = Tread , TCS read , (read δrreg ), FCS read Z[[(abs () Eread )]] TE = (-> () δreff Tread ), TCS read , pure, (>= δreff (read δrreg )) . FCS read [→-introZ ] .. . Z[[Ewrite ]] TE = Twrite , TCS write , (write δwreg ), FCS write Z[[(abs () Ewrite )]] TE = (-> () δweff Twrite ), TCS write , pure, (>= δweff (write δwreg )) . FCS write [→-introZ ] Z[[(if b (abs () Eread ) (abs () Ewrite ))]] TE = (-> () δread Tread ), TCS read TCS write . . {bool = bool, (-> () δreff Tread ) = (-> () δweff Twrite )}TCS , pure, ((>= δreff (read δrreg )) . FCS read ) @ ((>= δweff (write δwreg )) . FCS write ) [ifZ ] Figure 16.7 Type/effect reconstruction corresponding to the type/effect derivation of Eifproc on page 955.
As an example, Figure 16.7 shows the fragment of the type/effect derivation for Eifproc from page 955 expressed in the reconstruction system. Distinct effect variables δreff and δweff are introduced as the latent effects for (abs () Eread ) and (abs () Ewrite ), respectively, but these are forced to be the same by the type constraint . (-> () δreff Tread ) = (-> () δweff Twrite )
We assume that the unification algorithm used by the type-constraint set solver solveTCS is extended to unify the latent effects of two procedure types and the regions of two cellof types. The extension is straightforward, because both of these are guaranteed to be description variables: all procedure types generated by the [→-introZ ] rule have latent effects that are description variables and all regions are description variables. Modifying the algorithm to unify arbitrary effects would be significantly more complicated, because the algorithm would need to generate a set of effect constraints in addition to a solution substitution (see [JG91] for details).
966
Chapter 16
Effects Describe Program Behavior
Algebraic Type Schemas for Let Polymorphism A key difference between Algorithm Z and Algorithm R is how let polymorphism is handled. Recall that Algorithm R uses type schemas of the form (generic (τ ni=1 ) T ) to permit a type identifier to be instantiated to different types in different contexts. For example, the identity function (abs (x) x) can be used on any type of input. When it is let-bound to an identifier, it has the type schema (generic (?t) (-> (?t) ?t)). The job of a type schema is to describe all of the possible types of an identifier by determining type variables that can be generalized. In the implicit type/effect system of FLARE/E, type schemas were elaborated with effect and region variables. Reconstructing effects and regions requires us to extend type schemas further to carry along a set of constraints on the effects and regions they describe. In Algorithm Z, generic type schemas (Figure 16.8) are modified to have the form (generic (δ ∗ ) T (FCS )), where FCS contains effect constraints that may involve the effect and region variables in δ ∗ . We call a type schema that includes an effect-constraint set an algebraic type schema [JG91]. The fact that effect-constraint sets appear within algebraic type schemas that have an s-expression representation is the reason that we have chosen to represent effect constraints using the s-expression notation (>= δ F ) and to represent effect-constraint sets as sequences of such constraints. As a simple example, consider the algebraic type schema that the primitive type environment TE prim assigns to the cell assignment operation (:=): (generic (?t ?e ?r) (-> ((cellof ?t ?r)) ?e unit) {type} ((>= ?e (write ?r))) {effect constraints}
This type schema has three parts: (1) the description variables (?t ?e ?r) describe the type, effect, and region variables that can be generalized in the type schema; (2) the procedure type (-> ((cellof ?t ?r)) ?e unit) describes the cell assignment operation and notes that its application has effect ?e; and (3) the effect-constraint set ((>= ?e (write ?r))) describes the constraints on the effect variable ?e. In this case, the assignment operation can have any effect as long as it is larger than (write ?r), where ?r specifies the region in which the cell is allocated. A swap procedure that swaps the contents of two cells would have the following algebraic type schema: TS swap = (generic (?t ?e ?r1 ?r2) (-> ((cellof ?t ?r1) (cellof ?t ?r2)) ?e unit) ((>= ?e (maxeff (read ?r1) (write ?r1) (read ?r2) (write ?r2)))))
16.2.3 Reconstructing Types and Effects: Algorithm Z
967
Domains ATS ∈ AlgebraicTypeSchema ::= T | (generic (δ ∗ ) T (FCS )) TE ∈ TypeEnvironment = Ident AlgebraicTypeSchema Type Functions zgen : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet AlgebraicTypeSchema (zgen ⎧ T TE TCS FCS ) ⎪(generic (δ1 . . . δn ) (σ T ) ((σ FCS ))), ⎪ ⎪ ⎪ ⎪ if solveTCS TCS = (TypeSubst UnifySoln σ) ⎨ = and {δ1 , . . . , δn } = (FrDescIds ty [[(σ T )]] ∪ FrDescIds FCS [[(σ FCS )]]) ⎪ ⎪ ⎪ − (FrDescIds tyenv (σ TE )) ⎪ ⎪ ⎩ undefined, otherwise zgenPureSP : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet → Exp AlgebraicTypeSchema (zgenPure SP Tdefn TE TCS FCS E ) (zgen Tdefn TE TCS FCS ) if pure E (defined in Figure 13.25 on page 816) = otherwise Tdefn Type/Effect Reconstruction Rules Z[[I ]] TE = ([δi /δi ]ni=1 )Tbody , {}TCS , pure, ([δi /δi ]ni=1 )FCS body where TE (I ) = (generic (δ ni=1 ) Tbody (FCS body )) δ ni=1 are fresh ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) n Z[[E0 ]] TE [Ii : (zgenPureSP Ti TE TCS i FCS i Ei )]i=1 = T0 , TCS 0 , F0 , FCS 0 Z[[(let ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeffF ni=0 ), @ni=0 FCS i n where TCS defns = i=1 TCS i
n ∀ni=1 . Z[[Ei ]] TE [Ij : δj ]j=1 = Ti , TCS i , Fi , FCS i n Z[[E0 ]] TE [Ii : (zgenPureSP Ti TE TCS defns FCS defns Ei )]i=1 = T0 , TCS 0 , F0 , FCS 0 Z[[(letrec ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeff F ni=0 ), @ni=0 FCS i where δ ni=1 are fresh n . n TCS defns = ( i=1 TCS i ) ( i=1 {δi = Ti }TCS ) n FCS defns = @i=1 FCS i Figure 16.8
[genvarZ ]
[letSPZ ]
[letrecSPZ ]
FLARE/E type/effect reconstruction for let polymorphism.
968
Chapter 16
Effects Describe Program Behavior
The effect-constraint set in this algebraic type schema constrains the latent effect of the swap procedure type to include read and write effects for the regions of both cells. As in FLARE type reconstruction, the type/effect reconstruction system introduces type schemas in the rules for let and letrec expressions. Algebraic type schemas are created by the zgenPureSP and zgen functions defined in Figure 16.8. These are similar to the rgenPure and rgen functions used in FLARE type reconstruction (as defined in Figure 13.26 on page 818), except that: • zgenPureSP takes an additional argument, an effect-constraint set, and returns an algebraic type schema rather than a regular type schema. The purity of the expression argument determines whether generalization takes place, and the effect-constraint set is passed along to zgen. zgenPureSP employs the same syntactic purity test used in Algorithm R and the FLARE/E type/effect system; see Section 16.2.5 for a discussion of an alternative purity test. • zgen takes one additional argument, an effect-constraint set, which it incorporates into the algebraic type schema. Note that the schema generalizes over free description variables in the effect constraints (as well as in the type) that are not mentioned in the type environment. As in rgen, the type constraints in zgen must be solved before generalization can take place. Why not solve the effect constraints as well? Because effect constraints involve inequalities rather than equalities, they can only be solved globally, not locally. E.g., knowing that ?e is larger than (read r1) and (write r2) does not allow us to conclude that ?e = (maxeff (read r1) (write r2)), since there may be other constraints on ?e elsewhere in the program that force it to encompass more effects. So the solution of effect inequality constraints must be delayed until all effect constraints in the whole program have been collected (see the [progZ ] rule in Figure 16.9, which is discussed later). In contrast, type equality constraints can be solved eagerly as well as lazily. This is why algebraic type schemas must carry effect constraints but not type constraints. When the type environment assigns an algebraic type schema to a variable, the [genvarZ ] rule instantiates the parameters of the type schema with fresh description variables. Because different description variables are chosen for different occurences of the variable, the definitions associated with the variables may be used polymorphically. Although the variable reference itself is pure, its effectconstraint set includes instantiated versions of the algebraic type schema’s effect constraints. For example, suppose that cell x is an integer cell allocated in region
16.2.3 Reconstructing Types and Effects: Algorithm Z
969
Domains for Type/Effect Reconstruction of Programs RA ∈ ReconAns = Type + Failure Failure = {fail} Function Signature for Type/Effect Reconstruction of Programs Z pgm : Prog → ReconAns Type/Effect Reconstruction Rules for Programs n Z[[E ]] {Ii : δi }i=1 = T , TCS , F , FCS [progZ ] Z pgm [[(flarek (I1 . . . In ) E )]] = RApgm where δ ni=1 are fresh ⎧ (ProgType ReconAns (σFCS (σTCS (=> (δ ni=1 ) F T )))), ⎪ ⎪ ⎪ ⎨ if solve TCS TCS = (TypeSubst UnifySoln σTCS ) RApgm = ⎪ and solve FCS (σTCS FCS ) = σFCS ⎪ ⎪ ⎩ (Failure ReconAns fail), otherwise Figure 16.9 The FLARE/E type/effect reconstruction algorithm for programs expressed via a deduction rule.
rx and and cell y is an integer cell allocated in region ry. Then the reconstruction of (swap x y) would yield an effect-constraint set equivalent to ((>= e1 (maxeff (read rx) (write rx) (read ry) (write ry))))
(where e1 is the fresh description variable substituted for ?e). The reconstruction of (swap x x) would yield an effect-constraint set equivalent to ((>= e2 (maxeff (read rx) (write rx))))
Instantiating the effect-constraint set of the algebraic type schema in the [genvar] rule is consistent with the view that referencing an expression variable bound by a let or letrec to a pure expression E is equivalent to replacing the variable reference by E . Reconstructing Programs At the level of a whole program, the [progZ ] rule in Figure 16.9 models the result of successful type/effect reconstruction as a program type whose • parameter types are the types of the program parameters; • result type is the type of the program body; and • latent effect is the effect of the program body.
970
Chapter 16
Effects Describe Program Behavior
This program type is constructed via (σFCS (σTCS (=> (δ ni=1 ) Fbody Tbody )))
where • δ ni=1 are the description variables generated for the types of the program parameters. • Fbody is the effect reconstructed for the program body, Ebody . • Tbody is the type reconstructed for Ebody . • σTCS is the description substitution that is the solution of the type constraints TCS body collected in the reconstruction of Ebody . • σFCS is the description substitution solveFCS (σTCS FCS body ) that is the global solution of all the effect constraints FCS body collected in the reconstruction of Ebody . Before solveFCS is called, the substitution σTCS must be applied to each constraint in FCS body to incorporate information gleaned from unifying latent effect variables in procedure types and region variables in cell types. For a similar reason, σFCS is applied to the program type after σTCS has been applied. Note that the application of σFCS resolves effect variables not only in Fbody but also in the latent effects of any procedure types that occur in Tbody . Algorithm Z has the Power of FLARE/E Algorithm Z succeeds if the type constraints and effect constraints collected for the program body are solvable. We have seen from the discussion of solveFCS on page 963 that the effect constraints are always solvable, so reconstruction succeeds if and only if the type constraints are solvable. The following theorems say that Algorithm Z is sound and complete for the FLARE/E implicit type/effect system in Figure 16.3 on page 953: Theorem 16.2 (Soundness of Algorithm Z) Suppose Z[[E ]] TE = T , TCS , F , FCS . If σTCS is any solution of TCS , σFCS is any solution of (σTCS FCS ), and σ = σFCS ◦ σTCS , then (σ TE ) E : (σ T ) ! (σ F ). Theorem 16.3 (Completeness of Algorithm Z) If TE E : T ! F then Z[[E ]] TE = T , TCS , F , FCS where there are solutions σTCS of TCS and σFCS of (σTCS FCS ) such that ((σFCS ◦ σTCS ) T ) = T and ((σFCS ◦ σTCS ) F ) = F .
16.2.3 Reconstructing Types and Effects: Algorithm Z
971
In both of these theorems, σTCS may be less general than the most general unifier calculated by (solveTCS TCS ) and σFCS may be greater than the least solution calculated by (solveFCS (σTCS FCS )). In both theorems, it is necessary to apply the composition σFCS ◦ σTCS to both types and effects rather than just applying σTCS to types and σFCS to effects. For types, σFCS may be needed to resolve latent effect variables in procedure types that were determined when solving the effect constraints in FCS . For effects, σTCS may be needed to resolve effect and region variables that were unified as part of solving the type constraints in TCS . Together, the soundness and completeness theorems for Algorithm Z imply a principality result similar to the one shown for Algorithm R: Any type that can be assigned to an expression in FLARE/E is a substitution instance of the type found by Algorithm Z, and any effect that can be assigned to an expression in FLARE/E is a substitution instance of the effect Algorithm Z. Since FLARE expressions are typable in the FLARE/E implicit type/effect system if and only if they are typable in the FLARE implicit type system (by Theorem 16.1 on page 956) and expressions are typable in the FLARE implicit type system if and only if their type can be reconstructed by Algorithm R (by Theorems 13.7 and 13.8 on page 799), a consequence of Theorems 16.2 and 16.3 is that Algorithm Z and Algorithm R succeed on exactly the same set of FLARE expressions and programs. Exercise 16.7 a. Write a FLARE/E abstraction Eswapabs that swaps the contents of its two cell arguments. b. Show the derivation for (Z[[Eswapabs ]] {}), the type/effect reconstruction of Eswapabs in the empty type environment. c. Use zgenPure to create an algebraic type schema for Eswapabs , supplying the type and effect information from part b as arguments. d. Is your algebraic type schema from part c equivalent to TS swap defined on page 966? Explain any discrepancies. Exercise 16.8 Construct an Algorithm Z type/effect derivation for the following program: (flarek (a b) (let ((mapcell (abs (c f) (let ((v (prim ^ c))) (let ((_ (prim := c (f v)))) v))))) (mapcell a (abs (x) (mapcell b (abs (y) x))))))
972
Chapter 16
Effects Describe Program Behavior
Exercise 16.9 Give type/effect reconstruction derivations for the programs in part a and part b of Exercise 16.1 on page 956. Exercise 16.10 Write a FLARE/E program whose type/effect reconstruction uses an algebraic type schema whose effect-constraint set has more than one constraint. Show this by giving the type/effect reconstruction derivation for your program. Exercise 16.11 Modify the FLARE/E implicit type/effect system and Algorithm Z to handle mutable variables (via the set! construct). Begin by studying Exercise 13.11 on page 820 to get a sense for the issues involved in this extension. In particular, references to variables modified by set! are no longer pure — they have a read effect! Your system should distinguish variables modified by set! from those that are not; references to the latter can still be considered pure.
16.2.4
Effect Masking Hides Unobservable Effects
We now explore some variations on the FLARE/E type/effect system. The first involves effect masking, which allows effects to be deleted from an expression when they cannot be observed from outside of the expression. For example, consider the following procedure, which sums the elements of a list of integers: Esumabs = (abs (ints) (let ((sum (cell 0))) (letrec ((loop (abs (ns) (if (null? ns) (^ sum) (begin (:= sum (+ (^ sum) (car ns))) (loop (cdr ns))))))) (loop ints))))
Suppose that the cell named sum is in region rs. According to the effect rules we have studied thus far, the latent effect of the type for this procedure is (maxeff (init rs) (read rs) (write rs))
Intuitively, however, the sum cell is completely internal to the summation procedure and cannot be observed outside the procedure. There is no experiment that a client can perform to determine whether or not the summation procedure uses cells in its implementation. We can use the type/effect system to prove that the effects within the summation procedure are unobservable outside the procedure. We do this by showing that no cell in region rs can be referenced outside the let expression that is the body of the procedure. Region rs does not appear in the type (int) of the
16.2.4 Effect Masking Hides Unobservable Effects TE E : T ! F TE E : T ! F
where F e F ∀BF ∈ (F[[F ]] − F[[F ]]) . (∀δ ∈ FrDescIds eff [[BF ]] . ((δ ∈ FrDescIds ty [[T ]]) ∧ (∀I ∈ FrIds[[E ]] . (δ ∈ FrDescIds ty [[TE (I )]]))))) Figure 16.10
973
[effect-masking]
[export restriction] [import restriction]
An effect-masking rule for FLARE/E.
procedure body, nor does it appear in the type environment in the types of the free variables used in the procedure body (ints, cell, ^, :=, +, null?, car, cdr). This shows that region rs is inaccessible outside the procedure body, and so cannot be observed by any client of the procedure. We can add effect masking to FLARE/E by extending the type/effect rules with the [effect-masking] rule in Figure 16.10. This rule says that any base effect BF can be deleted from the effect of an expression E as long as it is purely local to E — i.e., it cannot be observed elsewhere in the program. BF is local to E if no effect and region variable δ appearing in it is mentioned in the type of any free variable used by E (the import restriction) or can escape to the rest of the program in the type of E (the export restriction). In some sense, the [effectmasking] rule is the opposite of the [does] rule, since it allows deflating the effect of an expression as opposed to inflating it. In the case of the list summation procedure, the [effect-masking] rule formalizes our above reasoning about effect observability. It allows the let expression to be assigned the pure effect, making the latent effect of the procedure type for Esumabs pure as well. Note that the [effect-masking] rule does not allow any effects to be deleted from the letrec expression in the list summation procedure. Although rs does not appear in the type (int) of this expression, it does appear in the type (cellof int rs) of the free variable sum used in the expression. Effect masking is an important tool for encapsulation. The [effect-masking] rule can detect that certain expressions, while internally impure, are in fact externally pure. It thus permits impure expressions to be included in otherwise stateless functional programs; expressions can take advantage of local side effects for efficiency without losing their referential transparency. As we will see in Section 16.3.1, it also allows effects that denote control transfers to be masked, indicating that an expression may perform internal control transfers that are not observable outside of the expression.
974
Chapter 16
Effects Describe Program Behavior
If the [effect-masking] rule is so important, why didn’t we include it as a rule in the FLARE/E type/effect system presented in Figure 16.3 on page 953? The reason is that it complicates the story of type reconstruction. The effects computed by the solveFCS function in Algorithm Z are the least effects for the type/effect system presented in Figure 16.3. But they are no longer the least effects when the [effect-masking] rule is added, since this rule allows even smaller effects. For example, Algorithm Z would determine that the effect of the let expression that is the body of Esumabs includes init, read, and write effects for the region rs in which the sum cell is allocated, but we have seen that these can be eliminated by the [effect-masking] rule. Exercise 16.12 Consider the following FLARE expression: (abs (a) (let ((b (cell 1))) (snd (let ((_ (:= a (^ b))) (c (cell 2)) (d (cell 3))) (let ((_ (:= c (^ e)))) {e is a free variable} (pair c d))))))
a. Construct a FLARE/E type/effect derivation for this expression that does not use the [effect-masking] rule. Assume that each cell is allocated in a separate region. b. Construct a FLARE/E type/effect derivation for this expression that uses the [effectmasking] rule to find the smallest allowable effect for each subexpression.
16.2.5
Effect-based Purity for Generalization
It may be surprising that the egenPureSP function for type generalization in the FLARE/E type rules (Figure 16.2 on page 952) determines expression purity using a syntactic test rather than using the effect system itself. Here we explore an alternative type/effect system FLARE/EEP that determines purity via an effect-based test rather than a syntactic test. (The EP subscript stands for “effect purity.”) The key difference between FLARE/EEP and FLARE/E is the new function egenPureEP in Figure 16.11. Like egenPureSP , egenPureEP uses its third argument to determine the purity (and thus the generalizability) of an expression. However, egenPureEP ’s third argument is an effect determined from the effect system, whereas egenPureSP ’s is an expression whose effect is determined by a separate syntactic deduction system. The [letEP ] and [letrecEP ] rules employ effect-based purity by passing appropriate effects to egenPureEP .
16.2.5 Effect-based Purity for Generalization
975
New Type Function egenPureEP : Type → TypeEnvironment → Effect → TypeSchema (egen Tdefn TE ) if F ≈e pure (egenPureEP Tdefn TE F ) = otherwise Tdefn Modified Type/Effect Rules ∀ni=1 . (TE Ei : Ti ! Fi ) n TE [Ii : (egenPureEP Ti TE Fi )]i=1 E0 : T0 ! F0 TE (let ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 )
n ∀ni=1 . TE [Ij : Tj ]j=1 Ei : Ti ! Fi n TE [Ii : (egenPureEP Ti TE Fi )]i=1 E0 : T0 ! F0 TE (letrec ((Ii Ei )ni=1 ) E0 ) : T0 ! (maxeff F ni=0 )
[letEP ]
[letrecEP ]
Figure 16.11 Modified type/effect rules for FLARE/EEP , a system that uses the effect system itself rather than syntactic tests to determine purity.
The FLARE/EEP type system is more powerful than the FLARE and FLARE/E systems: every expression typable in FLARE and FLARE/E is typable in FLARE/EEP , but there are expressions typable in FLARE/EEP that are not typable in FLARE or FLARE/E. Consider the expression: EcurriedPair = (let ((cp (abs (x) (abs (y) (prim pair x y))))) (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t))))
This expression is not well typed in FLARE. According to the syntactic definition of purity in Figure 13.25 on page 816, the application (cp 1) is considered impure, so the type of cp1 cannot be generalized to a polymorphic type and must be a monomorphic type of the form (-> (Ty ) (pairof int Ty )). Since (cp1 #u) requires Ty to be unit and (cp1 #t) requires Ty to be bool, no FLARE typing is possible. Similar reasoning shows that EcurriedPair is not well typed in FLARE/E. In contrast, EcurriedPair is well typed in FLARE/EEP , as shown by the type/effect derivation in Figure 16.12. The key difference is that the effect system can deduce that the application (cp 1) is pure, and this allows the type of cp1 to be generalized in FLARE/EEP . The extra typing power of FLARE/EEP derives from using the more precise purity test of the effect system itself in place of the crude syntactic purity test used in FLARE and FLARE/E.
976
Chapter 16
Effects Describe Program Behavior
Abbreviations EcurriedPair = (let ((cp Eabs1 )) (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t)))) Eabs1 = (abs (x) Eabs2 ) Eabs2 = (abs (y) (prim pair x y)) TE 1 = {cp : (generic (?x ?y) (-> (?x) pure (-> (?y) pure (pairof ?x ?y))))} TE 2 = TE 1 [cp1 : (generic (?y) (-> (?y) pure (pairof int ?y)))] Tiu = (pairof int unit) Tib = (pairof int bool) Type/Effect Derivation TE prim pair : (-> (?x ?y) pure (pairof ?x ?y)) ! pure [genvar] {x : ?x, y : ?y} x : ?x ! pure [var] {x : ?x, y : ?y} y : ?y ! pure [var] {x : ?x, y : ?y} (prim pair x y) : (pairof ?x ?y) ! pure [prim] {x : ?x} Eabs2 : (-> (?y) pure (pairof ?x ?y)) ! pure [→-intro] {} Eabs1 : (-> (?x) pure (-> (?y) pure (pairof ?x ?y))) ! pure [→-intro] TE 1 cp : (-> (int) pure [genvar] (-> (?y) pure (pairof int ?y))) ! pure TE 1 1 : int ! pure [int] TE 1 (cp 1) : (-> (?y) pure (pairof int ?y)) ! pure [→-elim] TE prim pair : (-> (Tiu Tib ) pure (pairof Tiu Tib )) ! pure [genvar] TE 2 cp1 : (-> (unit) pure (pairof int unit)) ! pure [genvar] TE 2 #u : unit ! pure [unit] TE 2 (cp1 #u) : (pairof int unit) ! pure [→-elim] TE 2 cp1 : (-> (bool) pure (pairof int bool)) ! pure [genvar] TE 2 #t : bool ! pure [bool] TE 2 (cp1 #t) : (pairof int bool) ! pure [→-elim] TE 2 (prim pair (cp1 #u) (cp1 #t)) : (pairof Tiu Tib ) ! pure [prim] [let] TE 1 (let ((cp1 (cp 1))) (prim pair (cp1 #u) (cp1 #t))) : (pairof Tiu Tib ) ! pure {} EcurriedPair : (pairof Tiu Tib ) ! pure [let] Figure 16.12
Type/effect derivation for EcurriedPair in FLARE/EEP .
16.2.5 Effect-based Purity for Generalization
977
Given the apparent advantages of effect-based purity over syntactic purity, why did we adopt syntactic purity as the default in the FLARE/E type/effect system? The reason is that effect-based purity greatly complicates type reconstruction. With syntactic purity, the decision to generalize types in the [letSPZ ] and [letrecSPZ ] type reconstruction rules is independent of solving the effect constraints collected during reconstruction. With effect-based purity, type generalization may depend on the result of solving effect constraints. This introduces a fundamental dependency problem: the decision to generalize must be made when processing let and letrec expressions, but the effect constraints cannot be solved until the whole program body has been processed. One way to address this dependency problem is via backtracking (see Exercise 16.16). Exercise 16.13 Show that the FLARE/EEP type/effect system can be made even more powerful by extending it with the [effect-masking] rule in Figure 16.10 on page 973. That is, give an expression that is typable in FLARE/EEP + [effect-masking] that is not typable in FLARE/EEP . Exercise 16.14 Thai Ping suggests the following subtyping rule for FLARE/E procedure types: ∀n Tbody Tbody F e F i=1 . (Ti Ti ) n n (-> (T 1=1 ) F Tbody ) (-> (T i=1 ) F Tbody )
[→-]
a. Suppose that the FLARE/EEP type system were extended with Thai’s rule as well as with a version of the [inclusion] type rule in Figure 12.1 on page 703. Give an example of an expression that is well typed in the extended system that is not well typed in the original one. b. Suppose that the FLARE/E type system were extended with Thai’s rule as well as the [inclusion] type rule. Are there any expressions that are well typed in the extended system but not well typed in the original one? Either give such an expression or show that the two systems are equivalent in terms of typing power. Exercise 16.15 Bud Lojack thinks that a small modification to Algorithm Z can make it sound and complete for FLARE/EEP , the version of the FLARE/E type system using effect-based purity. He modifies the reconstruction rules for let and letrec to use a new zgenPureEP function that performs an effect-based purity test (Figure 16.13). Excitedly, Bud shows his modifications to Thai Ping. But Thai bursts Bud’s bubble when he observes, “Your modified rules are just another way of reconstructing types and effects for FLARE/E, not for FLARE/EEP . The problem is that the purity test in zgenPureEP involves effect expressions containing effect variables that may eventually be shown to be pure but are conservatively assumed to be impure when the purity test is performed.”
978
Chapter 16
Effects Describe Program Behavior
Show that Thai is right by fleshing out the following two steps, which show that replacing the [letSPZ ]/[letrecSPZ ] rules by the [letEPZ ]/[letrecEPZ ] rules does not change which expressions can be reconstructed by Algorithm Z. a. Prove the following lemma: Lemma 16.4 In Bud’s modified Algorithm Z, suppose that Z[[E ]] TE = T , TCS , F , FCS and (solveTCS TCS ) = (TypeSubst UnifySoln σTCS ). Then (σTCS F ) ≈e pure if and only if (pure E ) according to the deduction system for pure defined in Figure 13.25 on page 816. Hint: What is the form of every latent effect in a procedure type generated by Algorithm Z? What does this imply about the purity of procedure applications? b. Using Lemma 16.4, show that in any type/effect derivation from Bud’s modified Algorithm Z, any instances of the [letEPZ ] and [letrecEPZ ] rules can be replaced by the [letSPZ ] and [letrecSPZ ] rules without changing the validity of the derivation. Exercise 16.16 Bud Lojack’s version of Algorithm Z (see Exercise 16.15) fails to reconstruct the types and effects of some expressions that are well typed in the FLARE/EEP type/effect system because it doesn’t “know” the purity of certain effect variables that eventually turn out to be pure. This drawback can be addressed by aggressively assuming that all effect variables are pure unless there is evidence otherwise, and backtracking in any case where the assumption is later proven to be false. a. Design and implement a backtracking version of Bud’s modified Algorithm Z based on this idea. b. Show that your modified version of Algorithm Z can successfully reconstruct the type and effect of the expression EcurriedPair defined on page 975.
16.3
Using Effects to Analyze Program Behavior
Thus far we have considered a system for calculating only store effects. Store effects are especially useful for guiding compiler optimizations like parallelization, common subexpression elimination, dead code elimination, and code hoisting (see Section 17.6). We now explore other kinds of effects and show how effect information can be used to reason about program behavior and guide the implementation of programs.
16.3.1
Control Transfers
Effects can be used to analyze control transfers, such as those expressed via the label and jump constructs studied in Section 9.4. Recall that (label Icp Ebody ) evaluates Ebody in an environment where Icp names the control point corresponding to the continuation of the label expression, and (jump Ecp Eval ) jumps to
16.3.1 Control Transfers
979
New Type Function zgenPureEP : Type → TypeEnvironment → TypeConstraintSet → EffectConstraintSet → Effect AlgebraicTypeSchema (zgenPure ⎧ EP Tdefn TE TCS FCS F ) ⎪ ⎨(zgen Tdefn TE TCS FCS ) , = if solveTCS TCS = (TypeSubst UnifySoln σ) and (σ F ) ≈e pure ⎪ ⎩ Tdefn , otherwise Modified Type/Effect Reconstruction Rules ∀ni=1 . (Z[[Ei ]] TE = Ti , TCS i , Fi , FCS i ) n Z[[E0 ]] TE [Ii : (zgenPureEP Ti TE TCS i FCS i Fi )]i=i = T0 , TCS 0 , F0 , FCS 0 Z[[(let ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeffF ni=0 ), @ni=0 FCS i n where TCS defns = i=1 TCS i
n ∀ni=1 . Z[[Ei ]] TE [Ij : δj ]j=1 = Ti , TCS i , Fi , FCS i n Z[[E0 ]] TE [Ii : (zgenPureEP Ti TE TCS defns FCS defns Fi )]i=i = T0 , TCS 0 , F0 , FCS 0 Z[[(letrec ((Ii Ei )ni=1 ) E0 )]] TE = T0 , TCS 0 TCS defns , (maxeff F ni=0 ), @ni=0 FCS i where δ ni=1 are fresh n . n TCS defns = ( i=1 TCS i ) ( i=1 {δi = Ti }TCS ) FCS defns = @ni=1 FCS i
[letEPZ ]
[letrecEPZ ]
Figure 16.13 Bud Lojack’s modified type/effect reconstruction rules for let and letrec in Algorithm Z (Exercise 16.15).
the control point denoted by Ecp with the value of Eval . Here is a simple example in a version of FLARE/E extended with these two constructs: Eproc1 = (abs (x y) (+ 1 (label exit (* 2 (if (< y 0) (jump exit y) x)))))
In Eproc1 , label gives the name exit to the control point that returns the value of the label expression. If y is negative, the jump to exit returns y as the value of the label expression, and Eproc1 returns one more than the value of y. Otherwise, no jump is performed, the value of the label expression is double the value of x, and Eproc1 returns one more than double the value of x. (See Section 9.4 for more examples of nonlocal exits.) The control behavior of label and jump can be modeled by introducing a new type and two new effect constructors:
980
Chapter 16
Effects Describe Program Behavior
TE [Icp : (controlpointof Tbody R)] Ebody : Tbody ! Fbody TE (label Icp Ebody ) : Tbody ! (maxeff (comefrom R) Fbody )
[cp-intro]
TE Ecp : (controlpointof Tval R) ! Fcp TE Eval : Tval ! Fval TE (jump Ecp Eval ) : Tany ! (maxeff (goto R) Fcp Fval )
[cp-elim]
Figure 16.14
Type/effect rules for label and jump.
T ∈ Type ::= . . . | (controlpointof T R) FCR ∈ EffectConstructor = . . . ∪ {goto, comefrom}
The type (controlpointof T R) describes a control point in region R that expects to receive a value of type T . An expression has effect (goto R) if it might jump to a control point in R, and it has effect (comefrom R)6 if it creates a control point in R that could be the target of a jump. Although regions represent areas of memory in store effects, they represent sets of control points in control effects, and can have other meanings for other kinds of effects. The FLARE/E type/effect system can be extended to handle control effects with the two rules in Figure 16.14. In the [cp-intro] rule, (label Icp Ebody ) introduces a control point with type (controlpointof Tbody R) into the type environment in which Ebody is type-checked. The type of the label expression must be the same whether Ebody returns normally (without encountering a jump) or a jump is performed to the named control point. This constrains the received value type in the controlpointof type to be the same as the type Tbody of Ebody . The effect of the label expression includes (comefrom R) to indicate that it introduces a control point in region R. The [cp-elim] rule requires that in (jump Ecp Eval ) the type of Ecp must be (controlpointof Tval R), where the received value type Tval must match the type of the supplied value Eval . The effect of a jump expression includes (goto R) to model its control-point-jumping behavior. The jump expression has an unconstrained type, Tany , that is determined by the context in which it is used. For example, in (* 2 (if (< y 0) (jump exit y) x))
the jump expression has type int to match the type of x. But in (* 2 (if (scor (< y 0) (jump exit x)) y x)) 6
The effect name comefrom is a play on the name goto, and was inspired by a spoof article [Cla73] on a COME FROM statement dual to a GOTO statement.
16.3.1 Control Transfers
981
the jump expression must have type bool because it appears in a context that requires a boolean value. Returning to our example, Eproc1 = (abs (x y) (+ 1 (label exit (* 2 (if (< y 0) (jump exit y) x)))))
exit has type (controlpointof int cp1), where cp1 is a control region. The expression (jump exit y) has type int and effect (goto cp1). The label expression has type int and an effect, (maxeff (comefrom cp1) (goto cp1)), describing that it establishes a control point in region cp1 that is the target of jump that may be performed in its body. In this simple example, the control effects (comefrom cp1) and (goto cp1) are completely local to the label expression. So a system that supports effect masking (Section 16.2.4) can delete them from the effect of the label expression and the latent effect of the abstraction, making these effects pure. This highlights that effect masking works for all effects, including control effects and store effects. When a control effect in region R can be masked from expression E , it means that no part of the program outside E will be subject to unexpected control transfers with respect to the continuation associated with R. Effect masking of control effects is powerful because it allows module implementers to use control transfers internally, while allowing clients of the modules to insist that these internal control transfers not alter the clients’ control flow. In a system using explicit types and effects at module boundaries, a client can guarantee this invariant by ensuring that it does not call module procedures with control effects. As an example where control effects cannot be deleted, consider: Eproc2 = (label exit (abs (y) (if (= y 0) (jump exit (abs (z) z)) (+ 1 y))))
In this example, evaluating the label expression returns the procedure created by (abs (y) . . . ) without peforming any jumps. This procedure behaves like an incrementing procedure when called on a nonzero argument. But applying it to 0 has the bizarre effect of returning from the label expression a second time with the identity procedure instead! What are the types in this example? Let’s assume that the control point for exit is in region cp2. Then (abs (y) . . .) must have type Tproc2 = (-> (int) (goto cp2) int)
982
Chapter 16
Effects Describe Program Behavior
because it takes an integer y, returns an integer (+ 1 y), and may jump to exit. The type of exit must be (controlpointof Tproc2 cp2), because the [cp-intro] rule requires the received value type of the control point to be the same as the body type. The type of (abs (z) z) must also be Tproc2 , because the [cp-elim] rule requires the received value type of the control point to match the type of the value supplied to jump. Finally, the label expression has type Tproc2 and effect (comefrom cp2), which does not include a goto effect because no jump can be performed by evaluating the label expression. Because cp2 appears in the type Tproc2 of Eproc2 , the (comefrom cp2) effect cannot be deleted from Eproc2 via effect masking. This effect tracks the fact that the procedure resulting from Eproc2 can jump back into Eproc2 if it is called with the argument 0. Since the impurity of Eproc2 is externally observable (see Exercise 16.17), the control effect cannot be deleted. Exercise 16.17 a. Assuming Eproc2 is the expression studied above, what is the value of the following expression? (let ((g Eproc2 ) (h Eproc2 )) (list (g 1) (h 1) (h 0)))
b. Based on your answer to part a, argue that Eproc2 cannot be a pure expression. Exercise 16.18 Extend the Algorithm Z type/effect reconstruction rules to handle label and jump. Exercise 16.19 Control effects can be used to describe the behavior of the procedure cwcc (see Section 9.4.4). a. Give a type schema for cwcc that is as general as possible. b. Show how your type schema for cwcc can be instantiated in the following FLARE/E expressions: i.
= Eproc1
(abs (x y) (+ 1 (cwcc (abs (exit) (* 2 (if (< y 0) (exit y) x))))))
ii.
Eproc2 =
(cwcc (abs (exit) (abs (y) (if (= y 0) (exit (abs (z) z)) (+ 1 y)))))
16.3.2 Dynamic Variables
983
c. Consider the following FLARE/E abstraction: Eproc3 = (abs (x y) (+ 1 (cwcc (abs (exit) (* 2 (if (scor (< y 0) (exit x)) (exit y) x))))))
i.
Explain why Eproc3 is ill typed in FLARE/E.
ii.
In an explicitly typed dialect of FLARE/E with universal (i.e., forall) types, with appro(1) give a type for cwcc and (2) write a well-typed version of Eproc3 priate explicit type annotations.
iii.
to a well-typed FLARE/E abstraction that uses label and jump Convert Eproc3 instead of cwcc.
iv.
What feature of the label and jump type/effect rules makes the well-typedness of your converted abstraction possible?
v.
Show that your converted abstraction can be given a pure latent effect in a version of FLARE/E with the [effect-masking] rule.
16.3.2
Dynamic Variables
In a dynamically scoped language (see Section 7.2.1), dynamically bound variables (i.e., the free variables of a procedure) take their meaning from where the procedure is called rather than where it is defined. References to dynamically bound variables can be tracked by an effect system in which (1) the effect of an expression is the set of dynamically bound variables it might reference and (2) procedure types are extended to have the form ∗ (-> (Targ ) ((Idyn Tdyn )∗ ) Tresult )
Each binding (Idyn Tdyn ) serves both as a kind of latent effect (each name Idyn is a dynamically bound variable that may be referenced wherever the procedure is called) and as a way to check (using Tdyn ) that the dynamically bound variable is used with the right type at every invocation of the procedure. This sketch for how effects can be used to give types to dynamic variables is fleshed out in Exercise 16.20. Exercise 16.20 Dinah McScoop likes both dynamic scoping and explicit types, so she creates a new language, DIFLEX, that includes both! The syntax of DIFLEX is like FLEX, except for the definition and type of procedures, which have been modified as follows: E ∈ Exp ::= . . . | (abs ((Ifml Tfml )∗ ) ((Idyn Tdyn )∗ ) Ebody ) ∗ T ∈ Type ::= . . . | (-> (Targ ) ((Idyn Tdyn )∗ ) Tresult )
984
Chapter 16
Effects Describe Program Behavior
In abs, the first list of identifiers and types, ((Ifml Tfml )∗ ), specifies the formal parameters of the procedure and their types. The second list, ((Idyn Tdyn )∗ ), specifies the names and types of the dynamically bound identifiers (all non-parameter identifiers) that appear in Ebody . Procedure types include the names and types of dynamically bound identifiers in addition to the usual parameter type list and result type. As usual, in a procedure application, the procedure’s parameter types must match the types of the actual arguments. Because DIFLEX is dynamically scoped, the types of the dynamically bound identifiers in the procedure type must match the types of these identifiers wherever the procedure is called, not where it is defined. For example, the following expression is well typed in Dinah’s language because the dynamically bound variable x is a boolean where procedure p is called (the fact that x is an integer where p is created is irrelevant): (let ((x 1)) (let ((p (abs ((y int)) ((x bool)) (if x y 0)))) (let ((x #t)) (p 1)))) {This expression evaluates to 1}
In contrast, the following expression is ill typed: (let ((x #t)) (let ((p (abs ((y int)) ((x bool)) (if x y 0)))) (let ((x 1)) (p 1)))) {x is not a boolean in this call to p}
Dinah realizes that uses of dynamic variables can be tracked by an effect system. Dinah extends the FLEX typing framework to employ type/use judgments of the form TE E : T & IS
which means “in type environment TE , E has type T and may use identifiers from the set IS .” Assume IS ∈ IdSet = P(Ident). For example, Dinah’s type/use rule for variable references is: TE I : TE (I ) & {I }
[var]
Dinah provides the following examples of type/use judgments for her system: {x : int} (prim + 1 x) : int & {x} {} (let ((x 1)) (prim + 1 x)) : int & {} {x : bool, y : int} (if x y 0) : int & {x, y} {x : int} (abs ((y int)) ((x bool)) (if x y 0)) : (-> (int) ((x bool)) int) & {} {x : bool, p : (-> (int) ((x bool)) int)} (p 1) : int & {p, x}
In the final type judgment, note that the identifier set for (p 1) includes x because the procedure p has a dynamic reference to x. a. Write type/use rules for the following constructs: let, abs, and procedure application.
16.3.3 Exceptions
985
b. Briefly argue that your type/use rules guarantee that in a well-typed program, an identifier can never be unbound or used with an incorrect type. c. Dinah’s friend Thai Ping observes that the following DIFLEX expression is ill typed: (abs ((b bool)) () (let ((f (abs ((x int)) ((c int)) (prim + x c))) (g (abs ((y int)) ((d int)) (prim * y d)))) (let ((c 1) (d 2)) ((if b f g) 3))))
i.
Explain why this expression is ill typed.
ii.
Thai suggests that expressions like this can be made well typed by extending the type/usage rules for DIFLEX with a type-inclusion rule (see Figure 12.1 on page 703). Define an appropriate notion of subtyping for DIFLEX’s procedure types, and show how Thai’s example is well-typed in the presence of type inclusion.
d. Based on ideas from the DIFLEX language and type system, develop an explicitly typed version of the DYNALEX language (Exercise 7.25 on page 349) named DYNAFLEX. Describe the syntax and type system of DYNAFLEX. Hint: Since DYNAFLEX has two namespaces — a static one and a dynamic one — the type system needs two type environments.
16.3.3
Exceptions
Recall from Section 9.6 that exception-handling mechanisms specify how to deal with abnormal conditions in a program. A type/effect system can be used to track the exceptions that might be raised when evaluating an expression. One way to do this in FLARE/E is to use new base effects with the form (raises Itag Tinfo ) to indicate that an expression raises an exception with tag Itag and information of type Tinfo . If an expression E handles an exception with tag Itag , the (raises Itag Tinfo ) effect can be removed from the effect set of E . Exercise 16.21 explores a specialized effect system that tracks only exceptions and not other effects. Java is an example of an explicitly typed language with an effect system for exceptions. It tracks a subset of exceptions known as checked exceptions. If a checked exception is thrown (Java’s terminology for raising an exception) in the body of a method, then it must either be explicitly handled by a try/catch statement or explicitly listed in a throws clause of the method specification. For example, a Java method that displays the first n characters of a text file might have the following specification: public static void readFirst (n:int, filename:string) throws FileNotFoundException, EOFException;
986
Chapter 16
Effects Describe Program Behavior
The throws clause indicates that the readFirst method may not handle the case where there is no file named filename (in which case a FileNotFoundException is thrown) and might attempt to read past the end of a file (in which case an EOFException7 is thrown). The throws clause serves as an explicit latent exception effect for the method. Any method invoking readFirst in its body must either handle the exceptions it throws or explicitly declare them in its own throws clause. Exercise 16.21 Bud Lojack wants to add exceptions with termination semantics to FLARE. He extends the FLARE expression and type syntax as follows: E ∈ Exp ::= . . . | (raise Itag Einfo ) | (handle Itag Ehandler Ebody ) T ∈ Type ::= . . . | (handlerof Tinfo )
The dynamic semantics of the raise and handle constructs is described in Section 9.6. Bud’s new type (handlerof Tinfo ) stands for an exception handler that processes exception information with type Tinfo . In Bud’s new type rules, the handlerof type is used to communicate type information from the point of the raise to the point of the handle: TE Ehandler : (-> (Tinfo ) Tbody ) TE [Itag : (handlerof Tinfo )] Ebody : Tbody TE (handle Itag Ehandler Ebody ) : Tbody TE Itag : (handlerof Tinfo ) TE Einfo : Tinfo TE (raise Itag Einfo ) : Traise
[handle]
[raise]
Note that because raise never returns in termination semantics, the type Traise of a raise expression (like the type of an error expression) can be any type required by the surrounding context. Bud proudly shows his new rules to type guru Thai Ping, who is unimpressed. “Your rules make the type system unsound!” exclaims Thai. “You’ve assumed that exception handlers are statically bound when they’re actually dynamically bound.” a. Explain what Thai means. In particular, provide expressions Eouter and Einner such that the following expression is well typed according to Bud’s rules, but generates a dynamic type error: (handle an-exn Eouter (let ((f (abs () (raise an-exn 17)))) (handle an-exn Einner (f))))
Thai observes that raising an exception Itag with a value of type Tinfo is similar to referencing a dynamic variable named Itag bound to a handler procedure with type (-> (Tinfo ) Tresult ), where Tresult can be different for different handlers associated with 7
EOF stands for “End Of File.”
16.3.3 Exceptions
987
Itag . Since dynamic variables can be typed using an effect system (see Exercise 16.20), Thai aims to develop a similar effect system for typing exceptions. Thai’s system is based on “effects” from the following domain: ES ∈ ExceptionSpec = Ident Type
An exception specification ES is a partial function mapping the name of an exception that can be raised to the type of the information value with which it is raised. For example, representing a partial function as a set of bindings, the exception specification {bounds → int, wrong → bool} indicates that the bounds exception is raised with an integer and the wrong exception is raised with a boolean. Two exception specifications can be combined via ⊕ or #, which require that they agree on names for which they are both defined:
ES 1 ⊕ ES 2
ES 1 ES 2
8 > λI . if I ∈ dom(ES 1 ) then (ES 1 I ) > > > > else if I ∈ dom(ES 2 ) then (ES 2 I ) else undefined end < end, = > > if I ∈ (dom(ES 1 ) ∩ dom(ES 2 )) implies (ES 1 I ) ≈ (ES 2 I ) > > > : undefined, otherwise 8 λI . if (I ∈ dom(ES 1 )) ∧ (I ∈ dom(ES 2 )) then (ES 1 I ) > > < else undefined end, = if I ∈ (dom(ES 1 ) ∩ dom(ES 2 )) implies (ES 1 I ) ≈ (ES 2 I ) > > : undefined, otherwise
In Thai’s type/exception system, judgments have the form TE E : T # ES
which means “in type environment TE , expression E has type T and may raise exceptions as specified by ES .” For example, the judgment TE Etest : bool # {x → int, y → symb}
indicates that if Etest returns normally, its value will be a boolean, but that evaluation of Etest could raise the exception x with an integer value or the exception y with a symbol value. Thai’s system guarantees that no other exceptions can be raised by Etest . Thai’s system also uses exception masking to remove exceptions from judgments when it is clear that they will be handled. E.g., the exception specification of TE (handle x (abs (z) (> z 0)) Etest ) : bool # {y → symb}
does not include x → int from Etest because the exception named x has been handled by the handle expression. Thai eliminates Bud’s handlerof from the FLARE type system and instead changes procedure types to carry a latent exception specification describing exceptions that might be raised when the procedure is applied :
988
Chapter 16
Effects Describe Program Behavior
∗ T ∈ Type ::= . . . FLARE types except for -> . . . | (-> (Targ ) ES lat Tres )
Here are two of the type/exception rules from Thai’s system: TE N : int # {}
[int]
TE E1 : bool # ES 1 TE E2 : T # ES 2 TE E3 : T # ES 3 TE (if E1 E2 E3 ) : T # ES 1 ⊕ ES 2 ⊕ ES 3
[if ]
Thai seeks your help in fleshing out other parts of his type/exception system: b. Give the type/exception rules for abs, procedure application, raise, and handle. c. Give a type/exception derivation for the following expression, which should be well typed according to your rules: (abs (n m) (handle e (abs (a) (not a)) (let ((f (abs (x) (if (prim < x 0) (raise e x) (prim + x n))))) (prim < 0 (if (handle e (abs (y) (prim > y n)) (prim = m (f n))) (handle e (abs (z) (if (prim = z n) (raise e #f) z)) (prim * 2 (f m))) (handle e (raise e #t) (raise e (raise e (sym beardly)))))))))
d. Describe how to extend Thai’s type/exception system so that it tracks errors generated by the error construct as well as exceptions. e. Discuss the technical challenges that need to be addressed in order to modify Algorithm Z to automatically reconstruct types and exception specifications for Thai’s system (FLARE+{raise, handle}). For simplicity, ignore all store effects and focus solely on tracking exception specifications. What do constraints on exception specifications look like? Can they always be solved?
16.3.4
Execution Cost Analysis
It is sometimes helpful to have an estimate for the cost of evaluating an expression. A cost might measure abstract units of time or other resources (e.g., memory space, database accesses, network bandwidth) required to evaluate the expression. An effect system can estimate the cost of evaluating an expression by (1) associating a cost effect with each expression and (2) extending procedure types to have a latent cost effect that is accounted for every time a procedure is called. Exercise 16.22 explores a simple cost system based on this idea. For practical cost systems, it must be possible to express costs that depend on the size of data structures (e.g., the length of a list or dimensions of a matrix) [RG94]. Cost systems can be helpful for parallel scheduling; two noninterfering expressions should be scheduled for parallel execution only if their execution times
16.3.4 Execution Cost Analysis
989
are large enough to outweigh the overheads of the mechanism for parallelism. Cost systems also provide a simple way to conservatively determine which expressions must terminate and which might not. Cost systems can even be used to approximate the complexity of an algorithm [DJG92]. Exercise 16.22 In order to estimate the running time of FLARE programs, Sam Antics wants to develop a set of static rules that assign every expression a cost as well as a type. The cost of an expression is a conservative estimate of how long the expression will take to evaluate. Sam develops a type/cost system for Discount, a variant of FLARE in which procedure types carry latent cost information: ∗ T ∈ Type ::= . . . FLARE types except for -> . . . | (-> (Targ ) Clat Tres )
C ∈ Cost ::= NT | loop | (sumc C ∗ ) | (maxc C ∗ ) NT ∈ NatLit ::= 0 | 1 | 2 | . . .
For example, the Discount type (-> (int int) 5 bool) is the type of a procedure that takes two integers, returns a boolean result, and costs at most 5 abstract time units every time it is called. Sam formulates a cost analysis in Discount via type/cost judgments of the form TE E : T $ C
which means “in type environment TE , expression E has type T and cost C .” For example, here are Sam’s type/cost rules for integers and (nongeneric) variable references: TE N : int $ 1
[int]
TE I : TE (I ) $ 1
[var]
That is, Sam assigns both integers and variable references a cost of 1 abstract time unit. In addition, Sam specifies the following costs for some other Discount expressions: • The cost of an abs expression is 2. • The cost of an if expression is 1 more than the cost of the predicate expression plus the maximum of the costs of the two branch expressions. • The cost of an n-argument procedure application is the sum of the cost of the operator expression, the cost of each operand expression, the latent cost of the operator, and n. • The cost of an n-argument primitive application is the sum of the cost of each operand expression, the latent cost of the primitive operator (as specified in the primitive type environment TE prim ), and n. Here are some example types of primitive operators: TE prim (+) = (-> (int int) 1 int) TE prim (>) = (-> (int int) 1 bool)
990
Chapter 16
Effects Describe Program Behavior
Here are some example judgments that hold in Sam’s system: {a : int} (prim + a 7) : int $ 5 {a : int, b : int} (prim > (prim + a 7) b) : bool $ 9 {a : int} (abs (x) (prim > x a)) : (-> (int) 5 bool) $ 2 {a : int, gt : (-> (int) 5 bool)} (gt 17) : bool $ 8 {a : int, b : int, gt : (-> (int) 5 bool)} (if (gt b) (prim + b 1) 0) : int $ 14
The abstract cost loop is assigned to expressions that may diverge. For example, the expression Ehang = (letrec ((hang (abs () (hang)))) (hang))
is assigned cost loop in Discount. Because it is undecidable whether an arbitrary expression will diverge, it is impossible to have a type/cost system in which exactly the diverging expressions have cost loop. So Sam settles for a system that makes a conservative approximation: every program that diverges will be assigned cost loop, but some programs that do not diverge will also be assigned loop. The cost constructs (sumc C1 . . . Cn ) and (maxc C1 . . . Cn ) are used for denoting, respectively, the sum and maximum of the costs C1 . . . Cn , which may include nonnumeric costs like loop and cost identifiers (see part d). Sam’s system ensures that sumc and maxc satisfy sensible cost-equivalence axioms, such as: (sumc NT 1 NT 2 ) ≈c NT 3 , where N [[NT 3 ]] = N [[NT 1 ]] +Nat N [[NT 2 ]] (sumc loop NT ) ≈c (sumc NT loop) ≈c (sumc loop loop) ≈c loop (maxc NT 1 NT 2 ) ≈c NT 3 , where N [[NT 3 ]] = (max N [[NT 1 ]] N [[NT 2 ]]) (maxc loop NT ) ≈c (maxc NT loop) ≈c (maxc loop loop) ≈c loop
In Sam’s system, such cost equivalences can implicitly be used wherever costs are mentioned. a. Give type/cost rules for abs, procedure application, primitive application, and if. b. Sam wants the following Discount expression to be well typed: Eif = (if b (abs (x) (prim + x x)) (abs (y) (prim + (prim + y y) y)))
But the types of the two branches, (-> (int) 5 int) and (-> (int) 9 int), are procedure types differing in their latent costs, which causes this expression to be ill typed. To fix this problem, define (1) a sensible cost-comparison relation ≤c (2) a notion of subtyping in Discount and (3) a type/cost inclusion rule for Discount (a variant of the [inclusion] rule in Figure 12.1 on page 703). Show that Eif is well typed with your extensions.
16.3.5 Storage Deallocation and Lifetime Analysis
991
c. Define type/cost rules for monomorphic versions of let and letrec. Show why Ehang must be assigned cost loop using your rules. d. Define type/cost rules for polymorphic versions of let and letrec and a rule for referencing a variable whose type is a generic type schema. You may assume that the Cost domain is extended to include description variables δ ∈ DescId that can stand for costs. Using your rules, give a type/cost derivation showing that the following expression is well typed: (let ((app5 (if (app5 (app5 (app5
(abs (abs (abs (abs
(f) (x) (y) (z)
(f 5)))) (prim > x 0))) y)) (prim + z 1)))))
e. Discuss the technical challenges that need to be addressed in order to modify Algorithm Z to automatically reconstruct types and costs for Discount. For simplicity, ignore all store effects and focus solely on calculating costs. What do cost constraints look like? Can they always be solved? f. In Sam’s Discount type/cost system, every recursive procedure has latent cost loop. Since Discount uses recursion to express iteration, all iterations are conservatively assigned the infinite cost loop. While this is sound, it is not very useful. For example, it would be nice for an iteration summing the integers from 1 to n to have a finite cost that depends on n. Design a simple iteration construct that would allow assigning finite costs to some iterations, and discuss the technical issues that arise in the context of your construct.
16.3.5
Storage Deallocation and Lifetime Analysis
In implementations of languages like FLARE/E, it is often difficult to determine statically when a cell can no longer be referenced. For this reason, cells are typically allocated in a dynamically managed storage area called the heap, where they are reclaimed dynamically by a garbage collector (see Chapter 18). However, an effect system with regions enables a framework for the static allocation and deallocation of memory. The following expression illustrates the key idea: (let ((my-cell (cell 1)) {assume this cell is in region rm} (your-cell (cell 2))) {assume this cell is in region ry} (pair (^ my-cell) (abs () (^ your-cell))))
The region rm for my-cell is completely local to the let expression and can be deleted from the effect of the let expression. This means that the allocation and all uses of my-cell occur within the let expression, so my-cell may be deallocated when the let expression is exited. In contrast, the region ry for your-cell appears in the type (pairof int (-> () (read ry) int)) of the
992
Chapter 16
Effects Describe Program Behavior
TE E : T ! F [letregion] TE (letregion R E ) : T ! F where F[[F ]] = {BF | ((BF ∈ F[[F ]]) ∧ (R ∈ FrDescIds eff [[BF ]]))} [export restriction] R ∈ FrDescIds ty [[T ]] ∀I ∈ FrIds[[E ]] . (R ∈ FrDescIds ty [[TE (I )]]) [import restriction] [control restriction] ∀BF ∈ F[[F ]] . (BF = (comefrom R )) Figure 16.15
The type/effect rule for region-based storage management.
let expression, indicating that your-cell must outlive the let expression. But if ry does not “escape” some enclosing expression E , it may be deallocated when E is exited. Region-based static storage management can be formalized by extending the expressions of FLARE/E with a binding construct (letregion R E ) that declares the region named R in the scope of the body expression E . In the dynamic semantics, this construct creates a new segment of memory named R in which cells may be allocated, evaluates E to a value V , and then deallocates the entire segment R before returning V . So memory is organized as a stack of segments such that entering letregion pushes a new segment onto the stack and exiting letregion pops its segment off the stack. We also replace the cell primitive by the kernel construct (cell E R), in which the region name R explicitly indicates in which segment the cell should be allocated. We assume that letregion is used only to declare cell regions and that other regions, such as regions representing control points in control effects, are handled as before. Using the region name R in the cell construct is sound only if (1) it is in the lexical scope of a letregion expression declaring R and (2) the cell cannot outlive (i.e., escape from the scope of) the letregion expression declaring R. Condition 1 can be expressed by requiring that a program body contain no free cell region names — i.e., all cell regions mentioned in the program body must be bound by an enclosing letregion. Condition 2 is expressed by the [letregion] type/effect rule in Figure 16.15. This is a specialized version of the [effect-masking] rule in Figure 16.10 on page 973 guaranteeing that it is safe to deallocate the memory segment created by (letregion R E ) once the evaluation of E is complete. The effect F of (letregion R E ) contains all base effects in the effect F of E except for those that mention R. As in the [effect-masking] rule, the export and import restrictions of the [letregion] rule guarantee that R is only used locally and may safely be excluded from F . In a system without control transfers, the export and import restrictions are enough to justify that it is safe to deallocate the memory segment named by R, since no cell allocated in R can be referenced again upon termination of the
16.3.5 Storage Deallocation and Lifetime Analysis
993
letregion expression. However, in the presence of control effects, an additional control restriction is necessary to guarantee that the rest of the program can never jump back into the letregion expression. If such a jump were to occur, the memory segment associated with R might be accessed after the termination of the letregion expression, and so deallocation of this segment would be unsafe. This possibility can be precluded by requiring that the letregion expression not have a comefrom effect, and thus cannot be the target of any control transfers. The above cell example can be transformed to use letregion as follows: (letregion rm (let ((my-cell (cell 1 rm)) (your-cell (cell 2 ry))) {ry is free here but is presumably} {bound by an enclosing letregion.} (pair (^ my-cell) (abs () (^ your-cell)))))
This expression is well typed, so it is safe to deallocate the region rm containing my-cell upon exiting (letregion rm . . . ). Although only one cell is allocated in a region in this example, in general arbitrarily many cells may be allocated in a single region. But an attempt to allocate your-cell in rm in this example would make the letregion expression ill typed because the export restriction would be violated. It is necessary to allocate your-cell in a separate region ry that is declared by some other letregion expression syntactically enclosing this one. In the worst case, ry might be declared by a top-level letregion that wraps the entire program body. We have focused on the region-based storage management of cells, but any type of value — e.g., pairs, lists, procedures, and even integers and booleans — can be associated with regions of memory. In FLARE/E, all these values are immutable and so they have no effects observable by the programmer. However, even immutable values must be stored somewhere, and regions are useful for managing the storage of such values. In this context, effects and regions can be used to perform a static lifetime analysis that determines where in the program a value created at one point can still be “alive.”8 This is necessary for determining when the storage associated with the value can be deallocated. The lifetime analysis of immutable values is explored in Exercise 16.24. A practical region-based storage management system requires a way to automatically determine the placement of letregion declarations and annotate cell expressions with region information while maintaining the well-typedness of a program. A crude approach is to wrap all letregions around the program body, but a more useful (and challenging!) goal is to make the scope of every letregion 8
A closely related analysis is an escape analysis that determines which values can escape the scope in which they were declared.
994
Chapter 16
Effects Describe Program Behavior
as small as possible. Procedures that can be polymorphic in regions are helpful for shrinking the scope of letregions; see Exercise 16.23. One such region-based storage management system has been designed and implemented by Tofte and Talpin [TT97]. They developed and proved correct an algorithm for translating an implicitly typed functional language into a language with explicit letregion expressions, region annotations, and region-polymorphic procedures. Their system handles integers, procedures, and immutable pairs, all of which are allocated in regions, but it can easily be extended to mutable data as well. Exercise 16.23 a. The following is a FLARE/E program in which the two cell expressions have been annotated with explicit regions. Add explicit letregion expressions declaring r1 and r2 so that (1) the resulting expression is well typed and (2) the scope of each letregion expression is as small as possible: (flarek (a b) (let ((f (abs (x) (let ((p (cell (prim - x 1) r1)) (q (cell (prim + x 1) r2))) (prim pair (prim ^ p) q))))) (let ((s (prim fst (f a))) (t (prim snd (f b)))) (prim + s (prim ^ t)))))
b. Sketch an algorithm for adding letregion declarations and explicit cell regions to a well-typed FLARE/E expression E so that (1) the resulting expression E is well typed and (2) the scope of each letregion expression in E is as small as possible. You may assume that you are given the complete type derivation for E . c. Polly Morwicz observes that tighter letregion scopes can often be obtained if some procedures are region-polymorphic. For example, using the pabs and pcall constructs from Figure 12.9 on page 731, she modifies the procedure f in the program from part a to abstract over region r2: (flarek (a b) (let ((f (pabs (r2) (abs (x) (let ((p (cell (prim - x 1) r1)) (q (cell (prim + x 1) r2))) (prim pair (prim ^ p) q)))))) (let ((s (prim fst ((pcall f r3) a))) (t (prim snd ((pcall f r4) b)))) (prim + s (prim ^ t)))))
Add explicit letregion expressions to Polly’s modified expression, striving to make all letregion scopes as small as possible.
16.3.6 Control Flow Analysis
995
Exercise 16.24 Thai Ping wants to use regions and effects to perform lifetime analysis and storage management for pairs and other immutable values in FLARE/E. He begins by modifying the type grammar of FLARE/E to extend a pairof type to include the region where it is stored: T ∈ Type ::= . . . all types except (pairof T T ) . . . | (pairof T T R)
He also extends the effect grammar to include a new access effect constructor: FCR ∈ EffectConstructor = . . . ∪ {access}
Thai explains that the effect (access R) is a lifetime effect used both for allocating an immutable pair value in region R and for extracting its components. a. Write the type schemas for pair and fst in the primitive type environment used by the FLARE/E implicit type/effect system. b. Explain how access effects and the [letregion] rule can be used to aggressively deallocate pair p in the following expression: (let ((g (abs (a b) (let ((p (prim pair a b))) (prim pair (prim snd p) (prim fst p)))))) (g 1 2))
The FLARE/E pair primitive does not take explicit regions, but you may assume that the scope of the region R declared by (letregion R E ) includes any pairof types and access effects that appear in the type derivation of E . c. access effects are used for the lifetime analysis of immutable values and should not affect the purity of expressions. For example, the expressions (prim pair 1 2) and (prim fst p) both have effects of the form (access R), but should still be considered pure since they do not have store effects or control effects that could cause them to interfere with other expressions. Describe how to modify the FLARE/E notion of purity to handle lifetime effects. d. FLARE/E lists and procedures can also be modified to support a region-based lifetime analysis similar to the one Thai developed for pairs. Describe all the changes that need to be made to the FLARE/E syntax and type rules to accomplish this.
16.3.6
Control Flow Analysis
In function-oriented languages, a control flow analysis tracks the flow of higherorder procedures in a program.9 Each abstraction in a program can be annotated with a distinct label, just as each cell expression can be associated with a region name. Then every procedure type can be annotated with the set of labels 9
Although traditionally used to track the flow of procedure values, the same analysis can easily be extended to track the flow of any kind of value.
996
Chapter 16
Effects Describe Program Behavior
describing the abstractions that could be the source of that type. Although these labels are not effects, such an analysis can be accomplished using the machinery of an effect system. Consider the following FLARE expression, in which each abstraction has been annotated with an explicit integer label: (let ((inc (abs 1 (x) (+ x 1))) (dbl (abs 2 (y) (* y 2))) (app3 (abs 3 (f) (f 3))) (app4 (abs 4 (g) (g 4)))) (list (app3 inc) (app4 inc) (app4 dbl)))
The annotated type of inc would be (-> (int) {1} int) and that of dbl would be (-> (int) {2} int). A type/label system can determine that the argument g to app4 has type (-> (int) {1, 2} int) (because it might be either the inc or dbl procedure) while the argument f to app3 has type (-> (int) {1} int) (because it can only be the inc procedure). Knowing which procedures reach which call sites can guide program optimizations. For example, if only one procedure reaches a call site, the call can be replaced by an inlined version of the procedure’s body. Information from a control flow analysis is particularly important for choosing procedure representations in a compiler (see Section 17.10.2). A control flow analysis is simpler than the lifetime analysis discussed in Section 16.3.5. In lifetime analysis, the latent effect in a procedure type describes all values that might be referenced when the procedure is called (see Exercise 16.24). In a control flow analysis, the annotation on a procedure type just describes which source abstractions might flow to the expression with that type. Consult [NNH98] for an extensive discussion of control flow analysis and how it can be expressed in an effect system.
16.3.7
Concurrent Behavior
Thus far we have studied only sequential programs, in which execution can be visualized as the progress of a single control token that moves through the program, performing each operation it encounters along the way. The path taken by the control token is known as a control thread. This single thread of control can be viewed as a time line along which all operations performed by the computation are arranged in a total order. For example, a computation that sequentially performs the operations A, B, C, and D can be depicted as the following total order, where X → Y means that X is performed before Y : →A→B→C→D→
16.3.7 Concurrent Behavior
997
In a concurrent program, multiple control threads may be active at the same time, allowing the time relationship between operations to be a partial order rather than a total order. Here is a sample partial order that declares that A precedes B along one control thread and C precedes D along another control thread, but does not otherwise constrain the operation order: A→B → fork
@
@
join →
C→D
The diagram introduces two new nodes labeled fork and join. The purpose of these nodes is to split and merge control threads so that a computation has a distinguished starting edge and a distinguished ending edge. A control token reaching a fork node splits into two subtokens on the output edges of the node. When tokens are on both input edges of a join node, they merge into a single token on the output node. If only one input edge to a join has a token, it cannot move forward until the other edge contains a token. Any node like join that forces one control token to wait for another is said to synchronize them. There are many linguistic mechansisms for specifying concurrency and synchronization, some of which are described in the Web Supplement to this book. Suppose that on any step of a multithreaded computation, only one control token is allowed to move.10 Then a particular execution of a concurrent program is associated with the sequence of its observable actions, which we shall call an interleaving. The behavior of the concurrent program is the set of all possible interleavings that can be exhibited by the program. For example, assuming that all operations (except for fork and join) are observable, then the behavior of the branching diagram above is: {ABCD, ACBD, ACDB, CABD, CADB, CDAB}
The behavior of a concurrent program may be the empty set (no interleavings are possible), a singleton set (exactly one interleaving is possible), or a set with more than one element (many interleavings are possible). A concurrent program with more than one interleaving exhibits nondeterministic behavior. Although sequential programs can exhibit nondeterminism,11 nondeterminism is most commonly associated with concurrent programs. 10 There are concurrent models in which multiple control tokens can move in a single step, but we shall not consider these. 11 For example, a purely sequential language can exhibit nondeterminism if operand expressions in procedure applications may be evaluated in any order or if it supports an (either E1 E2 ) construct that returns the value of one of E1 or E2 .
998
Chapter 16
Effects Describe Program Behavior
In some models of concurrency, concurrently executing threads can communicate by having one thread send a value to another thread over a channel to which they share access. Communication establishes a timing constraint between the threads: a value sent over a channel cannot be received by the receiving thread until it has been sent by the sending thread. We can extend FLARE/E to be a channel-based concurrent language by adding the following four constructs: (channel) : Create and return a new channel. (send! Echan Eval ) : First evaluate Echan to the channel value Vchan , then evaluate Eval to the value Vval , and then send Vval over the channel Vchan . It is an error if Vchan is not a channel. (receive! Echan ) : Evaluate Echan to the channel value Vchan and then return the next value received from the channel Vchan . It is an error if Vchan is not a channel. (cobegin E1 . . . En ) : Evaluate each of E1 . . . En in a separate thread and return the value of En . For example, here is a procedure that uses three channels to communicate between three threads: Econcabs = (abs (x) (let ((a (channel)) (b (channel)) (c (channel))) (cobegin (send! c (+ 1 (receive! a))) (send! c (* 2 (receive! b))) (begin (send! a (- x 3)) (send! b (/ x 4)) (+ (receive! c) (receive! c))))))
Since + is commutative, the order in which the values are received from channel c by the third thread does not affect the value returned by the procedure. But the returned value would depend on the order if the + were replaced by a noncommutative operator like -. An effect system can be used to analyze the communication behavior of a channel-based concurrent program. If we interpret a region R as denoting an abstract channel, then we can model sending a value over channel R with an effect (out R) and model the receipt of a value from this channel with an effect (in R). In a simple communication-effect system (such as the one described in [JG89b]), in and out effects can be tracked just like the store and control effects
16.3.8 Mobile Code Security
999
studied earlier. Such a system can determine that an expression communicates on certain channels, but the ACUI nature of the maxeff effect combiner makes it impossible to determine any ordering on these communications. E.g., if channels a, b, and c in the above example are in regions ra, rb, and rc, respectively, then the body of Econcabs has the effect (maxeff (in ra) (in rb) (in rc) (out ra) (out rb) (out rc))
which does not indicate the relative order of the communication actions or the number of times they are performed. However, the information is sufficient to show that the communication effects are completely local to the procedure body and so can be deleted by effect masking. In more sophisticated communication-effect systems (such as the one described in [ANN97]), the ordering of communication effects is modeled by specifying the sequential and parallel composition of effects. For example, in such a system, the effect of the cobegin expression in Econcabs might be: (par (seq (in ra) (out rc)) (seq (in rb) (out rc)) (seq (out ra) (out rb) (in rc) (in rc)))
where seq is used to combine effects for sequential execution and par is used to combine effects for parallel execution. This shows the ordering of channel operations in each thread and the fact that the third thread receives two values from the channel in region rc. Such a specification resembles the kinds of specifications used in process algebra frameworks like Communicating Sequential Processes (CSP) [Hoa85] and the Calculus of Communicating Systems (CCS) [Mil89].
16.3.8
Mobile Code Security
In modern computer systems, it is often desirable for applications on a local computer to automatically download and execute mobile code from remote Internet sites. But this is a dangerous prospect, since executing arbitrary mobile code might destroy or steal local information or use the local computer’s resources for nefarious purposes like sending spam email, attacking Web servers, or spreading viruses. One application of effects is to provide mobile code security by labeling primitive operations with latent effects that describe their actions. For example, all procedures that write on the local computer’s disk could carry a write-disk latent effect. Other latent effects could be assigned to display and networking procedures. These effects create a verifiable, succinct summary of the actions of
1000
Chapter 16
Effects Describe Program Behavior
imported mobile code. These effects can be presented to a security checker — which might involve a user dialogue box — that accepts or rejects mobile code on the basis of its effects. Since mobile code is downloaded and executed on the fly, any security analysis performed by the local computer must be relatively quick in order to be practical. Although some properties can efficiently be deduced by analyzing the downloaded code from scratch, other important properties are too expensive for the local computer to reconstruct. For example, for arbitrary low-level code, it is difficult to prove memory safety properties like the following: (1) no variable is accessed until it is initialized; (2) no out-of-bounds access is made to an array; and (3) there is no dereference of a pointer to a deallocated memory block.12 This problem can be addressed by requiring the code producer to include explicit type and effect annotations in the mobile code that are sufficient to allow the code consumer to rapidly verify security properties. For example, the types might encode a proof that no array access is out of bounds, and a simple type-checking procedure by the code consumer could verify this proof. Generating the appropriate annotations might be expensive for the producer, but the consumer can use type and effect rules to quickly verify that the annotations are valid. This is an example of a technique called proof-carrying code [NL98, AF00], in which mobile code carries a representation of proofs of various properties in addition to the executable code. It is used for properties that are difficult for the consumer to determine from raw low-level code, but are easy for the consumer to verify if the producer of the low-level code (which presumably has access to more information, in the form of the high-level source program) provides a proof.
Notes Effect systems were introduced by Lucassen and Gifford in [Luc87, LG88], which outlined the need for a new kind of static analysis for describing program behavior. Early experiments with the design and use of effect systems were performed in the context of the FX-87 programming language [GJLS87], an explicitly typed language including effects and regions. Later versions of FX incorporated region and effect inference [JG91]. Effects were used to guide standard compiler optimizations (e.g., common subexpression elimination, dead code elimination, and code hoisting) as well as to find opportunities for parallel evaluation [Luc87, HG88]. We explore effect-based code optimization in Section 17.6.2. 12
See Chapter 18 for a discussion of memory allocation and deallocation.
Notes for Chapter 16
1001
The first polymorphic type/effect reconstruction system was presented in [JG91]. The improved reconstruction systems in [TJ92, TJ94a] guaranteed principal types and minimal effects. Our Algorithm Z incorporates two key features of the improved systems in a derivation-style reconstruction algorithm: It (1) allows subeffecting via the [does] rule to compute minimal effects and (2) requires the latent type of a procedure type to be a description variable, which simplifies the unification of procedure types and the solution of effect constraints. Without the second feature, it would be necessary to modify the unification algorithm to produce effect-equality constraints between the latent effects of two unified procedure types and to extend the effect-constraint solver to handle such equality constraints. A wide variety of effect systems have been developed, including systems for cost accounting [DJG92, RG94, CW00], control effects [JG89a], and communication effects [JG89b]. The FX-91 programming language [GJSO92] included all of these features. Other examples of effect systems include control flow analysis [TJ94b], region-based memory management [TT97], behavior analysis for concurrency [ANN97], atomicity effects for concurrency, [FQ03], register usage analysis [Aga97, AM03], and trace effects for verifying program safety properties [SSH08]. As noted in Section 16.3.3, Java has a simple effect system for tracking exceptions that can be thrown by a method [GJS96]. Monadic systems for expressing state can be extended with an effect system [Wad98]. For a detailed introduction to effect systems and a summary of work done in this area, see [TJ94a], [NNH98, Chapter 5], and [ANN99].
Part IV
Pragmatics
17 Compilation Bless thee, Bottom! bless thee! thou art translated. — William Shakespeare, A Midsummer Night’s Dream, act 2, scene 1
17.1
Why Do We Study Compilation?
Compilation is the process of translating a high-level program into instructions that can be directly executed by a low-level machine, such as a microprocessor or a simple virtual machine. Our goal in this chapter is to use compilation to further our understanding of advanced programming language features, including the practical implications of language design choices. To be a good designer or user of programming languages, one must know not only how a computer carries out the instructions of a program (including how data are represented) but also the techniques by which a high-level program is converted into something that runs on an actual computer. In this chapter, we will show the relationship between the semantic tools developed earlier in the book and the practice of translating high-level language features to executable code. Our approach to compilation is different from the approach taken in most compiler texts. We assume that the input program has already been parsed and is syntactically correct, thus ignoring issues of lexical analysis and parsing that are important in real compilers. We also assume that type and effect checking are performed by the reconstruction techniques we have already studied. Our focus will be a series of source-to-source program transformations that implement complex high-level naming, state, and control features by making them explicit in an FL-like intermediate compilation language. A key benefit of our approach is that it dispenses with traditional special-purpose compilation machinery like symbol tables, invocation frames, stacks, and basic blocks. These notions are uniformly represented as patterns in the structure of the intermediate code. The result of compilation will be a program in a restricted subset of the intermediate
1006
Chapter 17
Compilation
language that can be viewed as instructions for a simple virtual register machine. In this way we avoid details of code generation that are important when targeting a real microprocessor. Throughout the compilation process, efficiency will take a back seat to clarity, modularity, expressiveness, and demonstrable correctness. The notion of compilation by source-to-source transformation has a rich history. Beginning with Guy Steele’s Rabbit compiler ([Ste78]), there is a long line of research compilers based on this approach. (See the notes at the end of this chapter for more details.) In homage to Rabbit, we will call our compiler Tortoise. We study compilation for the following reasons: • We can review many of the language features presented earlier in this book in a new light. By showing how programs can be transformed into low-level machine code, we arrive at a more concrete understanding of these features. • We present some simple ways to implement language features by translation. These techniques can be useful in everyday programming, especially if your programming language doesn’t support the features that you need. • We will see how complex translations can be composed out of many simple passes. Although in practice these passes might be merged, we will discuss them separately for conceptual clarity. • We will see that the inefficiencies that crop up in the compiler are a good motivation for studying static semantics. These inefficiencies can be addressed by a combination of two methods: • Developing smarter translation techniques that exploit information known at compile time. • Restricting source languages to make them more amenable to static analysis techniques. For example, we’ll see (in Section 18.2.2) that dynamically typed languages imply a run-time overhead that can be reduced by clever techniques or eliminated by requiring the language to be statically typable. We begin with an overview of the transformation-based architecture of Tortoise (Section 17.2). We then discuss the details of each transformation in turn (Sections 17.3–17.12).
17.2 Tortoise Architecture
17.2
Tortoise Architecture
17.2.1
Overview of Tortoise
1007
The Tortoise compiler is organized into ten transformations that incrementally massage a source language program into code resembling register machine code (Figure 17.1). The input and output of each transformation are programs written either in dialects of FLARE or in dialects of an FL-like intermediate language named FIL that is defined later. The output of the compiler is a program in FILreg , a dialect of FIL whose constructs can be viewed as instructions for a low-level register machine. We review FLARE in this section and present the dialects of FIL later as they are needed. We will see that dialects of FL (including FLARE) can be powerful intermediate languages for compilation. Many low-level machine details find a surprisingly convenient expression in FL-like languages. Some advantages of structuring our compiler as a series of source-to-source transformations on dialects of FL are: • All the intermediate languages are closely related to FL, a language whose semantics we already understand well. • When intermediate languages are closely related, compiler writers are more likely to develop modular stages and experiment with their ordering. • The result of every transformation stage is executable source code in a dialect of FL. This facilitates reading and testing the transformation results using an interpreter (or compiler) for the dialect. Because the dialects are so similar, their interpreters are closely related. Indeed, modulo the verification of certain syntactic constraints, a single interpreter can be used for most of the dialects. Each compiler transformation expects its input program to satisfy certain preconditions and produces output code that satisfies certain postconditions. These conditions will be stated explicitly in the formal specification of each transformation. They will help us understand the purpose of each transformation, and why the compiler is sound. A compiler is sound when it produces low-level code that faithfully implements the formal semantics of the compiler’s source language. We will not formally prove the soundness of any of the transformations because such proofs can be very complex. Indeed, soundness proofs for some of these transformations have been the basis for Ph.D. dissertations! However, we will informally argue that the transformations are sound.
1008
Chapter 17
Compilation
FLARE/V Desugaring Globalization Assignment Conversion FLARE Type/Effect Reconstruction FLARE Translation FIL Renaming CPS Conversion FIL cps Closure Conversion Lifting FIL lift Register Allocation FILreg Figure 17.1 Organization of the Tortoise compiler. The initial transformations translate the FLARE/V source program to a FLARE program. This is translated into the FIL intermediate language and is then gradually transformed into a form that resembles register machine code.
17.2.2 The Compiler Source Language: FLARE/V
1009
Tortoise implements each transformation as a separate pass for clarity of presentation and to allow for experimentation. Although we will apply the transformations in a particular order in this chapter, other orders are possible. Our descriptions of the transformations will explore some alternative implementations and point out how different design choices affect the efficiency and semantics of the resulting code. We generally opt for simplicity over efficiency in our presentation.
17.2.2
The Compiler Source Language: FLARE/V
The source language of the Tortoise compiler is FLARE/V, a version of the FLARE language presented in Chapter 13 extended with mutable variables (using the set! construct from the FLAVAR language presented in Section 8.4). We include mutable variables in the source language because they are a standard feature in many languages and we wish to show how they can be automatically transformed into mutable cells (via the assignment conversion transformation in Section 17.5). FLARE/V is a stateful, call-by-value, statically scoped, function-oriented, and statically typed language with type reconstruction that supports mutable cells, mutable variables, pairs, and homogeneous immutable lists. For convenience, the complete syntax of FLARE/V is presented in Figures 17.2 and 17.3. This is the same as the presentation of FLARE in Figure 13.23 on page 814 except that (1) FLARE/V includes mutable variables via the set! construct and (2) the desugaring of a full-language program into a kernel program does not introduce bindings for standard identifiers like the names of primitive operations.1 All primitive names (such as *, >, and cons) may still be used as free identifiers in a FLARE/V program, where they denote global procedures performing the associated primitive operations, but this is implemented by the globalization transformation presented in Section 17.4 rather than via desugaring. As before, (prim * E1 E2 ) may be written as (* E1 E2 ) in almost any context. We say “almost any” because these names can be assigned and locally rebound like any other names. For example, the program (flare (x y) (let ((- +)) (begin (set! / *) (- (/ x x) (/ y y)))))
calculates the sum of the squares of x and y. 1
For simplicity, we reuse the program keywords flare and flarek for FLARE/V rather than introducing new ones.
1010
Chapter 17
Compilation
Kernel Grammar ∗ ) Ebody ) P ∈ Prog ::= (flarek (Iformal E ∈ Exp ::= | | |
L | I | (error Ymessage ) | (if Etest Ethen Eelse ) | ∗ ) (set! Ivar Eval ) | (prim Oprimop Earg ∗ ∗ (abs (Iformal ) Ebody ) | (Erator Erand ) (let ((Iname Edefn )∗ ) Ebody ) | (letrec ((Iname Edefn )∗ ) Ebody )
L ∈ Lit ::= #u | B | N | (sym Y ) B ∈ BoolLit = {#t, #f} as in FL. N ∈ IntLit = as in FL and FLARE. Y ∈ SymLit = as in FL and FLARE. O ∈ Primop ::= | | | | |
+ | - | * | / | % | >= | bool=? | sym=? not | and | or pair | fst | snd cons | car | cdr | null | null? cell | ^ | := | cell=?
; ; ; ; ; ;
arithmetic ops relational ops logical ops pair ops list ops mutable cell ops
Keyword = {abs, error, flarek, if, let, letrec, prim, set!, sym} SugarKeyword = {begin, cond, def, flare, list, recur, scand, scor}} I ∈ Ident = SymLit − ({Y | Y begins with @} ∪ Keyword ∪ SugarKeyword) Figure 17.2
Kernel grammar for the FLARE/V language.
Figure 17.4 presents a contrived but compact FLARE/V program that illustrates many features of the language, such as numbers, booleans, lists, locally defined recursive procedures, higher-order procedures, tail and nontail procedure calls (see Section 17.9.1 for a discussion of tail versus nontail calls), and mutable variables. We will use it as a running example throughout the rest of this chapter. The revmap procedure takes a procedure f and a list elts of elements and returns a new list that is the reversal of the list obtained by applying f to each element of elts. The accumulation of the new list ans is performed by a local iterative loop procedure that is defined using the recur sugar, which abbreviates the declaration and invocation of a recursive procedure. The loop procedure performs an iteration in a single state variable xs denoting the unprocessed elements of elts. Although ans could easily be made a second argument to loop, here it is defined externally to loop and updated via set! to illustrate the use of a mutable variable. The example program takes two integer arguments, a and b, and returns a list of the two booleans ((7 · a) > b) and (a > b). For example, on the inputs 6 and 17, the program returns the list true, false .
17.2.2 The Compiler Source Language: FLARE/V
1011
Syntactic Sugar (@Oprimop E ni=1 ) ds (prim Oprimop E ni=1 ) (cond (else Edefault )) ds Edefault (cond (Etest1 Ethen1 ) (Etesti Etheni )ni=2 (else Edefault )) ds (if Etest1 Ethen1 (cond (Etesti Etheni )ni=2 (else Edefault ))) (scand) ds #t ∗ ∗ ) ds (if Econjunct (scand Erest ) #f) (scand Econjunct Erest (scor) ds #f ∗ ∗ ) ds (if Edisjunct #t (scor Erest )) (scor Edisjunct Erest (recur Iproc ((Ii Ei )ni=1 ) Ebody ) n ) Ebody ))) (Iproc E ni=1 )) ds (letrec ((Iproc (abs (I i=1 (begin) ds #u (begin E ) ds E ∗ ∗ ) ds (let ((_ E1 )) (begin Erest )), (begin E1 Erest where _ is a special identifier that can never be referenced (list) ds (prim null) ∗ ∗ ) ds (prim cons E1 (list Erest )) (list E1 Erest ∗ (def (IprocName IprocFormal ) EprocBody ) ∗ ds (def IprocName (abs (IprocFormal ) EprocBody )) ∗ (flare (IpgmFormal ) EpgmBody (def Inamei Edefni )ni=1 ) {Assume procedure defs already desugared to (def I E ) by the previous rule.} ∗ ds (flarek (IpgmFormal ) {Compiler handles standard identifiers via globalization, not desugaring.} (letrec ((Inamei Edefni )ni=1 ) EpgmBody ))
Figure 17.3
Syntactic sugar for the FLARE/V language.
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (null))) (recur loop ((xs elts)) (if (null? xs) ans (begin (set! ans (cons (f (car xs)) ans)) (loop (cdr xs))))))))) (revmap (abs (x) (> x b)) (list a (* a 7))))) Figure 17.4
revmap program.
1012
Chapter 17
Compilation
tf ∈ Transform FLARE /V = ExpFLARE /V → ExpFLARE /V mapsubFLARE /V : ExpFLARE /V → TransformFLARE /V → ExpFLARE /V mapsubFLARE /V [[L]] tf = L mapsubFLARE /V [[I ]] tf = I mapsubFLARE /V [[(error Ymsg )]] tf = (error Ymsg ) mapsubFLARE /V [[(if Etest Ethen Eelse )]] tf = (if (tf Etest ) (tf Ethen ) (tf Eelse )) mapsubFLARE /V [[(set! Ivar Eval )]] tf = (set! Ivar (tf Eval )) n n mapsubFLARE /V [[(abs (I i=1 ) Ebody )]] tf = (abs (I i=1 ) (tf Ebody ))
mapsubFLARE /V [[(Erator E ni=1 )]] tf = ((tf Erator ) (tf Ei )ni=1 ) mapsubFLARE /V [[(prim O E ni=1 )]] tf = (prim O (tf Ei )ni=1 ) mapsubFLARE /V [[(let ((Ii Ei )ni=1 ) Ebody )]] tf = (let ((Ii (tf Ei ))ni=1 ) (tf Ebody )) mapsubFLARE /V [[(letrec ((Ii Ei )ni=1 ) Ebody )]] tf = (letrec ((Ii (tf Ei ))ni=1 ) (tf Ebody )) Figure 17.5 The mapsubFLARE /V function simplifies the specification of purely structural transformations.
17.2.3
Purely Structural Transformations
Most of the FLARE/V and FIL program transformations that we shall study can be described by functions that traverse the abstract syntax tree of the program and transform some of the tree nodes but leave most of the nodes unchanged. We will say that a transformation is purely structural for a given kind of tree node if the result of applying it to that node results in the same kind of node, in which each child node is a transformed version of the corresponding child of the original node. We formalize this notion for FLARE/V via the mapsubFLARE /V function defined in Figure 17.5. This function returns a copy of the given FLARE expression whose immediate subexpressions have been transformed by a given transformation tf . A FLARE transformation is purely structural for a given kind of node if its action on that node can be written as an application of mapsubFLARE /V . As an example of mapsubFLARE /V , consider a transformation T that rewrites every occurrence of (if (prim not E1 ) E2 E3 ) to (if E1 E3 E2 ). The fact that T is purely structural on all but if nodes is expressed via a single invocation of mapsubFLARE /V in the following definition:
17.3 Transformation 1: Desugaring
1013
subexpsFLARE /V : ExpFLARE /V → Exp∗FLARE /V subexpsFLARE /V [[L]] = [ ] subexpsFLARE /V [[I ]] = [ ] subexpsFLARE /V [[(error Ymsg )]] = [ ] subexpsFLARE /V [[(if Etest Ethen Eelse )]] = [Etest , Ethen , Eelse ] subexpsFLARE /V [[(set! Ivar Eval )]] = [Eval ] n subexpsFLARE /V [[(abs (I i=1 ) Ebody )]] = [Ebody ]
subexpsFLARE /V [[(Erator E ni=1 )]] = [Erator , E1 , . . . , En ] subexpsFLARE /V [[(prim O E ni=1 )]] = [E1 , . . . , En ] subexpsFLARE /V [[(let ((Ii Ei )ni=1 ) Ebody )]] = [E1 , . . . , En , Ebody ] subexpsFLARE /V [[(letrec ((Ii Ei )ni=1 ) Ebody )]] = [E1 , . . . , En , Ebody ] Figure 17.6 The subexpsFLARE /V function returns a sequence of all immediate subexpressions of a given FLARE/V expression. T : ExpFLARE /V → ExpFLARE /V T [[(if (prim not E1 ) E2 E3 )]] = (if (T [[E1 ]]) (T [[E3 ]]) (T [[E2 ]])) T [[E ]] = mapsubFLARE /V [[E ]] T , for all other expressions E
When manipulating expressions, it is sometimes helpful to extract from an expression a collection of its immediate subexpressions. Figure 17.6 defines a subexpsFLARE /V function that returns a sequence of all child expressions of a given FLARE/V expression.
17.3
Transformation 1: Desugaring
The first pass of the Tortoise compiler performs desugaring, converting the convenient syntax of FLARE/V into a simpler kernel subset of the language. The advantage of having the first transformation desugar the program is that subsequent analyses and transforms are simpler to write and prove correct because there are fewer syntactic forms to consider. Additionally, subsequent transformations also do not require modification if the language is extended by introducing new syntactic shorthands. We will provide preconditions and postconditions for each of the Tortoise transformations. In the case of desugaring, these are:
1014
Chapter 17
Compilation
Preconditions: The input to the desugaring transformation is a wellformed full FLARE/V program. Postconditions: The output of the desugaring transformation is a wellformed kernel FLARE/V program. We will say that a program is well formed in a language when it satisfies the grammar of the language — i.e., it does not contain any syntactic errors. There is an additional postcondition that we expect for desugaring (and all other transformations we study): The output program should have the same behavior as the input program. This is a fundamental property of each compilation stage that we will not explicitly state in every postcondition. One consequence of this property is that if the input program never encounters a dynamic type error, then neither does the output program. For dialects of FLARE, we can use a notion of well-typedness to conservatively approximate which programs never encounter a dynamic type error. (Although we have not formally described a type system for full FLARE/V, it is possible to define one by extending the type system of kernel FLARE with type rules for set! and all the syntactic sugar constructs.) We expect that Tortoise stages transforming programs in these dialects should preserve well-typedness. The desugaring process for FLARE/V is similar to the rewriting approach to desugaring summarized in Figures 6.6 and 6.7 on pages 232 and 233, so we will not repeat the details of the transformation process here. Figure 17.7 shows the result of desugaring the revmap example introduced in Figure 17.4. The (recur loop . . .) desugars into a letrec, the begin desugars into a let that binds the special variable _ (which we assume is never referenced), and the list desugars into a null-terminated nested sequence of conses.
17.4
Transformation 2: Globalization
In general, a program unit being compiled may contain free identifiers that reference externally defined values in standard libraries or other program units. Such free identifiers must somehow be resolved via a name resolution process before they are referenced during program execution. Depending on the nature of the free identifiers, name resolution can take place during compilation, during a linking phase that takes place after compilation but before execution (see Section 15.1), or during the execution of the program unit. In cases where name resolution takes place after compilation, the compiler may still require some information about the free identifiers, such as their types, even though their values may be unknown.
17.4 Transformation 2: Globalization
1015
(flarek (a b) (let ((revmap (abs (f elts) (let ((ans (null))) (letrec ((loop (abs (xs) (if (null? xs) ans (let ((_ (set! ans (cons (f (car xs)) ans)))) (loop (cdr xs))))))) (loop elts)))))) (revmap (abs (x) (> x b)) (prim cons a (prim cons (* a 7) (prim null)))))) Figure 17.7
revmap program after desugaring.
In the Tortoise compiler, we consider a very simple form of compile-time linking that resolves free references to standard identifiers like +, (T ni=1 ) Tres ) n or TE prim (O) = (generic (τ m j=1 ) (-> (T i=1 ) Tres ))
ABS[[I ]] = undefined, where I ∈ PrimopFLARE /V Figure 17.8
The wrapping approach to globalization.
wrapping strategy used here includes bindings for only the standard identifiers actually used in the program rather than all those that are supported by the language. For example, the wrapping strategy transforms the program (flarek (x y) (+ (* x x) (* y y)))
into (flarek (x y) (let ((+ (abs (v.0 v.1) (prim + v.0 v.1))) (* (abs (v.2 v.3) (prim * v.2 v.3)))) (+ (* x x) (* y y))))
We assume that identifiers ending in a period followed by a number (such as v.0 and v.1) are names that are freshly generated during the compilation process. Constructing an abstraction for a primitive operator (via ABS) requires knowing the number of arguments that it takes. In FLARE/V, this can be determined from the type of the primitive operator name in the primitive type environment, TE prim . ABS is a partial function because it is undefined for identifiers that are not the names of primitive operators. wrap is also a partial function because it is undefined if any invocation of ABS in its definition is undefined. Similarly, GW is undefined if the invocation of wrap in its definition is undefined; this is how the failure of the globalization transformation is modeled in the case where a free identifier in the program is not the name of a primitive operator. The wrapping strategy can be extended to handle standard identifiers that are not the names of primitive operators (see Exercise 17.2).
17.4 Transformation 2: Globalization
1017
The Inlining Strategy A drawback of the wrapping strategy is that global procedures are invoked via the generic procedure-calling mechanism rather than the mechanism for invoking primitive operators (prim). We will see in later stages of the compiler that the latter is handled far more efficiently than the former. This suggests an alternative approach in which calls to global procedures are transformed into primitive applications. Replacing a procedure call by an instantiated version of its body is known as inlining, so we shall call this the inlining strategy for globalization. Using the inlining strategy, the sum-of-squares program is transformed into: (flarek (x y) (prim + (prim * x x) (prim * y y)))
There are three situations that need to be handled carefully in the inlining strategy for globalization: 1. A reference to a global procedure can be converted to an instance of prim only if it occurs in the rator position of a procedure application. References in other positions must be handled either by wrapping or by converting them to abstractions. Consider the expression (cons + (cons * (null)))
which makes a list of two procedures. The occurrences of cons and null can be transformed into prims, but the + and * cannot be. They can, however, be turned into abstractions containing prims: (prim cons (abs (v.0 v.1) (prim + v.0 v.1)) (prim cons (abs (v.2 v.3) (prim * v.2 v.3)) (prim null)))
Alternatively, we can “lift” the abstractions for + and * to the top of the enclosing program and name them, as in the wrapping approach. 2. In languages like FLARE/V, where local identifiers may have the same name as global standard identifiers for primitive operators, care must be taken to distinguish references to global and local identifiers.2 For example, in the program (flare (x) (let ((+ *)) (- (+ 2 x) 3))), the invocation of + in (+ 2 x) cannot be inlined, but the invocation of - can be: (flare (x) (let ((+ (abs (v.0 v.1) (prim * v.0 v.1)))) (prim - (+ 2 x) 3))) 2
Many programming languages avoid this and related problems by treating primitive operator names as reserved keywords that may not be used as identifiers in declarations or assignments. This allows compiler writers to inline all primitives.
1018
Chapter 17
Compilation
3. In FLARE/V, the values associated with global primitive identifier names can be modified by set!. For example, consider (flarek (x y) (* (+ x (let ((_ (set! + -))) y)) (+ x y)))
in which the first occurrence of + denotes addition and the second occurrence denotes subtraction. It would clearly be incorrect to replace the second occurrence by an inlined addition primitive. Correctly inlining addition for the first occurrence and subtraction for the second occurrence is possible in this case, but can be justified only by a sophisticated effect analysis. A simple conservative way to address this problem in the inlining strategy is to use wrapping rather than inlining for any global name that is mutated somewhere in the program. For the above example, this yields: (flarek (x y) (let ((+ (abs (v.2 v.3) (prim + v.2 v.3)))) (prim * (+ x (let ((_ (set! + (abs (v.0 v.1) (prim - v.0 v.1))))) y)) (+ x y))))
All of the above issues are handled by the definition of the inlining approach to globalization in Figure 17.9. The GI prog function uses MutIds prog (defined in Figure 17.10) to determine the mutated free identifiers of a program — i.e., the free identifiers that are targets of assignments — and wraps the program body in abstractions for these. All other free identifiers should name primitives that may be inlined in call positions or expanded to abstractions (via ABS from Figure 17.8) in other positions. The identifier-set argument to GI exp keeps track of the unmutated free identifiers in the program that have not been locally redeclared. Again, the undefined cases of partial functions are used to model the situations in which globalization fails. Figure 17.11 shows our revmap example after the globalization stage using the inlining strategy. In this case, all references to free identifiers have been converted to primitive applications. In this and subsequent examples, we “resugar” primitive applications (prim O . . . ) to (@O . . . ) to make the code more concise. Exercise 17.1 What is the result of globalizing the following program using (1) the wrapping strategy and (2) the inlining strategy? (flare (* /) (+ (let ((+ *)) (- + 1)) (let ((* -)) (* / 2))))
17.5 Transformation 3: Assignment Conversion
1019
IS ∈ IdSet = P(IdentFLARE /V ) GI prog : ProgFLARE /V ProgFLARE /V n GI prog [[P ]] = (flarek (I i=1 ) (wrap[[GI exp [[Ebody ]] IS unmuts ]] IS muts )) n ) Ebody ), where P = (flarek (I i=1 IS muts = MutIds prog [[P ]], IS unmuts = (FrIds[[P ]]) − IS muts , wrap is defined in Figure 17.8, and MutIds prog is defined in Figure 17.10
GI exp : ExpFLARE /V → IdSet ExpFLARE /V GI exp [[(Irator E ni=1 )]] IS = if Irator ∈ IS then if Irator ∈ PrimopFLARE /V then (prim Irator (GI exp [[Ei ]] IS )ni=1 ) else undefined end else (Irator (GI exp [[Ei ]] IS )ni=1 ) end GI exp [[I ]] IS = if I ∈ IS then ABS[[I ]] else I end where ABS is defined in Figure 17.8 n n GI exp [[(abs (I i=1 ) Ebody )]] IS = (abs (I i=1 ) (GI exp [[Ebody ]] (IS − ∪ni=1 {Ii })))
GI exp [[(let ((Ii Ei )ni=1 ) Ebody )]] IS = (let ((Ii (GI exp [[Ei ]] IS ))ni=1 ) (GI exp [[Ebody ]] (IS − ∪ni=1 –Ii ˝))) n GI exp [[(letrec ((I i Ei )i=1 ) Ebody )]] IS = (letrec ((Ii GI exp [[Ei ]] IS )ni=1 ) GI exp [[Ebody ]] IS ) where IS = IS − ∪ni=1 {Ii }
GI exp [[E ]] IS = mapsubFLARE /V [[E ]] (λEsub . GI exp [[Esub ]] IS ), otherwise. Figure 17.9
The inlining approach to globalization.
Exercise 17.2 The globalization strategies described in this section assume that all standard identifiers name primitive procedures, but a standard library typically contains other kinds of entities. Describe how to extend globalization (both the wrapping and inlining strategies) to handle standard identifiers that name (1) literal values (e.g., true standing for #t) and (2) nonprimitive procedures (e.g., length and map from the FL standard library). Keep in mind that the nonprimitive procedures might be recursive or even mutually recursive.
17.5
Transformation 3: Assignment Conversion
Assignment conversion removes all mutable variables from a program by converting them to mutable cells. We will say that the resulting program is assignment-free because it contains no occurrences of the set! construct.
1020
Chapter 17
Compilation
MutIds prog : ProgFLARE /V → P(IdentFLARE /V ) n MutIds prog [[(flarek (I i=1 ) Ebody )]] = MutIds[[Ebody ]] − ∪ni=1 {Ii }
MutIds : ExpFLARE /V → P(IdentFLARE /V ) MutIds[[(set! I E )]] = {I } ∪ MutIds[[E ]] n MutIds[[(abs (I i=1 ) Ebody )]] = MutIds[[Ebody ]] − ∪ni=1 {Ii }
MutIds[[(let ((Ii Ei )ni=1 ) Ebody )]] = (∪ni=1 MutIds[[Ei ]]) ∪ (MutIds[[Ebody ]] − ∪ni=1 {Ii }) MutIds[[(letrec((Ii Ei )ni=1 ) Ebody )]] = ((∪ni=1 MutIds[[Ei ]]) ∪ MutIds[[Ebody ]]) − ∪ni=1 {Ii } MutIds[[E ]] = ∪E ∈subexps[[E ]] MutIds[[E ]], otherwise, (Since literals, variable references, and error expressions have no subexpressions, they have no mutated free identifiers.) Figure 17.10
Mutated free identifiers of FLARE/V expressions and programs.
Assignment conversion makes all mutable storage explicit and simplifies later passes by making all variable bindings immutable. After assignment conversion, all variables denote values rather than implicit cells containing values. A variable may be bound to an explicit cell value whose content varies with time, but the explicit cell value bound to the variable cannot change. As we will see later in the closure conversion stage (Section 17.10), assignment conversion is important because it allows environments to be treated as immutable data structures that can be freely shared and copied without concerns about side effects. In our compiler, assignment conversion precedes type and effect reconstruction because reconstruction is simpler in a language without mutable variables (FLARE) than one with them (FLARE/V). Additionally, in a language without mutable variables, all variable references are guaranteed to be pure, which enhances let-style polymorphism. A straightforward approach to assignment conversion is to make an explicit cell for every variable in a given program. For example, the factorial program (flarek (x) (let ((ans 1)) (letrec ((loop (abs (n) (if (@= n 0) ans (let ((_ (set! ans (@* n ans)))) (loop (@- n 1))))))) (loop x))))
17.5 Transformation 3: Assignment Conversion
1021
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (@null))) (letrec ((loop (abs (xs) (if (@null? xs) ans (let ((_ (set! ans (@cons (f (@car xs)) ans)))) (loop (@cdr xs))))))) (loop elts)))))) (revmap (abs (x) (@> x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.11
revmap example after globalization using inlining.
can be assignment-converted to (flarek (x) (let ((x (@cell x))) (let ((ans (@cell 1))) (letrec ((loop (@cell (abs (n) (let ((n (@cell n))) (if (@= (@^ n) 0) (@^ ans) (let ((_ (@:= ans (@* (@^ n) (@^ ans))))) ((@^ loop) (@- (@^ n) 1))))))))) ((@^ loop) (@^ x))))))
In the converted program, each of the variables in the original program (x, ans, loop, n) is bound to an explicit cell. Each variable reference I in the original program is converted to a cell reference (@^ I ), and each variable assignment (set! I E ) in the original program is converted to a cell assignment of the form (@:= I E ), where E is the converted E . The code generated by the naive approach to assignment conversion can contain many unnecessary cell allocations, references, and assignments. A cleverer strategy is to make explicit cells only for those variables that are mutated in the program. Determining exactly which variables are mutated when a program executes is undecidable. We employ a simple conservative syntactic approximation that defines a variable to be mutated if it is assigned within its scope. In the
1022
Chapter 17
Compilation
factorial example, the alternative strategy yields the following program, in which only the ans variable is converted to a cell: (flarek (x) (let ((ans (@cell 1))) (letrec ((loop (abs (n) (if (@= n 0) (@^ ans) (let ((_ (@:= ans (@* n (@^ ans))))) (loop (@- n 1))))))) (loop x))))
The improved approach to assignment conversion is formalized in Figure 17.12. The AC prog function wraps the transformed body of a FLARE/V program in a let that binds each mutated program parameter (that is, each mutated free identifier in the body) to a cell. The free identifiers syntactically assigned within an expression are determined by the MutIds function defined in Figure 17.10. Expressions are transformed by the AC exp function, whose second argument is the set of in-scope identifiers naming variables that have been transformed to cells. Processing of variable references transforms such identifiers to cell references; variable assignments are transformed to cell assignments. The only other nontrivial cases for AC exp are the binding constructs abs, let, and letrec. All of these cases use the partition function to partition the identifiers declared by these constructs into two sets: the mutated identifiers IS M that are assigned somewhere in the given expressions, and the unmutated identifiers IS U that are not assigned. In each of these cases, any subexpression in the scope of the declared identifiers is processed by AC exp with an identifier set that includes IS M but excludes IS U . The exclusion is necessary to prevent the conversion of local unmutated variables that have the same name as external mutated variables. For example, (flarek (x) (let ((_ (set! x (@* x 2)))) ((abs (x) x) x))) is converted to (flarek (x) (let ((x (@cell x))) (let ((_ (@:= x (@* (@^ x) 2)))) ((abs (x) x) (@^ x))))) Even though the program parameter x is converted to a cell, the x in the abstraction body is not.
17.5 Transformation 3: Assignment Conversion
1023
IS ∈ IdSet = P(IdentFLARE /V ) AC prog : ProgFLARE /V → ProgFLARE Preconditions: The input to AC prog is a well-formed, closed, kernel FLARE/V program. Postconditions: The output of AC prog is a well-formed, closed, assignment-free, kernel FLARE program. n AC prog [[(flarek (I i=1 ) Ebody )]] n = (flarek (I i=1 ) (wrap-cells IS muts (AC exp [[Ebody ]] IS muts ))) where IS muts = MutIds[[Ebody ]] and MutIds is defined in Figure 17.10.
AC exp : ExpFLARE /V → IdSet → ExpFLARE AC exp [[I ]] IS = if I ∈ IS then (@^ I ) else I end AC exp [[(set! I E )]] IS = (@:= I (AC exp [[E ]] IS )) n AC exp [[(abs (I i=1 ) Ebody )]] IS = let IS M , IS U be (partition {I1 , . . . , In } [Ebody ]) n ) (wrap-cells IS M (AC exp [[Ebody ]] ((IS ∪ IS M ) − IS U )))) in (abs (I i=1
AC exp [[(let ((Ii Ei )ni=1 ) Ebody )]] IS = let IS M , IS U be (partition {I1 , . . . , In } [Ebody ]) in (let ((Ii (maybe-cell Ii IS M (AC exp [[Ei ]] IS )))ni=1 ) (AC exp [[Ebody ]] ((IS ∪ IS M ) − IS U ))) AC exp [[(letrec ((Ii Ei )ni=1 ) Ebody )]] IS , Ebody ]) = let IS M , IS U be (partition {I1 , . . . , In } [E1 , . . . , En in (letrec ((I IS [[E ]] IS maybe-cell I AC )ni=1 ) i i M exp i AC exp [[Ebody ]] IS ), where IS = ((IS ∪ IS M ) − IS U ) AC exp [[E ]] IS = mapsubFLARE /V [[E ]] (λEsub . AC exp [[Esub ]] IS ), otherwise. wrap-cells : IdSet → ExpFLARE → ExpFLARE wrap-cells {} E = E wrap-cells {I1 . . . In } E = (let ((Ii (@cell Ii ))ni=1 ) E ), where n ≥ 1. partition : IdSet → ExpFLARE /V * → (IdSet × IdSet) partition IS [E1 . . . En ] = let IS M be ∪ki=1 (MutIds[[Ei ]]) in IS ∩ IS M , IS − IS M maybe-cell : Ident → IdSet → ExpFLARE /V maybe-cell I IS E = if I ∈ IS then (@cell E ) else E end Figure 17.12 An assignment-conversion transformation that converts only those variables that are syntactically assigned in the program.
1024
Chapter 17
Compilation
Abstractions are processed like programs in that the transformed abstraction body is wrapped in a let that binds each mutated identifier to a cell. This preserves the call-by-value semantics of FLARE, since an assignment to the formal parameter of an abstraction is transformed to a cell assignment that modifies the content of a cell that is allocated locally within the abstraction. The transformation can be modified to instead implement a call-by-reference semantics (see page 436), in which a formal parameter assignment is transformed to an assignment of a cell passed into the abstraction from the point of application (Exercise 17.5). In processing let and letrec, maybe-cell is used to wrap the binding expressions for mutated identifiers in applications of the cell primitive. These two forms are processed similarly except for scoping differences in their declared names. Figure 17.13 shows our revmap example after the assignment-conversion stage. The only variable assigned in the input program is ans, and this is converted to a cell. Intuitively, consistently converting a mutated variable along with its references and assignments into explicit cell operations should not change the observable behavior of a program. So we expect that assignment conversion should preserve both the type safety and the meaning of a program. However, formally proving such intuitions can be challenging. See [WS97] for a proof that a version of assignment conversion for Scheme is a meaning-preserving transformation. Exercise 17.3 Show the result of assignment-converting the following programs using AC prog : (flarek (a b c) (let ((_ (set! a (@+ a c)))) (abs (a d) (let ((_ (set! c (@* a b)))) (set! d (@+ c d)))))) (flarek (x) (letrec ((f (abs (y) (@pair y (g (@- y 1))))) (g (abs (z) (let ((_ (set! g (abs (w) w)))) (f z))))) (f x)))
Exercise 17.4 Can assignment conversion be performed before globalization? Explain. Exercise 17.5 Suppose that FLARE/V had a call-by-reference semantics rather than a call-by-value semantics for mutable variables (see Section 8.4). Modify the definition of assignment conversion so that it implements call-by-reference semantics. (Compare to Exercise 8.22 on page 439.)
17.6 Transformation 4: Type/Effect Reconstruction
1025
(flare (a b) (let ((revmap (abs (f elts) (let ((ans (@cell (@null)))) (letrec ((loop (abs (xs) (if (@null? xs) (@^ ans) (let ((_ (@:= ans (@cons (f (@car xs)) (@^ ans))))) (loop (@cdr xs))))))) (loop elts)))))) (revmap (abs (x) (@> x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.13
revmap program after assignment conversion.
Exercise 17.6 A straightforward implementation of the AC prog and AC exp functions in Figure 17.12 is inefficient because (1) it traverses the AST of every declaration node at least twice: once to determine the free mutated identifiers, and once to transform the node; and (2) it may recalculate the free mutated identifiers for the same expression many times. Describe how to modify the assignment-conversion algorithm so that it works in a single traversal of the program AST and calculates the free mutated identifiers only once at every node. Note: You may need to modify the information stored in the nodes of a FLARE/V AST.
17.6
Transformation 4: Type/Effect Reconstruction
The fourth stage of the Tortoise compiler is type and effect reconstruction. Only well-typed FLARE programs are allowed to proceed through the rest of the compiler. The details of how types and effects are reconstructed were described earlier, in Section 16.2.3. Note that assignment conversion must precede this stage because type and effect reconstruction was defined for the FLARE language, which does not include set!. Preconditions: The input to type/effect reconstruction is a well-formed, closed kernel FLARE program that is assignment-free. Postconditions: The output of type/effect reconstruction is a valid, closed kernel FLARE program that is assignment-free. We will use the term valid to describe a program or expression that is well formed and is guaranteed not to encounter a dynamic type error.
1026
17.6.1
Chapter 17
Compilation
Propagating Type and Effect Information
Although neither FLARE nor FIL (the intermediate language to be used from the next compilation stage on) has explicit types or effects, this does not mean that the type and effect information generated by the FLARE type/effect reconstruction phase is thrown away. This information can be passed through the compiler stages via a separate channel, where it is appropriately transformed by each pass. In an actual implementation, this information might be stored in abstract syntax tree nodes for FLARE and FIL expressions, in symbol tables mapping variable names to their types, or in explicit type/effect derivation trees. We assume that this type and effect information is available for later stages, where it can be used to guide the compilation process. Similarly, the results from other static analyses, such as flow information [NNH98, DWM+ 01], could be computed at this stage and passed along to other compiler stages. An alternative approach used in many modern research compilers is to use so-called typed intermediate languages (TILs) that carry explicit type information (possibly including effect, flow, and other analysis information) through all stages of the compiler. In these systems, program transformations effectively transform type derivations of programs. The fact that each program manipulated by a TIL-based compiler is well typed has several advantages. The compiler can avoid generating code to check for run-time type errors, because these are provably impossible. The explicit type information carried by a TIL can be inspected to guide compilation (e.g., determining clever representations for certain types) and to implement run-time operations (such as tag-free garbage collection and checking safety properties of dynamically linked code). It also serves as an important tool for debugging a compiler implementation: if the output of a transformation doesn’t type-check, the transformation must have a bug. The reason that we do not use TILs in our presentation is to keep our compiler simple. TILs typically require a sophisticated type system with universal and existential types. Specifying each compiler stage becomes more complicated because it transforms not only expressions but their types. The explicit type information is often larger than the code it describes, which makes it impractical to show the result of compilation of even relatively simple expressions. See [MWCG99] for a presentation of using TILs to translate a high-level language all the way down into a typed assembly language.
17.6.2
Effect-based Code Optimization
The effect information reconstructed for a FLARE program is important for enabling many standard code optimizations performed by a compiler. We now discuss some of these in the context of the Tortoise compiler.
17.6.2 Effect-based Code Optimization
1027
Many program transformations require knowledge about expression interference (see Section 8.3.6). In our system, two expressions interfere if they both write to the same region or if one has a read effect on a region the other has an init or write effect on. A pure expression does not interfere with any other expression because it does not depend on the store in any way. For example, if two expressions interfere, it is unsafe to reorder them relative to each other, since this could change the order of operations on the store locations manipulated by both expressions. But if two expressions do not interfere, then it may be possible to reorder them, execute them in parallel, or perform other improvements. As a simple example of how effects can enable code optimizations, we demonstrate how the following FLARE abstraction can be improved if certain effect information is known. (abs (n) (letrec ((loop (abs (i) (if (@= i 0) (@^ x) (begin (h (f i) (g i)) (@:= x (k (g (f i)))) (@:= x (h (g i) (k n))) (loop (@- i 1))))))) (loop n)))
Assume that this abstraction appears in a scope where x is a cell in region rx and f, g, h, and k are procedures with the following latent effects: Procedure f g h k
Latent Effect (read rx) (maxeff (read ry) (write ry)) (maxeff (read rz) (write rz)) pure
Since the latent effects of f and g do not overlap, (f i) and (g i) do not interfere, and may be executed in parallel. This means that in a computer with multiple processing units, the expressions can be executed at the same time on different processing units. This is an improvement because it allows (f i) and (g i) to be executed in the maximum of the execution times for the two expressions rather than the sum of their times.3 If FLARE is extended with a letpar binding construct whose binding definition expressions are executed in parallel, then the begin expression in our example can be transformed to: 3 In practice, there is often an additional overhead associated with parallel execution; the individual execution times must be big enough to justify this overhead.
1028 (letpar ((a (f i)) (begin (h a b) (@:= x (k (@:= x (h (loop (@-
Chapter 17
Compilation
(b (g i))) (g (f i)))) (g i) (k n))) i 1))))
Extending FLARE with mutable arrays, each of which has an associated region, would further expand opportunities for parallelism. For example, given two arrays in distinct regions, loops to sum their elements could be executed in parallel. If an expression occurs more than once and it does not interfere with itself or any intervening expressions, then the result of the first occurrence can be named and the name can be used for the subsequent occurrences. This is known as common subexpression elimination. For example, the only effect of (f i) is (read rx), so it does not interfere with the invocations of f, g, and h (none of which has a (write rx) effect) that appear before the second occurrence. Since the first occurrence of (f i) already has the name a, the second occurrence of (f i) can be replaced by a: (letpar ((a (f i)) (begin (h a b) (@:= x (k (@:= x (h (loop (@-
(b (g i))) (g a))) {(f i) replaced by a} (g i) (k n))) i 1))))
Although (g i) also appears twice, its second occurrence cannot be eliminated because it interferes with the first occurrence as well as with (g a). Because g both reads and writes region ry, the second (g i) may have a different value than the first one. When an expression does not contribute to a program in its value or its effect, it may be removed via a process known as dead code elimination. For example, the second assignment expression, (@:= x (h (g i) (k n))), does not read rx before writing it, so the first assignment to x, in (@:= x (k (g a))), is unnecessary. This leaves (k (g a)), which cannot be entirely eliminated because (g a) writes to region ry, which is read later by (g i). But the invocation of k can be eliminated because it is pure and its result is not used: (letpar ((a (f i)) (b (g i))) (begin (h a b) (g a) {assignment to x and call to k eliminated} (@:= x (h (g i) (k n))) (loop (@- i 1))))
17.6.2 Effect-based Code Optimization
1029
It might seem unlikely that a programmer would ever write dead code, but it occurs in practice for a variety of reasons. For example, the assumptions in place when the code is originally written may no longer hold when the code is later modified. In our example, perhaps g and/or h initially had a latent (read rx) effect justifying the first assignment to x, but the procedures were later changed to remove this effect, and the programmer neglected to remove the first assignment to x. Perhaps the dead code was not written by a human but was created by an automatic program generator or was the result of transforming another program. Generators and transformers can be simpler to build when they are allowed to produce code that contains inefficiencies (such as common subexpressions and dead code) that are cleaned up by later optimization phases. When an expression in the body of a procedure or loop is guaranteed to have the same value for every invocation of the procedure or loop, it may be lifted out of the body via a transformation called code hoisting. In our example, since k is a pure procedure and n is an immutable variable defined outside the loop procedure, the invocation (k n) in the body of loop always has the same value. We can hoist it outside the definition of loop so that it is calculated only once rather than for every invocation of loop: (abs (n) (let ((c (k n))) {(k n) has been hoisted outside loop} (letrec ((loop (abs (i) (if (@= i 0) (@^ x) (letpar ((a (f i)) (b (g i))) (begin (h a b) (g a) (@:= x (h (g i) c)) {c replaces (k n)} (loop (@- i 1)))))))) (loop n))))
Note that if the k in (k n) were replaced by f or g, the expression could not be hoisted. The loop body writes to regions (rx and ry) that are read by these procedures, so (f n) and (g n) are not guaranteed to be loop-invariant. In each of the optimizations we have mentioned, effect information is critical for justifying the optimization. Without any effect information, we would need to conservatively assume that all invocations of f, g, h, and k are impure and interfere with each other and with the assignments to x. With these conservative assumptions, none of the optimizations we performed on our example would be permissible!
1030
17.7
Chapter 17
Compilation
Transformation 5: Translation
In this transformation, a kernel FLARE program is translated into the FIL intermediate language. All subsequent transformations are performed on FIL programs. We first present the FIL language and then describe how to transform FLARE to FIL.
17.7.1
The Compiler Intermediate Language: FIL
The intermediate language of the main stages of our transformation-based compiler uses a language that we call FIL, for Functional Intermediate Language. Like FLARE, FIL is a stateful, call-by-value, statically scoped, function-oriented language. However, FIL is simpler than FLARE in two important ways: 1. FIL supports fewer features than FLARE. It does not have a recursion construct (letrec) or an assignment construct (set!), and it represents both cells and pairs with a single form of mutable product. So specifying FIL transformations requires fewer cases than FLARE transformations. 2. Unlike FLARE, FIL does not have a formal type system and does not support type reconstruction. Although all of the remaining transformations can be expressed in a typed framework, the type systems and transformations are rather complex to describe. Specifying these transformations in FIL is much simpler. However, we will not completely disregard type and effect information. As discussed later (page 1035), we will assume that certain type and effect information is preserved by FIL programs, but will not formally describe how this is accomplished. The Syntax of FIL The syntax of FIL is specified in Figure 17.14. FIL is similar to many of the stateful variants of FL that we have studied. Some notable features of FIL are: • As in FLARE, multiargument abstractions and applications are hardwired into the kernel rather than being treated as syntactic sugar, and the abstraction keyword is abs. Unlike in FLARE, FIL applications have an explicit app keyword. • As in FLARE, multibinding let expressions are considered kernel expressions rather than sugar for applications of explicit abstractions.
17.7.1 The Compiler Intermediate Language: FIL
1031
Kernel Grammar ∗ ) Ebody ) P ∈ ProgFIL ::= (fil (Iformal E ∈ ExpFIL ::= | | |
L | I | (error Ymessage ) ∗ ) (if Etest Ethen Eelse ) | (prim Oprimop Earg ∗ ∗ (abs (Iformal ) Ebody ) | (app Erator Erand ) (let ((Iname Edefn )∗ ) Ebody )
L ∈ LitFIL ::= #u | B | N | (sym Y ) B ∈ BoolLit = {#t, #f} as in FLARE/V. N ∈ IntLit = as in FLARE/V. J ∈ PosLit = {1, 2, 3, . . .} Y ∈ SymLit = as in FLARE/V. O ∈ PrimopFIL ::= | | | | |
+ | - | * | / | % | >= bool=? | sym=? not | and | or cons | car | cdr | null | null? mprod | (mget J ) | (mset! J ) | mprod=? . . . other primitives will be added as needed . . .
; ; ; ; ;
arithmetic ops relational ops logical ops list ops mut. prod. ops
KeywordFIL = {abs, app, error, fil, if, let, let*, prim, sym} I ∈ IdentFIL = SymLit − ({Y | Y begins with @} ∪ KeywordFIL ) Syntactic Sugar (@mget J Emprod ) ds (prim (mget J ) Emprod ) (@mset! J Emprod Enew ) ds (prim (mset! J ) Emprod Enew ) (@Oop E ni=1 ) ds (prim Oop E ni=1 ), where Oop ∈{(mget J ), (mset! J )} (let* () Ebody ) ds Ebody (let* ((I1 E1 ) (Irest Erest )∗ ) Ebody ) ds (let ((I1 E1 )) (let* ((Irest Erest )∗ ) Ebody )) Figure 17.14
Syntax of FIL, the Tortoise compiler intermediate language.
• Unlike FLARE/V, FIL does not have mutable variables (i.e., no set!). But FIL does have mutable products (also known as mutable tuples), which are created via mprod, whose component slots are accessed via mget and changed via mset!, and which are tested for equality (i.e., same location in the store) via mprod=?. We treat mget and mset! as “indexed primitives” (mget Jindex ) and (mset! Jindex ) in which the primitive operator includes the index Jindex of the manipulated component slot. If we wrote (prim mget Eindex Emp ), this would imply that the index could be calculated by an arbitrary expression Eindex when in fact it must be a positive integer literal Jindex . So we instead
1032
Chapter 17
Compilation
write (prim (mget Jindex ) Emp ) (and similarly for mset!). Treating mget and mset! as primitives rather than as kernel constructs simplifies the definition of several transformations. • Unlike FLARE, FIL does not include cells and pairs; both are implemented as mutable products.4 • Unlike FLARE, FIL does not have any explicit kernel expression form (such as letrec) for recursive definitions. It is assumed that the “knot-tying” of recursion is instead performed by setting the components of mutable products. This is the approach taken in the translation from FLARE to FIL. • Other data include integers, booleans, symbols, and immutable lists, all of which are in FLARE. • Unlike FLARE, FIL does not support globally bound standard identifiers for procedures like +, x b)) (@cons a (@cons (@* a 7) (@null)))))) Figure 17.17
revmap program after translation.
((abs (x) (x w)) (abs (x) (let ((x (* x 2))) (+ x 1)))) ((abs (x) (x w)) (abs (y) (let ((z (* y 2))) (+ z 1))))
Some of the subsequent program transformations we will study require that programs are uniquely named to avoid problems with variable capture or otherwise simplify the transformation. Here we describe a renaming transformation whose output program is a uniquely named version of the input program. The renaming transformation is presented in Figure 17.18. In this transformation, every bound identifier in the program is replaced by a fresh identifier. Fresh names are introduced in all declaration constructs: the fil program construct and abs and let expressions. Renaming environments in the domain RenEnv are used to associate these fresh names with the original names and communicate the renamings to all variable references. Renaming is a purely structural transformation for all other nodes. As in many other transformations, we gloss over the mechanism for generating fresh identifiers. This mechanism can be formally specified and implemented by threading some sort of name-generation state through the transformation. For example, this state could be a natural number that is initially 0 and is incremented every time a fresh name is generated. The fresh name can combine the original name and the number in some fashion. In our examples, we assume that renamed
1040
Chapter 17
Compilation
Renaming Environments re ∈ RenEnv = Ident → Ident rbind : Ident → Ident → RenEnv → RenEnv = λIold Inew re . λIkey . if Ikey = Iold then Inew else (re Ikey ) end (rbind Iold Inew re) is abbreviated as [Iold →Inew ]re; this notation associates to the right. I.e., [I1 →I1 ][I2 →I2 ]re = [I1 →I1 ]([I2 → I2 ]re) Renaming Transformation Rprog : ProgFIL → ProgFIL Preconditions: The input to Rprog is a valid kernel FIL program. Postconditions: The output of Rprog is a valid and uniquely named kernel FIL program. n ) Ebody )]] Rprog [[(fil (I i=1 n = (fil (I i=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ] (λI . I )))), where I ni=1 are fresh.
Rexp : ExpFIL → RenEnv → ExpFIL Rexp [[I ]] re = (re I ) n Rexp [[(abs (I i=1 ) Ebody )]] re n = (abs (I i=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ]re))), where I ni=1 are fresh.
Rexp [[(let ((Ii Ei )ni=1 ) Ebody )]] re = (let ((Ii (Rexp [[Ei ]] re))ni=1 ) (Rexp [[Ebody ]] ([I1 → I1 ] . . . [In → In ]re))), where I ni=1 are fresh. Rexp [[E ]] re = mapsubFIL [[E ]] (λEsub . Rexp [[Esub ]] re), otherwise. Figure 17.18
Renaming transformation.
identifiers have the form prefix.number, where prefix is the original identifier, number is the current name-generator state value, and “.” is a special character that may appear in compiler-generated names but not in user-specified names.5 Later compiler stages may rename generated names from previous stages; we assume that only the prefix of the old generated name is used as the prefix for the new generated name. For example, x can be renamed to x.17, and x.17 can be renamed to x.42 (not x.17.42). Figure 17.19 shows our running example after the renaming stage. Exercise 17.11 What changes need to be made to Rexp to handle the FILsum language (see Exercise 17.10)? 5 prefix is not really necessary, since number itself is unique. But maintaining the original names helps human readers track variables through the compiler transformations.
17.8 Transformation 6: Renaming
1041
(fil (a.0 b.1) (let ((revmap.2 (abs (f.3 elts.4) (let* ((ans.5 (@mprod (@null))) (loop.6 (@mprod #u)) (_ (@mset! 1 loop.6 (abs (xs.7) (if (@null? xs.7) (@mget 1 ans.5) (let ((_ (@mset! 1 ans.5 (@cons (app f.3 (@car xs.7)) (@mget 1 ans.5))))) (app (@mget 1 loop.6) (@cdr xs.7)))))))) (app (@mget 1 loop.6) elts.4))))) (app revmap.2 (abs (x.8) (@> x.8 b.1)) (@cons a.0 (@cons (@* a.0 7) (@null)))))) Figure 17.19
revmap program after renaming.
Exercise 17.12 This exercise explores ways to formalize the generation of fresh names in the renaming transformation. Assume that rename is a function that renames variables according to the conventions described above. E.g., (rename x 17) = x.17 and (rename x.17 42) = x.42. a. Suppose that the signature of Rexp is changed to accept and return a natural number that represents the state of the fresh name generator: Rexp : ExpFIL → RenEnv → Nat → (ExpFIL × Nat)
Give modified definitions of Rprog and Rexp in which rename is used to generate all fresh names uniquely. Define any auxiliary functions you find helpful. b. An alternative way to thread the name-generation state through the renaming transformation is to use continuations. Suppose the signature of Rexp is changed as follows: Rexp : ExpFIL → RenEnv → RenameCont → Nat → ExpFIL
RenameCont is a renaming continuation defined as follows: rc ∈ RenameCont = ExpFIL → Nat → ExpFIL
Give modified definitions of Rprog and Rexp in which rename is used to generate all fresh names uniquely. Define any auxiliary functions you find helpful. c. The mapsubFIL function cannot be used in the above two parts because it does not thread the name-generation state through the processing of subexpressions. Develop modified versions of mapsubFIL that would handle the purely structural cases in the above parts.
1042
17.9
Chapter 17
Compilation
Transformation 7: CPS Conversion
Did he ever return, no he never returned And his fate is still unlearned — Bess Hawes and Jacqueline Steiner, “Charley on the MTA” In Chapter 9, we saw that continuations are a powerful mathematical tool for modeling sophisticated control features like nonlocal exits, unrestricted jumps, coroutines, backtracking, and exceptions. Section 9.2 showed how such features can be simulated in any language supporting first-class procedures. The key idea in these simulations is to represent a possible future of the current computation as an explicit procedure, called a continuation. The continuation takes as its single parameter the value of the current computation. When invoked, the continuation proceeds with the rest of the computation. In these simulations, procedures no longer return to their caller when invoked. Rather, they are transformed so that they take one or more explicit continuations as arguments and invoke one of these continuations on their result instead of returning the result. A program in which every procedure invokes an explicit continuation parameter in place of returning is said to be written in continuation-passing style (CPS). As an example of CPS, consider the FIL expression Esos in Figure 17.20. It defines a squaring procedure sqr and a sum-of-squares procedure sos and applies the latter to 3 and 4. E cps sos is the result of transforming Esos into CPS form. In E cps sos , each of the two procedures sqr and sos has been extended with a continuation parameter, which by our convention will come last in the parameter list and begin with the letter k. The sqrcps procedure invokes its continuation ksqr on the square of its input. The soscps procedure first calls sqrcps on a with a continuation that names the result asqr. This continuation then calls sqrcps on b with a second continuation that names the second result bsqr. Finally, soscps invokes its continuation ksos on the sum of these two results. The initial call (sos 3 4) must also be converted. We assume that klet* names a continuation that proceeds with the rest of the computation given the value of the let* expression. The process of transforming a program into CPS form is called CPS conversion. Here we shall study CPS conversion as a stage in the Tortoise compiler. Whereas globalization makes explicit the meaning of standard identifiers, and assignment conversion makes explicit the implicit cells of mutable variables, CPS conversion makes explicit all control flow in a program. In addition to transforming every procedure to use an explicit continuation, our compiler’s CPS transformation also makes explicit the order in which primitive operations are executed.
17.9 Transformation 7: CPS Conversion
1043
Esos = (let* ((sqr (abs (x) (@* x x))) (sos (abs (a b) (@+ (app sqr a) (app sqr b))))) (app sos 3 4)) E cps sos = (let* ((sqrcps (abs (x ksqr) (app ksqr (@* x x)))) (soscps (abs (a b ksos) (app sqrcps a (abs (asqr) (app sqrcps b (abs (bsqr) (app ksos (@+ asqr bsqr))))))))) (app soscps 3 4 klet*)) Figure 17.20
E cps sos is a CPS version of Esos .
Performing CPS conversion as a compiler stage has several benefits: • Procedure-calling mechanism: A compiler must implement the mechanism for calling a procedure, which specifies: how arguments and control are passed from the caller to the procedure when it is called; how the procedure’s result value and control are passed from the procedure back to the caller when the procedure returns; and how values needed by the caller after the procedure call are preserved during the procedure’s execution. Continuations are an explicit representation of the stack of procedure-call invocation frames used in traditional compilers to implement the call/return mechanism of procedures. In CPS-converted code, a continuation (such as (abs (asqr) . . .) above) corresponds to a pair of (1) an invocation frame that saves variables needed after the call (i.e., the free variables of the continuation, which are b and ksos in the case of (abs (asqr) . . . )) and (2) a return address (i.e., a specification of the code to be executed after the call). Since CPS procedures never return, every procedure call in a CPS-converted program can be viewed as an assembly code jump that passes arguments. In particular, invoking a continuation corresponds in assembly code to jumping to a return address with a return value in a distinguished return register. • Code linearization: CPS conversion makes explicit the order in which subexpressions are evaluated, yielding code that linearizes basic computation steps in a way similar to assembly code. For example, the body of soscps makes it clear that the square of a is calculated before the square of b. We shall see that our CPS transformation also linearizes nested primitive applications. For instance, CPS-converting the expression (@* (@+ c d) (@- c 1)) yields
1044
Chapter 17
Compilation
code in which it is clear that the addition is performed first, followed by the subtraction, and then the multiplication. • Sophisticated control features: Representing control explicitly in the form of continuations facilitates the implementation of advanced control features (such as nonlocal exits, exceptions, and backtracking) that can be challenging to implement in traditional stack-based approaches. • Uniformity: Representing control features via procedures keeps intermediate program representations simple and flexible. Moreover, any optimizations that improve procedures will work on continuations as well. But this uniformity also has a drawback: because of the liberal use of procedures, the efficiency of procedure calls in CPS code is of the utmost importance, making certain optimizations almost mandatory. We present the Tortoise CPS transformation in four stages. The structure of the CPS code produced by the CPS transformation is formalized in Section 17.9.1. A straightforward approach to CPS conversion that is easy to understand but leads to intolerable inefficiencies in the converted code is described in Section 17.9.2. Section 17.9.3 presents a more complex but considerably more efficient CPS transformation that is used in Tortoise. Finally, we consider the CPS conversion of advanced control constructs in Section 17.9.4.
17.9.1
The Structure of Tortoise CPS Code
All procedure applications can be classified according to their relationship to the innermost enclosing procedure declaration (or program). A procedure application is a tail call if its implicit continuation is the same as that of its enclosing procedure. In other words, no computational work is done between the termination of the inner tail call and the termination of its enclosing procedure. These two events can be viewed as happening simultaneously. All other procedure applications are nontail calls. These are characterized by pending computations that must take place between the termination of the nontail call and the termination of a call to its enclosing procedure. The notion of a tail call is important in CPS conversion because every procedure call in CPS code must be a tail call. Otherwise, it would have to return to perform a pending computation. As concrete examples of tail versus nontail calls, consider the FIL abstractions in Figure 17.21. • In Eabs1 , the call to g is a tail call because a call to Eabs1 returns a value v when g returns v . But both calls to f are nontail calls because the results of these calls must be passed to g before Eabs1 returns.
17.9.1 The Structure of Tortoise CPS Code
1045
Eabs1 = (abs (f g x) (app g (app f x) (app f (@+ x 1)))) Eabs2 = (abs (p q r s y) (let ((a (app p (app q y)))) (app r a (app s a)))) Eabs3 = (abs (filter pred base zs) (if (@null? zs) (app base zs) (if (app pred (@car zs)) (@cons (@car zs) (app filter pred base (@cdr zs))) (app filter pred base (@cdr zs))))) Figure 17.21
Sample abstractions for understanding tail versus nontail calls.
• In Eabs2 , only the call to r is a tail call. The results of the calls to p, q, and s must be further processed before Eabs2 returns. • In Eabs3 , there are two tail calls: the call to base, and the second call to filter. The result of the first call to filter must be processed by @cons before Eabs3 returns, so this is a nontail call. The result of pred must be checked by the if, so this is a nontail call as well. In this example, we see that (1) a procedure body may have multiple tail calls and (2) the same procedure can be invoked in both tail calls and nontail calls within the same expression. Tail and nontail calls can be characterized syntactically. The FIL expression contexts in which tail calls can appear are defined by TC in the following grammar: TC ∈ TailCallContext ::= | | | |
2 (if Etest TC E ) (if Etest E TC) (let ((I E )∗ ) TC) (abs (I ∗ ) TC)
In FIL, an application expression Eapp = (app E E ∗ ) is a tail call if and only if the enclosing program can be expressed in the form (fil (I ∗ ) TC{Eapp }) — i.e., as the result of filling a tail context in the program body with Eapp . Any application that does not appear in a tail context is a nontail call. In particular, applications occurring in (but not wrapped by abs in) if tests, let definition expressions, and app and prim arguments are nontail calls.
1046
Chapter 17
Compilation
∗ P ∈ Progcps ::= (fil (Iformal ) Ebody ) ∗ ) | (if Vtest Ethen Eelse ) E ∈ Expcps ::= (app Irator Vrand | (let ((Iname LE defn )) Ebody ) | (error Ymessage )
V ∈ ValueExpcps ::= L | I ∗ ∗ ) Ebody ) | (prim Oprimop Varg ) LE ∈ LetableExpcps ::= L | (abs (Iformal
L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primop = as in full FIL I ∈ Ident = as in full FIL Figure 17.22 Kernel grammar for FILcps , the subset of FIL in CPS form. The result of CPS conversion is a FILcps program.
Understanding tail calls is essential for studying the structure of Tortoise CPS code, which is defined by the grammar for FILcps , a restricted dialect of FIL presented in Figure 17.22. The FILcps grammar requires specialized component expressions for many constructs that can have arbitrary component expressions in FIL: the rator of an app must be an identifier; the rands of an app, arguments of a prim, and test of an if must be literals or identifiers; and the definition expression of a let must be a literal, abstraction, or primitive application. As explained below, these restrictions guarantee that all FILcps procedure calls are tail calls, that all procedure calls and primitive applications are linearized, and that FILcps code resembles assembly code in many ways: • The definition of Expcps in FILcps guarantees that app expressions appear precisely in the tail contexts TC discussed above. So every call in a FILcps program is guaranteed to be a tail call. In a continuation-based denotational semantics (see Section 9.3) of a FILcps program, the expression continuation k for every app expression is exactly the same: the top-level continuation of the program. We say that procedure calls in a CPS program “never return” because no procedure call establishes a new control point to which a value can be returned. This explains why calls in a CPS program can be viewed as assembly-language jumps (that happen to additionally pass arguments). • Operands of app and prim must be literals or variables, so one application (of a procedure or a primitive) may not be nested within another. The test subexpression of an if must also be a literal or variable. The definition subexpression of a let can only be one of a restricted number of simple “letable
17.9.1 The Structure of Tortoise CPS Code
1047
expressions” that does not include apps, ifs, or other lets. These restrictions impose the straight-line nature of assembly code on the bodies of FIL abstractions and programs, which must be elements of Expcps . The only violation of the straight-line property is the if expression, which has an element of Expcps for each branch. This branching code would need to be linearized elsewhere in order to generate assembly language (see Exercise 17.16 on page 1056). • The grammar effectively requires specifying the order of evaluation of primitive applications by forcing the result of every primitive application to be named by a let. So the CPS transformation of an expression containing nested primitive applications uses a sequence of nested single-binding let expressions to introduce names for the intermediate results returned by the primitives. For example, CPS-converting the expression (@+ (@- 0 (@* b b)) (@* 4 (@* a c)))
in the context of an initial continuation ktop.0 yields:6 (let* ((t.3 (@* b b)) (t.2 (@- 0 t.3)) (t.5 (@* a c)) (t.4 (@* 4 t.5)) (t.1 (@+ t.2 t.4))) (app ktop.0 t.1)))
The let-bound names represent abstract registers in assembly code. Mapping these abstract registers to the actual registers of a real machine (a process known as register allocation — see Section 17.12) must be performed by a later compilation stage. • The operator of an app must be an identifier. In classical CPS conversion, the operator of an app may be an abstraction as well. However, we require that all abstractions be named in a let binding so that certain properties of the FILcps structure are preserved by later Tortoise transformations. In particular, the subsequent closure-conversion stage will transform abstractions into applications of the mprod primitive. Such applications cannot appear in the context of CPS values V , but can appear in “letable expressions” LE . • Every execution path through an abstraction or program body Ebody must end in either an app or an error. The Expcps grammar does not include literals, identifiers, and abstractions, because these would allow procedures and 6
The particular let-bound names used are irrelevant. Here and below, we show the results of CPS conversion using our implementation of the transformation described in Section 17.9.3.
1048
Chapter 17
Compilation
(let* ((sqrFILcps (abs (x ksqr) (let ((t1 (@* x x))) (app ksqr t1)))) (sosFILcps (abs (a b ksos) (let ((k1 (abs (asqr) (let ((k2 (abs (bsqr) (let ((t2 (@+ asqr bsqr))) (app ksos t2))))) (app sqrFILcps b k2))))) (app sqrFILcps a k1))))) (app sosFILcps 3 4 klet*)) Figure 17.23
A CPS version of Esos expressed in FILcps .
programs to return values. But FILcps procedures and programs never return, so the last action in procedure or program body must be to call a procedure or signal an error. Moreover, apps and errors can appear only as the final expressions executed in such bodies — they cannot appear in let definitions, procedure or primitive operands, or if tests. Modulo the branching allowed by if, program and abstraction bodies in FILcps are similar in structure to basic blocks in traditional compilers. A basic block is a sequence of statements such that the only control transfers into the block are at the beginning and the only control transfers out of the block are at the end. The fact that ValueExpcps does not include abstractions or primitive applications means that E cps sos in Figure 17.20 is not a legal FILcps expression. A To satFILcps version of the Esos expression is presented in Figure 17.23. isfy the syntactic constraints of FILcps , let-bound names must be introduced to name abstractions (the continuations k1 and k2) and the results of primitive applications (t1 and t2). Note that some calls (to sqrFILcps and sosFILcps ) are to transformed versions of procedures in the original Esos expression. These correspond to the jump-to-subroutine idiom in assembly code. The other calls (to ksqr and ksos) are to continuation procedures introduced by CPS conversion. These model the return-from-subroutine idiom in assembly code. We will assume that the grammar for FILcps in Figure 17.22 describes the structure of CPS code after the standard FIL simplifications in Figure 17.15 have been performed. The CPS conversion functions we study below sometimes generate expressions that are illegal according to the FILcps grammar before such simplifications are performed. However, in all these cases, simpli-
17.9.2 A Simple CPS Transformation
1049
fication yields a legal FILcps expression. For example, CPS conversion might generate (let ((a.2 b.1)) (app k.0 a.2)), which is not a FILcps expression because the variable reference b.1 is not an element of the domain LetableExpcps . However, applying the [copy-prop] simplification to this expression yields (app k.0 b.1), which is indeed a FILcps expression. The next two sections present two different CPS transforms, each of which converts every procedure call in the program into a tail call: Preconditions: The input to CPS conversion is a valid, uniquely named kernel FIL program. Postconditions: The output of CPS conversion is a valid, uniquely named kernel FILcps program.
17.9.2
A Simple CPS Transformation
The first transformation we will examine, SCPS (for Simple CPS conversion), is easier to explain, but generates code that is much less efficient than that produced by the second transformation. The SCPS transformation is defined in Figure 17.24. SCPS exp transforms any given expression E to an abstraction (abs (Ik ) E ) that expects as its argument Ik an explicit continuation for E and eventually calls this continuation on the value of E within E . This explicit continuation is immediately invoked to pass along (or “return”) the values of literals, identifiers, and abstractions. Each abstraction is transformed to take as a new additional final parameter a continuation Ikcall that is passed as the explicit continuation to its transformed body. Because the grammar of FILcps does not allow abstractions to appear directly as app arguments, it is also necessary to name the transformed abstraction in a let using a fresh identifier Iabs . In the transformation of an app expression (app E0 E1 . . . En ), explicit continuations specify that the rator E0 and rands E1 . . . En are evaluated in left-to-right order before the invocation takes place. The fresh variables I0 . . . In are introduced to name the values of the subexpressions. Since every procedure has been transformed to expect an explicit continuation as its final argument, the transformed app must supply its continuation Ik as the final rand. The let transformation is similar, except that the let-bound names are used in place of fresh names for naming the values of the definition expressions. The unique naming requirement on input programs to SCPS guarantees that no variable capture can take place in the let transformation (see Exercise 17.15).
1050
Chapter 17
Compilation
The transformation of prim expressions is similar to that for app and let. The syntactic constraints of FILcps require that a fresh variable (here named Ians ) be introduced to name the result of a prim expression before passing it to the continuation. In a transformed if expression, a fresh name Itest names the result of the test expression and the same continuation Ik is supplied to both transformed branches. This is the only place in SCPS where the explicit continuation Ik is referenced more than once in the transformed expression. The transformed error construct is the only place where the continuation is never referenced. All other constructs use Ik in a linear fashion — i.e., they reference it exactly once. This makes intuitive sense for regular control flow, which has only one possible “path” out of every expression other than if and error. Even in the if case, only one branch can be taken in a dynamic execution even though the continuation is mentioned twice. In Section 17.9.4 we will see how CPS conversion exposes the nonlinear nature of some sophisticated control features. FIL programs are converted to CPS form by SCPS prog , which adds an additional parameter Iktop that is an explicit top-level continuation for the program. It is assumed that the mechanism for program invocation will supply an appropriate procedure for this argument. For example, an operating system might construct a top-level continuation that displays the result of the program on the standard output stream or in a window within a graphical user interface. The clauses for SCPS exp contain numerous instances of the pattern (app (SCPS exp [[E1 ]]) E2 )
where E2 is an abstraction or variable reference. But SCPS exp is guaranteed to return an abs expression, and the FILcps grammar does not allow any subexpression of an app to be an abs. Doesn’t this yield an illegal FILcps expression? The result of SCPS exp would be illegal were it not for the [implicit-let] simplification, which transforms every app of the form (app (abs (Ik ) E1 ) E2 )
to the expression (let ((Ik E2 )) E1 )
Since the grammar for letable expressions LE permits definition expressions that are abstractions, the result of SCPS exp is guaranteed to be a legal FILcps expression when E2 is an abstraction. When E2 is a variable, the [copy-prop] simplification will also be performed, eliminating the let expression.
17.9.2 A Simple CPS Transformation
1051
SCPS prog : ProgFIL → Progcps n n SCPS prog [[(fil (I i=1 ) Ebody )]] = (fil (I i=1 Iktop ) ; Iktop fresh (app (SCPS exp [[Ebody ]]) Iktop ))
SCPS exp : ExpFIL → Expcps SCPS exp [[L]] = (abs (Ik ) (app Ik L)) ; Ik fresh SCPS exp [[I ]] = (abs (Ik ) (app Ik I )) ; Ik fresh n ) Ebody )]] = (abs (Ik ) ; Ik fresh SCPS exp [[(abs (I i=1 (let ((Iabs ; Iabs fresh n (abs (I i=1 Ikcall ) ; Ikcall fresh (app (SCPS exp [[Ebody ]]) Ikcall )))) (app Ik Iabs )))
SCPS exp [[(app E ni=0 )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E0 ]]) (abs (I0 ) ; I0 fresh .. . (app (SCPS exp [[En ]]) (abs (In ) ; In fresh (app I ni=0 Ik ))) . . . ))) SCPS exp [[(let ((Ii Ei )ni=1 ) Ebody )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E1 ]]) (abs (I1 ) .. . (app (SCPS exp [[En ]]) (abs (In ) (app (SCPS exp [[Ebody ]]) Ik ))) . . . ))) SCPS exp [[(prim O E ni=1 )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[E1 ]]) (abs (I1 ) ; I1 fresh .. . (app (SCPS exp [[En ]]) (abs (In ) ; In fresh n (let ((Ians (prim O I i=1 ))) ; Ians fresh (app Ik Ians )))) . . . ))) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (Ik ) ; Ik fresh (app (SCPS exp [[Etest ]]) (abs (Itest ) ; Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) Ik ) (app (SCPS exp [[Eelse ]]) Ik ))))) SCPS exp [[(error Ymsg )]] = (abs (Ik ) (error Ymsg )) ; Ik fresh Figure 17.24
A simple CPS transformation.
1052
Chapter 17
Compilation
As a simple example of SCPS, consider the CPS conversion of the incrementing program Pinc = (fil (a) (@+ a 1)). Before any simplifications are performed, SCPS prog [[Pinc ]] yields (fil (a ktop.0) (app (abs (k.2) (app (abs (k.6) (app k.6 a)) (abs (t.3) (app (abs (k.5) (app k.5 1)) (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app k.2 t.1))))))) ktop.0))
Three applications of [implicit-let] simplify this code to (fil (a ktop.0) (let ((k.2 ktop.0)) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app k.2 t.1))))) (app k.5 1))))) (app k.6 a)))) : A single [copy-prop] replaces k.2 by ktop.0 to yield the final result Pinc
(fil (a ktop.0) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1))))) (app k.5 1))))) (app k.6 a))) Pinc is a legal FILcps program — go ahead and check! Its convoluted nature makes it a bit tricky to read. Here is one way to read this program:
The program is given an input a and top-level continuation ktop.0. First evaluate a and pass its value to continuation k.6, which gives it the name t.3. Then evaluate 1 and pass it to continuation k.5, which gives it the name t.4. Next, calculate the sum of t.3 and t.4 and name the result t.1. Finally, return this answer as the result of the program by invoking ktop.0 on t.1. This is a lot of work to increment a number! Even though the [implicit-let] and [copy-prop] rules have simplified the program, it could still be simpler: the
17.9.2 A Simple CPS Transformation
1053
continuations k.5 and k.6 merely rename the values of a and 1 to t.3 and t.4, which is unnecessary. In larger programs, the extent of these undesirable inefficiencies becomes more apparent. For example, Figure 17.25 shows the result of using SCPS to transform a numerical program Pquad with several nested subexpressions. Try to read . Along the way you will notice the transformed program as we did with Pinc numerous unnecessary continuations and renamings. The result of performing SCPS on our revmap example is so large that it would require several pages to display. The desugared revmap program has an abstract syntax tree with 46 nodes; transforming it with SCPS prog yields a result with 314 nodes. And this is after simplification — the unsimplified transformed program has 406 nodes! Can anything be done to automatically eliminate the inefficiencies introduced by SCPS? Yes! It is possible to define additional simplification rules that will make the CPS-converted code much more reasonable. For example, in (let ((I Edefn )) Ebody ), if Edefn is a literal or abstraction, it is possible to replace the let by the substitution of Edefn for I in Ebody . This simplification is traditionally called constant propagation and (when followed by [implicit-let]) is called inlining for abstractions. For example, two applications of inlining on Pinc yield (fil (a ktop.0) (let ((t.3 a)) (let ((t.4 1)) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1)))))
and then copy propagation and constant propagation simplify the program to (fil (a ktop.0) (let ((t.1 (@+ a 1))) (app ktop.0 t.1))) Performing these additional simplifications on Pquad in Figure 17.25 gives the following much improved CPS code:
(fil (a b c ktop.0) (let* ((t.20 (@* b b)) (t.16 (@- 0 t.20)) (t.9 (@* a c)) (t.5 (@* 4 t.9)) (t.1 (@* t.16 t.5))) (app ktop.0 t.1)))
These examples underscore the inefficiency of the code generated by SCPS.
1054
Chapter 17
Compilation
Why don’t we just modify FIL to include the constant propagation and inlining simplifications? Constant propagation of literals is not problematic,7 but inlining is a delicate transformation. In FILcps , it is legal to copy an abstraction only to certain positions (such as the rator of an app, where it can be removed by [implicit-let]). When a named abstraction is used more than once in the body of a let, copying the abstraction multiple times makes the program bigger. Unrestricted inlining can lead to code bloat, a dramatic increase in the size of a program. In the presence of recursive procedures, special care must often be taken to avoid infinitely unwinding a recursive definition. Since we insist that FIL simplifications be straightforward to implement, we do not include inlining as a simplification. Inlining issues are further explored in Exercise 17.17. Does that mean we are stuck with an inefficient CPS transformation? No! In the next section, we study a cleverer approach to CPS conversion that avoids generating unnecessary code in the first place. Exercise 17.13 Consider the FIL program P = (fil (x y) (@* (@+ x y) (@- x y))). a. Show the result P1 generated by SCPS prog [[P ]] without performing any simplifications. b. Show the result P2 of simplifying P1 using the standard FIL simplifications (including [implicit-let] and [copy-prop]). c. Show the result P3 of further simplifying P2 using inlining in addition to the standard FIL simplifications. Exercise 17.14 a. Suppose that begin, scand, scor, and cond (from FLARE/V) were kernel FIL constructs. Give the SCPS exp clauses for these four constructs. b. Suppose that FILcps were extended to include mutable variables by adding the assignment construct (set! I E ) as an element of LE . Give the SCPS exp clause for set!. Exercise 17.15 a. Give a concrete example of how variable capture can take place in the let clause of SCPS exp if the initial program is not uniquely named. b. Modify the let clause of SCPS exp so that it works properly even if the initial program is not uniquely named. 7 Nevertheless, we do not include constant propagation in our list of standard simplifications because we don’t want constants to be copied when we get to the register-allocation stage of our compiler.
17.9.2 A Simple CPS Transformation
1055
Pquad = (fil (a b c) (@+ (@- 0 (@* b b)) (@* 4 (@* a c)))) SCPS prog [[Pquad ]] = Pquad , where Pquad = (fil (a b c ktop.0) (let* ((k.17 (abs (t.3) (let* ((k.6 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1)))) (k.15 (abs (t.7) (let* ((k.10 (abs (t.8) (let ((t.5 (@* t.7 t.8))) (app k.6 t.5)))) (k.14 (abs (t.11) (let ((k.13 (abs (t.12) (let ((t.9 (@* t.11 t.12))) (app k.10 t.9))))) (app k.13 c))))) (app k.14 a))))) (app k.15 4)))) (k.26 (abs (t.18) (let* ((k.21 (abs (t.19) (let ((t.16 (@- t.18 t.19))) (app k.17 t.16)))) (k.25 (abs (t.22) (let ((k.24 (abs (t.23) (let ((t.20 (@* t.22 t.23))) (app k.21 t.20))))) (app k.24 b))))) (app k.25 b))))) (app k.26 0)))
Figure 17.25
Simple CPS conversion of a numeric program.
1056
Chapter 17
Compilation
Exercise 17.16 Control branches in linear assembly language code are usually provided via branch instructions that perform a jump if a certain condition holds but “drop through” to the next instruction if the condition does not hold. We can model assemblystyle branch instructions in FILcps by restricting if expressions to the form ∗ (if Vtest (app Vrator Vrand ) Eelse )
which immediately performs a subroutine jump (via app) if the test is true and otherwise drops through to Eelse . Modify the SCPS exp clause for if so that all transformed ifs have this restricted form. Exercise 17.17 This exercise explores procedure inlining. Consider the following [copyabs] simplification rule, where AB ranges over FIL abstractions: simp (let ((I AB )) Ebody ) − − −− → [AB /I ]Ebody
[copy-abs]
Together, [copy-abs] and the standard FIL [implicit-let] and [copy-prop] rules implement a form of procedure inlining. For example (let ((inc (abs (x) (@+ x 1)))) (@* (app inc a) (app inc b)))
can be simplified via [copy-abs] to (@* (app (abs (x) (@+ x 1)) a) (app (abs (x) (@+ x 1)) b))
Two applications of [implicit-let] give (@* (let ((x a)) (@+ x 1)) (let ((x b)) (@+ x 1)))
and two applications of [copy-prop] yield the inlined code (@* (@+ a 1) (@+ b 1))
a. Use inlining to remove all calls to sqr in the following FIL expression. How many multiplications does the resulting expression contain? (let ((sqr (abs (x) (@* x x)))) (app sqr (app sqr (app sqr a))))
b. Use inlining to remove all calls to sqr, quad, and oct in the following FIL expression. How many multiplications does the resulting expression contain? (let* ((sqr (abs (x) (@* x x))) (quad (abs (y) (@* (app sqr y) (app sqr y)))) (oct (abs (z) (@* (app quad z) (app quad z))))) (@* (app oct a) (app oct b)))
c. What happens if inlining is used to simplify the following FIL expression? (let ((f (abs (g) (app g g)))) (app f f))
d. Can expressions like the one in part c ever arise in the compilation of a FLARE/V program? Explain.
17.9.2 A Simple CPS Transformation
1057
e. Using only standard FIL simplifications, the result of SCPS prog is guaranteed to be uniquely named if the input is uniquely named. This property does not hold in the presence of inlining. Write an example program Pnun such that the result of simplifying SCPS prog [[Pnun ]] via inlining is not uniquely named. Hint: Where can duplication occur in a CPS-converted program? f. Inlining multiple copies of an abstraction can lead to code bloat. Develop an example FIL program Pbloat where performing inlining on the result of SCPS prog [[Pbloat ]] yields a larger transformed program rather than a smaller one. Hint: Where can duplication occur in a CPS-converted program? Exercise 17.18 Emil P. Mentor wants to modify the CPS transformation to add a little bit of profiling information. Specifically, the modified CPS transformation should produce code that keeps a count of user procedure (not continuation) applications. Users will be able to access this information with the new construct (app-count), which is added to the grammar of kernel FILcps expressions: E ∈ Exp ::= . . . | (app-count)
Emil gives the following example (where he uses the notation x, y normally used for pair values to represent mutable products with two components): (let ((f (abs (x) (prim mprod x (app-count)))) (id (abs (y) y)) (twice (abs (g) (abs (z) (app g (app g z)))))) (prim mprod (app f (app-count)) (prim mprod (app id (app f (app id (app-count)))) (app f (app (app (app twice twice) id) (app-count)))))) − − − → 0 , 1 , 1 , 3 , 4 , 16 F IL
In the modified SCPS transformation, all procedures (including continuations) should take as an extra argument the number of user procedure applications made so far. For example, here are Emil’s new SCPS clauses for program, literals, and conditionals: SCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Iktop ) ; Iktop fresh (app (SCPS exp [[Ebody ]]) 0 Iktop )) SCPS exp [[L]] = (abs (In Ik ) ; In (app count) and Ik (continuation) fresh (app Ik In L)) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (In0 Ik ) ; In0 and Ik fresh (app (SCPS exp [[Etest ]]) In0 (abs (In1 Itest ) ; In1 and Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) In1 Ik ) (app (SCPS exp [[Eelse ]]) In1 Ik )))))
Write the modified SCPS clauses for abs, app, let, and app-count.
1058
17.9.3
Chapter 17
Compilation
A More Efficient CPS Transformation
Reconsider the result of SCPS on the program (fil (a) (@+ a 1)): (fil (a ktop.0) (let ((k.6 (abs (t.3) (let ((k.5 (abs (t.4) (let ((t.1 (@+ t.3 t.4))) (app ktop.0 t.1))))) (app k.5 1))))) (app k.6 a)))
The inefficient code we eliminated by inlining in the last section is shown in gray. Our goal in developing a more efficient CPS transformation is to perform these simplifications as part of CPS conversion itself rather than waiting to do them later. Instead of eliminating unsightly gray code as an afterthought, we want to avoid generating it in the first place! Our approach is based on a diabolically simple shift of perspective: we view the gray code as part of the metalanguage specification of the transformation rather than as part of the FIL code being transformed. If we change the gray FIL lets, abss, and apps to metalanguage lets, λs, and applications, our example becomes: (fil (a ktop.0) let k6 be (λV3 . let k5 be (λV4 . (let ((t.1 (@+ V3 V4 ))) (app ktop.0 t.1))) in (k5 1)) in (k6 a))
To enhance readability, we will keep the metalanguage notation in gray and the FILcps code in black teletype font. Note that k5 and k6 name metalanguage functions whose parameters (V3 and V4 ) must be pieces of FILcps syntax — in particular, FILcps value expressions (i.e., literals and variable references). Indeed, k5 is applied to the FILcps literal 1 and k6 is applied to the FILcps identifier a. The result of evaluating the gray metalanguage expressions in our example yields (fil (a ktop.0) (let ((t.1 (@+ a 1))) (app ktop.0 t.1)))
which is exactly the simplified result we want! We have taken computation that would have been performed when executing the code generated by CPS conversion and instead performed it when the code
17.9.3 A More Efficient CPS Transformation
1059
is generated. The output of CPS conversion can now be viewed as code that is executed in two stages: the gray code is the code that can be executed as part of CPS conversion, while the black code is the residual code that can only be executed later at run time. This notion of staged computation is the key idea of an approach to optimization known as partial evaluation. The goal of partial evaluation is to evaluate at compile time all static expressions — i.e., those expressions that do not depend on information known only at run time — and leave behind a residual dynamic program that is executed at run time. In our case, the static expressions are the gray metalanguage code that is executed “for free” as part of CPS conversion, and the dynamic expressions are the black FILcps code. Our improved approach to CPS conversion will make heavy use of gray abstractions of the form (λV . . . . ) that map FILcps value expressions (i.e., literals and variable references) to other FILcps expressions. Because these abstractions play the role of continuations at the metalanguage level, we call them metacontinuations. In the above example, k5 and k6 are examples of metacontinuations. A metacontinuation can be viewed as a metalanguage representation of a special kind of context: a FILcps expression with named holes that can be filled only with FILcps value expressions. Such contexts may contain more than one hole, but a hole with a given name can appear only once. For example, here are metacontinuations that will arise in the CPS conversion of the incrementing program: Context Notation
Metalanguage Notation
(app ktop.0 21 )
λV1 . (app ktop.0 V1 )
(let ((t.1 (@+ 23 24 ))) (app ktop.0 t.1))
λV4 . (let ((t.1 (@+ V3 V4 ))) ; V3 is free (app ktop.0 t.1))
(let ((t.1 (@+ 23 1))) (app ktop.0 t.1))
λV3 . (let ((t.1 (@+ V3 1))) (app ktop.0 t.1))
Figures 17.26 and 17.27 present an efficient version of CPS conversion that is based on the notions of staged computation and metacontinuations. We call this transformation MCPS (for metaCPS conversion). The metavariable m ranges over metacontinuations in the domain MetaCont, which consists of functions that map FILcps value expressions to FILcps expressions. The MCPS functions in Figures 17.26 and 17.27 are similar to the SCPS functions in Figure 17.24 (page 1051). Indeed, except for the let and if clauses, the MCPS clauses can be derived automatically from the SCPS clauses by the following transformation process:
1060
Chapter 17
Compilation
Domain m ∈ MetaCont = ValueExpcps → Expcps Conversion Functions mc→exp : MetaCont → Expcps = (λm . (abs (Itemp ) (m Itemp ))) ; Itemp fresh id→mc : Ident → MetaCont = (λI . (λV . (app I V ))) MetaCPS Program Transformation MCPS prog : ProgFIL → Progcps n n MCPS prog [[(fil (I i=1 ) Ebody )]] = (fil (I i=1 Iktop ) ; Iktop fresh (MCPS exp [[Ebody ]] (id→mc Iktop )))
Figure 17.26
An efficient CPS transformation based on metacontinuations, Part 1.
• Transform every continuation-accepting FILcps abstraction (abs (Ik ) . . . ) into a metacontinuation-accepting metalanguage abstraction (λm . . . . ). • Transform every FILcps application (app Ik V ) in which Ik denotes a continuation to a metacall (i.e., metalanguage function call) of the form (m V ), where m is the metacontinuation that corresponds to Ik . This makes sense because the metacontinuation m is a metalanguage function that expects a value expression V as its argument. • Transform every FILcps application (app (SCPS exp [[E ]]) (abs (I ) . . . )) to a metacall (MCPS exp [[E ]] (λV . . . . )). This transforms every FILcps continuation of the form (abs (I ) . . . ) into a metacontinuation of the form (λV . . . . ), thus providing the metacontinuation-accepting function returned by MCPS exp [[E ]] with the metacontinuation it expects. • Transform every FILcps application (app (SCPS exp [[E ]]) Ik ) in which Ik has not already been transformed to m to a metacall (MCPS exp [[E ]] (id→mc Ik )), where id→mc converts a FILcps identifier Ik denoting an unknown continuation to a metacontinuation (λV . (app Ik V )). This conversion is necessary to provide the metacontinuation-accepting function returned by MCPS exp [[E ]] with the metacontinuation it expects. • Transform every FILcps application (app I ni=0 Ik ) in which I0 , . . . , In are the bound variables of continuations and Ik denotes the continuation bound by an SCPS exp clause to (let ((Ik (mc→exp m))) (app V ni=0 Ik )), where • Ik is a fresh name; • V0 , . . . , Vn are the bound variables of the metacontinuations that correspond to the continuations binding I0 , . . . , In ;
17.9.3 A More Efficient CPS Transformation
1061
MetaCPS Expression Transformation MCPS exp : ExpFIL → MetaCont → Expcps MCPS exp [[L]] = (λm . (m L)) MCPS exp [[I ]] = (λm . (m I )) n MCPS exp [[(abs (I i=1 ) Ebody )]] = (λm . (let ((Iabs ; Iabs fresh n Ikcall ) ; Ikcall fresh (abs (I i=1 (MCPS exp [[Ebody ]] (id→mc Ikcall ))))) (m Iabs )))
MCPS exp [[(app E ni=0 )]] = (λm . (MCPS exp [[E0 ]] (λV0 . .. . (MCPS exp [[En ]] (λVn . (let ((Ik (mc→exp m))) ; Ik fresh (app V ni=0 Ik )))) . . . ))) MCPS exp [[(let ((Ii Ei )ni=1 ) Ebody )]] = (λm . (MCPS exp [[E1 ]] (λV1 . .. . (MCPS exp [[En ]] (λVn . (let* ((Ii Vi )ni=1 ) (MCPS exp [[Ebody ]] m)))) . . . ))) MCPS exp [[(prim O E ni=1 )]] = (λm . (MCPS exp [[E1 ]] (λV1 . .. . (MCPS exp [[En ]] (λVn . (let ((Ians (prim O V ni=1 ))) ; Ians fresh (m Ians )))) . . . ))) MCPS exp [[(if Etest Ethen Eelse )]] = (λm . (MCPS exp [[Etest ]] (λVtest . (let ((Ikif (mc→exp m))) ; Ikif fresh (if Vtest (MCPS exp [[Ethen ]] (id→mc Ikif )) (MCPS exp [[Eelse ]] (id→mc Ikif ))))))) MCPS exp [[(error Ymsg )]] = (λm . (error Ymsg )) Figure 17.27
An efficient CPS transformation based on metacontinuations, Part 2.
1062
Chapter 17
Compilation
• m is the metacontinuation variable bound by the MCPS exp clause corresponding to the SCPS exp clause that binds the continuation variable Ik ; and • mc→exp is a function that converts a metacontinuation m to a FILcps continuation (abs (I ) (m I )). For example: (mc→exp (λV3 . (let ((t.1 (@+ V3 1))) (app ktop.0 t.1)))) = (abs (t.2) (let ((t.1 (@+ t.2 1))) (app ktop.0 t.1)))
In this case, there is no metacontinuation-accepting function to process the metacontinuation m, so mc→exp is necessary to convert the gray m into a black residual FILcps abstraction. The FILcps grammar forces this abstraction to be named, which is the purpose of the (let ((Ik . . . )) . . . ). The MCPS clauses for let and if are based on the above transformations, but also contain some special-purpose code. The let clause contains additional code to construct a residual let* expression binding the original let-bound identifiers. To avoid potential duplication involving the metacontinuation m, the if clause gives the name Ikif to a residual version of m and uses (id→mc Ikif ) in place of m for the two branches. The key benefit of the metacontinuation approach to CPS conversion is that many beta reductions that would be left as residual run-time code in the simple approach are performed at compile time. The MCPS functions are carefully designed so that every metacontinuation-accepting function (λm . . . . ) that arises in the conversion process is applied to a metacontinuation of the form (λVformal . M ), where M is a metalanguage expression denoting a FILcps expression. Observe that in each (λm . M ) that appears in the MCPS definition, the metacontinuation m is referenced at most once in M . If m is referenced zero times in M , then the metacall ((λm . M ) (λVformal . M )) simply reduces to M . If m is referenced once in M , then M can be written as M{m}, where M is a one-holed metalanguage expression context. By the usual metalanguage beta-reduction rule, each metacall of the form ((λm . M{m}) (λVformal . M ))
can be reduced to M{(λVformal . M )}
In the case where m is applied to a value expression within M , the metacall ((λm . M{(m Vactual )}) (λVformal . M ))
reduces to M{[Vactual /Vformal ]M }
17.9.3 A More Efficient CPS Transformation
1063
via two beta reductions. Since MCPS exp [[Vactual ]] = (λm . (m Vactual )), metacalls of the form (MCPS exp [[Vactual ]] (λVformal . M ))
are special cases of this pattern that can be reduced to [Vactual /Vformal ]M
The fact that m is referenced at most once in every function that accepts a metacontinuation guarantees that reducing a metacall makes the metalanguage expression smaller, and so the metacall reduction process eventually terminates. At this point, no gray code remains since all metacalls have been eliminated and there is no way other than a metacall to include gray code in an element of Expcps . So all that remains is a black residual FILcps program. Another consequence of the fact that m is referenced at most once in every metacontinuation-accepting function is that there is no specter of duplication-induced code bloat that haunts more general inlining optimizations. Using mc→exp to convert m to a FILcps abstraction named Ikif in the if clause of MCPS exp is essential for avoiding code duplication. We illustrate compile-time beta reductions in Figure 17.28, which shows the CPS conversion of the expression (app f (@* x (if (app g y) 2 3))) relative to an initial continuation named k. The example illustrates how MCPS effectively turns the input expression “inside out.” In the input expression, the call to f is the outermost call, and (app g y) is the innermost call. But in the CPS-converted result, the call to g is the outermost call and the call to f is nested deep inside. This reorganization is necessary to make explicit the order in which operations are performed: 1. first, g is applied to y; 2. then the result t.5 of the g application is tested by if; 3. the test determines which of 2 or 3 will be named t.3 and multiplied by x; 4. then f is invoked on the result t.1 of the multiplication; 5. finally, the result of the f application is supplied to the continuation k. Variables such as t.1, t.3, t.5 can be viewed as registers that hold the results of intermediate computations.
1064
Chapter 17
Compilation
The example assumes that (mc→exp (id→mc k)) can be simplified to k.8 To see why, observe that (mc→exp (id→mc k)) = ((λm . (abs (Itemp ) (m Itemp ))) (λV . (app k V ))) = (abs (Itemp ) ((λV . (app k V )) Itemp )) = (abs (Itemp ) (app k Itemp ))
The final expression can be simplified to k by the [eta] rule. This eta reduction eliminates an abstraction in cases where the CPS transformation would have generated a trivial continuation that simply passed its argument along to another continuation with no additional processing. This simplification is sometimes called the tail-call optimization because it guarantees that tail calls in the source program require no additional control storage in the compiled program. In particular, there is no need to push an invocation frame corresponding to a trivial continuation onto the procedure-call stack. This allows tail calls to compile to assembly code jumps that pass arguments. A language is said to be properly tail recursive if implementations are required to compile source tail calls into jumps. Our FIL mini-language is properly tail recursive, as is the real language Scheme. Such languages can leave out iteration constructs (like while and for loops) and still express the constantcontrol-space iterative computations specified by such constructs using recursive procedures that invoke themselves via tail calls. Figure 17.29 shows the result of using MCPS to CPS-convert our revmap example. Observe that the output of CPS conversion looks much closer to assembly language code than the input (Figure 17.19 on page 1041). You should study the code to convince yourself that this program has the same behavior as the original program. CPS conversion has introduced only one nontrivial continuation abstraction: k.41 names the continuation of the call to f (now called f.3) in the body of the loop. Each input abstraction has been extended with a final argument naming its continuation: revmap (which has been renamed to abs.10) takes continuation argument k.20; the looping procedure (abs.25) takes continuation argument k.29; and the greater-than-b procedure (abs.11) takes continuation k.18. Note that the looping procedure (which is not only named abs.25 but is also named t.26 and t.37 when it is extracted from the first slot of the mutable product t.23) is always invoked with the same continuation as the enclosing abstraction (k.20 when it is named t.26 and k.29 when it is named t.37). So it requires only constant control space and is thus truly iterative like loops in traditional languages. 8
Subsequently, (let ((t.0 k)) (app V1 V2 t.0)) is simplified to (app V1 V2 k) by an application of [copy-prop].
17.9.3 A More Efficient CPS Transformation
1065
(MCPS exp [[(app f (@* x (if (app g y) 2 3)))]] (id→mc k)) = ((λm . (MCPS exp [[f]] (λV1 . (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (let ((t.0 (mc→exp m))) (app V1 V2 t.0))))))) (id→mc k)) = (MCPS exp [[f]] (λV1 . (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (let ((t.0 (mc→exp (id→mc k)))) (app V1 V2 t.0)))))) = (MCPS exp [[(@* x (if (app g y) 2 3))]] (λV2 . (app f V2 k))) = ((λm . (MCPS exp [[x]] (λV3 . (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* V3 V4 ))) (m t.1))))))) (λV2 . (app f V2 k))) = (MCPS exp [[x]] (λV3 . (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* V3 V4 ))) (app f t.1 k)))))) = (MCPS exp [[(if (app g y) 2 3)]] (λV4 . (let ((t.1 (@* x V4 ))) (app f t.1 k)))) = ((λm . (MCPS exp [[(app g y)]] (λV5 . (let ((kif.2 (mc→exp m))) (if V5 (MCPS exp [[2]] (id→mc kif.2)) (MCPS exp [[3]] (id→mc kif.2))))))) (λV4 . (let ((t.1 (@* x V4 ))) (app f t.1 k)))) = (MCPS exp [[(app g y)]] (λV5 . (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if V5 (MCPS exp [[2]] (λV6 . (app kif.2 V6 ))) (MCPS exp [[3]] (λV7 . (app kif.2 V7 ))))))) = ((λm . (MCPS exp [[g]] (λV8 . (MCPS exp [[y]] (λV9 . (let ((t.4 (mc→exp m))) (app V8 V9 t.4))))))) (λV5 . (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if V5 (app kif.2 2) (app kif.2 3))))) = (let ((t.4 (abs (t.5) ; (abs (t.5) . . . ) = (mc→exp (λV5 . . . . )) (let ((kif.2 (abs (t.3) (let ((t.1 (@* x t.3))) (app f t.1 k))))) (if t.5 (app kif.2 2) (app kif.2 3)))))) (app g y t.4)) ; substituted g for V8 and y for V9 in (app V8 V9 t.4) Figure 17.28
An example of CPS conversion using metacontinuations.
1066
Chapter 17
Compilation
Note that the looping procedure (abs.25) is the only nontrivial value ever stored in the first slot of the mutable product named t.23, so that all references to this slot (i.e., the values named by t.26 and t.37) denote the looping procedure. In the case of t.26, this fact can be automatically discovered by a simple peephole optimization (a local code optimization that transforms small sequences of instructions) on let* bindings: (let* (. . . (I1 (@mset! J Improd V )) (I2 (@mget J Improd )) . . .) Ebody ) simp − − −− → (let* (. . . (I1 (@mset! J Improd V )) (I2 V ) . . . ) Ebody )
In conjunction with the [copy-prop] simplification, this peephole optimization can justify simplifying (let* ( . . . (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23))) (app t.26 elts.4 k.20))
to (let* ( . . . (t.24 (@mset! 1 t.23 abs.25))) (app abs.25 elts.4 k.20))
in the CPS-converted revmap code. A much more sophisticated analysis would be necessary to determine that t.37 denotes the looping procedure. However, even this knowledge cannot be used to replace t.37 by abs.25 because abs.25 is a let-bound variable whose scope does not include the body of the abstraction (abs (xs.7 k.29) . . . ). The conciseness of the code in Figure 17.29 is a combination of the simplifications performed by reducing metacalls at compile time and the standard FILcps simplifications. To underscore the importance of the latter, Figure 17.30 shows the result of MCPS before any FILcps simplifications are performed. Nine applications of the [copy-prop] rule and four applications of the [eta] rule are used to simplify the code in Figure 17.30 to the code in Figure 17.29. In addition to making the code shorter, these simplifications are essential for performing the tail-call optimization. For example, the call (app t.37 t.38 k.40) in Figure 17.30 uses the trivial continuation k.40 = (abs (t.39) (app kif.30 t.39)), which itself uses the trivial continuation kif.30 = (abs (t.43) (app k.29 t.43)). This call is transformed to the new call (app t.37 t.38 k.29) by using two applications of the [eta] rule (simplifying (abs (t.43) (app k.29 t.43)) to k.29 and (abs (t.39) (app kif.30 t.39)) to kif.30) and two applications of the [copy-prop] rule (replacing kif.30 and k.40 by k.29). A drawback of the [copy-prop] simplifications is that they rename some of the identifiers from the input, making it harder for programmers to compare the
17.9.3 A More Efficient CPS Transformation
1067
(fil (a.0 b.1 ktop.9) (let* ((abs.10 (abs (f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (abs.25 (abs (xs.7 k.29) (let ((t.31 (@null? xs.7))) (if t.31 (let ((t.42 (@mget 1 t.21))) (app k.29 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (abs (t.35) (let* ((t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7))) (app t.37 t.38 k.29))))) (app f.3 t.34 k.41)))))) (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23))) (app t.26 elts.4 k.20)))) (abs.11 (abs (x.8 k.18) (let ((t.19 (@> x.8 b.1))) (app k.18 t.19)))) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13))) (app abs.10 abs.11 t.12 ktop.9))) Figure 17.29
revmap program after metaCPS conversion (with simplifications).
input and output of CPS conversion. In the revmap example, [copy-prop] changes ans.5 to t.21 and loop.6 to t.23 and also replaces two occurrences of _ (by t.32 and t.24). Since techniques to avoid such renamings are complex, and the particular names used don’t affect the correctness of the resulting code, we opt to accept such renamings without complaint.
1068
Chapter 17
Compilation
(fil (a.0 b.1 ktop.9) (let* ((abs.10 (abs (f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (ans.5 t.21) (t.23 (@mprod #u)) (loop.6 t.23) (abs.25 (abs (xs.7 k.29) (let* ((kif.30 (abs (t.43) (app k.29 t.43))) (t.31 (@null? xs.7))) (if t.31 (let ((t.42 (@mget 1 ans.5))) (app kif.30 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (abs (t.35) (let* ((t.36 (@mget 1 ans.5)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 ans.5 t.33)) (_ t.32) (t.37 (@mget 1 loop.6)) (t.38 (@cdr xs.7)) (k.40 (abs (t.39) (app kif.30 t.39)))) (app t.37 t.38 k.40))))) (app f.3 t.34 k.41)))))) (t.24 (@mset! 1 loop.6 abs.25)) (_ t.24) (t.26 (@mget 1 loop.6)) (k.28 (abs (t.27) (app k.20 t.27)))) (app t.26 elts.4 k.28)))) (revmap.2 abs.10) (abs.11 (abs (x.8 k.18) (let ((t.19 (@> x.8 b.1))) (app k.18 t.19)))) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (k.17 (abs (t.16) (app ktop.9 t.16)))) (app revmap.2 abs.11 t.12 k.17))) Figure 17.30
revmap program after metaCPS conversion (without simplifications).
17.9.3 A More Efficient CPS Transformation
1069
Exercise 17.19 Use MCPS exp to CPS-convert the following FIL expressions relative to an initial metacontinuation (id→mc k). a. (abs (f) (@+ 1 (app f 2))) b. (abs (g x) (@+ 1 (app g (@* 2 x)))) c. (abs (f g h x) (app f (app g x) (app h x))) d. (abs (f) (@* (if (app f 1) 2 3) (if (app f 4) 5 6))) Exercise 17.20 Use MCPS prog to CPS-convert the following FIL programs: a. The program Pquad from Figure 17.25 (page 1055). b. (fil (x) (let ((fact (@mprod #u)) (_ (@mset! 1 fact (abs (n) (if (@= n 0) 1 (@* n (app (@mget 1 fact) (@- n 1)))))))) (app (@mget 1 fact) x)))
c. (fil (x) (let ((fib (@mprod #u)) (_ (@mset! 1 fib (abs (n) (if (@ y n) y (throw y))))) (@- (f 5) (catch (abs (z) (@* 2 z)) (f 3)))))) (app Ecatchabs 0) − −IL − → 2 {5 - 3 = 2} F (app Ecatchabs 4) − −IL − → −1 {5 - (2*3) = -1} F (app Ecatchabs 8) − −IL − → 6 {5 + 1 = 6} F
a. Sam modifies the standard SCPS conversion clauses to translate every expression into a procedure taking two continuations: an exception continuation and a normal continuation. Sam’s SCPS conversion clauses for programs, literals, and conditionals are: SCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Ikntop ) ; Ikntop fresh (let ((Iketop (abs (Iinfo ) ; Iketop and Iinfo fresh (error uncaught-exception)))) (app (SCPS exp [[Ebody ]]) Iketop Ikntop ))) SCPS exp [[L]] = (abs (Ike Ikn ) ; Ike (exception cont.) and Ikn (normal cont.) fresh (app Ikn L)) SCPS exp [[(if Etest Ethen Eelse )]] = (abs (Ike Ikn ) ; Ike and Ikn fresh (app (SCPS exp [[Etest ]]) Ike (abs (Itest ) ; Itest fresh (if Itest (app (SCPS exp [[Ethen ]]) Ike Ikn ) (app (SCPS exp [[Eelse ]]) Ike Ikn )))))
Write the SCPS exp clauses for abs, app, throw, and catch.
17.10 Transformation 8: Closure Conversion
1075
b. For MCPS, Sam modifies MCPS exp to take an additional argument (an identifier naming the current exception continuation) before the metacontinuation argument. For example: MCPS prog [[(fil (I n i=1 ) Ebody )]] = (fil (I n i=1 Ikntop ) ; Ikntop fresh (let ((Iketop (abs (Iinfo ) ; Iketop and Iinfo fresh (error uncaught-exception)))) (MCPS exp [[Ebody ]] Iketop (id→mc Ikntop )))) MCPS exp : ExpFIL → Ident → MetaCont → Expcps MCPS exp [[L]] = (λIke m . (m L)) MCPS exp [[(if Etest Ethen Eelse )]] = (λIke m . (MCPS exp [[Etest ]] Ike (λVtest . (let ((Ikif (mc→exp m))) ; Ikif fresh (if Vtest (MCPS exp [[Ethen ]] Ike (id→mc Ikif )) (MCPS exp [[Eelse ]] Ike (id→mc Ikif )))))))
Write the MCPS exp clauses for abs, app, throw, and catch. c. Based on the metaCPS conversion of FIL+{throw, catch} explain how to perform metaCPS conversion for FIL+{raise, handle}.
17.10
Transformation 8: Closure Conversion
In a block-structured language, code can refer to variables declared outside the current block (i.e., in an outer procedure or class declaration). As we have seen in Chapters 6–7, the meaning of such free variable references is often explained in terms of environments. Traditional interpreters and compilers have specialpurpose machinery to manage environments. The Tortoise compiler avoids such machinery by making all environments explicit in the intermediate language. Each procedure is transformed into an abstract pair of code and environment, where the code explicitly accesses the environment to retrieve values formerly referenced by free variables. The resulting abstract pair is known as a closure because its code component is closed—i.e., it contains no free variables. The process of transforming all procedures into closures is traditionally called closure conversion. Because it makes all environments explicit, environment conversion is another name for this transformation. Closure conversion transforms a program that may contain higher-order procedures into one that contains only first-order procedures: rather than passing
1076
Chapter 17
Compilation
a procedure as a parameter or returning one as a result, a transformed program passes or returns a closure data structure. This technique is not only useful as a compiler transformation, but programmers may also apply it manually to simulate higher-order procedures in languages that support only first-order procedures (such as C, Pascal, and Ada) or objects with methods (such as SmallTalk, Java, C++, and C#). All one needs is a way to embed a procedure value (or a reference to a procedure) in a data structure (or object). In the Tortoise compiler, closure conversion has the following specification: Preconditions: The input to closure conversion is a valid kernel FIL program. Postconditions: The output of closure conversion is a valid kernel FIL program in which all abstractions are closed. Other properties: If the input program is in FILcps , so is the output program. In the Tortoise compiler, the closure conversion stage follows the renaming and CPS conversion stages, but closure conversion can be performed on any FIL program, even ones that are not uniquely named or in FILcps . The reason that Tortoise performs closure conversion after CPS conversion is so that closure conversion will be performed on the continuation procedures introduced by CPS conversion as well as on the user-defined procedures already in the program. The Tortoise closure conversion specification requires that any FILcps program will be tranformed to another FILcps program, so the output of the closure conversion stage of the compiler is guaranteed to be in FILcps . There are numerous approaches to closure conversion that differ in their representations of environments and closures. We shall focus on one class of representations, flat closures, and then briefly discuss some alternatives.
17.10.1
Flat Closures
Consider the following example: (let ((linear (abs (a b) (abs (x) (@+ (@* a x) b))))) (let ((f (app linear 4 5)) (g (app linear 6 7))) (@+ (app f 8) (app g 9))))
17.10.1 Flat Closures
1077
Given a and b, the linear procedure returns a procedural representation of a line with slope a and y-intercept b. The f and g procedures represent two such lines, each of which is associated with the abstraction (abs (x) . . . ), which has free variables a and b. In the case of f, these variables have the bindings 4 and 5, respectively, while for g they have the bindings 6 and 7. We will convert this example by hand and then develop an automatic closure conversion transformation. One way to represent f and g as closed procedures is shown below: (let ((fgcode (abs (env x) (let ((a (@mget 1 env)) (b (@mget 2 env))) (@+ (@* a x) b)))) (fenv (@mprod 4 5)) (genv (@mprod 6 7))) (let ((fclopair (@mprod fgcode fenv )) (gclopair (@mprod fgcode genv ))) (@+ (app (@mget 1 fclopair ) (@mget 2 fclopair ) 8) (app (@mget 1 gclopair ) (@mget 2 gclopair ) 9))))
In this approach, the two procedures share the same code component, fgcode , which takes an explicit environment argument env in addition to the original argument x. The env argument is assumed to be a tuple (product) whose two components are the values of the former free variables a and b. These values are extracted from the environment and given their former names in a wrapper around the body expression (@+ (@* a x) b). Note that fgcode has no free variables and so is a closed procedure. The environments fenv and genv are tuples holding the free variable values. The closures fclopair and gclopair are formed by making explicit code/environment pairs, each combining the shared code component with a specific environment for the closure. To handle the change in procedure representation, each call of the form (app f E ) must be transformed to (app (@mget 1 fclopair ) (@mget 2 fclopair ) E ) (and similarly for g) in order to pass the environment component as the first argument to the code component. Closure conversion can be viewed as an exercise in abstract data type implementation. The abstraction being considered is the procedure, whose interface has two operations: abs, which creates procedures, and app, which applies procedures. The goal of closure conversion is to find an implementation of this interface that behaves the same, but in which procedure creation requires no free variables. As in traditional data structure problems, we’re keen to design correct implementations that are as efficient as possible.
1078
Chapter 17
Compilation
(let ((linear (@mprod {this closure (clo.1) has only a code component} (abs (clo.1 a b) {the parameter clo.1 is not referenced} (@mprod {this closure (clo.2) has code + vars {a, b}} (abs (clo.2 x) (let ((a (@mget 2 clo.2)) (b (@mget 3 clo.2))) (@+ (@* a x) b))) a b)) {vars used by clo.2 = {a, b}} ))) {clo.1 has no vars} (let ((f (app (@mget 1 linear) linear 4 5)) (g (app (@mget 1 linear) linear 6 7))) (@+ (app (@mget 1 f) f 8) (app (@mget 1 g) g 9)))) Figure 17.33
Result of closure-converting the linear example.
For example, a more efficient approach to using explicit code/environment pairs is to collect the code and free variable values into a single tuple, as shown below: (let ((fgcode (abs (clo x) (let ((a (@mget 2 clo)) (b (@mget 3 clo))) (@+ (@* a x) b))))) (let ((fclo (@mprod fgcode 4 5)) (gclo (@mprod fgcode 6 7))) (@+ (app (@mget 1 fclo ) fclo 8) (app (@mget 1 gclo ) gclo 9))))
This approach, known as closure-passing style, avoids creating a separate environment tuple every time a closure is created, and avoids extracting this tuple from the code/environment pair every time the closure is invoked. If we systematically use closure-passing style to transform every abstraction and application site in the original linear example, we get the result shown in Figure 17.33. The inner abs has been transformed into a tuple that combines fgcode with the values of the free variables a and b from the outer abs. For consistency, the outer abs has also been transformed; its tuple has only a code component since the original abs has no free variables. By convention, we will refer to a closure tuple by the name of the first argument of its code component. In this example, the code comments refer to the outer closure tuple as clo.1 and the inner closure tuple as clo.2.
17.10.1 Flat Closures
1079
Figure 17.34 shows an example involving nested open procedures and unreferenced variables. In the unconverted clotest, the outermost abstraction, (abs (c d) . . .), is closed; the middle abstraction, (abs (r s t) . . .), has c as its only free variable (d is never used); and the innermost abstraction, (abs (y) . . .), has {c, r, t} as its free variables (d and s are never used). In the converted clotest, each abstraction has been transformed into a tuple that combines a closed code component with all the free variables of the original abstraction. The resulting tuples are called flat closures because all the environment information has been condensed into a single tuple that does not reflect any of the original nesting structure. Note that unreferenced variables from an enclosing scope are ignored. For example, the innermost body does not reference d and s, so these variables are not extracted from clo.3 and are not included in the innermost closure tuple. A formal specification of the flat closure conversion transformation is presented in Figure 17.35. The transformation is specified via the CLexp function on FIL expressions. The only nontrivial cases for CLexp are abs and app. CLexp converts an abs to a tuple containing a closed code component and all the free variables of the abstraction. The code component is derived from the original abs by adding a closure argument Iclo and extracting the free variables from this argument in a wrapper around the body. The order of the free variables is irrelevant as long as it is consistent between tuple creation and projection. An app is converted to another app that applies the code component of the converted rator closure to the closure and the converted operands. Certain parts of the CLexp definition are written in a somewhat unnatural way to guarantee that an input expression in FILcps will be translated to an output expression in FILcps . This is the purpose of the separate clause for converting an abs that occurs in a let binding and of the let* bindings in the app and abs conversions. We will ignore these details in our informal examples of closure conversion. Note that the unique naming property is not preserved by CLexp . The names Ifvi declared in the body of the closed abstraction stand for variables that are logically distinct from variables with the same names in the mprod application that creates the closure tuple. Figure 17.36 shows the revmap example after closure conversion. In addition to transforming procedures present in the original code in Figure 17.4 on page 1011 (clo.56 is revmap, clo.52 is loop, and clo.60 is the greater-than-b procedure), closure conversion also transforms the continuation procedures introduced by CPS conversion (clo.48 is the continuation for the f call—compare Figure 17.29 on page 1067). The free variables in converted continuations corre-
1080
Chapter 17
Compilation
Unconverted Expression (let ((clotest (abs (c d) (abs (r s t) (abs (y) (@+ (@/ (@* r y) t) (@- r c))))))) (let ((p (app clotest 4 5))) (let ((q1 (app p 6 7 8)) (q2 (app p 9 10 11))) (@+ (app q1 12) (app q2 13))))) Converted Expression (let ((clotest (@mprod {this closure (clo.1) has only a code component} (abs (clo.1 c d) {the parameter clo.1 is never referenced} (@mprod {this closure (clo.2) has code + var {c}} (abs (clo.2 r s t) (let ((c (@mget 2 clo.2))) (@mprod {this closure (clo.3) has code + vars {c, r, t}} (abs (clo.3 y) (let ((c (@mget 2 clo.3)) (r (@mget 3 clo.3)) (t (@mget 4 clo.3))) (@+ (@/ (@* r y) t) (@- r c)))) c r t))) {vars used by clo.3 = {c, r, t}} c)) {vars used by clo.2 = {c}} ))) {clo.1 has no vars} (let ((p (app (@mget 1 clotest) clotest 4 5))) (let ((q1 (app (@mget 1 p) p 6 7 8)) (q2 (app (@mget 1 p) p 9 10 11))) (@+ (app (@mget 1 q1) q1 12) (app (@mget 1 q2) q2 13))))) Figure 17.34
Flat closure conversion on an example with nested open procedures.
spond to the caller-saved register values that a traditional implementation would save on the stack during a subroutine call that returns to the control point represented by the continuation. In the Tortoise compiler, this saving behavior is automatically implemented by performing closure conversion after CPS conversion, but the saved values are stored in the continuation closure rather than on an explicit stack. For example, continuation closure clo.48 includes the values needed by the loop after a call to f: the cell t.21 resulting from the assignment
17.10.1 Flat Closures
1081
CLexp : ExpFIL → ExpFIL n CLexp [[(abs (I i=1 ) Ebody )]] n ) Ebody )]] = let {Ifv1 , . . . , Ifvk } be FrIds[[(abs (I i=1 ; assume an appropriate definition of FrIds for FIL n ) ; Iclo fresh in (@mprod (abs (Iclo I i=1 (let* ((Ifvj (@mget j + 1 Iclo ))kj=1 ); N [[n]] = n for n ∈ Nat CLexp [[Ebody ]])) Ifv1 . . . Ifvk ) n CLexp [[(let ((Iabs (abs (I i=1 ) Eabsbody ))) Eletbody )]] ; special case of abs conversion that preserves FILcps n = let (@mprod Ecode Ifv1 . . . Ifvk ) be CLexp [[(abs (I i=1 ) Eabsbody )]] in (let* ((Icode Ecode ) ; Icode fresh (Iabs (@mprod Icode Ifv1 . . . Ifvk ))) CLexp [[Eletbody ]])
CLexp [[(app Erator E ni=1 )]] = (let* ((Iclo CLexp [[Erator ]]) ; Iclo fresh (Icode (@mget 1 Iclo ))) ; Icode fresh (app Icode Iclo CLexp [[Ei ]]ni=1 )) CLexp [[E ]] = mapsubFIL [[E ]] CLexp , otherwise. Figure 17.35
The flat closure conversion transformation CLexp .
conversion of ans, the cell holding the looping procedure t.23, the loop state variable xs.7, and the end-of-loop continuation k.29. In Figure 17.36, we assume that the top-level continuation ktop.9 supplied by the operating system is consistent with the calling convention used by the closureconverted code. I.e., ktop.9 must be a closure tuple whose first slot contains an abstraction with two parameters: (1) the closure tuple and (2) the argument expected by the system’s unary continuation procedure. Alternatively, for the case where closure conversion is known to follow CPS conversion, we could define a special program-level closure conversion function CLprog that assumes that the final argument in the input FILcps program is an unconverted unary continuation procedure (see Exercise 17.30). In order to work properly, CLexp requires that the input expression contain no assignments (set!). This is necessarily true in FIL, which does not support set!, but would be an issue in extensions to FIL that include set! (e.g., see Exercises 17.14 and 17.23 in Sections 17.9.2 and 17.9.3). The reason for this restriction is that the copying of free variable values by CLexp in the abs clause
1082
Chapter 17
Compilation
does not preserve the semantics of mutable variables. Consider the following example of a nullary procedure that increments a counter every time it is called in FIL+{set!}: (let ((count 0)) (abs () (let* ((new-count (@+ count 1)) (_ (set! count new-count))) new-count)))
Closure-converting this example yields: (let ((count 0)) (@mprod (abs (clo) (let* ((count (@mget 2 clo))) (let* ((new-count (@+ count 1)) (_ (set! count new-count))) new-count))) count))
The set! in the transformed code changes the local variable count within the abstraction, which is always initially bound to the value 0. So the closure-converted procedure always returns 1, which is not the correct behavior. Performing assignment conversion before closure conversion fixes this problem, since count will then name a sharable mutable cell rather than a number, and the set! will be transformed to an mset! on this cell. The interaction between mutable variables and closure conversion arises in practice in Java. Java’s anonymous inner classes allow the programmer to create an instance of an unnamed class (the inner class) within the method of another class (the outer class). Because it is possible for the inner class instance to refer to parameters and local variables of the enclosing method, the inner class instance is effectively a closure over these variables. For example, Figure 17.37 shows how an inner class can be used to express the linear example from page 1076 in Java. The IntFun interface is a specification for a class providing an app method that takes a single integer argument and returns an integer result. The linear method of the Linear class takes integers a and b and returns an instance of an anonymous class satisfying the IntFun specification whose app method maps an argument x to the result (a*x)+b. This instance corresponds to the first-class procedure (abs (x) (+ (* a x) b)) in FIL. Java requires any enclosing local variables mentioned in the inner class (a and b in this example) to be declared immutable (using the keyword final). This restriction allows the Java compiler to copy the values of these variables into instance variables of the anonymous inner class instance rather than attempting to share the locations of these variables (which would require some form of assignment conversion).
17.10.1 Flat Closures
1083
(fil (a.0 b.1 ktop.9) (let* ((code.57 {code of clo.56} (abs (clo.56 f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (code.53 {code of clo.52} (abs (clo.52 xs.7 k.29) (let* ((t.21 (@mget 2 clo.52)) (t.23 (@mget 3 clo.52)) (f.3 (@mget 4 clo.52)) (t.31 (@null? xs.7))) (if t.31 (let* ((t.42 (@mget 1 t.21)) (code.44 (@mget 1 k.29))) (app code.44 k.29 t.42)) (let* ((t.34 (@car xs.7)) (code.49 {code of clo.48} (abs (clo.48 t.35) (let* ((t.21 (@mget 2 clo.48)) (t.23 (@mget 3 clo.48)) (xs.7 (@mget 4 clo.48)) (k.29 (@mget 5 clo.48)) (t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7)) (code.46 (@mget 1 t.37))) (app code.46 t.37 t.38 k.29)))) (k.41 (@mprod code.49 t.21 t.23 xs.7 k.29)) {clo.48} (code.50 (@mget 1 f.3))) (app code.50 f.3 t.34 k.41)))))) (abs.25 (@mprod code.53 t.21 t.23 f.3)) {clo.52} (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23)) (code.54 (@mget 1 t.26))) (app code.54 t.26 elts.4 k.20)))) (abs.10 (@mprod code.57)) {clo.56} (code.61 (abs (clo.60 x.8 k.18) {code of clo.60} (let* ((b.1 (@mget 2 clo.60)) (t.19 (@> x.8 b.1)) (code.58 (@mget 1 k.18))) (app code.58 k.18 t.19)))) (abs.11 (@mprod code.61 b.1)) {clo.60} (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)))
Figure 17.36
revmap program after closure conversion.
1084
Chapter 17
Compilation
Exercise 17.28 a. A function f is idempotent iff (f (f x)) = (f x) for all x ∈ dom(f ). CLexp is not idempotent. Explain why. Can any closure conversion transformation be idempotent? n b. In the abs clause for CLexp , suppose FrIds[[(abs (I i=1 ) Ebody )]] is replaced by the set of all variables in scope at that point. Is this a meaning-preserving change? What are the advantages and disadvantages of such a change?
c. In a FIL-based compiler, CLexp must necessarily be performed after an assignment conversion pass. Could we perform it before a renaming pass? A globalization pass? A CPS-conversion pass? Explain. Exercise 17.29 In the abs clause, the CLexp function uses a wrapping strategy to wrap the body of the original abs in a let* that extracts and names each free variable value in the closure. An alternative substitution strategy is to replace each free reference in the original abs by a closure access. Here is a modified version of the clo.2 code component from Figure 17.33 that uses the substitution strategy: (abs (clo.2 x) (@+ (@* (@mget 2 clo.2) x) (@mget 3 clo.2)))
Neither strategy is best in all situations. Describe situations in which the wrapping strategy is superior and in which the substitution strategy is superior. State all of the assumptions of your argument. Exercise 17.30 a. Define a program-level closure conversion function CLprog that expects a FILcps program: CLprog : Progcps → Progcps
In both the input and output programs, the final program argument Iktop is expected to be the top-level unary continuation procedure. CLprog must handle Iktop specially so that it is applied directly to its single argument rather than via the closure application convention. It is not necessary to modify CLexp . b. Show the result of using your CLprog function to closure-convert the following program: (fil (a b ktop) (let ((add-a (abs (x k) (let ((t (@+ x a))) (app k t))))) (if b (app add-a a ktop) (app ktop a))))
Exercise 17.31 Using anonymous inner classes, complete the following translation of the clotest example from Figure 17.34 into Java by filling in the hole in the following code with a single Java expression:
17.10.2 Variations on Flat Closure Conversion
1085
interface IntFun { public int app (int x); } public class Linear { public static IntFun linear (final int a, final int b) { return new IntFun() { public int app (int x) {return (a*x)+b;} }; } public static int example () { IntFun f = linear(4,5); IntFun g = linear(6,7); return f.app(8) + g.app(9); } } Figure 17.37 Using anonymous inner classes to express the linear example from page 1076 in Java.
interface IntFun1 { public int app (int x); } interface IntFun2 { public IntFun3 app (int x, int y); } interface IntFun3 { public IntFun1 app (int x, int y, int z); } public class Clotest { public static int example () { IntFun2 clotest = 2; IntFun3 p = clotest.app(4,5); IntFun1 q1 = p.app(6,7,8); IntFun1 q2 = p.app(9,10,11); return q1.app(12) + q2.app(13); } }
17.10.2
Variations on Flat Closure Conversion
Now we consider several variations on flat closure conversion. We begin with an optimization to CLexp . Why does CLexp transform an already closed abs into a closure tuple? This strategy simplifies the transformation by enabling all procedure applications to be transformed uniformly to “expect” such a tuple. But it is also possible to use nonuniform transformations on abstractions and applications as long as the correct behavior is maintained. Given a control flow analysis (see page 995) that indicates which procedures flow to which call sites (application expressions that use the procedures in their rator positions), we can do a better job via so-called selective closure conversion [WS94].
1086
Chapter 17
Compilation
(let ((linear (abs (a b) {this closed abstraction is not transformed} (@mprod {this is the closure tuple for an open abstraction} (abs (clo.2 x) (let* ((a (@mget 2 clo.2)) (b (@mget 3 clo.2))) (@+ (@* a x) b))) a b)))) {free vars of clo.2} (let ((f (app linear 4 5)) {this application is not transformed} (g (app linear 6 7))) {this application is not transformed} (@+ (app (@mget 1 f) f 8) (app (@mget 1 g) g 9)))) Figure 17.38
Result of selective closure conversion in the linear example.
In this approach, originally closed procedures that flow only to call sites where only originally closed procedures are called are left unchanged by the closure conversion process, as are their call sites. This avoids unnecessary tuple creation and projection. The result of selective closure conversion for the linear example is presented in Figure 17.38 (compare Figure 17.33 on page 1078). Because the linear procedure is closed, its abstraction and the calls to linear are not transformed. But the procedure returned by invoking linear has free variables (a and b), and so must be converted to a closure tuple. In selective closure conversion, a closed procedure pclosed cannot be optimized when it is called at the same call site s as an open procedure popen in the original program. The call site must be transformed to expect for its rator a closure tuple for popen , and so pclosed must also be represented as a closure tuple since it flows to the rator position of s. This representation constraint can similarly force other closed procedures that share call sites with pclosed to be converted, leading to a contagious phenomenon called representation pollution [DWM+ 01]. In the following example, although f is closed, selective closure conversion must still convert f to a closure tuple because it flows to the same call site (app (if b f g) 3) as the open procedure g: Epolluted = (abs (b c) (let ((f (abs (x) (@+ x 1))) (g (let ((a (if b 4 5))) (abs (y) (@+ (@* a y) c))))) (@+ (app f 2) (app (if b f g) 3))))
17.10.2 Variations on Flat Closure Conversion
1087
Representation pollution can sometimes be avoided by duplicating a closed procedure and using different representations for the two copies. For instance, if we split f in Epolluted into two copies, then the copy that flows to the call site (app f 2) need not be converted to a tuple in the closure-converted code: (abs (b c) {assume the outer abstraction need not be converted to a tuple} (let ((f1 (abs (x) (@+ x 1))) {this copy is not converted to a tuple} (f2 (@mprod (abs (clo.1 x) (@+ x 1)))) {this copy is converted to a tuple} (g (let ((a (if b 4 5))) (@mprod (abs (clo.2 y) {this must be converted to a tuple} (let ((a (@mget 1 clo.2)) (c (@mget 2 clo.2))) (@+ (@* a y) c))) a c)))) (@+ (app f1 2) {this is an unconverted call site} (let ((clo.3 (if b f2 g))) {this is a converted call site} (app (@mget 1 clo.3) clo.3 3)))))
When closed and open procedures flow to the same call site (e.g., f2 and g above), we can force the closed procedure to have the same representation as the open one (i.e., a closure tuple). Another way to handle heterogeneous procedure representations is to affix tags to procedures to indicate their representation. Call sites where different representations flow together perform a dynamic dispatch on the tagged value. For example, using the oneof notation introduced in Section 10.2, we can use code to tag a closed procedure and closure to tag a closure tuple, as in the following conversion of Epolluted : (abs (b c) {assume the outer abstraction need not be converted to a tuple} (let ((f1 (abs (x) (@+ x 1))) {this copy is not converted to a tuple} (f2 (one code (abs (x) (@+ x 1)))) {tagged as a closed procedure} (g (let ((a (if b 4 5))) (one closure {tagged as a closure} (@mprod (abs (clo y) (let ((a (@mget 2 clo)) (c (@mget 3 clo))) (@+ (@* a y) c))) a c))))) (@+ (app f1 2) {this is an unconverted call site} (app-generic (if b f2 g) 3))))
Here, (app-generic Erator E ni=1 ) is assumed to desugar to n (let ((Ii Ei )ni=1 ) ; I i=1 are fresh (tagcase Erator Irator n (code (app Irator I i=1 )) n )))) (closure (app (@mget 1 Irator ) Irator I i=1
1088
Chapter 17
Compilation
This tagging strategy is not necessarily a good idea. Analyzing and converting programs to handle tags is complex, and the overhead of tag manipulation can offset the gains made by reducing representation pollution [DWM+ 01]. In an extreme version of the tagging strategy, all procedures that flow to a given call site are viewed as members of a sum-of-products data type. Each element in this data type is a tagged environment tuple. The tag indicates which abstraction created the procedure, and the environment tuple holds the free variable values of the procedure. A procedure call can then be converted to a dispatch on the environment tag that calls an associated closed procedure. Using this strategy on Epolluted yields (abs (b c) (let ((fcode (abs (x) (@+ x 1))) {code for f} (fenv (one abs1 (@mprod))) {tagged environment for f} (gcode (abs (y a c) (@+ (@* a y) c))) {code for g} (genv (let ((a (if b 4 5))) (one abs2 (@mprod a c))))) {tagged environment for g} (@+ (app fcode 2) (app-env (if b fenv genv) 3))))
where (app-env Eenv Erand ) is an abbreviation for (let ((Irand Erand )) (tagcase Eenv Ienv (abs1 (app fcode Irand )) (abs2 (app gcode Irand (@mget 1 Ienv ) (@mget 2 Ienv )))))
The procedure call overhead in the dispatch can often be reduced by an inlining process that replaces some calls by appropriately rewritten copies of their bodies. E.g., app-env could be rewritten as (let ((Irand E1 )) (tagcase Eenv Ienv (abs1 (@+ Irand 1)) (abs2 (@+ (@* (@mget 1 Ienv ) Irand ) (@mget 2 Ienv )))))
This example uses only a single app-env procedure, but in the worst case a different environment application procedure might be needed at every call site. This environment-tagging strategy is known as defunctionalization [Rey72] because it removes all higher-order functions from a program. Defunctionalization is an important closure conversion technique for languages (such as Ada and Pascal) in which function pointers cannot be stored in data structures — a feature required in all the previous techniques we have studied. Some drawbacks of defunctionalization are that it requires the whole program (it cannot be performed on individual modules) and that environment application procedures like app-env might need to dispatch on all abstractions in the entire program. In
17.10.2 Variations on Flat Closure Conversion
1089
practice, type and control flow information can be used to significantly narrow the set of abstractions that need to be considered at a given call site, making defunctionalization a surprisingly efficient approach to closure conversion [CJW00]. A closure need not carry with it the value of a free variable if that variable is available in all contexts where the closure is invoked. This observation is the key idea behind so-called lightweight closure conversion [WS94, SW97], which can decrease the number of free variables in a procedure by adding extra arguments to the procedure if those arguments are always dynamically available at all call sites for the procedure. In our example, the lightweight optimization is realized by rewriting the Epolluted as follows before performing other closure conversion techniques: (abs (b c) (let ((f (abs (x c) (@+ x 1))) {(3) By 2, need param c here.} (g (let ((a (if b 4 5))) (abs (y c) (@+ (@* a y) c))))) {(1) Add c as param.} (@+ (app f 2 c) {(4) By 3, must add c as an arg here, too.} (app (if b f g) 3 c)))) {(2) By 1, need arg c here.}
Since g’s free variable c is available at the one site where g is called, we should be able to pass it as an argument at the site rather than storing it in the closure for g. But representation constraints also force us to add c as an argument to f, since f shares a call site with g. If f were called in some context outside the scope of c, this fact would invalidate the proposed optimization. This example only hints at the sophistication of the analysis required to perform lightweight closure conversion in practice. Exercise 17.32 Consider the following FIL abstraction Eabs : (abs (b) (let ((f (abs (x) (@+ x 1))) (g (abs (y) (@* y 2))) (h (abs (a) (abs (z) (@/ z a)))) (p (abs (r) (app r 3)))) (@+ (app (if b f g) 4) (@* (app p (app h 5)) (app p (app h 6))))))
a. Show the result of applying flat closure conversion to Eabs . b. The transformation can be improved if we use selective closure conversion instead. Show the result of selective closure conversion on Eabs . c. Suppose we replace (app h 6) by g in Eabs to give Eabs . Then selective closure conversion on Eabs does not yield an improvement over regular closure conversion on Explain why. Eabs d. Describe a simple meaning-preserving change to Eabs after which selective closure conversion will be an improvement over regular closure conversion.
1090
Chapter 17
Compilation
Exercise 17.33 Using the flat closure conversion techniques presented so far, translate the following FIL program into C, Java, and Pascal. The program has the property that equality, remainder, division, and subtraction operations are performed only when p is called, not when q is called. Your translated programs should also have this property (fil (n) (let* ((p (abs (w) (if (@= 0 w) (abs (x) x) (if (@= 0 (@% w 2)) (let ((p1 (p (@/ (abs (y) (@* 2 (let ((p2 (p (@(abs (z) (@+ 1 (let ((q (app p n))) (@+ (app q 1) (app q n)))))
17.10.3
w 2)))) (app p1 y)))) w 1)))) (app p2 z)))))))))
Linked Environments
Thus far we have assumed that all free variable values of a procedure are stored in a single flat environment or closure. This strategy minimizes the information carried in a particular closure. However, it is often the case that a free variable is referenced by several closures. Setting aside a slot for (a pointer to) the value of this variable in several closures/environments increases the space requirements of the program. For example, in the flat clotest example of Figure 17.34, closures p, q1, and q2 all contain a slot for the value of free variable c. An alternative approach is to structure closures to enhance sharing and reduce copying. In a code/environment model, a high degree of sharing is achieved when every call site bundles the environment of the called procedure (the parent environment) together with the argument values to create the environment for the body of the called procedure. In this approach, each closed abstraction takes a single argument, its environment, and all variables are accessed through this environment. This is called a linked environment approach because environments are represented as chains of linked components called frames. Figure 17.39 shows this approach for the clotest example. Note that the first slot of environment frames env1, env2, and env3 contains (a pointer to) its parent frame. Variables declared by the closest enclosing abs are accessed directly from the current frame, but variables declared in outer abstractions require one or more indirections through parent frames. For instance, in the body of the innermost abs, variable y, which is the first argument of the current frame, env3, is accessed via (@mget 2 env3); the variable r, which is the first argument one frame back, is accessed via (@mget 2 (@mget 1 env3)); and the variable c, which is the first argument two frames back, is accessed via (@mget 2 (@mget 1 (@mget 1 env3))).
17.10.3 Linked Environments
1091
(let ((clotest (@mprod (abs (env1) {env1 = env0, c, d} (@mprod (abs (env2) {env2 = env1, r, s, t} (@mprod (abs (env3) {env3 = env2, y} (@+ (@/ (@* (@mget 2 (@mget 1 env3)) {get r} (@mget 2 env3)) {get y} (@mget 4 (@mget 1 env3))) {get t} (@- (@mget 2 (@mget 1 env3)) {get r} (@mget 2 (@mget 1 (@mget 1 env3)))))) {get c} env2)) env1)) (@mprod)))) {This is env0 = the empty environment} (let ((p (app (@mget 1 clotest) (@mprod (@mget 2 clotest) 4 5)))) (let ((q1 (app (@mget 1 p) (@mprod (@mget 2 p) 6 7 8))) (q2 (app (@mget 1 p) (@mprod (@mget 2 p) 9 10 11)))) (@+ (app (@mget 1 q1) (@mprod (@mget 2 q1) 12)) (app (@mget 1 q2) (@mprod (@mget 2 q2) 13)))))) Figure 17.39
A version of the clotest example with linked environments.
In general, each variable has a lexical address back , over , where back indicates how many frames back the variable is located and over indicates its position in the resulting frame. A variable with lexical address b, o9 is translated to (@mget o (@mgetb 1 e)), where e is the current lexical environment frame and (@mgetb 1 e) stands for the b-fold composition of the first projection starting with e. Traditional compilers often use such lexical addresses to locate variables on a stack, where so-called static links are used to model chains of frames stored on the stack. Linked environments are also commonly used in interpreters for block-structured languages. For example, the environment model interpreter in [ASS96] represents procedures as closures whose environments are linked frames. Figure 17.40 depicts the shared environment structure in the clotest example with linked environments. Note how the environment of p is shared as the parent environment of q1’s environment and q2’s environment. In contrast with the flat environment case, p, q1, and q2 all share the same slot holding c, so less slot space is needed for c. Another advantage of sharing is that the linked environment approach to closure conversion can support set! directly without the need for assignment conversion (see Exercise 17.35). 9
Assume that back indices b start at 0 and over indices o start at 2.
1092
Chapter 17
Compilation
However, there are several downsides to linked environments. First, variable access is slower than for flat closures because of the indirections through parent environment links. Second, environment slots hold values (such as d and s) that are never referenced, so space is wasted on these slots. A final subtle point is that shared slots can hold onto values longer than they are actually needed by a program, leading to space leaks in the storage manager (see Section 18.1). Some of these points and some alternative linked strategies are explored in the exercises. Exercise 17.34 a. In the context of closure-converting the following FIL expression, discuss the issues involved in converting let expressions in the linked environment approach described above: (abs (a) (let ((b (@+ a 1)) (c (@* a a))) (let ((f (abs (d) (@+ a (@* c d))))) (@mprod (app f b) (app f c)))))
let-bound names (such as b and f) that do not appear as the free variables of an abstraction should not be put in environment frames. b. Formally define a closure conversion transformation on FIL expressions that implements the linked environment approach. Do not worry about preserving the CPS form of input programs. Exercise 17.35 Use the linked environment approach to closure-convert the following FIL+{set!} expression. A set! in the input expression should be converted to an mset! on an environment tuple in the converted expression. (let ((r (abs (x) (abs (y) (let ((z (@+ x y))) (let ((_ (set! x z))) z)))))) (let ((s1 (app r 1)) (s2 (app r 2))) (@mprod (app s1 3) (app s2 4) (app s1 5))))
Exercise 17.36 The linked environment approach illustrated by the code in Figure 17.39 constructs a mutable tuple representing an environment frame at every call site. An alternative approach, which we shall call the code/linked-env representation, is to construct new linked environment frames only when building the closure tuple. This way, the procedure calling convention looks exactly like that for code/environment pairs; the only difference from the code/environment approach studied earlier is that the environments are not flat but are composed of linked frames.
17.10.3 Linked Environments
1093 clotest:
•
(abs (env1) . . .)
p:
4 c
5 d
(abs (env2) . . .)
q1:
6 r
(abs (env3) . . .)
Figure 17.40
12 y
7 s
8 t
q2:
(abs (env3) . . .)
9 r
10 s
11 t
13 y
Depiction of the links in the linked clotest example.
a. Show the code/linked-env approach for the clotest example by fleshing out the hole in the following code: (let ((clotest 2)) (let ((p (app (@mget 1 clotest) (@mget 2 clotest) 4 5))) (let ((q1 (app (@mget 1 p) (@mget 2 p) 6 7 8)) (q2 (app (@mget 1 p) (@mget 2 p) 9 10 11))) (@+ (app (@mget 1 q1) (@mget 2 q1) 12) (app (@mget 1 q2) (@mget 2 q2) 13)))))
b. Compare the code/linked-env approach with the linked environment approach discussed in the text on the following points: number of tuples created, efficiency of accessing variables, omitting variables from environment frames, converting let expressions, and handling set!. c. Formally define a closure conversion transformation on FIL expressions that implements the code/linked-env strategy. Do not worry about preserving the CPS form of input programs.
1094
17.11
Chapter 17
Compilation
Transformation 9: Lifting
Programmers nest procedures when an inner procedure needs to use variables that are declared in an outer procedure. The free variables in such an inner procedure are bound by the outer procedure. We have seen that closure conversion eliminates free variables in every procedure. However, because it leaves abstractions in place, it does not eliminate procedure nesting. A procedure is global when it is declared at top level — i.e., in the outermost scope of a program. Lifting (also called lambda lifting10 ) is the process of eliminating procedure nesting by collecting all procedure abstractions and declaring them as global procedures. All procedure abstractions must be closed before lifting is performed — otherwise, lifting them to top level would break the fundamental connection between free variable references and their associated declarations. Once all of the procedures in a program are declared at top level, each one can be compiled into straight-line code (modulo branches for any if expressions in its body) and given a global name.11 In the analogy with assembly code, such a name corresponds to an assembly code label for the first instruction in the subroutine corresponding to the procedure. In the Tortoise compiler, the result of the lifting phase is a program in FILlift (Figure 17.41), a variant of the FILcps language. The key difference between FILlift and FILcps is that abstractions may appear only at the top level of a program in new declaration constructs having the form (def S AB ), where AB is an abstraction and S is special kind of identifier called a subroutine name. Each subroutine name subr0, subr1, subr2, . . . is the concatenation of the name subr and a natural number literal. For n ∈ Nat, we use both subrn and subrn to stand for the result of concatenating the name subr with the digits of the numeral for n. E.g., subr17 = subr17 = subr17. The definition of Proglift requires that subr0 be used for the first subroutine, subr1 be used for the second subroutine, etc. This requirement makes it possible to refer to procedures by number rather than by name. Every subroutine name is a legal identifier and so may be used as a variable reference elsewhere in a program. As in 10 In the literature on compiling functional programming languages (e.g., [Joh85, Pey87]), “lambda lifting” often refers to a process that not only lifts all functions to top level, but also serves as a closure conversion transformation in which closures are represented as partially applied curried functions. 11 It is possible to compile a procedure with nested internal procedures directly to assembly code by placing unconditional branch instructions around the code for the internal procedures. Avoiding unnecessary unconditional branches is important for modern processors with instruction caches, instruction prefetching, and pipelined architectures.
17.11 Transformation 9: Lifting
1095
∗ P ∈ Proglift ::= (fil (Iformal ) Ebody (def subri AB i )ni=0 ) ∗ ) Ebody ) AB ∈ Abstractionlift ::= (abs (Iformal ∗ ) | (if Vtest Ethen Eelse ) E ∈ Explift ::= (app Irator Vrand | (let ((Iname LE defn )) Ebody ) | (error Ymessage )
V ∈ ValueExplift ::= L | I ∗ LE ∈ LetableExplift ::= L | (prim Oprimop Varg )
L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primop = as in full FIL Keywordlift = {abs, app, def, error, fil, if, let, let*, prim, sym} I ∈ Identlift = SymLit − {Y | Y begins with @} ∪ Keywordlift NT ∈ NatLit = {0, 1, 2, . . .} S ∈ Subr = identifiers of the form subrn ; subrn is shorthand for subrn ; For n ∈ Nat, the notation I n stands for the identifier that ; results from concatenating the characters of the name I with ; the digit characters of the numeral in NatLit that denotes n. Figure 17.41
Grammar for FILlift , the result of the Tortoise lifting stage.
other FL-like languages we have studied, the names declared by def have global scope — they may be referenced in all expressions in the program, including the bodies of all def declarations. The Tortoise lifting transformation LI prog has the following specification: Preconditions: The input to LI prog is a valid kernel FILcps program in which every abstraction is closed. Postconditions: The output of LI prog is a program in which every abstraction is globally defined via def at the top level of a program, as specified in the FILlift grammar in Figure 17.41. The free identifiers in each abstraction must be a subset of the subroutine names bound by defs in the program. Although abstractions are required to be closed before lifting, abstractions after lifting are not necessarily closed. This is because each nested abstraction is replaced by a def-bound subroutine name that is necessarily free in the immediately enclosing abstraction. But the def-bound subroutine names are the only names that can be free in the abstractions that result from the lifting transformation.
1096
Chapter 17
Compilation
Here is a sketch of the algorithm employed by LI prog for a program containing n abstractions: 1. Associate with each abstraction AB i (0 ≤ i ≤ n) in the program the subroutine name subri . 2. Replace the abstraction AB i in the program by a reference to its associated name, subri . 3. Return a program of the form ∗ (fil (Ifml ) Ebody (def subr0 AB 0 ) . . . (def subrn AB n ))
where AB 0 , . . ., AB n are the transformed versions of all the abstractions AB 0 , . . ., AB n in the original program, and Ebody is the transformed body. For example, Figure 17.42 shows the revmap example after lambda lifting. subr0 is the code for the revmap procedure, subr1 is the code for the loop procedure, subr2 is the code for the continuation of the call to f within the body of the loop procedure, and subr3 is the code for the greater-than-b procedure. The example shows how replacing each abstraction with its unique subroutine name can introduce free variables into otherwise closed abstractions. For instance, the body of the abstraction named subr0 contains a reference to subr1 and the body of the abstraction named subr1 contains a reference to subr2. In the revmap example, code.62 always denotes the subroutine named subr0, code.46 and code.54 always denote subr1, code.58 always denotes subr2, and code.50 always denotes subr3. In all these cases, it would be safe to replace these code references (and eliminate their associated (mget 1) operations) by the subroutine names. In assembly code, this optimization corresponds to replacing an indirect jump to a subroutine by a direct jump. It is possible for the compiler to perform this optimization automatically, but a sophisticated analysis that tracks control-flow and store-effect information would be required to determine when the optimization can be safely applied. Exercise 17.37 Formally define the LI prog function sketched above. You will also need to define appropriate functions on other FILcps syntactic domains. For simplicity, you may assume that fresh subroutine names are generated in the order subr0, subr1, . . .; i.e., you need not thread a subroutine name counter through your functions.
17.11 Transformation 9: Lifting (fil (a.0 b.1 ktop.9) (let* ((abs.10 (@mprod subr0)) (abs.11 (@mprod subr3 b.1)) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) (def subr0 (abs (clo.56 f.3 elts.4 k.20) (let* ((t.22 (@null)) (t.21 (@mprod t.22)) (t.23 (@mprod #u)) (abs.25 (@mprod subr1 t.21 t.23 f.3)) (t.24 (@mset! 1 t.23 abs.25)) (t.26 (@mget 1 t.23)) (code.54 (@mget 1 t.26))) (app code.54 t.26 elts.4 k.20)))) (def subr1 (abs (clo.52 xs.7 k.29) (let* ((t.21 (@mget 2 clo.52)) (t.23 (@mget 3 clo.52)) (f.3 (@mget 4 clo.52)) (t.31 (@null? xs.7))) (if t.31 (let* ((t.42 (@mget 1 t.21)) (code.44 (@mget 1 k.29))) (app code.44 k.29 t.42)) (let* ((t.34 (@car xs.7)) (k.41 (@mprod subr2 t.21 t.23 xs.7 k.29)) (code.50 (@mget 1 f.3))) (app code.50 f.3 t.34 k.41)))))) (def subr2 (abs (clo.48 t.35) (let* ((t.21 (@mget 2 clo.48)) (t.23 (@mget 3 clo.48)) (xs.7 (@mget 4 clo.48)) (k.29 (@mget 5 clo.48)) (t.36 (@mget 1 t.21)) (t.33 (@cons t.35 t.36)) (t.32 (@mset! 1 t.21 t.33)) (t.37 (@mget 1 t.23)) (t.38 (@cdr xs.7)) (code.46 (@mget 1 t.37))) (app code.46 t.37 t.38 k.29)))) (def subr3 (abs (clo.60 x.8 k.18) (let* ((b.1 (@mget 2 clo.60)) (t.19 (@> x.8 b.1)) (code.58 (@mget 1 k.18))) (app code.58 k.18 t.19)))))
Figure 17.42
revmap program after lambda lifting.
1097
1098
17.12
Chapter 17
Compilation
Transformation 10: Register Allocation
The goal of the Tortoise compiler is to translate high-level programs to code that can be executed on a register machine. A register machine provides two kinds of storage locations for values: a small number of registers with fast access times and a large number of memory locations with slow access times. It typically has instructions for loading values into registers from memory, storing values from registers to memory, and performing operations whose arguments and results are in registers. The code generated by the Lifting stage of the Tortoise compiler resembles assembly code for a register machine except for its handling of variable names. Intuitively, each identifier in a FILlift program that is not a subroutine name can be viewed as an abstract register. Because fresh identifiers are introduced by many transformations, there is no bound on the number of abstract registers that a program may use. But any register machine executing the program employs a relatively small number of actual registers. The process of mapping the abstract registers of a program to the actual registers of a register machine is known as register allocation. Register allocation makes the storage locations represented by variable names explicit. Tortoise also uses registers to pass procedure arguments, so register allocation makes the argument-passing mechanism explicit. We will study a simple approach to register allocation in the context of transforming FILlift to FILreg , the target language of the Tortoise compiler. In Section 17.12.1, we describe FILreg and explain how to view it as the instruction set for a register machine. We then describe how to convert FILlift to FILreg in Sections 17.12.2–17.12.5.
17.12.1
The FILreg Language
FILreg (Figure 17.43) is a language that is designed to be viewed in two very different ways: 1. FILreg is basically a restricted subset of FILlift . A FILreg program can be executed like any other FILlift program. 2. FILreg is the instruction set for a simple register machine. This machine, FRM, is discussed in Section 18.2. Remarkably, FILreg programs have the same behavior whether we view them as FILlift programs or as register machine programs. This section summarizes
17.12.1 The FILreg Language
1099
the features of the syntax of FILreg and describes how to view FILreg programs and expressions in terms of the underlying register machine operations they are intended to represent. Later (Section 18.2) we will sketch how FILreg programs are executed on the FRM register machine. (A full description of FRM program execution can be found in the Web Supplement.) The general identifiers of FILlift have been replaced by a restricted domain Identreg containing only (1) subroutine names S (as in FILlift ) and (2) register names R (new in FILreg ). Each register name r0, r1, r2, . . . is the concatenation of the name r and a numeral for a natural number between 0 and nmax , where nmax + 1 is the number nreg of registers in the machine. For n ∈ Nat, we use rn to stand for rn. In FILreg , the formal parameter sequences of programs and abstractions and the operand sequences of applications must be prefixes RS of the register sequence [r0, r1, r2, . . . , rnmax ]. That is, abstractions and applications must have the following form: Number of params/rands 0 1 2 3 .. .
Abstraction
Application
(abs () E ) (abs (r0) E ) (abs (r0 r1) E ) (abs (r0 r1 r2) E ) .. .
(app I ) (app I r0) (app I r0 r1) (app I r0 r1 r2) .. .
These restricted forms represent a decision to pass program and procedure arguments in specific registers: the first argument is always passed in register r0, the second argument is always passed in register r1, etc. An abstraction definition (def S (abs (RS ) E )) represents the entry point to a subroutine; an application (app S RS ) represents a direct jump to the subroutine labeled S ; and an application of the form (app R RS ) is an indirect jump to the subroutine whose label (address) is stored in register R. From the register machine’s point of view, the formal parameter names and argument names are superfluous: The arguments are in the registers and both the caller and the callee know how many arguments there are. The names appear in the syntax so that we can continue to interpret our code from the FIL perspective as well. In FILreg , all if tests must be register names. (if R Ethen Eelse ) is thus an instruction that tests the content of register R and continues with the instructions in Ethen if R contains true and with the instructions in Eelse if R contains false. The FILreg expression (error Ymsg ) terminates program execution in an error state that includes the error message Ymsg . The new (halt NT R) ex-
1100
Chapter 17
Compilation
pression terminates program execution with a return code specified by NT ; for some return codes, the result of the program is the value in register R. This expression is used in the register-machine implementation of FILreg (see the Web Supplement for details). The FILreg expression (let ((Rdst LE )) E ) loads the value of LE into the destination register Rdst and then proceeds with the instructions in E . The nature of this load depends on the structure of the letable expression LE : • The case where LE is a literal corresponds to loading the literal value into Rdst . ∗ ) corresponds to • The case where LE is a primitive application (prim Oop Rsrc ∗ and code that performs an operation on the contents of the source registers Rsrc stores the result in the destination register Rdst . Note that the operand registers of primitive applications, unlike those of procedure applications, needn’t be a specific sequence, because register machines let you specify arbitrary registers for primitive operations.
• The case where LE is an application (prim copy Rsrc ) of the new primitive operator copy, which acts as an identity, represents code that copies the content of register Rsrc to the register Rdst . This cannot be accomplished by just having a register Rsrc as the letable expression, because the [copy-prop] rule will always eliminate a let expression of the form (let ((R1 R2 )) E ) by substituting R2 for R1 in E . • The case where LE is the new letable expression (addr S ) represents a load of a subroutine address into Rdst . This cannot be accomplished by just using a subroutine name S as the letable expression, because the [copy-prop] rule will always eliminate a let expression of the form (let ((R S )) E ) by substituting S for R in E . This case is slightly different from the copy case above: addr, which acts like the identity on subroutine names, cannot be a primitive operator, because all prim operands must be register names, and S is not a register name. Registers are used to store values that are needed later in the computation. Sometimes the number of values needed by the rest of the computation exceeds the number of registers. In this case, the extra values must be stored in the register machine’s memory, a process known as spilling. The spget and spset! primitives are used for spilling. They are explained in Section 17.12.5. To model the simplicity of register operations and facilitate spilling, all FILreg primitive operators take zero, one, or two arguments. The one primitive in pre-
17.12.1 The FILreg Language
1101
Kernel Grammar P ∈ Progreg ::= (fil (RS formals ) Ebody (def subri AB i )ni=0 ) AB ∈ Abstractionreg ::= (abs (RS formals ) Ebody ) E ∈ Expreg ::= (app Irator RS rands ) | (if Rtest Ethen Eelse ) | (let ((Rdst LE defn )) Ebody ) | (error Ymessage ) | (halt NT returnCode Rresult ) I ∈ Identreg ::= R | S ∗ LE ∈ LetableExpreg ::= L | (addr S ) | (prim Oprimop Rsrc ) L ∈ Lit = as in full FIL Y ∈ SymLit = as in full FIL O ∈ Primopreg ::= | | |
. . . FIL primops except mprod . . . copy ; register copy (mnew NT ) ; mutable tuple allocation (spget NT ) | (spset! NT ) ; spill get and set
NT ∈ NatLit = {0, 1, 2, . . .} R ∈ Reg = {r0, r1, . . . , rnmax } ; rn is shorthand for rn RS ∈ RegSeq = any prefix of [r0, r1, . . . , rnmax ] ; nmax + 1 = nreg S ∈ Subr = identifiers of the form subrn ; subrn is shorthand for subrn ; For n ∈ Nat, the notation I n stands for the identifier that ; results from concatenating the characters of the name I with ; the digit characters of the numeral in NatLit that denotes n. New Syntactic Sugar (@mnew NT ) ds (prim (mnew NT )) (@spget NT ) ds (prim (spget NT )) (@spset! NT R) ds (prim (spset! NT ) R) Figure 17.43 Grammar for FILreg , the result of the register allocation transformation and the language of FRM, the virtual register machine discussed in Section 18.2.
vious FIL dialects that took an arbitrary number of arguments — mprod — is replaced by a combination of the new primitive operator (mnew NT ) (which creates a mutable tuple with N [[NT ]] slots) and a sequence of mset! operations for filling the slots. For example, the FILlift expression (let ((Improd (@mprod S Iarg1 17))) Ebody )
can be expressed in FILreg as follows (where Rtemp , Rmprod , and Rarg1 are three distinct registers):
1102
Chapter 17
(let* ((Rmprod (@mnew 3)) (Rtemp (addr S )) (Rtemp (@mset! 1 Rmprod (Rtemp (@mset! 2 Rmprod (Rtemp 17) (Rtemp (@mset! 3 Rmprod ) {the translation of Ebody , Ebody
Compilation
Rtemp )) Rarg1 )) {assume Rarg1 corresponds to Iarg1 } Rtemp ))) in which Rmprod corresponds to Improd }
Because the operands of a primitive application must be register names, the integer literal 17 and the subroutine label (addr S ) must be stored in temporary registers before they can be used in applications of the primitive operator mset!.
17.12.2
A Register Allocation Algorithm
The Tortoise register allocation transformation RAprog has the following specification: Preconditions: The input to RAprog is a valid kernel FILlift program in which the only free identifiers of any abstraction are subroutine names. Postconditions: The output of RAprog is a valid kernel FILreg program in which the only free identifiers of any abstraction are subroutine names. Register allocation is largely the process of renaming fil-bound, abs-bound, and let-bound identifiers in FILlift to the register names r0, . . . , rnmax in FILreg . In Tortoise, register allocation must also ensure that the resulting program respects the other syntactic restrictions of FILreg by naming literals and subroutine names and expanding each mprod into mnew followed by a sequence of mset!s. Register allocation has been studied intensively, and it is the subject of many elegant and efficient algorithms. (The notes at the end of this chapter provide some references.) Tortoise uses a simple register allocation algorithm that is not particularly efficient but is easy to describe. The algorithm has three phases: 1. The expansion phase takes a FILlift program, ensures that all literals and subroutine names are bound to identifiers in a let before they are used, and converts instances of mprod into sequences of mnew and mset!s. The output is in a language called FILregId , a version of FILreg in which R ∈ Reg is redefined to be any nonsubroutine identifier and RS ∈ RegSeq is redefined to be any sequence of nonsubroutine identifiers.
17.12.2 A Register Allocation Algorithm
1103
Domains Proglift = as defined in Figure 17.41 ProgregId = programs in FILregId , a version of FILreg in which nonsubroutine identifiers are used in place of registers Progreg∞ = programs in FILreg∞ , a version of FILreg supporting an unbounded number of registers Progreg = as defined in Figure 17.43 Register Allocation Functions EX prog : Proglift → ProgregId ; described in Section 17.12.3 RC prog : ProgregId → Progreg∞ ; described in Section 17.12.4 SP prog : Progreg∞ → Progreg
; described in Section 17.12.5
RAprog : Proglift → Progreg RAprog [[P ]] = SP prog [[RC prog [[EX prog [[P ]]]]]] Figure 17.44 The Tortoise register allocation transformation RAprog is the composition of the expansion transformation EX prog , the register conversion transformation RC prog , and the spilling transformation SP prog .
2. The register conversion phase takes a FILregId program, renames all nonsubroutine identifiers to be register names, and ensures that all formal parameter sequences and operand sequences of procedure applications are prefixes of [r0, r1, r2, . . . ]. It introduces appropriate register moves (via the copy primitive) to satisfy this requirement. The output is in a language called FILreg∞ , a version of FILreg in which R ∈ Reg is redefined to include an unbounded number of register names of the form rn . This phase greedily reuses register names in an attempt to reduce the number of registers needed by the program, but that number may still exceed the fixed number nreg of registers provided by the register machine. 3. The spilling phase guarantees that only nreg registers are used in the final code by moving the contents of some registers to memory if necessary. Figure 17.44 shows how these three phases are composed to implement the register allocation function RAprog . In the following three sections, we sketch each of these phases by providing an English description of how they work along with some examples. The formal details of each phase are fleshed out in the Web Supplement.
1104
17.12.3
Chapter 17
Compilation
The Expansion Phase
The expansion phase of the Tortoise register allocator converts FILlift programs to FILregId programs by performing two transformations: 1. It introduces let-bound names for all literals and subroutine names that appear in if tests and in the operands of procedure and primitive applications. 2. It expands each primitive application of mprod into a primitive application of mnew to allocate the mutable tuple followed by a sequence of primitive applications of mset! to fill the slots of the new tuple. Figure 17.45 illustrates the expansion phase on the body of the revmap program after the Lifting stage. Both mprods in the input are expanded to the mnew/mset! idiom, and new lets are introduced to name the literals and subroutine names in the input.
17.12.4
The Register Conversion Phase
The register conversion phase of the Tortoise register allocator converts FILregId programs to FILreg∞ programs by performing three transformations: 1. It converts every formal parameter sequence I ni=0 of the program or its abstractions to an ordered register sequence r ni=0 . 2. It renames every let-bound name to a register name. 3. It guarantees that the operand sequence I ni=0 of every app expression is an ordered register sequence r ni=0 . We will illustrate each of these transformations in the context of registerconverting the following abstraction: AB 0 = (abs (clo.7 x.8 k.9) (let* ((t.10 (@mget 2 clo.7)) (t.11 (@mget 3 clo.7)) (t.12 (@* x.8 x.8)) (t.13 (@* t.11 t.12)) (t.14 (@+ x.8 t.12)) (code.15 (@mget 1 t.10))) (app code.15 t.10 t.14 t.13 t.11 k.9)))
The first transformation renames the formal parameters clo.7, x.8, and k.9 to r0, r1, and r2, respectively:
17.12.4 The Register Conversion Phase
1105
Body of Lifted revmap Before Expansion Phase (let* ((abs.10 (@mprod subr0)) (abs.11 (@mprod subr3 b.1)) (t.14 (@* a.0 7)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) Body of revmap After Expansion Phase (let* ((abs.10 (@mnew 1)) (t.79 (addr subr0)) (t.78 (@mset! 1 abs.10 t.79)) (abs.11 (@mnew 2)) (t.82 (addr subr3)) (t.80 (@mset! 1 abs.11 t.82)) (t.81 (@mset! 2 abs.11 b.1)) (t.83 7) (t.14 (@* a.0 t.83)) (t.15 (@null)) (t.13 (@cons t.14 t.15)) (t.12 (@cons a.0 t.13)) (code.62 (@mget 1 abs.10))) (app code.62 abs.10 abs.11 t.12 ktop.9)) Figure 17.45 program.
Illustration of the expansion phase on the body of the lifted revmap
AB 1 = (abs (r0 r1 r2) (let* ((t.10 (@mget 2 r0)) (t.11 (@mget 3 r0)) (t.12 (@* r1 r1)) (t.13 (@* t.11 t.12)) (t.14 (@+ r1 t.12)) (code.15 (@mget 1 t.10))) (app code.15 t.10 t.14 t.13 t.11 r2)))
We assume that there are enough registers to handle the longest formal parameter sequence. The later spilling phase will handle the case where this assumption is false. The second transformation renames each identifier I declared in a let expression to a register name R that does not appear free in the body of the let
1106
Chapter 17
Compilation
expression. Although it would be safe to use any nonfree register name, the algorithm chooses the “least” one according to the order ri ≤ rj if and only if i ≤ j. This greedy strategy attempts to reduce register usage by reusing low-numbered registers whose values are no longer needed. For example, renaming let-bound identifiers transforms AB 1 to AB 2 = (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) {r0,r1,r2 used later, so use r3 for t.10} (r0 (@mget 3 r0)) {r0=clo.7 not used later, so reuse r0 for t.11} (r4 (@* r1 r1)) {r0–r3 used later, so use r4 for t.12} (r5 (@* r0 r4)) {r0–r4 used later, so use r5 for t.13} (r1 (@+ r1 r4)) {r1=x.8 not used later, so reuse r1 for t.14} (r4 (@mget 1 r3))) {r4=t.12 not used later, so reuse r4 for code.15} (app r4 r3 r1 r5 r0 r2)))
Note how r0, r1 and r4 are reused when they are no longer mentioned in the rest of the computation. After the first two transformations are performed, the program satisfies the grammar of FILreg∞ except for app expressions (app Irator R n−1 i=0 ). Although the first two transformations guarantee that all app operands are registers, they are not necessarily the sequence r n−1 i=0 required by the FILreg∞ grammar. This form can be achieved by a register shuffling process that uses a sequence of copy applications to move the contents of the registers in the source operand sequence n−1 R n−1 i=0 to the corresponding registers in the destination operand sequence r i=0 . For example, (app subr5 r2 r0) can be transformed to (let* ((r1 (@copy r0)) (r0 (@copy r2))) (app subr5 r0 r1))
A simple but very inefficient implementation of shuffling would copy the first n−1 {r } ∪ ∪ n operands to n fresh registers not mentioned in ∪n−1 i i=0 i=0 {Ri } ∪ {Irator }, and then copy the operands from the fresh registers to r n−1 i=0 . This is expensive both in the number of additional registers used (n) and the number of copy operations performed (2n). Using more registers also increases the need for spilling. We now sketch a register shuffling algorithm that uses at most two registers in addition to the ones already mentioned in the source and destination register sets. (See the Web Supplement for full details.) The register shuffling algorithm begins by testing whether the operator of the app is a register in the difference of the destination and source register sets. If so, it must be renamed to a name not
17.12.4 The Register Conversion Phase
1107
in the union of these sets to avoid blocking later copy operations. This is the first additional register that the algorithm may use. For example, in the application (app r4 r3 r1 r5 r0 r2) of AB 2 , the operator r4 is a destination register not mentioned in the source registers, and so is renamed to the least register not appearing in either set (r6): (let ((r6 (@copy r4))) (app r6 r3 r1 r5 r0 r2))
The rest of the register shuffling algorithm transforms a FILregId application Eapp = (app Irator R n−1 i=0 ) to a FILreg∞ expression of the form (let* ((Rdst (@copy Rsrc ))kj=1 ) j j n−1 (app Irator r i=0 ))
that has the same meaning as Eapp . This transformation is guided by a register dependence graph (RDG) that keeps track of the copy operations that still need to be performed. An RDG is a set of edges, where each edge is a pair of register names, written Rdst Rsrc , that associates a destination register Rdst with a source register Rsrc . Such an edge indicates that the value in Rsrc must be moved into Rdst , and so corresponds to the let binding (Rdst (@copy Rsrc )). The direction of the arrow indicates that the final content of the Rdst depends on the current content of Rsrc . For the application (app r6 r3 r1 r5 r0 r2), this graph can be depicted as r4 r2 r5 r0 r3
There is no edge involving r1 because it is already in the correct position. There are two connected components in this graph: the acyclic component involving r4, r2, and r5, and the cyclic component involving r0 and r3. The copy associated with the edge rdst rsrc can be performed only if the destination register rdst will not be the source of a later copy operation — i.e., only if there is no edge of the form R rdst in the RDG. Another way of phrasing this condition is that the number of directed edges going into vertex rdst (its in-degree) must be 0. We will call an edge rdst rsrc a root edge of an RDG if the in-degree of rdst is 0. A root edge appears as an initial edge of an acyclic component of the RDG. The fundamental strategy of the register shuffling algorithm is to find a root edge EG root = rdst rsrc in the RDG (there may be more than one) and perform its corresponding copy operation via the let binding (rdst (@copy rsrc )). The shuffling process then continues after removing EG root from the RDG be-
1108
Chapter 17
Compilation
cause rdst now contains its final value. For example, processing the first two root edges in the RDG r4 r2 r5 yields
r0 r3 for (app r6 r3 r1 r5 r0 r2)
(let* ((r4 (@copy r2)) {move r2 to r4 in (app r6 r3 r1 r5 r0 r2)} (r2 (@copy r5))) {move r5 to r2 in (app r6 r3 r1 r5 r0 r4)} (app r6 r3 r1 r2 r0 r4))
When processing root edge rdst rsrc , if the operator is named rsrc , it is necessary to rename it to rdst . This does not happen in our example. The RDG for the residual application (app r6 r3 r1 r2 r0 r4) is the cyclic graph r0 r3, which contains no root edge. To handle this situation, a temporary register Rtemp is used to break one of the cycles, converting it to an acyclic component. An arbitrary edge EG arb = rdst rsrc is chosen from one of the cyclic components, and the content of rsrc is stored in Rtemp by the let binding (Rtemp (@copy rsrc )). Replacing EG arb in the RDG by rdst Rtemp yields an acyclic component rsrc . . . rdst Rtemp that allows the root-edgefinding strategy of the algorithm to proceed. The temporary register Rtemp can be the least register that is different from Irator and is not a member of the final destination registers. In the case where an RDG contains multiple cyclic components, a single temporary register can be used to break all of the components. This is the second of the two registers that may be required to perform the register shuffling. (The first additional register was used for potentially renaming the operator of an application.) In our example, suppose the edge r0 r3 is chosen to break the cycle r0 r3. Since r0 through r4 are reserved for the final operands and the operator is r6, r5 is chosen as the temporary register. The residual application (app r6 r3 r1 r2 r0 r4) is now transformed to (let* ((r5 (@copy r3))) {break cycle in (app r6 r3 r1 r2 r0 r4) with r5} (app r6 r5 r1 r2 r0 r4))
where the new RDG r3 r0 r5 consists of a single acyclic component. Processing the remaining two edges leads to two more let bindings: (let* ((r3 (@copy r0)) {move r0 to r3 in (app r6 r5 r1 r2 r0 r4)} (r0 (@copy r5))) {move r5 to r0 in (app r6 r5 r1 r2 r3 r4)} (app r6 r0 r1 r2 r3 r4))
17.12.4 The Register Conversion Phase
1109
So the final abstraction AB 3 that results from applying our register shuffling algorithm to AB 2 is: AB 3 = (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) (r0 (@mget 3 r0)) (r4 (@* r1 r1)) (r5 (@* r0 r4)) (r1 (@+ r1 r4)) (r4 (@mget 1 r3)) (r6 (@copy r4)) (r4 (@copy r2)) (r2 (@copy r5)) (r5 (@copy r3)) (r3 (@copy r0)) (r0 (@copy r5))) (app r6 r0 r1 r2 r3 r4)))
Applying the expansion and register conversion phases to the revmap example yields the FILreg∞ program in Figure 17.46. Such a program is clearly very close to register machine language; it leaves very little to the imagination! The program uses eight registers (r0 through r7), so no spilling is required as long as the number of machine registers nreg is at least eight. Although our algorithm is simple and tends to use a small number of registers, it can use more registers and/or perform more copy operations than necessary. For example, here is a register-converted version of AB 0 that uses only five registers and one copy operation: (abs (r0 r1 r2) (let* ((r4 (@copy r2)) {moving r2 to r4 right away frees up r2.} (r3 (@mget 3 r0)) (r0 (@mget 2 r0)) {this @mget moved later so r0 free for result.} (r5 (@* r1 r1)) (r2 (@* r3 r5)) (r1 (@+ r1 r5)) (r5 (@mget 1 r0))) (app r5 r0 r1 r2 r3 r4)))
This version avoids many copy operations by (1) storing results in registers chosen according to their operand position in the app expression and (2) reordering the (@mget 2 r0) and (@mget 3 r0) bindings so that the result of (@mget 2 r0) can be stored directly in r0. Code using fewer registers or register moves (i.e., copy operations) than our algorithm can be obtained with other register allocation algorithms from the
1110
Chapter 17
(fil (r0 r1 r2) (let* ((r3 (@mnew 1)) (r4 (addr subr0)) (r4 (@mset! 1 r3 r4)) (r4 (@mnew 2)) (r5 (addr subr3)) (r5 (@mset! 1 r4 r5)) (r1 (@mset! 2 r4 r1)) (r1 7) (r1 (@* r0 r1)) (r5 (@null)) (r1 (@cons r1 r5)) (r0 (@cons r0 r1)) (r1 (@mget 1 r3)) (r5 (@copy r1)) (r1 (@copy r4)) (r4 (@copy r3)) (r3 (@copy r2)) (r2 (@copy r0)) (r0 (@copy r4))) (app r5 r0 r1 r2 r3)) (def subr0 (abs (r0 r1 r2 r3) (let* ((r0 (@null)) (r4 (@mnew 1)) (r0 (@mset! 1 r4 r0)) (r0 (@mnew 1)) (r5 #u) (r5 (@mset! 1 r0 r5)) (r5 (@mnew 4)) (r6 (addr subr1)) (r6 (@mset! 1 r5 r6)) (r4 (@mset! 2 r5 r4)) (r4 (@mset! 3 r5 r0)) (r1 (@mset! 4 r5 r1)) (r1 (@mset! 1 r0 r5)) (r0 (@mget 1 r0)) (r1 (@mget 1 r0)) (r4 (@copy r1)) (r1 (@copy r2)) (r2 (@copy r3))) (app r4 r0 r1 r2)))) {continued in right column}
Figure 17.46
Compilation
{continued from left column} (def subr1 (abs (r0 r1 r2) (let* ((r3 (@mget 2 r0)) (r4 (@mget 3 r0)) (r0 (@mget 4 r0)) (r5 (@null? r1))) (if r5 (let* ((r0 (@mget 1 r3)) (r1 (@mget 1 r2)) (r3 (@copy r1)) (r1 (@copy r0)) (r0 (@copy r2))) (app r3 r0 r1)) (let* ((r5 (@car r1)) (r6 (@mnew 5)) (r7 (addr subr2)) (r7 (@mset! 1 r6 r7)) (r3 (@mset! 2 r6 r3)) (r3 (@mset! 3 r6 r4)) (r1 (@mset! 4 r6 r1)) (r1 (@mset! 5 r6 r2)) (r1 (@mget 1 r0)) (r3 (@copy r1)) (r1 (@copy r5)) (r2 (@copy r6))) (app r3 r0 r1 r2)))))) (def subr2 (abs (r0 r1) (let* ((r2 (@mget 2 r0)) (r3 (@mget 3 r0)) (r4 (@mget 4 r0)) (r0 (@mget 5 r0)) (r5 (@mget 1 r2)) (r1 (@cons r1 r5)) (r1 (@mset! 1 r2 r1)) (r1 (@mget 1 r3)) (r2 (@cdr r4)) (r3 (@mget 1 r1)) (r4 (@copy r1)) (r1 (@copy r2)) (r2 (@copy r0)) (r0 (@copy r4))) (app r3 r0 r1 r2)))) (def subr3 (abs (r0 r1 r2) (let* ((r0 (@mget 2 r0)) (r0 (@> r1 r0)) (r1 (@mget 1 r2)) (r3 (@copy r1)) (r1 (@copy r0)) (r0 (@copy r2))) (app r3 r0 r1)))))
revmap program after expansion and register conversion.
17.12.4 The Register Conversion Phase
1111
literature. Many of these are based on a classic register-coloring algorithm that uses registers to “color” an interference graph whose vertices are abstract register names and whose edges connect vertices that cannot be the same actual register [CAC+ 81, Cha82]. These algorithms can be adapted to pass procedure arguments in registers, as required by our approach. The assumption that all n-argument procedures take their arguments in regn−1 isters r i=0 simplifies our algorithm, but is too restrictive. Algorithms for interprocedural register allocation (e.g., [BWD95]) can reduce the number of copy operations by using different argument registers for different procedures. For example, before register shuffling is performed, the top-level call to the revmap procedure is transformed to (app r1 r3 r4 r0 r2). Since this is the only call to revmap in the program, the register shuffling operations that transform the operand sequence [r3, r4, r0, r2] to [r0, r1, r2, r3] can be eliminated if the subroutine corresponding to the revmap procedure (subr0) is simply modified to expect its arguments in the unshuffled registers: (def subr0 (abs (r3 r4 r0 r2) . . . ))
Of course, in order to use specialized argument registers for a particular procedure, the compiler must have access to its definition and all its calls. Exercise 17.38 a. Write a six-operand application (app Irator R 5i=0 ) whose RDG has two cyclic components, one acyclic component, and one vertex with in-degree 2. b. Show the result of using the register shuffling algorithm described in the text on your example. Exercise 17.39 For an application with n operands, what is the number of copy operations needed by the register shuffling algorithm described in the text in the best case? In the worst case? Write a six-operand application (app Irator R 5i=0 ) that requires the worst-case number of copies. Exercise 17.40 a. Consider the following abstraction AB a : (abs (clo.0 a.1 b.2 k.3) (let* ((t.4 (@mget 2 clo.0)) (t.5 (@mget 3 clo.0)) (t.6 (@- a.1 t.4)) (t.7 (@/ b.2 t.4)) (code.8 (@mget 1 t.5))) (app code.8 t.5 t.6 t.7 k.3)))
What is the result of register-converting this abstraction using the algorithm described in the text?
1112
Chapter 17
Compilation
b. Consider the abstraction AB b obtained from AB a by changing the application expression to (app code.8 t.5 t.4 t.6 t.7 k.3) {new argument t.4 added before t.6}
What is the result of register-converting AB b ? c. Consider the abstraction AB c obtained from AB b by changing the application expression to (app code.8 t.5 t.4 t.4 t.6 t.7 k.3) {second t.4 added before t.6}
What is the result of register-converting AB c ? d. The results of part b and part c use more registers and copy operations than necessary. Show this by register-converting AB b and AB c to FILreg∞ abstractions by hand to use both the minimal number of registers and the minimal number of copy operations. You may reorder let bindings and interleave copy bindings with the existing let bindings as long as you do not change the meaning of the abstractions.
17.12.5
The Spilling Phase
A FILreg∞ register is live if its current value may be accessed from the register later in the computation. When the number of live FILreg∞ registers exceeds the number nreg of registers in the machine, some of the register values must be stored elsewhere in memory. The process of moving values that would otherwise be stored in registers to memory is called spilling. In the Tortoise compiler, we use the name spill memory for the area of memory used to store spilled values. We treat spill memory like a zero-indexed mutable array of slots manipulated via two FILreg primitive operations: • (prim (spset! NT ) R) stores the content of R in the slot at index N [[NT ]] in spill memory and returns #u. This is abbreviated (@spset! NT R). • (prim (spget NT )) returns the value stored in the slot at index N [[NT ]] in spill memory. This is abbreviated (@spget NT ). Tortoise uses a simple spilling algorithm that assumes nreg ≥ 2. Given a FILreg∞ program P , the algorithm first determines the largest register rtop used in P . If top < nreg , then P is already a FILreg program, so it is returned. But if top ≥ nreg , then all references to registers of the form ri such that i ≥ nreg must be eliminated to convert the program to FILreg . This is accomplished by dedicating the top two registers, rsp = r(nreg −2 ) and r(sp+1 ) = r(nreg −1 ) , to the spilling process and storing the content of every register rj as follows: • If j < sp, the content of rj continues to be stored in register rj .
17.12.5 The Spilling Phase
1113
• If j ≥ sp, the content of rj is stored in slot (j − sp) of spill memory. In this case we say that rj is a spilled register. We assume that spill memory is large enough to hold the values of all spilled registers. The spilling phase performs the following spill conversion transformations on the FILreg∞ program, which are illustrated in Figure 17.47. • The program formal parameter sequence, all abstraction formal parameter sequences, and all application operand sequences are truncated to contain no register larger than r(sp−1 ) . This is because we pass the first sp arguments in registers and any arguments beyond this in spill memory. We assume that the program-invoking mechanism of the operating system “knows” that any arguments beyond those mentioned in the program formal parameters must be passed in spill memory. • A let expression (let ((Rdst LE )) Ebody ) in which rdst is a spilled register is converted to (let* ((rsp LE ) (r(sp+1 ) (@spset! dst − sp rsp ))) ) Ebody where LE is the spill-converted version of LE , Ebody is the spill-converted version of Ebody , and dst − sp is a natural number literal NT such that N [[NT ]] = (dst − sp). This takes the value that would have been stored in rdst and instead (1) stores it in the dedicated register rsp and (2) uses spset! to move it from rsp to spill memory at index (dst − sp). Storing the unit value resulting from spset! in r(sp+1 ) rather than in rsp allows the value in rsp to be used later in improved versions of the spilling algorithm.
• Any reference to a spilled register rsrc that appears as a conditional test, as an operator of a procedure application, or as the first argument of a primitive application is converted to a reference to rsp in a context where rsp is let-bound to (@spget src − sp). This takes the value that would have been retrieved directly from rsrc , and instead (1) uses spget to retrieve it from spill memory at index (src − sp), (2) stores it in the dedicated register rsp , and (3) retrieves it from rsp . Similarly, any reference to a spilled register rsrc that appears as the second argument of a primitive application is converted to a reference to r(sp+1 ) in a context where r(sp+1 ) is let-bound to (@spget src − sp). A spilled register in the second argument position is stored in a different register than one in the first position to handle the case where both argument registers are spilled registers.
1114
Chapter 17
Compilation
In the spilling example in Figure 17.47, where sp = 2, the formal parameter registers r2, r3, and r4 of the abstraction are stored in spill memory and are accessed via (@spget 0), (@spget 1), and (@spget 2), respectively. The global spill conversion transformation guarantees that any invocation of this (or any other) five-parameter subroutine will use spset! to store the third, fourth, and fifth operands in spill memory locations 0, 1, and 2 before control is passed to the subroutine. The example illustrates this parameter spilling for subroutine calls in the application of the six-parameter subroutine stored in r1. The converted code uses spset! to store the value of parameters r2 and r5 in spill memory locations 0 and 3. No explicit spset!s are needed for spilling r3 and r4 to locations 1 and 2 because these values were already placed in spill memory by the caller of the converted abstraction and are not changed in its body. Our simple spilling algorithm can generate code with some obvious inefficiencies. For example, if sp = 2, it transforms (let* ((r2 (@* r4 r4)) (r3 (@< r1 r2))) (if r3 (app r1 r0) (error wrong)))
to (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r3 (@spget 2)) {move content of spilled r4 into r3} (r2 (@* r2 r3)) {calculate spilled r4 times spilled r4} (r3 (@spset! 0 r2)) {store content of spilled r2 into memory} (r2 (@spget 0)) {move content of spilled r2 into r2} (r2 (@< r1 r2)) {calculate r1 less than spilled r2} (r3 (@spset! 1 r2)) {move content of spilled r3 into memory} (r2 (@spget 1))) {move content of spilled r3 into r2} (if r2 (app r1 r0) (error wrong)))
when the following much simpler code would work: (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r2 (@* r2 r2)) {use r2 for both args and for result; no need to} { spill r2 to memory since only used in next binding} (r3 (@< r1 r2))){use r2 directly and store result directly in r3; no need} { to spill r3 to memory since only used in if test} (if r3 (app r1 r0) (error wrong))) {use r3 directly}
The Web Supplement explores these inefficiencies and how they can be eliminated. Some of the simplifications can be made by a peephole optimization
17.12.5 The Spilling Phase
1115
Abstraction before Spilling (abs (r0 r1 r2 r3 r4) (let* ((r5 (@< r4 r2)) (r0 (@+ r0 r4))) (if r5 (app r3 r0 r1 r2) (let ((r2 (@* r0 r0))) (app r1 r0 r1 r2 r3 r4 r5)))) Abstraction after Spilling (where sp = 2) (abs (r0 r1) {truncate formal parameters} (let* ((r2 (@spget 2)) {move content of spilled r4 into r2} (r3 (@spget 0)) {move content of spilled r2 into r3} (r2 (@< r2 r3)) {calculate spilled r4 less than spilled r2} (r3 (@spset! 3 r2)) {store content of spilled r5 into memory} (r3 (@spget 2)) {move content of spilled r4 into r3} (r0 (@+ r0 r3)) {use r3 for spilled r4} (r2 (@spget 3))) {move content of spilled r5 into r2} (if r2 {use r2 for spilled r5} (let ((r2 (@spget 1))) {move content of spilled r3 into r2} (app r2 r0 r1)) {use r2 for spilled r3 and truncate operands} (let* ((r2 (@* r0 r0)) {calculate content of spilled r2} (r3 (@spset! 0 r2))) {store content of spilled r2 into memory} (app r1 r0 r1))))) {truncate operands} Figure 17.47
A spilling example.
phase that performs local transformations on the result of the spilling phase. Other improvements require modifying the spilling algorithm itself. Any approach to spilling based purely on an index threshold is rather crude.12 It would be better to estimate the frequency of register usage and spill the less frequently used registers. 12
But index thresholds for spilling have an interesting precedent. All machines in the IBM 360 line executed uniform machine code assuming the same number of virtual registers. Since hardware registers were expensive, cheaper machines in the line used a small number of hardware registers for the low-numbered virtual registers and main memory locations for high-numbered virtual registers. These machines employed a threshold-based spilling mechanism implemented in hardware!
1116
Chapter 17
Compilation
Notes The literature on traditional compiler technology is vast. A classic text is the “Dragon book” [ASU86]. More modern treatments are provided by Cooper and Torczon [CT03] and by Appel’s textbooks [App98b, App98a, AP02]. Comprehensive coverage of advanced compilation topics, especially optimizations, can be found in Muchnick’s text [Muc97]. Inlining is a particularly important but subtle optimization — see especially [CHT91, ASG97, DC00, JM02]. Issues in functional-language compilation are considered by Peyton Jones in [Pey87]. Compiling programs via transformations on an intermediate, lambda calculusbased language was pioneered in the Scheme community through a series of compilers that started with Steele’s Rabbit [Ste78] and was followed by many others [Roz84, KKR+ 86, Cli84, KH89, FL92, CH94]. An extreme version of this idea is the nanopass compiler for Scheme, which is composed of fifty simple transformation stages [SWD04]. The idea (embodied in FILreg ) that the final intermediatelanguage program can also be interpreted directly as a register-machine program is due to Kelsey [Kel89, KH89]. He showed that realistic compiler features like register allocation, instruction selection, and stack-based allocation could be modeled in such a framework and demonstrated that the transformational technique was viable for compiling traditional languages like Pascal and Basic. The next major innovation along these lines was developing transformationoriented compilers based on explicitly typed intermediate languages (e.g., [Mor95, TMC+ 96, Pey96, PM97, Sha97, BKR98, TO98, MWCG99, FKR+ 00, CJW00, DWM+ 01]. The type information guides program analyses and transformations, supports run-time operations such as garbage collection, and is an important debugging aid in the compiler development process. In [TMC+ 96], Tarditi and others explored how to express classical optimizations within a typed intermediate language framework. In some compilers (e.g., [MWCG99]) type information is carried all the way through to a typed assembly language, where types can be used to verify certain safety properties of the code. The notion that untrusted low-level code should carry information that allows safety properties to be verified is the main idea behind proof-carrying code [NL98, AF00]. Early transformation-based compilers typically included a stage converting the program to CPS form. The view that procedure calls can be viewed as jumps that pass arguments was championed by Steele, who observed that a stack discipline in compilation is not implied by the procedure-call mechanism but rather by the evaluation of nested subexpressions [SS76, Ste77].
Notes for Chapter 17
1117
The Tortoise MCPS transformation is based on a study of CPS conversion by Danvy and Filinski [DF92]. They distinguish so-called static continuations (what we call “metacontinuations”) from dynamic continuations and used these notions to derive an efficient form of CPS conversion from the simple but inefficient definition. Appel studied the use of continuations for compiler optimizations in [App92]. In [FSDF93], Flanagan et al. argued that explicit CPS form was not necessary for such optimizations. They showed that transformations performed on CPS code could be expressed directly in a non-CPS form they called A-normal form. Although modern transformation-based compilers tend to use something like A-normal form, we adopted a CPS form in the Tortoise compiler. It is an important illustration of the theme of making implicit structures explicit, and it simplifies the compilation of complex control constructs like nonlocal exits, exceptions, and backtracking. The observation that these constructs use continuations in a nonlinear way is discussed in [Baw93]. Closure conversion is an important stage in a transformation-based compiler. Johnsson’s lambda-lifting transformation [Joh85] lifts abstractions to top level after they have been extended with initial parameters for free variables. It uses curried functions that are partially applied to these initial parameters to represent closures. This closure representation is standard in compilers for combinator reduction machines [Hug82, Pey87]. The Tortoise lifting stage also lifts closed abstractions to top level, but uses a different representation for closures: the closure-passing style invented by Appel and Jim in [AJ88]. Defunctionalization (a notion due to Reynolds [Rey72]) has been used as the basis for closure conversion in some ML compilers [TO98, CJW00]. Selective and lightweight closure conversion were studied by Steckler and Wand [WS94, SW97]. The notion of representation pollution was studied by Dimock et al. [DWM+ 01] in a compiler that chooses the representation of a closure depending on how it is used in a program. Sophisticated closure conversion systems rely on a control flow analysis to determine how procedures are used in a program. In [NNH98], Nielson, Nielson, and Hankin provide excellent coverage of control flow analysis and other program analyses. [BCT94] summarizes work on register allocation and spilling. The classic approach to register allocation and spilling involves graph-coloring algorithms [CAC+ 81, Cha82]. See [BWD95] for one approach to managing registers across procedure calls.
18 Garbage Collection Be you the mean hombre that’s a-hankerin’ for a heap of trouble, stranger? Well, be ya? — Yosemite Sam, in “Hare Trigger”
18.1
Why Garbage Collection?
Programming without some form of automatic memory management is dangerous and may lead to run-time type errors. Here is why: A programmer forced to manage memory manually may inadvertently release a block of memory for reuse, yet retain the ability to access the block by holding on to a pointer to the block (a so-called dangling pointer). When this memory block is reused by the system, the value in it will be accessible by two independent pointers, perhaps even under two independent types (the type expected by the logically invalid pointer and that expected by the new pointer). Modifying the value via one pointer will unexpectedly cause the value accessible via the other to be changed as well, leading to insidious bugs that are notoriously difficult to catch. The program is incorrect, and in some cases, type safety can be lost!1 Thus a critical run-time service in type-safe programming language implementations is the safe allocation and deallocation of memory for compound values such as tuples, arrays, lists, and oneofs. Such values are stored in units called blocks in a region of memory called the heap. As described in Chapter 17, the Tortoise complier generates FILreg code for a simple register machine that uses the primitive operator mnew to allocate a mutable product value and the primitives mget and mset! to manipulate the contents of such products. The job of this chapter is to demonstrate how to implement primitives like these. This chapter describes a safe storage management system based on a technique for automatic heap deallocation called garbage collection. The implementa1
The same problem arises in languages that do not do array bounds checking, a deficiency exploited by countless security attacks.
1120
Chapter 18 Garbage Collection
tions of almost all type-safe languages (e.g., Java, C#, Lisp, SmallTalk, ML, Haskell) use garbage collection. In a system with manual heap deallocation, where programmers must explicitly declare when heap blocks may be reused, it is possible for a sophisticated type checker that models the state of memory to guarantee that there are no dangling pointers [ZX05]. But any such type system necessarily limits expressiveness by rejecting some programs that are actually safe. In contrast, garbage collection guarantees value integrity and type safety without limiting expressiveness. Garbage collection (GC) is a process that identifies memory blocks that will not be used again and makes their storage available for reuse. A heap block is live in a program state if it will be accessed later in the program execution, and otherwise the block is dead. It is not in general possible to prove which blocks are live in a program state, and thus a garbage collector must identify and reuse only blocks that it can prove are dead. The engineering challenge is to design a garbage collector that efficiently preserves live memory blocks and a minimum of dead blocks. Garbage collection also reduces memory leaks that arise when a programmer does not deallocate dead blocks so they can be used for something else. Memory leaks can cause a program to abort with an out-of-memory error that could have been avoided if the dead blocks were reused. It is in fact common for long-running programs to crash because of slow memory leaks that exhaust available storage. Memory leaks are notoriously difficult to find and fix, especially in a large and complex program like an operating system. Garbage collectors can also exhibit memory leaks, but they are better equipped than human beings to reason about block liveness, and typically do a better job of efficiently reclaiming dead blocks. In manual deallocation systems, the programmer is caught between two dangers: Deallocating blocks too early creates dangling pointers, whereas deallocating them too late causes memory leaks. Yet it is often difficult, if not impossible, for the programmer to know when a heap-allocated data structure can no longer be accessed by the program. For example, consider a graphics application in which two-dimensional points are represented as pairs of x and y coordinates and lines are represented as pairs of points. A single point might be shared by many lines. When a line is deleted by the application, it may be safe to deallocate the pair associated with the line, but it is not safe to deallocate the pairs associated with the line’s endpoints, since these might be shared by other lines. Without explicitly tracking how many lines a point participates in, the programmer has no idea when to deallocate a point in such a system. In contrast, because a garbage collector “knows” the exact pointer wiring structure for heap blocks in memory, it can determine properties that are difficult for a programmer to keep track of,
18.1 Why Garbage Collection?
1121
such as how many references there are to a particular point. If the answer is 0, the point may be reclaimed. Manual deallocation also complicates the implementation of data structures and data abstractions. When a compound data structure becomes dead, many of its components become dead as well. The programmer must carefully free all dead components (often recursively) before freeing the storage for the compound structure itself. Manual deallocation complicates data abstractions because allocation and deallocation responsibilities must become part of the interface. Implementers of an abstraction must often provide a mechanism for deallocating abstract data structures. C++ provides this functionality for objects via a destructor function that is called whenever the storage for the object is deallocated. A destructor function typically deallocates storage for components of the object. But the problem is more complex still: Only the client of the data abstraction knows when abstract data values and many of the components used to create them are dead; but only the implementer knows the actual structure of abstract values, including their components, data-sharing properties, and invariants. Choreographing allocations and deallocations for even relatively simple and common abstractions, such as generic linked lists, can prove extremely complex and error-prone. In the mid-1990s garbage collection came into the mainstream when the implementers of the first widely adopted type-safe programming language, Java, chose to use garbage collection for their implementation of safe storage. Although garbage collection has a rich history in languages like Lisp and SmallTalk, until recently it was considered too inefficient to support in mainstream programming languages like C, Pascal, and Ada, which opted for manual storage deallocation instead. (In fact, the Ada specification allows implementations to perform garbage collection but does not require it: Programmers are effectively required to manually deallocate storage in the many implementations that do not support garbage collection.) Java’s type safety was sufficient inducement for programmers to accept a system that uses garbage collection. The remainder of this chapter explores garbage collection in the context of FRM, the FIL Register Machine that executes the FILreg code generated by the Tortoise compiler presented in the previous chapter. FRM allocates heap blocks for mutable tuples, list nodes, and symbols, and garbage collection will allow reuse of such blocks when it determines that they will never be accessed again. Section 18.2 presents the relevant details of FRM, especially how compound values are laid out in memory. Section 18.3 discusses approximations for block liveness. Section 18.4 lays out a complete design for an FRM garbage collector. Section 18.5 sketches some other approaches to garbage collection, including a conservative GC technique that can be used for languages that traditionally
1122
Chapter 18 Garbage Collection
rely on manual deallocation. Garbage collection is a dynamic approach to automatic heap deallocation; Section 18.6 briefly discusses some static approaches to automatic heap deallocation. To keep our discussions at a high level, we will explain implementation issues and algorithms using a combination of English and pictures. A complete metalanguage formalization of FRM and several heap management strategies can be found in the Web Supplement.
18.2
FRM: The FIL Register Machine
In order to explain heap management strategies for FRM, we first need to give an overview of the FRM architecture and explain how FRM values are represented.
18.2.1
The FRM Architecture
The fundamental unit of information is the binary digit, commonly called a bit. A bit can be one of two values, 0 or 1. Every FRM value is encoded as a single word, which is a fixed-size sequence of bits. A value that is too big to fit into a single word (such as a mutable tuple, nonempty list node, or symbol) is represented as a single address word (or pointer) that is the address of a block of words in the heap. Uniformly representing all values in a single-sized word datum greatly simplifies many aspects of the FRM implementation. For example, a word-sized register can hold any value, the ith component of any heap block can be stored in its ith word, and polymorphic functions require no special implementation techniques. Many practical language implementations use nonuniform value sizes (e.g., singleprecision floating point numbers requiring one word and double-precision floating point numbers requiring two words) for efficiency reasons, but they would create needless complexity here. The state of a program running on FRM has four components: 1. The current FILreg expression E being executed. As noted in Section 17.12.1, each FILreg expression can be viewed as a register machine instruction whose execution updates the state of the machine and specifies the next instruction. (See the Web Supplement for an SOS that specifies the action of each FILreg expression as an FRM instruction.) This component corresponds to the program counter, the address of the currently executing instruction, in traditional architectures. An FRM program executes until it terminates with a halt or error expression.
18.2.2 FRM Descriptors
1123
2. The subroutine memory S, where the definitions of all the program’s subroutines are stored. This is the code segment in a traditional architecture. Rather than worry about the address of the start of a subroutine in memory, we will simply refer to each subroutine by an integer index. That is, subroutine i stands for Ebody in the definition (def subrn (abs (RS ) Ebody )) in the FILreg program being executed. As observed on page 1099, FRM can ignore the register parameters in an abstraction, because the actual argument values are passed in the registers of the machine. 3. The register memory R, where the contents of the FRM registers are stored. As in FILreg , we assume that there are nreg registers. Each register holds one word. The notation R[n] stands for the word stored in register rn . 4. The heap memory H, where the contents of memory blocks are stored. We assume that the heap is a part of main memory, an array M of words indexed by addresses that are natural numbers. The notation M[naddr ] denotes the word stored at address naddr in M. Some of the main memory may be reserved for purposes other than the heap, such as a program’s subroutine and/or spill memory.2 We assume that the portion of main memory reserved for the heap uses indices in the range [0.. (nsize − 1)], where nsize is the number of words reserved for the heap.
18.2.2
FRM Descriptors
We have seen that all FRM values are single words, where some words are pointers to blocks of words in the heap. We will now explore how to represent words and blocks on a typical register machine. This allows us to discuss some of the low-level representation choices that are important in programming language implementations. A word is represented as an n-tuple of bits. We can define FRM for a machine of any word size, but for concreteness we shall assume that all words consist of 32 bits. Suppose B ranges over bits. Then we abbreviate the word tuple B1 ,B2 ,. . .,B31 ,B32 by the juxtaposition B1 B2 · · · B31 B32 of its bits. The notation B n represents n consecutive copies of B . For example, 020 101018 stands for the word that has 20 0s followed by 1010 followed by eight 1s. There are standard ways to represent natural numbers and signed integers using bits, and standard ways to perform arithmetic on these representations. For more information, consult the Web Supplement. 2
The FRM SOS in the Web Supplement shows how to handle spill memory, which we ignore here.
1124
Chapter 18 Garbage Collection
Each FRM value can be represented as a single 32-bit word, which we shall call its descriptor. A value is said to be unboxed when all information about the value fits into the descriptor, in which case its descriptor is said to be immediate. A value is said to be boxed when some of its information is stored in a heap block, in which case its descriptor contains the address of the block and is said to be nonimmediate. We assume that word addresses are specified by 30 bits. This is consistent with the word-alignment restrictions in many 32-bit architectures.3 Descriptors with Type Tags Each FRM value is encoded as a single word with an unambiguous representation. This unambiguous representation encodes both the type of the value and the value itself. Thus, we can examine the FRM value stored in a register and decode its type and value without additional information. Such explicit type information is necessary for descriptor representations in a dynamically typed language, where it is necessary to check the type of a value at run time. Such type information can also be helpful in a statically typed language, where it can be used by run-time processes (such as garbage collectors, debuggers, and value displayers) to parse memory into values. The left-hand column of Figure 18.1 shows the way we have chosen to encode type and value information in a descriptor. A descriptor is divided into a type tag — the lower-order bits that specify the type of the value — and the value representation — the remaining bits that distinguish values of a given type. In this particular representation, the lowest-order bit is 0 for immediate values and 1 for nonimmediate values. Since nonimmediate values have only 30 bits of address information, the next-to-last bit is arbitrary; we assume that it is 0. So all pointers have a 30-bit address followed by the type tag 01. For immediate values, the next-to-last bit distinguishes integers (0) from nonintegers (1). This leaves 30 bits of information to represent a signed integer. For simplicity, we will assume that FRM supports only 30-bit signed integers in this representation. It is possible to represent integers with more bits if we box them in a heap block.4 The third-to-last bit distinguishes subroutine indices (for which this bit is 0) from other values (for which this bit is 1). This leaves 29 bits available to express the subroutine index itself (as an unsigned integer). Two additional type bits are used to distinguish the remaining four types of immediate values: unit (00), null 3
In many 32-bit architectures, a 32-bit address word specifies a byte address. But data one word wide must be aligned to a word boundary, i.e., its address must have 00 as its lowermost bits. So the information content of a word address is limited to its upper 30 bits. 4 This technique can be used to represent arbitrary-sized integers, known as bignums.
18.2.2 FRM Descriptors
1125
Descriptor with type tags
Value
Descriptor with GC tags only
[30-bit signed integer]
00
integer
[31-bit signed integer]
0
[29-bit subroutine index]
010
subroutine
[31-bit subroutine index]
0
027
00110
unit
031
0
027
01110
null
031
0
026 0
10110
false
030 0
0
026 1
10110
true
030 1
0
019 [8-bit ASCII code]
11110
character
023 [8-bit ASCII code]
0
pointer
[30-bit address]
01
[30-bit address]
01
Figure 18.1 Two layouts for FRM descriptors: one with full type tags and one with garbage collection (GC) tags.
list (01), boolean (10), and character5 (11). The unit and null list types have only one value each, so the remaining 27 bits are arbitrary; we assume they are all 0. The boolean type has two values, which are distinguished by the 27th bit: 0 for false and 1 for true. In a character descriptor, the remaining 27 bits can be used to encode the particular character — e.g., as an 8-bit ASCII code or a 16-bit unicode representation. From the perspective of encoding words as bits, the placement and content of the type tags are arbitrary. For example, we could have put the type tags in the leftmost bits rather than the rightmost bits, and we could have made the integer type tag 01 and the pointer type tag 00. However, the particular choices made for type tags in Figure 18.1 have practical benefits on real architectures: • Using a rightmost tag of 00 for integers simplifies integer arithmetic. Each 30-bit signed integer i is represented by a word that denotes 4i . Addition, 5
Although FIL does not have character literals, FRM uses character values to represent symbols as boxed values whose components are characters.
1126
Chapter 18 Garbage Collection
subtraction, and remainder can be performed on these descriptors simply using the standard 32-bit arithmetic operations without using any other bit-level operations, because these operations preserve the rightmost 00 bits. Multiplication and division take slightly more work: one of the multiplication operands must be shifted right (arithmetically) by two bits to eliminate its 00 tag before performing the multiplication; and the result of division must be shifted left by two bits to add the 00 tag. Arithmetic would require more work if leftmost tags were used or a rightmost tag other than 00 were used (see Exercise 18.1). • Using a nonzero rightmost tag for pointers is efficient on most architectures via an offset addressing mode, which allows direct access to memory at a fixed offset (a small signed integer) from a 32-bit byte address stored in a register or memory location. An address with a 01 pointer tag can effectively be converted into a word-aligned byte address by specifying a −1 offset. Exercise 18.1 Assuming the following alternative placement and/or content of type tags, describe how to perform (1) integer arithmetic (+, −, ×, ÷, and %) and (2) accesses for memory addresses. a. Type tags are the rightmost two bits of a descriptor, 11 is the integer type tag, and 00 is the pointer type tag. b. Type tags are the leftmost two bits of a descriptor, 00 is the integer type tag, and 01 is the pointer type tag. c. Type tags are the leftmost two bits of a descriptor, 01 is the integer type tag, and 00 is the pointer type tag.
Descriptors with Garbage Collection (GC) Tags In the run-time system for a statically typed language, descriptors need not carry complete type information, because dynamic type checking is unnecessary. This is true for FILreg programs that are produced by the Tortoise compiler. However, it is still helpful for descriptors to carry information that aids other run-time services, like garbage collection. As we shall see later, a garbage collector needs to distinguish between pointers and nonpointers. This can be accomplished with a one-bit GC tag. The right-hand column of Figure 18.1 shows a descriptor layout that uses the low-order bit as the GC tag. A 0 indicates a nonpointer, a 1 indicates a pointer. Since pointers have only 30 bits of address information, we will use a two-bit 01 tag for pointers, reserving the 11 tag for for header words in heap blocks (see Section 18.2.3). The choice of the placement and values of the GC tags is guided by the same logic used for type tags. Using a rightmost 0 bit for immediate descriptors
18.2.3 FRM Blocks naddr
Encoding of nslots
naddr + 1 naddr + 2 .. . naddr + nslots Figure 18.2 naddr .
1127 (optional) type
11
; header word
W1
; content of slot 1
W2 .. . Wnslots
; content of slot 2
; content of slot nslots
The layout of a heap block with contents W1 , . . . , Wnslots at word address
simplifies integer arithmetic, and the 01 pointer tag can be processed at little or no cost by offset addressing. This layout yields an extra bit of integer precision. Note that because immediate descriptors do not include distinguishing type bits, many different values can have the same bit pattern. For example, the bit pattern 032 is used for the integer 0, the unit value, the null list, the boolean false, and the character whose ASCII code is 0. Tagless Descriptors Implementations can support garbage collection without using GC tags in descriptors. In these tag-free GC systems (Section 18.5.2), descriptors need not carry any type or GC tags, so all bits can be used for the value representation. For example, a 32-bit descriptor can encode a 32-bit integer or a 32-bit byte address. Tagless descriptors are essential in conservative GC systems (Section 18.5.3) for languages like C/C++ and Pascal, which cannot tolerate GC bits in their descriptors.
18.2.3
FRM Blocks
FRM blocks are allocated from the heap, an area of main memory that is indexed by 30-bit addresses. An FRM block is described by a single word FRM descriptor that includes its 30-bit address. This address naddr points to the header word of the FRM block, which is followed in memory by a variable number of slots, each of which can hold a single word (Figure 18.2). The header word at address naddr indicates the size of the block in words (excluding the header word itself) and possibly the type of the block. To aid in parsing the heap into blocks, header words have a two-bit tag of 11, which distinguishes them from immediate and nonimmediate descriptors (see Figure 18.1). This tag is not strictly necessary, but convenient. The words at addresses [(naddr + 1) .. (naddr + nslots )] are the descriptors for the contents of the nslots slots of the block.
1128
Chapter 18 Garbage Collection
Header with type and size
Header Type
Header with size only
[28-bit size]
0011
mutable tuple
[30-bit size]
11
026 10
0111
list node (size 2)
028 10
11
[28-bit size]
1011
symbol
[30-bit size]
11
[28-bit size]
1111
closure
[30-bit size]
11
Figure 18.3 Two layouts for FRM header words: one with size and type information and one with size information only.
For a statically typed language with garbage collection (the right-hand column of Figure 18.3), the type of every block is statically known, but the garbage collector needs to know the size of each block. The first 30 bits of the header are used to encode this size. For a dynamically typed language, the header may encode type information in addition to size information. The choices in the left-hand column of Figure 18.3 indicate that there are four types of FILreg values represented as blocks: mutable tuples (created by mnew), nonempty list nodes, symbols, and closures.6 The types of these four values can be distinguished by two type bits in addition to the 11 header tag. More type bits would be needed if FILreg were extended to support additional compound values, such as strings and arrays. For example, here are two heap block representations for the result of compiling and executing the FLARE/V expression (@pair #t 42): Block with type info 26
Block with GC info
0 10
0011
mutable tuple header
028 10
11
026 1
10110
true
030 1
0
42
025 101010
0
024 101010
00
In this block representation, accessing or changing slot i (1-indexed) of a block at 30-bit address naddr with nslots slots simply manipulates the location at address 6
Although closures are represented as mutable tuples in FILreg , it is helpful to distinguish the types of closure tuples from the types of other mutable tuples. In a compiler that maintains implicit type annotations for all expressions, closure types would be introduced by the closure conversion stage.
18.2.3 FRM Blocks
1129
naddr + i in main memory. In the case where the block size is not known at compile time (e.g., for an array of unknown size) it is first necessary to check at run time that 0 < i ≤ nslots , where nslots is determined from the header word at address naddr . Failure to pass this check leads to an out-of-bounds index error. But for data whose size is known statically (e.g., FIL’s mutable products), no dynamic index check is necessary. The simple heap-block layout depicted in Figure 18.2 does not make efficient use of space for heap-allocated products with a small, statically known number of components (in FILreg , list nodes and tuples/closures with a small number of components). Using a header word to encode the size (and possibly type) of these products has a high space overhead. One way to avoid the header word in these cases is to encode the size/type of the block in the pointer to the block rather than in the block itself. For example, reserving three right-hand bits of the pointer next to the 01 tag for this purpose would allow distinguishing eight size/type possibilities, one of which would indicate the standard block-with-header but the other seven of which would indicate headerless blocks. In addition to eliminating the header word for small blocks, this technique allows the type of a block in a dynamically typed system to be tested without a potentially expensive memory access. An obvious drawback of this approach is that the extra size/type bits reduce the range of the address space that can be expressed with the remaining address bits. Moreover, extra bit-diddling is required to turn these modified pointers into recognizable memory addresses. Size/type information can be encoded in the pointer without these drawbacks using the Big Bag of Pages (BIBOP) scheme. This is based on decomposing memory into pages by viewing the higher-order bits of a word address as a page address and the lower-order bits of a word address as a location within a particular page. In BIBOP, all blocks allocated on a page must have the same size/type information. In the simplest incarnation of BIBOP, each type of object is stored in its own single (large) page. In this case, the page address is the block type tag. It is also possible to decompose memory into many smaller pages and store the size/type information in a table indexed by the page address. BIBOP saves space by effectively using a single header per page rather than per block. There are other inefficiencies that can be addressed with clever block layouts. For example, the straightforward way to represent an n-character symbol or string as a block is to have a header word with size/type information followed by n character words. But using a 32-bit word to represent a single 8-bit ASCII character or 16-bit unicode character is wasteful of space. It is possible to employ packed representations in which 4 ASCII characters or 2 unicode characters are stored in a 32-bit word within a block.
1130
Chapter 18 Garbage Collection
Exercise 18.2 C. Hacker doesn’t like reserving a bit of every FRM descriptor for a GC tag because then it isn’t possible to have full 32-bit integers. Observing that every descriptor is either in the heap or in a register, he proposes an alternative way to store GC tags: • In a heap block, the header word is followed by one or more GC-tag words that precede the content words of the block and store the GC tags of these content words. For example, the ith bit (1-indexed) of the first GC-tag word is the GC tag of the ith content word, where 1 ≤ i ≤ 32; the ith bit of the second GC-tag word is the GC tag of the (32 + i)th content word; and so on. • For every 32 registers, a 32-bit GC-tag register is reserved to store the GC tags of the register contents. a. Describe the benefits and drawbacks of C. Hacker’s idea. b. If FRM were extended to include homogeneous arrays, what would be an efficient way to extend C. Hacker’s approach to store the GC tags for the array components?
18.3
A Block Is Dead if It Is Unreachable
A storage system may reuse any dead block. Recall that a heap block is live in an FRM state if it will be accessed later in the program execution and is dead if it will not be accessed later. Unfortunately, this property is uncomputable in general, since there is no way to prove whether a program will or will not use a pointer to an arbitrary memory block. Therefore, a garbage collector must approximate liveness by reusing only provably dead blocks. A sound garbage collector may classify a dead block as live, but not vice versa. The worst sound approximation is that all blocks are live, in which case GC degenerates to a simple heap manager that allocates blocks but never deallocates them, an approach viable for programs with small storage needs but not suitable for serious programming. Intuitively, a block is dead if it cannot be reached by following a chain of pointers from the data values currently accessible to the program. Since there is no way the program can access the block, it is provably dead. (This assumes that programs cannot generate new pointers themselves by, for example, performing arbitrary pointer arithmetic.) As we shall see, there are different algorithms for determining which blocks are reachable. GC algorithms are evaluated over many dimensions: the accuracy of their identification of live and dead blocks, how much time and space they require, whether they maintain locality (i.e., keep blocks that refer to each other close together in memory), whether they can be performed in a separate thread from
18.3.1 Reference Counting
1131
the executing program, and how long a pause may be needed to perform GC. Real-time computer systems often cannot tolerate long GC-induced pauses, and thus require incremental GC algorithms that perform only a small amount of work every time the garbage collector is invoked.
18.3.1
Reference Counting
One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.” Moon patiently told the student the following story: “One day a student came to Moon and said: ‘I understand how to make a better garbage collector...’ ” — MIT AI Koan about David Moon, attributed to Danny Hillis There are two basic techniques for approximating the liveness of a block. The first is reference counting, in which each block has associated with it a reference count that indicates the number of pointers pointing to the block. When the reference count falls to zero, the block is provably dead and can be immediately reclaimed, e.g., by inserting it into a free list of blocks used for allocation. Reference counting is conceptually simple and is easy to adapt to an incremental algorithm suitable for real-time systems. However, it suffers from numerous drawbacks. The run-time system executing a program must carefully increment a block’s reference count whenever a pointer to it is copied and decrement its reference count whenever a register or memory slot containing a pointer to it is overwritten, and the time overhead for this bookkeeping can be substantial. Storage must be set aside for the reference counts; e.g., a certain number of bits in the block header word can be reserved for this purpose. When reference counts are modeled by a fixed number of bits, the maximal count must be treated as “infinity” — incrementing or decrementing this count must yield the same count, and blocks with this count can never be reclaimed even when they are actually dead. Dead cyclic structures can never be deallocated by reference counting alone, since each element in the structure has at least one pointer to it. Like any heap manager that maintains a free list of blocks (regardless of whether it uses manual or automatic deallocation), reference-counting garbage collectors can suffer from memory fragmentation, where unallocated storage consists of many small blocks, none of which are contiguous. This happens when the heap manager reuses the storage of a deallocated block, but needs only part of it. Although a fragmented memory may contain a large amount of unallocated
1132
Chapter 18 Garbage Collection
storage, the largest block that can be allocated may be small, causing programs to abort prematurely because of out-of-memory errors. Fragmentation can be fixed by memory compaction, a process that moves all live blocks to the beginning of memory to yield a single large unallocated block. Compaction requires rewiring pointers and may change the contents of registers as well as the contents of heap blocks. We shall study a form of compaction in the context of the stop-and-copy garbage collection algorithm presented in Section 18.4. Reference counting is used in practice in the allocation and deallocation of disk blocks in Unix-like operating systems (including Linux). File deletion actually just removes a pointer, or hard link, and the operating system eventually collects all blocks with zero reference counts. Users are not permitted to make hard links to a directory in these systems, because this can create unreclaimable, cyclic structures on disk.
18.3.2
Memory Tracing
The second basic technique for approximating the liveness of a block is memory tracing, in which a block is considered live if it can be reached by a sequence of pointer-following steps from a root set of descriptors. In FRM, the root set for any machine state consists of the set of live registers (i.e., the registers that are mentioned in the current expression).7 In a given machine state, any block that is not reachable from the root set can not be accessed in a future state and thus is dead and may be safely collected as garbage. If we imagine that pointers to heap blocks are strings connecting physical boxes, then tracing-based GC may be viewed as a process in which the root-set descriptors are anchored down while a vacuum cleaner is applied to the heap. Any blocks that are not connected by some sequence of strings to the root set are untethered and will be sucked up by the vacuum cleaner. Memory tracing is a better approximation to reachability than reference counting, because it classifies cyclic structures unreachable from the root set as garbage. Memory tracing can also collect blocks that a reference counting scheme with fixed-size counts would render uncollectible with a count of “infinity.” Tracing-based GC imposes two requirements on a language implementation. In order to traverse all reachable blocks, it must be possible at run time to (1) distinguish block pointers from nonpointers (to know which descriptors to follow) 7 The root set also includes spill memory, which we are ignoring here (but see the Web Supplement for details). In language implementations with a run-time stack of procedure invocation frames, all descriptors in the stack are also in the root set. FRM does not have a run-time stack; instead, stack frames are encoded as heap-based continuation closures.
18.4 Stop-and-copy GC
1133
and (2) determine the size of a block (in order to process its components). In the FRM implementation we have discussed, the GC tag of a descriptor satisfies requirement 1, and the size information in a block header word satisfies requirement 2. But there are other ways to satisfy these requirements. For example, the discussion starting on page 1129 shows ways to encode size information in the pointer word itself, and Exercise 18.4 on page 1138 explores how a single header bit per word can be used to encode the size of a heap block without a header word. Some systems specially mark blocks containing no pointers so that GC does not have to examine the components of such a block.
18.4
Stop-and-copy GC
Memory tracing is the basis for a wide variety of GC strategies, including a relatively simple and effective one known as stop-and-copy. The essential idea is familiar to anyone who has moved from one dwelling to another: put everything you want to keep into a moving van, and anything not on the van at the end is garbage. Stop-and-copy garbage collection reclaims memory by copying all live data to a new area of memory and declaring everything left in the old memory space to be garbage. We will first sketch the stop-and-copy algorithm and then describe the details for FRM below. To distinguish new and old memory spaces, a heap memory of size nsize is divided into two equal-sized areas called semispaces: a lower semispace covering addresses in the range [0.. ((nsize ÷ 2) − 1)] and an upper semispace covering addresses in the range [(nsize ÷ 2) .. (nsize − 1)].8 At any time, one semispace is active and the other is inactive. The active semispace is used for all allocations, using the simplest possible strategy: Allocations start at the beginning (or symmetrically the end) of the active semispace, and each successive allocation request is satisfied with the memory block after the last one allocated. This continues until there is an allocation request for more memory than remains in the active semispace. The inactive semispace is like a field lying fallow; it contains no live blocks and is not used for allocation. When a request is made to allocate a block that cannot fit at the top of the active space, the program is stopped and the garbage collector is invoked. At this point, the active semispace is called from-space and the inactive semispace is called to-space. Garbage collection begins by copying the root set (the contents of the registers) to the bottom of to-space. It then enters a copy phase in which 8
For simplicity, assume that nsize is even.
1134
Chapter 18 Garbage Collection
it copies into to-space all blocks in from-space that are reachable from the root set. It must preserve the pointer relationships between blocks, so that the graph structure of the copied blocks is isomorphic to the original one. It must also update any pointers in the root set to point to the appropriate copied blocks in to-space. Once the copy phase is complete, all live blocks are in to-space, and the algorithm installs the updated root-set descriptors in the machine state (because all pointers have moved to to-space). At this point, the semispaces are flipped: to-space becomes the new active semispace and from-space becomes the new inactive semispace. An attempt is now made to retry the failed allocation request in the new active space: if it succeeds, the program continues normally; otherwise, program execution fails with an out-of-memory error.
A Stop-and-copy GC for FRM Implementing the allocation strategy above requires only a free pointer nfree , which points to the first free word in the active semispace. If the lower semispace is active first, nfree is initially 0, and the semispace is completely full when nfree = (nsize ÷ 2). If the addresses of the active semispace are in the range [nlo ..nhi ], then the free pointer partitions the active semispace into two parts: allocated blocks stored in the address range [nlo .. (nfree − 1)] and free memory available for future allocation in the address range [nfree ..nhi ]. A request to allocate an n-slot block is handled as follows: 1. Calculate nfree = nfree + n + 1 (the 1 accounts for the header word). 2. If there is enough room to allocate the block (i.e., if nfree ≤ nhi + 1):
(a) store a header word for size n in slot M[nfree ]; (b) save the value of nfree as nresult ; ; and (c) update the free pointer nfree to nfree
(d) indicate that allocation has succeeded and that nresult is the address of the newly allocated block. If there is not enough room to perform the allocation, then do a garbage collection (see below) and attempt the allocation again. If it fails a second time, then fail with an out-of-memory error. In FRM, the copy-phase algorithm is an iteration in three state variables: (1) a scan pointer nscan that keeps track of the blocks that need to be copied
18.4 Stop-and-copy GC Initial State nscan
nfree
to-space R[0 ] .. . R[nmax ] unallocated
1135 Intermediate State
Final State
to-space
to-space R [0 ] .. . R [nmax ] .. .
scanned nscan nfree
unscanned unallocated nscan = nfree
unallocated
Figure 18.4 Depictions of initial, intermediate, and final states of the copy phase iteration in the stop-and-copy garbage collection algorithm.
from from-space to to-space; (2) a free pointer nfree used to allocate storage in to-space for blocks being copied from from-space; and (3) the main memory M whose heap component is partitioned into from-space and to-space. Figure 18.4 shows initial, intermediate, and final states of the copy phase. The copy phase begins by installing the root set (the contents of all the registers)9 into the first nreg = nmax + 1 slots of to-space, setting nscan to point to the first slot of the root set, and setting nfree to point to the first slot after the root set. If to-space spans the memory addresses [nlo ..nhi ], then every step of the copyphase iteration maintains the following invariant: nlo ≤ nscan ≤ nfree ≤ nhi + 1
(18.1)
Indeed, the nscan and nfree pointers partition to-space into three regions: • The bottom region of to-space, occupying the address range [nlo .. (nscan − 1)], is the scanned region, which contains words that have been successfully processed by the copy phase, so that pointers in the scanned region point to blocks in to-space. • The middle region of to-space, occupying the address range [nscan .. (nfree − 1)], is the unscanned region, which contains words still to be processed by the copy phase. This region effectively serves as a first-in first-out queue of words to be processed by the copy phase; when a word at the bottom of this region is processed, new words may be added to the top of this region. 9 For simplicity, the algorithm includes all registers in the root set. A more precise tracingbased approximation to block liveness would be achieved by including only the live registers — i.e., those registers actually mentioned in the current expression of the FRM state. See Exercise 18.3.
1136
Chapter 18 Garbage Collection
• The top region of to-space, occupying the address range [nfree ..nhi ], is the unallocated region into which blocks from from-space will be copied. Two additional invariants hold at each step of the copy-phase iteration: Pointers in the scanned region point to blocks in to-space
(18.2)
Pointers in the unscanned region point to blocks in from-space
(18.3)
The copy-phase invariants hold in the initial state of the copy phase iteration: invariant (18.1) clearly holds; the scanned region is initially empty, so invariant (18.2) trivially holds; and the unscanned region initially contains the root set, whose pointers all point to from-space, so invariant (18.3) holds. Each step of the copy phase is described by one of the pictorial rewrite rules in Figure 18.5. Each rule processes the word at nscan , increments nscan to yield . The rules are distinguished by the type of nscan , and updates nfree to yield nfree the first element (the element at nscan ) in the queue of unscanned words. If this word is a nonpointer — i.e, it is an immediate descriptor or a header word — then the iteration simply skips it and moves on to the next word in the queue. If the descriptor is a pointer word, by invariant (18.3) it must specify an address nfrom of a from-space block. The first time the copy phase visits the block, it copies the contents of the block (including its header word) into to-space starting at address nfree and updates the free pointer accordingly; since all pointers in the block refer to from-space, invariant (18.3) is preserved for the next iteration. It also changes the descriptor at nscan to point to the new to-space address (nfree ) rather than the old from-space address (nfrom ), thus preserving invariant (18.2) for the next iteration. Finally, it replaces the header word of the original from-space block with its forwarding address, the new to-space address (nfree ) of the block, to indicate that the block has already been moved to to-space. If the copy phase encounters a from-space block with a forwarding address nfwd (distinguishable from a block header by having a tag of 01 instead of 11), it means that the block has already been copied to to-space, and it is only necessary to convert the block pointer to its forwarding address (in order to preserve invariant (18.2)). The copy phase eventually processes all block pointers that can be reached from the root set, thus performing a memory trace that approximates liveness by reachability from the root set. Because new blocks are copied to the end of the queue in the unscanned region, blocks are traversed in a breadth-first manner. The copy phase ends when the unscanned region queue becomes empty — i.e., when the scan pointer catches up to the free pointer. At this point, all objects reachable from the root set have been copied from from-space to to-space, and invariant (18.2) guarantees that all pointer descriptors in to-space now point into to-space.
18.4 Stop-and-copy GC
1137 Process a Nonpointer
to-space scanned nscan
nonpointer word unscanned
nfree
unallocated
nfrom
to-space scanned
⇒
nscan nonpointer word unscanned nscan = nscan + 1 nfree = nfree
unallocated
Process a Pointer to a Not-Yet-Copied Block from-space from-space .. .. . . nfrom nfrom [nslots ] 11 [nfree ] 01 W1 W1 .. .. . . + nslots nfrom + nslots Wnslots Wnslots .. .. . . to-space scanned nscan nfree
⇒
[nfrom ] 01 unscanned
to-space scanned nscan
nscan = nscan + 1
[nfree ] 01 unscanned
nfree
unallocated
nfree = nfree + nslots
[nslots ] 11 W1 .. . Wnslots + 1 unallocated
Process a Pointer to an Already Copied Block from-space from-space .. .. . . nfrom nfrom [nfwd ] 01 [nfwd ] 01 .. .. . . to-space scanned
Figure 18.5
⇒
to-space scanned
nscan
[nfrom ] 01 unscanned
nscan nscan = nscan + 1
[nfwd ] 01 unscanned
nfree
unallocated
nfree = nfree
unallocated
A pictorial description of the stop-and-copy garbage collection algorithm.
1138
Chapter 18 Garbage Collection
At the termination of the copy phase, the updated register contents in the first nreg slots of to-space are copied back into the registers, yielding the new register memory R . Additionally, a semispace flip is performed by making to-space the new active semispace. Subsequent allocations then take place starting at nfree in this new active semispace. The stop-and-copy algorithm has several nice properties. Unlike reference counting, it can collect cyclic garbage. Stop-and-copy GC compacts live blocks at the bottom of to-space; this avoids memory fragmentation and simplifies block allocation. The time to perform a stop-and-copy GC is proportional to the total size of reachable blocks, so if most of from-space is garbage, very little work is needed to perform a stop-and-copy GC. However, stop-and-copy has some serious drawbacks as well. Reserving half of heap memory for the inactive semispace wastes a large chunk of potential storage space. The breadth-first nature of the memory trace performed by stop-and-copy does not preserve the locality of blocks, which can seriously degrade memory performance. The block movement of the copy phase causes significantly more memory traffic than in-place approaches like reference counting and the marksweep strategy discussed below. Exercise 18.3 The stop-and-copy GC algorithm presented above has a root set that includes the contents of all registers. However, if a dead register (one that is not in the free variables of the currently executing expression) contains a pointer to a block, then this block will be treated as live by any tracing-based GC algorithm, even though it may be provably dead. Because of this, GC may not collect as much garbage as it could. Fix this problem by making a simple change to the GC algorithm that prevents it from following pointers stored in dead registers. Exercise 18.4 In a system that requires only GC information (not types), Ben Bitdiddle thinks that encoding the size of a block in a header word wastes too much space. He observes that it is possible to dispense with all block header words if an additional bit (which he calls the header bit) is reserved in every descriptor to indicate whether it is the first word of a block in the heap. So two tag bits are necessary in every descriptor: the header bit and the GC tag. Here is one possible tag encoding: 00 10 01 11
immediate descriptor that is not the first word in a heap block immediate descriptor that is the first word in a heap block nonimmediate descriptor that is not the first word in a heap block nonimmediate descriptor that is the first word in a heap block
The header bit should be 1 only for descriptors stored in the first word of a block in the heap. Any other descriptor (including those stored in registers) should have a header bit of 0.
18.4 Stop-and-copy GC
1139
a. Modify the stop-and-copy GC algorithm to work with Ben’s representation. b. What are the advantages and drawbacks of Ben’s approach? Exercise 18.5 Suppose that the first 20 words of main memory M in an executing FRM program have the contents shown below. (Assume the FRM GC-tag-only descriptor and size-only block representations presented in Sections 18.2.2 and 18.2.3 are being used. The number to the left of each slot is its address. In each slot, a bracketed decimal integer followed by tag bits stands for the binary representation of the integer concatenated with the tag bits.) 0 1 2 3 4
[5] 01 [5] 0 [2] 11 [7] 01 [2] 01
5 6 7 8 9
[1] 11 [10] 01 [2] 11 [2] 0 [2] 01
10 11 12 13 14
[3] 11 [3] 0 [5] 01 [14] 01 [2] 11
15 16 17 18 19
[2] 01 [7] 01 [2] 11 [10] 01 [5] 01
Suppose that the program uses only the first two registers (i.e., nreg = 2), where R[0 ] = [17] 01 and R[1 ] = [14] 01, and the program does not spill any registers. Finally, suppose that the currently executing FILreg expression has the form (let* ((r0 0) {Set register 0 to 0} (r0 (@mnew NT ))) {Set register 0 to the address of a new block with NT slots} Erest )
where FrIds[[Erest ]] = {r0, r1} (i.e., it refers to both registers). a. Draw a box-and-pointer diagram depicting the two registers and all the heap blocks in M. You should draw a register as a one-slot box. You should draw a heap block with n content slots as an n-slot box. A slot containing an immediate value should show that value. A slot containing an address should be the source of an arrow that points at the box representing the heap block at that address. b. Based on your diagram in part a, indicate which heap blocks are live and which are dead when the mnew primitive is executed. c. Assume that heap memory has 40 slots (so that the first 20 fill one semispace). Show the contents of heap memory after performing the stop-and-copy GC algorithm initiated when the mnew primitive is executed. What is the largest value of NT for which the program will not encounter an out-of-memory error? Exercise 18.6 Ben Bitdiddle has been hired by the Analog Equipment Corporation to consult on a memory management problem. Analog uses Balsa, a programming language in which heap storage is explicitly managed by programmers using the following two expression constructs:
1140
Chapter 18 Garbage Collection
(malloc E ): If the value of E is a positive integer n, returns a pointer to a block of storage that is n + 1 words long. The first word of the returned block is a size header; the other n words are uninitialized. An out-of-memory error is generated if there is insufficient storage to allocate a block of the requested size. An error is generated if the value of E is not a positive integer. (free E ): If the value of E is a pointer to a block of storage, deallocates the storage of that block (allowing it to be reused by malloc) and returns unit. Otherwise, an error is generated. Analog is having problems with a very large Balsa application (dubbed The Titanic by the development staff) that eventually either mysteriously crashes or runs out of heap space. Ben suspects that the programmers who wrote the application are not properly deallocating storage. In order to debug Analog’s problem, Ben decides to implement a standard stop-andcopy garbage collector for Balsa. He modifies malloc and free to keep track of the total amount of “busy” storage — malloc increments a global *busy* counter with the number of words in the block it creates and free decrements the *busy* counter by the number of words in the block it frees. In Ben’s system, free just decrements the *busy* counter and does not actually free any storage. Instead, when storage is exhausted, the garbage collector runs and copies live storage from the old active semispace into the new one. a. Let live be the number of words copied during a garbage collection and busy be the value of the *busy* counter at the time of the garbage collection. In each of the following situations encountered while executing a Balsa program in Ben’s system with garbage collection, describe the implications for executing the same program in the original system without garbage collection: i.
live < busy
ii.
live > busy
iii.
live = busy
b. How can Ben modify his garbage collector to detect dangling pointers? c. Ben tests his garbage collector on another very large AEC program called The Brittanic.10 The program uses malloc and free for explicit memory management and works fine with one megabyte of available memory. Ben installs enough extra memory to support two semispaces, each of which has one megabyte of storage beyond the space needed for the garbage collector itself. Ben turns on the garbage collector, and, full of hope, runs the program. To his surprise, The Brittanic encounters an out-of-memory error. i.
How can you explain this behavior?
ii.
How can Ben fix the problem?
10
The Brittanic (1914–1916) was an identical sister ship of the Titanic.
18.5 Garbage Collection Variants
18.5
1141
Garbage Collection Variants
Stop-and-copy is just one of many approaches to garbage collection. Here we review some other approaches.
18.5.1
Mark-sweep GC
Another popular tracing-based GC algorithm is mark-sweep, an approach to GC that takes place in two phases. First, the mark phase traverses memory from the root set, marking each reachable block along the way (e.g., by setting a mark bit associated with each block it visits). Then the sweep phase linearly scans through memory and collects all unmarked blocks into a free list. The mark-sweep collector is invoked whenever an allocation request is made and the free list does not have a big enough block. Mark-sweep has several benefits compared to stop-and-copy. Unlike stop-andcopy, which uses only half of heap memory for allocating blocks, mark-sweep can allocate blocks in all of heap memory. Like reference counting, mark-sweep is an in-place algorithm that does not move blocks, and so it can be used in situations (such as conservative GC, discussed later) where blocks cannot be moved. Inplaceness also helps to preserve block locality and reduce memory traffic during GC. But in-placeness has a big downside as well — it implies using a free list for allocation, which leads to memory fragmentation. There are other drawbacks to mark-sweep. There is space overhead for the mark bits and time overhead for manipulating them. There is also space overhead for controlling the mark-phase traversal, although this can be eliminated using the clever pointer-reversal technique described in [SW67]. Finally, the sweep phase takes time proportional to the size of the heap rather than to the size of live memory. In contrast, stop-and-copy GC takes time proportional to the size of live memory.
18.5.2
Tag-free GC
A GC algorithm is said to be tag-free if GC descriptors do not require tags. For a statically typed language, it is possible to implement a tag-free GC that can also eliminate header words for blocks whose sizes are statically known. The basic idea is to design the implementation so that the garbage collector is provided with (or can find) run-time type information for every word in the root set. This type information can be used as a “map” that guides GC by distinguishing pointers from nonpointers and indicating the sizes of blocks whose size is statically known.
1142
Chapter 18 Garbage Collection
(Size-bearing header words are still necessary for blocks whose size is known only dynamically, such as arrays.) For example, a descriptor with type (prodof int bool (listof int)) is a pointer to a block with three slots, the first two of which are nonpointers, but the third of which (if nonnull) is a pointer to a two-slot block with one nonpointer and one pointer. Because it has compact descriptions for complex specifications (e.g., (listof int) describes the layout of integer lists of any length), such a type “map” generally requires far less storage than that needed to explicitly annotate every word and block with GC information. But tag-free GC complicates the compiler (which must supply type information to the run-time system), the runtime system (which must preserve the type information), and the GC algorithm (which must find and follow the type “map”).
18.5.3
Conservative GC
In a tracing-based GC, it is never sound to treat a pointer as a nonpointer, since this could result in classifying a live block as dead. However, it is sound to treat some nonpointers as pointers. For example, in a system where words do not carry GC tags, if an integer in a register happens to have the same bit pattern as a pointer to a heap block, it’s OK to consider that block reachable; the only downside is that this block may now cause a memory leak. But if the probability of misinterpreting an integer as a block pointer is low, then this sort of memory leak may not be any worse than leaks due to other liveness approximations. This is the key idea behind a tag-free approach known as conservative GC [BW88], which can be used for garbage collection in implementations of languages (e.g., C/C++ and Pascal) that cannot tolerate GC tags in their word representations. There are clever techniques for efficiently determining whether an arbitrary word is a possible heap-block address and, if so, for determining the size of the block. Conservative GC must use an in-place algorithm (like mark-sweep) to collect dead blocks because there is no reliable way to distinguish integers from pointers when performing the pointer rewiring required by copying techniques. Empirically, conservative GC appears to work well in many situations, and it is the only GC technique available for many languages.
18.5.4
Other Variations
There are many other variations on the garbage collection approaches that we have discussed. Based on the observation that most blocks in some systems are short-lived, so-called generational collectors partition the heap into regions based on block lifetimes; recently allocated blocks are put in regions where
18.5.4 Other Variations
1143
collection is performed frequently, and older blocks migrate to regions where collection is performed less frequently. There are numerous incremental versions of many approaches that reduce the length of GC pauses or bound them so that GC can be used in real-time systems. There are also many concurrent GC algorithms that can run in a thread separate from the one(s) used for program execution; the challenging problem these algorithms must address is how to handle the fact that some threads are changing the graph of blocks while GC is being performed in a separate thread. Even more challenging is performing garbage collection in distributed environments, where memory is distributed over processing nodes connected by a communication network with an arbitrary topology. In practice, choosing a garbage collector depends critically on numerous details, such as the typical lifetimes of blocks of particular sizes, the tolerance for memory leaks, the frequency of cyclic data, the acceptability of GC pauses, the necessity of keeping blocks in place, the importance of memory locality, the cost of memory traffic, and various other issues involving time and space resources. Finding a good heap manager can require implementing several solutions, comparing them empirically, and fine-tuning the one that performs best in typical situations. Sometimes it is a good idea to combine several of the strategies we have discussed. For example, a system might require certain blocks to be deallocated manually and use reference counts to automatically deallocate other blocks, relying on a periodic stop-and-copy GC to compact and collect cyclic garbage from the reference-counted storage. Exercise 18.7 This exercise continues the scenario started in Exercise 18.5 on page 1139. a. Assume that heap memory has just 20 slots, containing the values shown in Exercise 18.5. Show the contents of heap memory after performing the mark-sweep GC algorithm initiated when the mnew primitive is executed. Assume that one bit of each header word is reserved for the mark bit and that reclaimed blocks are stored in a free list that is used for allocation. Assume that the free list is stored in a special register Rfree initially containing 0 (denoting the empty list) and that a block of size nslots is added to the free list by first setting the first content slot of the block to contain the content of Rfree and then setting Rfree to the address of the block. What is the largest value of NT for which the program will not encounter an out-of-memory error? b. Again assume that heap memory has just 20 slots and that memory is allocated from a free list as described in part a (where Rfree is initially the empty list). Assume that a reference-counting garbage collector is used, where 3 bits of each header word are reserved for a reference count. Show the contents of heap memory and Rfree (1) after performing the instruction that sets R0 to 0 and (2) after performing the mnew primitive. What is the largest value of NT for which the program will not encounter an out-of-memory error?
1144
Chapter 18 Garbage Collection
Exercise 18.8 Consider the following FLARE/V program P : (flare (n) (recur loop ((p (pair 0 0))) (let ((s (snd p))) (if (= s n) (fst p) (loop (pair s (+ s 1)))))))
a. Explain what this program does. b. Suppose that P is compiled by Tortoise and executed in a version of FRM using a simple heap manager that allocates blocks but never deallocates them. On the ith iteration of the loop, how many pair blocks are live and how many are dead? c. Suppose that we extend FLARE/V and FIL with a manual deallocation construct (free E ). If E denotes a compound value, then free indicates that the heap block representing this value may be reclaimed; otherwise free generates an error. Modify P to a program P that uses free in such a way that it will not abort with an out-ofmemory error for any natural number input when executed using a heap of reasonable size. d. Remark on the suitability of each of the following approaches to garbage collection for executing P : (1) stop-and-copy; (2) mark-sweep; and (3) reference counting. e. Suppose that P is translated to C, where pair is replaced by a call to malloc, C’s memory allocator, and loop is replaced by a while loop. What, if any, garbage collection techniques can prevent the program from running out of memory?
18.6
Static Approaches to Automatic Deallocation
We have studied dynamic approaches to automatic deallocation, but there are static approaches as well. In a language implementation with a run-time stack of frames that store information (e.g., arguments, local variables, return addresses) associated with procedure invocations, popping a frame on procedure exit can reclaim a large chunk of storage with low cost (resetting the stack pointer). Languages like C/C++, Pascal, and Ada permit (indeed, encourage) the programmer to allocate data blocks on the stack by declaring compound values that are local to a procedure; these are implicitly deallocated when the procedure returns. Pascal and Ada do not allow creating pointers to stack-allocated data that can outlive the stack frame for the procedure invocation in which they were allocated, so stack deallocation cannot give rise to dangling pointers in these languages. In contrast, C and C++ do allow pointers to stack-allocated data to outlive the stack frame
Notes for Chapter 18
1145
in which they were allocated, providing yet another way to generate dangling pointers in these languages. An alternative approach is to rely on a system that statically determines (e.g., using the lifetime analysis in Section 16.3.5) which blocks can safely be allocated on the stack. For example, in the Tortoise compiler, such a system would be able to determine automatically that all closures for continuation procedures introduced by the CPS stage can be stack-allocated — an expected result, since they correspond to traditional stack frames.11 The region-based approach to memory management sketched in Section 16.3.5 generalizes this idea by statically allocating each program value within a particular region of a stack of abstract regions associated with an automatically placed let-region construct. This allows an entire region to be deallocated when the let-region that introduced it is exited.
Notes A wealth of information about modern garbage collection techniques can be found in the surveys [Wil92] and [Jon99]. Earlier work is surveyed in [Coh81]. Mark-sweep collection was invented by McCarthy in the first version of Lisp [McC60]. The idea of copying garbage collection originated with Minsky [Min63], who wrote live data to disk and read it back into main memory. Fenichel and Yochelson developed a two-semispace algorithm for list memory in which live list nodes were scanned recursively [FY69]. The recursion implies extra storage for a recursion stack, but this can be eliminated by the pointer-reversal technique described in [SW67]. The iterative scanning algorithm we describe, which uses constant control space for scanning, is due to Cheney [Che70]. There are incremental versions of this algorithm that can limit the duration of a GC pause (e.g., [HGB78, AEL88, NOPH92, NO93, NOG93]). [App89] sketches how static typing can eliminate the need for almost all tag bits in a garbage-collected language. Many of the details of tag-free GC were worked out in [Gol91]. Determining the types of tagless objects in a statically typed language with polymorphism is a problem. One solution is to dynamically reconstruct the types at run time [AFH94]. Another is to modify the compiler and 11
This is true only because FLARE/V does not include constructs that capture control points, such as label/jump or cwcc. Also, note that stack allocation of continuation closures can be performed without sophisticated analysis. For example, Rozas’s Liar compiler for Scheme [Roz84] achieved this result by performing a pre-CPS closure-conversion pass that allocated closures on the heap and a post-CPS closure conversion pass that allocated closures on the stack.
1146
Chapter 18 Garbage Collection
run-time system to explicitly pass the types onto which polymorphic functions are projected [Tol94]. Conservative GC [BW88] is a variant of tag-free GC suitable for languages whose data representations cannot contain GC tag bits. An operational framework for reasoning formally about memory management is presented in [MFH95]. Interestingly, type reconstruction in this system can be used to identify values that, though reachable, will never actually be referenced, and so can be reclaimed as garbage. So-called linear types, which track the number of times a value is used, can be used in such a framework to eagerly reclaim values after their last use [IK00]. Many techniques have been developed to reduce dangling pointer errors in languages with manual heap deallocation. One approach is to insert additional run-time checks before memory operations to guarantee memory safety [NMW02, JMG+ 02]. Another approach is to prove statically that no dangling pointers can be encountered in a running program. Although this is undecidable in general, it can be done for certain kinds of programs with a sufficiently sophisticated analysis (e.g., [DKAL03, ZX05, Zhu06]). Heap management is only one of many services provided by the run-time system for a programming language implementation. For a discussion of a more full-featured run-time system, see [App90], which provides an overview of data layout and run-time services (including garbage collection, module loading, input/output, foreign function calls, and execution profiling) for an ML implementation.
Appendix A A Metalanguage Man acts as though he were the shaper and master of language, while in fact language remains the master of man. — Martin Heidegger, “Building Dwelling Thinking,” Poetry, Language, Thought (1971) This book explores many aspects of programming languages, including their form and their meaning. But we need some language in which to carry out these discussions. A language used for describing other languages is called a metalanguage. This appendix introduces the metalanguage used in the body of the text. The most obvious choice for a metalanguage is a natural language, such as English, that we use in our everyday lives. When it comes to talking about programming languages, natural language is certainly useful for describing features, explaining concepts at a high level, expressing intuitions, and conveying the big picture. But natural language is too bulky and imprecise to adequately treat the details and subtleties that characterize programming languages. For these we require the precision and conciseness of a mathematical language. We present our metalanguage as follows. We begin by reviewing the basic mathematics upon which the metalanguage is founded. Next, we explore two concepts at the core of the metalanguage: functions and domains. We conclude with a summary of the metalanguage notation.
A.1
The Basics
The metalanguage we will use is based on set theory. Since set theory serves as the foundation for much of popular mathematics, you are probably already familiar with many of the basics described in this section. However, since some of our notation is nonstandard, we recommend that you at least skim this section in order to familiarize yourself with our conventions.
1148
A.1.1
Appendix A A Metalanguage
Sets
A set is an unordered collection of elements. Sets with a finite number of elements are written by enclosing the written representations of the elements within braces and separating them by commas. So {2, 3, 5} denotes the set of the first three primes. Order and duplication don’t matter within set notation, so {3, 5, 2} and {3, 2, 5, 5, 2, 2} also denote the set of the first three primes. A set containing one element, such as {19}, is called a singleton. The set containing no elements is called the empty set and is written {}. We will assume the existence of certain sets: Unit = {unit} ; Bool = {true, false} ; Int = {. . . , −2, −1, 0, 1, 2, . . .} ; Pos = {1, 2, 3, . . .} ; Neg = {−1, −2, −3, . . .} ; Nat = {0, 1, 2, . . .} ; Rat = {0, 1, −1, 12 , − 21 , 2, −2, 13 , − 13 , 32 , − 32 , 3, −3, 32 , − 23 , . . .} ; Char = {‘a’, ‘b’, . . . , ‘A’, ‘B’, . . . , ‘1’, ‘2’, . . . , ‘.’, ‘,’, . . .} ; String = {“”, “a”, “b”, . . . , “foo”, . . . “a string”, . . .} ;
standard singleton set truth values integers positive integers negative integers natural numbers rationals text characters all character strings
(The text in slanted font following the semicolon is just a comment and is not a part of the definition. This is one of two commenting styles used in this book. In the other commenting style, the comments are written in slanted font and are delimited by braces. However, the braces would be confusing in the presence of set notation, so we use the semicolon style in some cases.) The Unit set is the canonical singleton set; its single element is named unit. Bool is the set of the boolean truth values true and false. Int, Pos, Neg, Nat, and Rat (which contains all ratios of integers) are standard sets of numbers. String is the set of all character strings. Unit and Bool are finite sets, but the other examples are infinite. Since it is impossible to write down all elements of an infinite set, we use ellipses (“. . .”) to stand for the missing elements in standard sets where it is clear what the remaining elements are. We consider the unit value, truth values, numbers, and characters to be primitive elements that cannot be broken down into subparts. Character strings are not primitive because they can be decomposed into their component characters. Sets can contain any structure, including other sets. For example, the set {Int, Nat, {2, 3, {4, 5}, 6}} contains three elements: the set of integers, the set of natural numbers, and a set of four elements (one of which is itself a set of two numbers). Here the names Int and Nat are used as synonyms for the set structures they denote.
A.1.1 Sets
1149
Set membership is specified by the symbol ∈ (pronounced “element of” or “in”). The notation e ∈ S asserts that e is an element of the set S, while e ∈ S asserts that e is not an element of S. (In general, a slash through a symbol indicates the negation of the property denoted by that symbol.) For example, 0 0 Int Neg 2
∈
∈ ∈
∈
∈
Nat Neg {Int, Nat, {2, 3, {4, 5}, 6}} {Int, Nat, {2, 3, {4, 5}, 6}} {Int, Nat, {2, 3, {4, 5}, 6}}
In the last example, 2 is not an element of the given set even though it is an element of one of that set’s elements. A set A is a subset of a set B (written A ⊆ B) if every element of A is also an element of B. Every set is a subset of itself, and the empty set is trivially a subset of every set. E.g., {} ⊆ {1, 2, 3} ⊆ Pos ⊆ Nat ⊆ Int ⊆ Rat Nat ⊆ Nat Nat ⊆ Pos
Two sets A and B are equal (written A = B) if they contain the same elements, i.e., if every element of one is an element of the other. Note that A = B if and only if A ⊆ B and B ⊆ A. A is said to be a proper subset of B (written A ⊂ B) if A ⊆ B and A = B. Sets are often specified by describing a defining property of their elements. The set builder notation {x | Px } (pronounced “the set of all x such that Px ”) designates the set of all elements x such that the property Px is true of x. For example, Nat could be defined as {n | n ∈ Int and n ≥ 0}. The sets described by set builder notation are not always well defined. For example, {s | s ∈ s}, (the set of all sets that are not elements of themselves) is a famous nonsensical description known as Russell’s paradox. We will use [lo..hi] (pronounced “the integers between lo and hi , inclusive”) as an abbreviation for {n | n ∈ Int and lo ≤ n ≤ hi }; if lo > hi , then [lo..hi ] denotes the empty set. Some common binary operations on sets are defined below using set builder notation: A ∪ B = {x | x ∈ A or x ∈ B} ; union A ∩ B = {x | x ∈ A and x ∈ B} ; intersection A − B = {x | x ∈ A and x ∈ B} ; difference
1150
Appendix A A Metalanguage
The notions of union and intersection can be extended to (potentially infinite) collections of sets. If A is a set of sets, then A denotes the union of all of the component sets of A. That is,
A = {x | there exists an a ∈ A such that x ∈ a}
If Ai is a family of sets indexed by elements i of some given index set I, then i∈I
Ai =
{Ai | i ∈ I}
denotes the union of all the sets Ai as i ranges over I. Intersections of collections of sets are defined in a similar fashion. Two sets B and C are said to be disjoint if and only if B ∩ C = {}. A set of sets A = {Ai | i ∈ I} is said to be pairwise disjoint if and only if Ai and Aj are disjoint for any distinct i and j in I. A is said to partition (or be a partition of) a set S if and only if S = i∈I Ai and A is pairwise disjoint. The cardinality of a set A (written |A|) is the number of elements in A. The cardinality of an infinite set is said to be infinite. Thus |Int| is infinite, but |{Int, Nat, {2, 3, {4, 5}, 6}}| = 3. Still, there are distinctions between infinities. Informally, two sets are said to be in a one-to-one correspondence if it is possible to pair every element of one set with a unique and distinct element in the other set without having any elements left over. Any set that is either finite or in a one-to-one correspondence with Nat is said to be countable. For instance, the set Int is countable because every nonnegative element n in Int can be paired with 2n in Nat and every negative element n in Int can be paired with 1 − 2 · (n + 1). Clearly Unit, Bool, Pos, Neg, and Char are also countable. It can be shown that Rat and String are countable as well. Informally, all countably infinite sets “have the same size.” On the other hand, any infinite set that is not in a one-to-one correspondence with Int is said to be uncountable. Cantor’s celebrated diagonalization proof shows that the real numbers are uncountable.1 Informally, the size of the reals is a much “bigger” infinity than the size of the integers. The powerset of a set A (written P(A)) is the set of all subsets of A. For example, P({1, 2, 3}) = {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
The cardinality of the powerset of a finite set is given by: |P(A)| = 2|A| 1
A description of Cantor’s method can be found in many books on mathematical analysis and computability. We particularly recommend [Hof80].
A.1.2 Boolean Operators and Predicates
1151
In the above example, the powerset has size 23 = 8. The set of all subsets of the integers, P(Int), is an uncountable set.
A.1.2
Boolean Operators and Predicates
In our metalanguage, we will often employ standard operators to manipulate expressions that denote the boolean truth values, true and false. Suppose that p, q, and r are any expressions that stand for boolean truth values. Then: • ¬p, the logical negation of p, is false if p is true and is true if p is false. The notation ¬p is pronounced “not p.” Note that ¬(¬p) = p. • p ∧ q, the logical conjunction of p and q, is true only if both p and q are true; otherwise it is false. The notation p ∧ q is pronounced “p and q.” It is commutative (p ∧ q = q ∧ p) and associative ((p ∧ q) ∧ r = p ∧ (q ∧ r)). • p ∨ q, the logical disjunction of p and q, is false only if both p and q are false; otherwise it is true. The notation p ∨ q is pronounced “p or q.” It is commutative and associative. • The logical implication statements “p implies q,”2 “if p then q,” and “p only if q” are synonymous, and are true only when p is false or q is true; otherwise, they are false. So these statements are equivalent to (¬p) ∨ q. When p is false, these statements are said to be vacuously true. • The contrapositive of “p implies q” is “not q implies not p.” This is logically equivalent to “p implies q,” which we can see because (¬ (¬q)) ∨ (¬p) can be simplified to (¬p) ∨ q. • The statement “p if q” is equivalent to “if q then p” and thus to p ∨ (¬q). • The statement “p if and only if q,” usually abbreviated “p iff q,” is true only if both “p implies q” and its converse, “q implies p,” are true; otherwise it is false. It is equivalent to ((¬p) ∨ q) ∧ (p ∨ (¬q)). For our purposes, a predicate is a metalanguage expression, usually containing variables, that may denote either true or false when the variables are instantiated with values. Some examples are n ∈ Pos, A ⊆ B, and x > y. The first of these examples is a unary predicate, a predicate that mentions one variable (in this case, n). 2 “p implies q” is traditionally written as p → q or p ⇒ q. However, the arrows → and ⇒ are used for other purposes in this book. To avoid confusion, we will always express logical implication in English.
1152
Appendix A A Metalanguage
We have already seen predicates in set builder notation; the expression to the right of the | symbol is a predicate over the variables mentioned in the expression to the right. For example, the notation {x | x ∈ Int and (x ≥ 0 or x ≤ 5)}
denotes the set of integers between 0 and 5, inclusive. In this case, the predicate after the | symbol is built out of three smaller predicates, and we could rewrite it using boolean operators as (x ∈ Int) ∧ (x ≥ 0 ∨ x ≤ 5)
Suppose that S is a set and P (x) is a unary predicate over the variable x. Then the universal quantification statement ∀x∈S . P (x), pronounced “for all x in S, P (x),” is true iff P is true when x is instantiated to any member of S. If there is some element for which the predicate is false, the universal quantification statement is false. If S is empty, ∀x∈S . P (x) is true for any predicate P (x); in this case, the statement is said to be vacuously true. We use the notation ∀hi i=lo . P (i ) as an abbreviation for ∀i∈[lo..hi] . P (i ), where [lo..hi ] is the set of all integers i such that lo ≤ i ≤ hi . The existential quantification statement ∃x∈S . P (x), pronounced “there exists an x in S such that P (x),” is true iff there is at least one element xwitness in S such that P (x) is true when x is instantiated to xwitness . If there is no element for which the predicate is true, the existential quantification statement is false. The element xwitness (if it exists) is called a witness for the existential quantification because it provides the evidence that the statement is true.
A.1.3
Tuples
A tuple is an ordered collection of elements. A tuple of length n, called an n-tuple, can be envisioned as a structure with n slots arranged in a row, each of which is filled by an element. Tuples with a finite length are written by writing the slot values down in order, separated by commas, and enclosing the result in angle brackets. Thus 2, 3, 5 is a tuple of the first three primes. The number and order of elements in a tuple matter, so 2, 3, 5, 3, 2, 5, and 3, 2, 5, 5, 2, 2 denote three distinct tuples. Tuples of size 2 through 5 are called, respectively, pairs, triples, quadruples, and quintuples. The 0-tuple, , and 1-tuples also exist. The element of the ith slot of a tuple t can be obtained by projection, written t ↓ i. For example, if s is the triple 2, 3, 5, then s ↓ 1 = 2, s ↓ 2 = 3, and s ↓ 3 = 5. The notation t ↓ i is well formed only when t is an n-tuple and
A.1.4 Relations
1153
1 ≤ i ≤ n. Two tuples s and t are equal if they have the same length n and s ↓ i = t ↓ i for all 1 ≤ i ≤ n. As with sets, tuples may contain other tuples; e.g., 2, 3, 5, 7, 11, 13, 17 is a tuple of three elements: a quadruple, an integer, and a pair. Moreover, tuples may contain sets and sets may contain tuples. For instance, 2, 3, 5, Int, {{2, 3, 5}, 7, 11} is a well-formed tuple. If A and B are sets, then their Cartesian product (written A × B) is the set of all pairs whose first slot holds an element from A and whose second slot holds an element from B. This can be expressed using set builder notation as: A × B = {a, b | a ∈ A and b ∈ B}
For example: {2, 3, 5} × {7, 11} = {2, 7, 2, 11, 3, 7, 3, 11, 5, 7, 5, 11} Nat × Bool = {0, false, 1, false, 2, false, . . . , 0, true, 1, true, 2, true, . . .}
If A and B are finite, then |A × B| = |A| · |B|. The product notion extends to families of sets. If A 1 , . . . , An is a family of sets, then their product (written A1 × A2 × . . . × An or ni=1 Ai ) is the set of all n n-tuples a1 , a2 , . . ., an such that ai ∈ Ai . The notation A (= ni=1 A) stands for the n-fold product of the set A.
A.1.4
Relations
A binary relation on A is a subset of A × A.3 For example, the less-than relation, ), 1163, see also >=; Relational operator Greater-than relation (>), 1163, see also >; gt; Relational operator Greatest lower bound (glb), 176
1266 gt (PostFix command), 8, 40, see also Relational operator informal semantics, 10 PostFix DS, 137 GW (globalization wrapping transform), 1016 Gymnastics, mental, 446 halt (program termination) in FILreg , 1101 in FLICK extension, 501 Halting problem, 618, see also Undecidability halting function, 198, 1158 halting theorem, 49 Hamming numbers, 560 handle (FLIC exception-handling construct), 515–532 computation-based DS, 524–527 desugaring-based implementation, 527–529 metaCPS conversion, 1072 standard DS, 519–524 termination semantics, 515, 520 in type/effect system, 986–988 handle (SML exception construct), 514 [handle] (unsound type/effect rule), 986 Handle an exception, 514 HandlerEnv (exception handler domain), 519, 521, 525 Handler environment for exceptions, 519 handlerof (exception handler type), 986–988 Hankin, Chris, 1117 Hard link, 1132 Hash (Perl record), 353 Haskell array, as updatable sequence, 545 array indexing, 548 as block-structured language, 337 as implicitly typed language, 626 as purely functional language, 384 as purely functional lazy language, 209 as statically typed language, 623
Index currying of constructor arguments, 588–589 extensible records, 767 garbage collection, 1120 HDM type reconstruction, 836 heterogeneous tuple, 548 immutable string, 548 immutable updatable sequence, 548 implicit parameter, 379 lack of abstraction in pattern matching, 607–610 list-manipulation libraries, 239 monadic style and do notation, 396 nonstrict application, 215 nonstrict product, 551 pattern matching, 590, 605–610, 768, 829 sum (Either), 568 sum-of-products data, 579, 829 type reconstruction, 812 universal polymorphism, 627 unsafe features, 622 user-defined data-type declaration, 588–589 Hasse diagram for partial order, 174 Haugen, Marty (quoted), 889 Hawes, Bess (quoted), 1042 HDM type reconstruction, 812, 836 head (Scheme stream head), 561 head (metalanguage sequence function), 1182–1184 Header word in FRM block, 1127 Heap allocation in active semispace of stop-and-copy GC, 1133–1134 of FRM block, 1127 in PostHeap, 110 Heap block, 1119, 1122, see also FRM live or dead, 1120, 1130–1133 stack allocation of, 1144 Heap deallocation, see Garbage collection (GC) Heap memory, 438, 991, 1119, 1123 in PostHeap, 110
Index Heidegger, Martin (quoted), 1147 Heterogeneous list, 686 Heterogeneous product, 548, 677 Hewitt, Carl, 941 Hewlett Packard stack calculator, 8 Hiding names, 333, 901 Hierarchical scope, 334–347, 352 Higher-order function, 1160–1161, see also First-class procedure; Higher-order procedure binding construct defined with, 124 lambda notation and, 1166 to simulate multiple arguments (currying), 1162 Higher-order polymorphic lambda calculus, 765 Higher-order procedure, 214–215, see also First-class procedure; Higher-order function closure conversion to simulate with first-order procedure, 1075 list procedure, 239–240 perspectives on, 305 Higher-order type, 750–767 description, 750–758 kind, 758–767 Hindley, J. Roger, 835 Hindley-Damas-Milner (HDM) type reconstruction, 812, 836 Hoare, C. A. R., 941 Hole in context, 71 in scope of a variable, 245, 337 Holmes, Sherlock (quoted), 769 Homogeneous list, 686 Homogeneous product, 548, 677 Homomorphism, 115 HOOK (Humble Object-Oriented Kernel), 362–368, see also HOOPLA class, simulating with prototype object, 366 prototype-based language, 380 semantics, 370–373
1267 static scope, 366 syntax, 363 translation to FL, 370–373 hook (HOOK program keyword), 363 HOOPLA (Humble Object-Oriented Programming Language), 362, 368–370, see also HOOK extending with state, 403 namespaces, 376 sample classes, 371 hoopla (HOOPLA program keyword), 369 Horace (quoted), 163 Horizontal style for type derivation, 648–650 HTML, sum-of-products data via markups, 580 Hudak, Paul, 305 Hughes, R. J. M., 305 Hygienic macros, 331, 379 I ∈ Ident, see Ident I ∈ Inputs, see Inputs i ∈ Int, see Int IBM 360, register spilling, 1115 Ice cream, inspiration for mixin, 380 Icon generators, 507 [id-≈tc ] (type-constructor equivalence axiom), 913 idA (identity function), 1159, 1167 Idempotence, 1084 Ident (identifier domain) in FILreg (Identreg ), 1099 in FL dialects, 211 in FLEX/M, 893 in HOOK, 363 Identification (unique object handle), 941 Identifier, 244, 334, see also Variable; Variable reference binding occurrence of, 245 bound, 246 bound occurrence of, 245 in FLK, 210 free, 246, 389
1268 Identifier (continued ) free occurrence of, 245 fresh, 221, 225, 255, 331 primitive name, 307 structured, 886 variable vs., 244 Identifier pattern, 590 desugaring in match, 598 Identity element of an operator, 40 Identity function, 1159, 1167 Identity of an object, 383 state and, 383–384 Identity substitution, 782 id→mc (metaCPS function), 1060 IdSet (identifier set domain), 1019 IE (input expression meaning function), 283 IF (Church conditional), 300 IF (SOS input function), see Input function of SOS if (metalanguage conditional), 1180, 1190, 1191 if (EL/FL conditional) desugaring in HOOK, 369 elseless if sugar in FLICK, 399, 401 in EL dialects, 25 in EL DS, 129 in FILcps , 1046 in FILlift , 1095 in FILreg , 1101 in FL dialects, 211, 214 in FLICK standard DS, 477 in FLK DS, 282, 283 in FLK SOS, 259, 261 in metaCPS conversion, 1061 in simple CPS conversion, 1051 as primitive operator, 267 type rule in μFLEX, 644 ifS (conditional function), 1164 [if ] μFLARE type rule, 775 FLARE/E type/effect rule, 953, 955 μFLEX type rule, 644, 646 type/exception rule, 988 [if-F] (FLK SOS reduction), 259, 261
Index iff (if and only if), 1151 [if-pure] (FLARE syntactic purity rule), 816 [ifR ] (μFLARE type reconstruction rule), 793 [if-T] (FLK SOS reduction), 259, 261 [ifZ ] (FLARE/E type/effect reconstruction rule), 963, 964 Ill-typed expression, 641, 645, 770 match expression, 593 metalanguage expression, 1172 Image of function (img), 1160 Immediate component expression, 1013 Immediate descriptor, 1124 Immediate subexpression, 1013 Immutable data, 397 named product, 353–359, 549–550, 677–678, 821–826 positional product, 541–549, 676–677 sequence, 544–545, 677 string, 548 tuple, 541–542, 676–677 updatable sequence, 545–547, 548 impeach (election construct), 490–492 Imperative programming, 384, 397, see also Stateful language essence, 397 examples, 400–404 Imperative type variable, 837 Implementation language, 7 [implicit-let] (FIL simplification), 1033 Implicit parameter, 343, 344 in Haskell, 379 [implicit-projection] (FLEX/M type rule), 921 Implicit subtyping, 713–717 Implicit type, 625–627, 774, see also Type reconstruction Implicit type projection, see Polymorphic value projection, implicit Import of module bindings, 333, 352, 889 Import restriction, 916 in [effect-masking] rule, 973 in existential types, 853, 858 in [letregion] rule, 992
Index in nonce types, lack of, 862 in universal polymorphism, 731, 732 Impredicative type system, 735 of FLEX/SP, 804 in (communication effect), 998 in (input function of DS), 151 incInt (metalanguage incrementing function), 1156 [inclusion] (FLEX/S type rule), 702, 703, 719 Inclusion function ( → ), 1159 Inclusion on types, see Subtyping Incomparable elements in partial order, 175 Incomplete row type, 823 In-degree of graph vertex, 1107 Indexing, see Product indexing Induction, 1168, see also Structural induction Inequality relation ( = ), 1163, see also !=; Relational operator +inf, -inf (extended integer), 568 Inference of types, see Type reconstruction Inference rule for type judgment, 645 Infinite data, see also Lazy (CBL) product; Nonstrictness; Stream in CBN FL, 217, 324 coroutine producer value sequence, 459 thunk implementation in CBV FL, 325 Infinite loop, see Divergence; Nontermination Infinite set, 1150 Information, of exception, 515 Information hiding, 901 Ingalls, Daniel H. H. (quoted), 362 Inheritance, 362, 366–367, 379 hierarchy, 362 multiple, 380 subtyping vs., 723–725, 767 init (initialization effect in FLARE/E), 946, 949 Initial configuration of SOS, 47 Initialization effect, 946
1269 inj (positional sum injection), 576 Injection, into sum value, 570 Injection function, 1176, 1189 Injective function, 1160 Inj k (metalanguage oneof injection), 1176, 1189 inleft (either injection), 574 Inlining, 1017, 1053, 1056, 1088, 1116 inlining strategy for globalization, 1017–1019 Inner class in Java, 338 closure conversion and, 441, 1082 [input] (ELM BOS rule), 77 InputExp (input value domain) in μFLARE, 778 in μFLEX, 664 in FLK, 258, 259–260 Input function of SOS (IF ), 47, 50 in FLICK, 407 in language L, see L, SOS Inputs (program inputs domain), 49, see also InputExp in EL, 152 in PostFix, 53, 152 inright (either injection), 574 install-cont (control-point invoking function), 498, 500 Instance method, 362 representing in HOOK, 366 Instance of class, 362 representing in HOOK, 366 Instance variable, 362 representing in HOOK, 366 Instantiation of type by substitution, 783 Int (integer set & semantic domain), 1148 in EL DS, 118 in PostFix DS, 132 int (PostFix2 command), 41 int (extended integer conversion), 568 int (integer type), 628, 629 [int] μFLARE type axiom, 775 FLARE/E type/effect axiom, 953 μFLEX type axiom, 644 type/exception axiom, 988
1270 int? (FL integer type predicate), 213 in FLK DS, 285 intAt (function in PostFix DS) definition, 143 specification, 133 Integer, see also Int; IntLit; Nat; NatLit; Neg; Pos; PosLit message-passing HOOK object, 364 message-passing SmallTalk object, 365 numeral vs., 57 Integer arithmetic module, 358 Integer range notation ([lo..hi]), 1149 Interface, 235, 333, 839, 889, 1171, see also Abstraction barrier; API in dependent type system, 870 in existential type system, 856 in Java, 724 of module, 352 Interference between expressions, 427 code optimization and, 1027–1029 interleave (list interleaving view), 607 Interleaving of observable actions, 997 Internal variable capture, 252 Interpreter, 7 ASTs processed by, 116 denotational semantics vs. program to specify, 128 for ELM, 241–242, 243 interpreted language, 623 metacircular, for FL, 242 Interprocedural register allocation, 1111 Intersection of records, 550 of sets (∩), 1149 [int?-F] (FLK SOS reduction), 259 intlist (integer list data type) def-datatype declaration in FLARE, 833, 834 def-datatype declaration in FLEX/SP, 738, 739 IntLit (integer syntactic domain) in EL dialects, 25 in FL dialects, 211
Index [intR ] (μFLARE type reconstruction axiom), 793 [int?-T] (FLK SOS reduction), 259 int-tree (integer tree data type) declared by def-data, 609 example sum-of-products type, 750 example types, 694 type specified with trec, 688 Invariant loop-invariant expressions and code hoisting, 1029 of representation, 402, 842 Invariant subtyping, 704–706 Inverse limit construction, 202 Invocation, procedure, see app Invocation frame, see Procedure call frame Irreducible (SOS configurations), 50 Irreducible SOS configuration ( ⇒ ), 50 IS ∈ IdSet, 1019 Isomorphism, 1160 isomorphic sets, 1160 Isorecursive type equivalence, 689, 699, 761 ISWIM (Landin’s If You See What I Mean language), 305 CBN variant, 378 CBV lambda calculus and, 378 Iter (iterator in Sather), 507 Iterating procedure, 507 Iteration, 390, see also Iterator; Loop Church numeral expressing, 298 continuations and, 447–449 factorial as example of, 400–401 in FL, 226 in FLIC via for, 489 in FLIC via loop, 488–490 in FLIC via repeat, 420, 489 in PostLoop via for/repeat, 103, 141 via label and jump, 496 looping constructs, 449 simulating state with, 390–391 tail recursion and, 1064
Index using recur, 227 while sugar, 399, 401 Iterative fixed point technique, 168–173 Iterative procedure, 449 Iterator, see also Iteration in C++, 507, 512 in CLU, 506 in FLIC, 507–513 in Java, 507, 512 stream vs., 556, 612 with success and failure continuations, 513 iterator (FLIC iterator construct), 507 J (Landin’s control operator), 537 Java abstract class, 724 answer domain, 474 array, as fixed-length mutable sequence, 561 array, as homogeneous product, 548 array subscripting notation, 561 as call-by-value language, 309 as explicitly typed language, 625 as language without block structure, 337 as monomorphic language, 658 as object-oriented language, 362 as stateful language, 384 autoboxing/autounboxing, 750 call-by-value-sharing of mutable products, 563 constructor method, 366 covariant array subtyping, 709 downcast, 723 dynamic and static type checking, 624 dynamic loading, 890, 892, 928, 941 dynamic loading and side effects, 926 effect system for tracking exceptions, 1001 exception handling with throw/try. . .catch, 514, 985 explicit typing example, 626 file-value coherence, 926
1271 garbage collection, 1120, 1121 generic types, 749, 768 immutable string, 548 implicit projection for generic methods, 918 inner class and closure conversion, 441, 1082 inner class as limited form of block structure, 338 integers and booleans as nonobjects, 365, 379 interface, 724 iterator, 507, 512 lack of universal polymorphism, 749–750 libraries, 239 object, 550 overloading, 748 program arguments as string array, 217 resignaling of exceptions, 514 return/break/continue, 445, 490 simulating higher-order procedures, 1076 standard library API, 235 strict application, 215 sum-of-products data via objects, 579 this as receiver parameter, 363 throws to specify exceptions, 514, 985 type coercion, 718 type conversion, 716 type information in compiled files, 926 type vs. class, 723 universal polymorphism, lack of, 627 vector, as dynamically sized mutable sequence, 561, 562 vector, as heterogeneous product, 562 void expression as command, 472 void type, 209 void vs. value-returning method, 385 JavaScript as dynamically typed language, 623 as object-oriented language, 362 prototype-based objects, 380 Jim, Trevor, 1117
1272 Johnsson, Thomas, 1117 join (list joining view), 607 join (of control threads), 997 Judgment kind, 760 syntactic purity, see Syntactic purity judgment type, 645 type/cost, 989 type/effect, 951 type/exception, 987 Jump, see also break; continue; goto; jump in assembly language, 1056 to represent procedure call, 1043, 1046, 1064 jump, 494–506, 717 computation-based DS for, 500 control effects and, 978–983 in μFLARE, 800 in μFLEX, 658–659 in FLICK SOS, 503–504 in metaCPS conversion, 1070–1075 in standard DS, 497 as sugar for cwcc, 506 as valueless expression, 498 [jump] (SOS transition rule), 503 K (lambda calculus combinator), 296 K ∈ Kind, 759 Kahn, Gilles, 112 Kelsey, Richard, 1116 Kernel of programming language, 207, 1186 Key/lock for data abstraction, 843–847 Keyword (keyword domain), 211 in language L, see L, syntax Keyword, reserved, 210 Kind, 758–767 well-kinded description, 760 Kind (kind domain) in FLARE/E, 957 in FLEX/SPDK, 759 Kind assignment, 761
Index Kind checking, 760–763 decidability of, 761 interactions with type checking, 762 Kind environment, 760 base kind environment, 761 Kind judgment, 760 knull (alternative to null), 611 kons (alternative to cons), 611 L ∈ Lit, see Lit L ∈ LogicalOperator, 25 L (literal meaning function) in EL, 129 in FLK, 283 l ∈ Location, 412 label (control point), 494–506 computation-based DS for, 500 control effects and, 978–983 duplication of continuation, 498 in μFLARE, 800 in μFLEX, 658–659 in FLICK SOS, 503–504 in metaCPS conversion, 1070–1075 in standard DS, 497 as sugar for cwcc, 506 [label] (SOS transition rule), 503 lam (FLK abstraction), 211, 214 free and bound identifiers, 247 in CBN vs. CBV SOS, 310 in FLICK standard DS, 477 in FLICK standard DS for exceptions, 520, 521 in FLK DS, 283, 286 in FLK ValueExp domain, 258 in lambda calculus, 291 in static vs. dynamic scope, 335 scope, 245 substitution, in FLK, 254 Lambda abstraction, 1165 Lambda calculus (LC), 290–304 Church boolean, 300–301 Church conditional, 300–301 Church numeral, 298–300 Church pair, 302
Index Church tuple, 225 combinator as program, 291 denotational semantics, 296–297 FLK vs., 291–294 higher-order polymorphic, 765 history, 305 normalization, 292–295 operational semantics, 291–296 polymorphic, 764, 768 recursion (Y operator), 303–304 second-order, 764 simplification, 291 simply typed, 674, 699, 764 syntactic sugar, 291 syntax, 291 untyped, 622 Lambda cube, 887 Lambda lifting, 1094–1096, 1117 Lambda notation, 1165–1170 for curried function, 1168 Lisp vs., 1169–1170 recursion, 1168–1169 Landin, Peter, 42, 112, 162, 305, 378, 537 lastStack (PostFix function), 93 alternative definitions of, 98 Latent effect (of a procedure), 944, 947 in control flow analysis, 996 inflating with [does], 955 latent control effect example, 981 latent cost effect, 988, 989 latent exception effect, 986, 987 latent store effect examples, 952 in security analysis, 999 unifying in reconstruction, 965 Latently typed language, see Dynamically typed language Laziness, see also Nonstrictness graph reduction and, 314, 440 lazy evaluation, 434, see also Call-by-need; Lazy (CBL) product lazy language, 209 lazy list, see Stream lazy parameter passing, 434, see Call-by-need
1273 modularity of, 440, 559, 612 Lazy (CBL) product, 552–555 denotational semantics of, 555 SOS, 553–555 LC ∈ Location, 406 LC, see Lambda calculus leaf (tree constructor), 608, 609 generic type of, 832 leaf? (tree predicate), 451, 608 leaf~ (tree deconstructor) generic type of, 832 least (extended FL construct), 267 Least fixed point, 173, 190 in FLK DS of rec, 286 solution to effect constraint set, 962 Least Fixed Point Theorem, 190 Least upper bound (lub), 176 leaves (iterator example), 509 left (tree selector), 451, 608 Left-hand side (LHS) of transition, 50 length (procedure), 236, 238 length (metalanguage sequence function), 1183 Length of transition path, 50 Less-than-or-equal-to relation (≤), 1163, see also