Table-Driven Code Generation - CiteSeerX

12 downloads 0 Views 4MB Size Report
Aug 25, 1980 - programs, written in a particular source programming language, into executable ... quence of target instructions which carry out the com- putation. ... proliferate, there is a continuing need for new compilers. .... to program variables and constants, and to compiler- .... Thus, the meaning of = is "store the value.
This research shows that the systematic organization provided by table-driven methods provides many more benefits than the ad hoc code generation techniques of the past.

Table-Driven Code Generation Susan L. Graham University of California, Berkeley

A compiler is a computer program which translates programs, written in a particular source programming language, into executable code for target computers of a particular design. Although the design of compilers has been studied for many years and many compilers have been written, certain design issues have not been satisfactorily solved. One of these is code generation. The traditional approach to code generation has been to provide, for each kind of operator or operand in the source language, a collection of routines to produce a sequence of target instructions which carry out the computation. Incorporated in the code generation task are storage allocation for intermediate results, register management, code "optimization" (i.e., replacement of longer or slower computations by equivalent shorter or faster ones), and instruction selection. Because of the complexity of the mapping from source language to target machine, and of the need for efficiency of various kinds, code synthesizers are large, complicated programs. Furthermore, because of the ad hoc way in which many of these programs are written, they are difficult to debug, modify, and maintain; hence, their reliability is sometimes questionable. Since compilers incorporate considerable knowledge about the structure and meaning of both the source language and the target machine, a new compiler must be produced for each new combination of source and target. Because both languages and architectures continue to proliferate, there is a continuing need for new compilers. Researchers have attempted to ease the work of producing compilers by developing methods to automate compiler writing. Computer scientists have had considerable success in automating production of the syntax analysis portion of compilers.' By using table-driven analysis methods and programs to construct the tables from a syntax description (usually a grammar), that aspect of compiler writing August 1980

has become not only easier but more reliable. It is easier to check that a grammar is an accurate syntax description than it is to check the implicit description embodied in the logic of a syntax analysis program. In addition, it is possible to prove once and for all that if the description is accurate, then the table-driven syntax-processing subroutine produced by the syntax-analyzer-generator is correct. Recently, researchers have turned their attention to the later stages of compilation. In attempting to provide tools to automate code generation, they have again turped to table-driven methods. In subsequent sections, we describe a method for generating instructions algorithmically from tabular information about the functional properties of the target machine. This approach to code generation has many of the same advantages as table-driven syntax analysis: reliability, ease of use, and the flexibility needed for a compiler-writing tool. By using a table-driven syntax analyzer, we can shift the analysis to a new language by giving a new grammar to the table-building program and then providing the new table to the analyzer. Similarly, by presenting a description of a new target machine to the code generator's table-building program, we can hope to retarget the code generator to produce code for the new machine. Our method depends in part on the way it is embedded in a compiler. In our view, one reason that code generation methods have not been improved more rapidly has been the lack of a modular approach to code synthesis. We will describe for the reader the assumptions we are making about the processing done by other parts of the compiler. Since the method described here represents research in progress, it may be fruitful to compare it with other researchers' work. In particular, the investigations of compiler-building methods being conducted at Car-

0018 9162/80/0800-0025$00.75

9" 1980 IEEE

25

negie-Mellon University,2 at IBM Research, Yorktown Heights,3 at the Technical University of Munich,4 and at Bell Laboratories, Murray Hill,5 are providing useful information about approaches to code generation and the other aspects of compilation on which this method depends. A subsequent section relates our work to these projects.

The setting We assume that a compiler has an analysis phase, or "front end," in which lexical and syntax analysis are performed, type-checking and other kinds of diagnostics are done, a symbol table is constructed, and other static semantic actions are carried out. The analysis phase yields an intermediate form of the source program in which details of the external representation of the program (comments, identifier spellings, spacing, etc.) have been removed, and in which the phrase structure determined by syntax analysis is apparent. The intermediate form is usually either a sequence of tuples or some sort of tree structure, sometimes termed the abstract syntax tree (Figure 1). In a one-pass compiler there may be no explicitly constructed intermediate form; however, the traversal of such a form is implicit in the sequence of steps of the syntax analyzer. The elements of the intermediate form usually can be divided into operands and operators. Typically, the operands, and sometimes the operators as well, possess associated attributes such as type, print-name, or scope. This information may appear as decorations on the tree nodes, or as tuple components, or as entries in the symbol table. The choice of representation of the intermediate form, while an important aspect of compiler design, is not germane to our discussion. Our concern here is the issue of what information is available to the code synthesis phase. It is also outside the scope of this discussion to consider the algorithmic mappings chosen for the source language. For example, the compiler writer or compiler-generating

source language:

xlii

:= a * b + a/b

intermediate form:

+

subscript

ri-ir a

I b

a

Figure 1. An example of the intermediate form.

26

b

program must decide what methods of representation and access will be used for source language types such as arrays, boolean values, sets, and records. Run-time support for dynamic storage allocation and reclamation, environment switching caused by procedure and function calls or tasking, and run-time exceptional condition and error handling must be designed. We assume that such decisions have been made. One of the code synthesis tasks is to allocate storage to program variables and constants, and to compilergenerated entities such as intermediate values, actual parameters, return addresses, and the target code itself. Typically, relative addresses within data blocks, program segments, or stack frames are assigned during compilation; actual locations are chosen by a loader or run-time allocator. We assume that assignment of storage locations (i.e., addresses in the computer memory) is logically separate from other aspects of code synthesis, and that there are compiler routines to carry out this task. However, the more limited resources, e.g., processor registers, are a separate issue. For the present, our target machine model will be an essentially sequential generalregister machine such as the PDP-l 1 or the IBM 370. We will return to the topic of machine architecture toward the end of the article. In a general-register machine, the registers are a limited and computationally valuable resource; their effective utilization can greatly improve the speed of a target program.* Since optimal register assignment is computationally intractable, it is conventional to use various assignment heuristics. Sometimes, certain registers have dedicated roles, perhaps for stack pointers, base addresses, or subroutine linkage. These conventions simplify, but do not resolve, the register management problems. Another aspect of the code synthesis problem is determining the order of the operations. In the tree shown in Figure 1, the order of evaluation is only partially specified. Both the multiplication and the division must precede the addition, and both the subscripting and the addition must precede assignment. The subscripting, multiplication, and addition operations, however, may occur in any order. A typical compiler generates code to evaluate operations as soon as possible after they occur, that is, as soon as code has been generated to evaluate their arguments. However, better register utilization and more efficient target programs often can be obtained by rearranging the order of computation. One reason why such a rearrangement helps is that the intermediate results of evaluating subexpressions need not be stored in memory, since these results have no further use after they are used as arguments. (We will defer the issue of common subexpressions.) Consequently, it is better to evaluate subexpressions as close as possible to the use of their values. It is also advantageous to evaluate the arguments of an n-ary operation in decreasing order of complexity, since each result already evaluated will be *Storage requirements can also be reduced by intelligent register manage-

nment

COMPUTER

"tying up" a register until it is used. Every compiler must include some algorithm to order the computation. Having decided the algorithmic mappings, allocated the resources (except, perhaps, for the registers), and determined the order of evaluation, the remaining task is the selection and generation of a sequence of instructions to perform the computation. Even at this stage, instruction selection is nontrivial. On virtually any machine, there is a choice of instructions to carry out the same operation. The choice may concern the location or addressing mode of the operands, the existence of special-purpose instructions, or the use of alternative operations in some situations. In addition, there is normally a large amount of detail. The many choices for the many operators create a complex situation out of individually simple cases. Let us assume that we are generating code from an intermediate form such as that in Figure 1. Since operators cannot be applied until their operands are evaluated, a simple code-generation strategy might be a bottom-up, left-to-right traversal of the tree in which, as each node is visited, code is generated to evaluate it. If we assume a general-register machine, the code for a leaf might load the indicated value into a register, and the code for an intermediate node might carry out the operation in a register, taking both operands from registers. If we assume a sufficient number of registers, an architecture in which the appropriate register-register instructions exist, and a tree whose operations correspond to single instructions, that algorithm would yield correct code. However, as most experienced programmers will recognize, the algorithm normally would not yield very good code. The deficiences arise from several sources: (1) The same computation might be repeated more than once, because so-called common subexpressions had not been detected. To simplify our discussion, assume that a logically separate code optimization phase will replace all but the first use of a common subexpression with a reference to its computed value.* Assume further that the code optimization phase pulls invariant computations out of loops, propagates constants, and so on. Allen7 provides a summary of typical optimizations. (2) It is unrealistic to assume that there will always be enough registers. The strategies for handling this problem divide into on-the-fly strategies and preplanning strategies. In an on-the-fly strategy, a register management routine provides a register when requested, perhaps by storing and subsequently reloading its previous value. A preplanning strategy inspects the computation before generating instructions and determines which intermediate values are to be stored in memory or recomputed, in order that the supply of registers not run out. (3) It may be possible to evaluate some of the operators without having the operands in registers or leaving the results in a register. Such computation may avoid load instructions and reduce the number of registers in use. The strategy outlined above precludes such instructions. *This assumption is really an oversimplification of the

August 1980

problem.6

(4) There may be complex instructions which carry out more than one of the operations in a particular subtree. (For instance, an "add 1 to memory location" does both an add and a store.) The algorithm will not find such instructions. (5) It may be necessary to have the operands in certain registers, or to have certain additional registers available. Typical examples are multiplication or division using register pairs, or the CDC 6000 series load and store conventions. If such registers are present, these subgoals can be achieved by register-to-register moves, although such moves are "extra" instructions. (6) An operation higher in the tree may disregard part of its operand, which consequently need not have been produced. For instance, a conditional branch uses the truth value of the associated conditional expression but perhaps not a bit pattern representing the value. In the expression "a + b < 0," the sign of the sum is needed but its value is not. To store an operand in memory, the address of the destination is used but its (previous) value is not.

Many of these complications for code generation could be eliminated by using a different repertoire of instructions. However, the compiler writer usually cannot change the target machine. Consequently, a more complicated code generation strategy is needed. We defer for the moment the first two issues. The remaining four limitations all stem from a lack of information about the context in which the local evaluation takes place. Whether an operand needs to be in a register depends in general on the operation and the location of the other operand(s). Similarly, whether an operand must be in a particular set of registers depends on the operation. Clearly, issues (4) and (6) depend on context. The context information can be provided to the code generator in various ways. One is to indicate the context explicitly in each node of the tree. That solution will make the tree larger, of course, and may require additional computation in constructing the intermediate representation. Another way to provide context is to abandon the notion of a strictly bottom-up traversal. If information is propagated down the tree, then local decisions can be based on goals generated by the context. If, for example, the code generator has seen an add operator, the goal for its left subtree might be to ensure that the value represented by the subtree was addressable by one of the add instructions. The possibilities might include registers, stack locations, and indexed memory locations. From another point of view, we could regard the code generator as being in a certain state in which code for a particular node is generated. Using this approach, we can propagate context as the traversal takes place rather than altering the representation in advance.

Top-down algorithms Continuing to defer the issues of code optimization and register allocation, let us consider generating code for the 27

intermediate tree in Figure 2. The tree represents the Pascal statement A Bt.Y + C, where A and C are integer variables, B is a pointer to a record with an integer field Y, t is the dereferencing operator, and = is the assignment operator. The nodes of the tree are numbered for reference. We use a low-level intermediate language in which the storage mapping is explicit. Thus, r2 designates the display register (stack pointer) and a,b,c the relative addresses for the usual stack implementation of a language supporting recursion.* In the intermediate form, the distinction between addresses and values is explicit; the symbol t again designates the dereferencing, or "contents of," operator. Figure 3 shows a simplified set of instructions for a hypothetical computer. It gives the assembly language form of each instruction; a tree illustrates the computation carried out by each instruction. For instructions which leave their results in registers, such as "add," a store operation is not explicit in the tree. Rather, the location of the result is indicated by a bracketed label at the root. Thus, the meaning of = is "store the value described by the right subtree in the memory location described by the left subtree." By convention, the first operand in the assembly language instruction is the destination for loads and adds and the source for stores. The constants are all integers. The notation -const denotes a literal value (rather than an address). If the operation symbols used in the intermediat'e form have the same meanings as they have in describing the instructions, we can use a top-down traversal of the *We assume that r2 is a dedicated register assigned by an earlier stage of the

compilation.

tree in Figure 2 to derive context information with which to select instructions from Figure 3. By assuming that

the meanings of the symbols are the same, we can reduce the search for instructions to symbolic pattern matching. The algorithm would proceed as follows. Starting at the root in Figure 2, we will need to generate an instruction to compute the ": =" operation, namely, a store instruction. As subgoals, the instructions which compute a + r2 must leave that number (an address) in a form usable by one of the store instructions, i.e., in a form which matches a left subtree of one of the store instructions. Also, the instructions to compute "t(t(b + r2) + y) + t(c + r2)" must leave the result in a register. By continuing the traversal, we discover that "a +r2" already

STORE INSTRUCTIONS

F

reg

reg

const

reg

reg

reg

const

store reg, const(reg)

store reg,const

store reg O(reg)

LOAD INSTRUCTIONS (Q ]CONTAINS DESTINATION) t [reg] const [reg] reg [reg] -I

=

(1) const

+ (2)

+ (5) I.

I

a (3)

reg

load reg,const(reg)

load reg, = const

move reg,reg

F-I-,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (6) t

r2 (4)

Fl', + (7)

y(12) c(15)

t (8)

ADD INSTRUCTIONS

t (1 3)

+

+

(14)

[reg]

+

[reg]

+ [reg]

rliH,rH const

reg

t

reg

reg

reg

r2(16)

+ (9)

F-Ill

b (10)

r2 (11)

Figure 2. Input to code generator.

28

const

reg

add reg,=const ,add reg,const(reg)

add reg,reg

Figure 3. Instruction descriptions.

COMPUTER

matches the left subtree of the first store instruction, since a is a constant and r2 is a register. In addition, we might discover that we could generate the first add instruction and then use the third store instruction. However, the register management routine presumably would rule that out because it would destroy r2, the display pointer. Another possibility is to use the first load instruction followed by the third add instruction to put the value of "a+ r2" in some other register. The next subgoal is to generate instructions which compute the right subtree of ": =" in Figure 2, leaving the result in a register. We examine all the instructions which put their results in registers. Since node 5, the root of the right subtree of ":=", is a +, we choose as subgoals at node 5 the add instructions, all of which leave their result in a register. In order to use an add instruction, the subgoals at nodes 6 and 13 are to generate instructions to compute the arguments of +, leaving the results in a form (actually an addressing mode) matching one argument of the add instructions. Since add is commutative, either subtree can match either argument, as long as the combination matches an instruction. Looking first at the left subtree, we have two choices (i.e., two subgoals) since node 6 is a t. The first possibility is to use the second add instruction by matching its left subtree. The second possibility is to generate instructions which leave the value of the subtree rooted at node 6 in a register. (We can rule out a subgoal destination "const" by observing that no instruction yields a compile-time constant as its destination.) Further inspection shows that there is an instruction which leaves its result in a register and also does an "t" operation, namely, the second load operation. Moving to node 7, we find that both candidate instructions (the second load and the second add) continue to match. At node 8, by noticing that node 12 is a constant and again appealing to the commutativity of " +", we again have as a subgoal an instruction which computes t and leaves its value in a register. The second load instruction is now the only possibility and matches the remaining subtree. At this point, we generate the load which computes t(b + r2) and puts the result in a register (this register, which we will call r3, is provided by a register allocator). That instruction computes the subtree rooted at node 8. Returning to node 6, we can either generate a load instruction to compute t(r3 + y), leaving open the possibility of using any of the add instructions, or indicate a match of the first subtree of the second add instruction. In the former case, the corresponding subgoal at node 13 is any operand of an add instruction; in the latter case, the subgoal is a register. In the remainder of the traversal, we discover that the subtree for t(c + r2) rooted at node 13 matches the first subtree of the second add instruction and that it can also be computed by the second load instruction, leaving the result in a register. Thus, if we disregard other possibilities for the assignment destination, the algorithm finds three sequences of instructions (Figure 4).* The first instruction is the same in all three sequences. That was apparent when the instruction was generated, August 1980

since there was a single subgoal. A choice at node 6 to generate the load instead of "working on" the second add instruction yields both the worst sequence and the best. The third sequence is worse than the others because it has an additional instruction. When the third load is generated, it is clear from the subgoal information that the add could be performed without the load. Consequently the load need never be generated. The first sequence is better than the second because it has the same set of instructions but uses fewer registers. However, that is true only because the second argument of the addition can be evaluated without using a register, a fact not known until the second operand is scanned. We will return to this example. Suppose we wish to have a code generator which, for simplicity and efficiency, fixes the order of traversal and chooses instructions as it finds them rather than summarizing all the possibilities first. There must then be a strategy for making choices on-the-fly. One plausible heuristic is not to generate an instruction if it can be avoided. In the example, when the traversal returns to the "t" at node 6, having traversed its subtree, there is a choice between generating the load or continuing to "work on" the second add instruction. Applying the heuristic would cause the load to be rejected. No matter what heuristic is used, an algorithm of this kind may block, i.e., prevent possible choices at some later stage because of an earlier choice. If the second operand of the second add instruction in Figure 3 were a constant, for example, the algorithm would block unless the load had been chosen at node 6. Note that the load would be a safe choice in this example, since it would leave open all possibilities for the second operand to add. In general, however, there need not be a safe

choice.

There are several ways to deal with this difficulty. One is to modify the algorithm so that if it blocks, it goes back and makes a previous choice differently. This is essentially a backtracking solution. Another possibility is to find a sufficient condition under which blocking

*We also disregard other possible sequences by reordering the same instructions, although this approach could generate them. Note that there would be many more possibilities if the instruction set had a nonindexed load.

load load add store

r3, r3, r3, r3,

b(r2) {subtree rooted at node 8} y(r3) {root at node 6} c(r2) {root at node 5} a(r2) {root at node 1 }

r3, b(r2) {subtree rooted at node 8} r4, c(r2) {root at node 13} r4, y(r3) {root at node 5} store r4, a(r2) {root at node 11

load load add

load

r3, r3, r4, r3, store r3,

load load add

b(r2) {subtree rooted at node 8}

y(r3) c(r3) r4 a(r2)

{root at node 6} {root at node 13} {root at node 5} {root at node 1

Figure 4. Possible instruction sequences. 29

reg.1 ::= + t + const.1 reg.2 cannot occur for a given instruction set, and to use the add reg.1, const.l (reg.2) algorithm only when that condition is satisfied. Another possibility is to identify those situations in which blockThis description carries more information than we ing is possible and to either make a safe choice or use saw in Figure 3. The tree describing the computation has more information (for example, a traversal of subse- been replaced by the sequence of nodes to the right of quent subtrees, i.e., lookahead) to make the decision. ":: =", which correspond to a prefix tree walk. The instruction destination is to the left of "::-". The qualifications ".1" and ".2" on "const" and "reg" A top-down deterministic algorithm serve two purposes-to show which quantities in the instruction tree correspond to which ones in the assembly The preceding discussion motivates the method of code language instruction, and to enforce the rule that repetigeneration we have been studying.8-'0 In collaboration tions of the same qualification on the same kind of node with R. S. Glanville, we have developed a code generation designate the same quantity. Because of the latter, the algorithm which traverses a sequence of trees representing destination register is the same as the second argument the low-level intermediate form of a program. The traver- register. Other parts of the machine description would sal is depth-first, left-to-right, without backup, i.e., a indicate that the operator + is commutative, that the prefix walk of the tree." Since there is no backup, each possible constants are integers in a certain range, and tree can be represented by a linear sequence of nodes in that the registers are designated in a particular way (RO, prefix form, i.e., with each operator (interior node) RI, R2, etc.). Other kinds of qualification are also preceding its operands (subtrees). For instance, the nodes possible. For example, an instruction may incorporate a in the example in Figure 2 would appear in the order in particular constant such as 1 or a particular register such which they are labeled.* as RO, may require that two registers be different, or Conceptually, the choices made by the algorithm occur may use register pairs. after a subtree has been traversed. A decision is made Although most instruction descriptions correspond to either to generate an instruction to compute the subtree single machine instructions, it is also possible to have a (and, if so, which instruction to generate) or to incor- description with no accompanying instruction (to record porate the subtree computation in a larger subtree of a change of state in the code generator) or one with a sewhich it is a part. In the current implementation, the first quence of instructions. For instance, an instruction choice is always to avoid generating an instruction (by us- description could be associated with an operation of the ing an instruction for a larger subtree) and otherwise to intermediate language that was not directly represented choose the "best" instruction. The best instruction is the in the hardware of a given target machine. one which is cheapest by whatever measure. Thus, in the The resemblance of the instruction description to a example the choice for the left subtree of ":#" would be to context-free grammar rule is more than coincidental. incorporate the computation in the first store instruction Many of the ideas behind the table construction and in(rather than loading "a" and doing a register-register struction selection algorithms are drawn from LR parsadd)> The load for the subtree rooted at node 8 would be ing,1 although the details differ significantly. The generated (since there is no choice). The load at node 6 subgoal generation and pattern matching described would not be generated; consequently, a load would be previously are carried out as a form of syntax analysis. generated at node 13, yielding the second code sequence. The ambiguity stemming from choosing among matches Notice that if the right subtree of node 5 were traversed is resolved by fixed rules built into the table construcbefore the left subtree, the first code sequence in Figure 4 tion. Therefore, there is no search or tree walk. would be generated. Thus, as one would expect, the order The instruction descriptions could include instrucof traversal can affect the quality of the code that is pro- tions having identical computation trees. (They would duced, since instructions are generated on-the-fly. A have differing qualifications, however.) Figure 5 shows preplanning strategy for this sort of code generator would an add instruction and an incremnent instruction with be to choose the appropriate traversal order and provide similar trees. The increment instruction is a constrained the corresponding sequence of nodes as input to the code form of add in which the constant has the value 1. generator. The qualifications on the instructions are checked The primary reasons for using a left-to-right, no after an instruction match has occurred, in a fashion backup traversal and a simple rule for making choices analogous to compiler semantic rules (although these are efficiency and automation. The process described "semantics" are automatically generated). If there is above-determining subgoals, seeing which subgoals more than one rule with the same computation tree (excan be satisfied, and then choosing among them-can cept for qualifications), the rules are ordered by some be carried out by a table-driven algorithm and an cost measure. If that tree is selected, the various sets of automatically constructed table. qualifications are checked in the determined order and The table construction program takes as input a the first instruction whose qualifications are satisfied is description of the target machine. We do not give a generated. detailed description here. As an example, the second The table generator ensures that for each syntactic inadd instruction of Figure 3 would be described as struction tree there is some unrestricted instruction or sequence of instructions to generate. For example, the *The return to an interior node after traversing one of its subtrees is add instruction in Figure 5 is unrestricted, since *he handled by the code generator internally and need not be reflected in the qualifications only indicate fields in the assembly input sequence. 30

COMPUTER

language instruction. The absence of an unrestricted sequence would constitute another kind of blocking. The table generator checks the machine description for a sufficient condition which, if satisfied, guarantees that blocking cannot occur. At present, in the rare case that the no-block condition fails, no code generator is produced. However, in this case it would also be possible to have instead a backtracking or lookahead version of the code generator. In addition to their speed, the table generator and table-driven code generator have several other advantages. First, we have proven that the code generator never loops and always produces correct code for wellformed input., By "correct code" we mean correct as long as the machine description is accurate. Input is well-formed if it corresponds to a well-formed tree; i.e., the operators have the correct number of operands of the proper kinds. By analyzing the instruction set, the table generator can determine the conditions satisfied by well-formed input. These conditions are easy to express and check. The code generator can easily check, as the input is read, that it is well-formed. Alternatively, the implementer can ensure that the compiler "front end" generates only well-formed input. Second, due to the nature of the search process, the code generator considers all the instructions. It will exploit addressing hardware and special-purpose instructions when the opportunity arises. It can also find and use machine "idioms" by describing them as pseudoinstructions. For example, multiplication by a power of two can be a description of a shift instruction on some machines. Third, the code generator appears to be very fast. The quality of the code it produces is very good, given the amount of information available about subsequent input (i.e., the algorithm generates locally good code and finds the more powerful special-purpose instructions). One can use such an algorithm during compiler development. Initially, the code generator can be incorporated without any preplanning or optimization. A strategy such as that described by Ammann'2 can be used to allocate registers and to choose which ones to spill if there are not enough. The compiler phase, which produces input to the code generator, can be modified or rewritten to provide op-

timization, expression reordering, or preplanning. Those modifications might add auxiliary information to the code generator input, such as common expression identification or usage counts, but would not greatly change the code generator. Meanwhile, the code generated without the benefit of the code improvement techniques under development can provide valuable feedback about what sort of code improvements to try

the new. The other part of retargeting-changing the source-to-target mappings to reflect the semantics of the source language and to exploit the architecture-involves modification to earlier phases of the compiler.* It has been our experience that an implementer using a machine manual can easily write new instruction descriptions. Hence, although one could develop a program which generates the descriptions automatically from a lower-level representation such as ISP,13 we have not found the need for such a tool.

The portable C compiler The portable C compiler5 is a translator for the Unix** system's programming language C. It was written by S. C. Johnson with the goals that it be easily retargetable and provide code of reasonable quality. It has been retargeted successfully on over a dozen machines. The compiler uses a table-driven code generation method somewhat different from the method discussed above. A preplanning strategy is used to avoid having the code generator run out of registers. So-called SethiUllman numbers'4 are assigned to the nodes of expression trees to determine which temporary values should be stored. Using those numbers, the compiler generates code for the subtrees rooted at nodes representing stored temporary values before it generates code for the expressions which use those values. Thus, the code generator has as input a tree for which it will not run out of registers. The code generator uses a table containing a collection of templates. In simplified form, each template contains a pattern to search for in the input tree; some subgoals (i.e., destinations) attained by the code associated with the template; resources, such as registers, used by the associated code; a rewriting rule for the input tree; and an encoded form of the assembly language instructions to be generated. The code generator is driven by a template-matching algorithm which attempts to find a template and an associated goal that matches a portion of the input tree and its associated goal. When a match is found, code is generated and the tree is transformed to reflect the computation. *For example, subprogram linkage conventions may have to be redesigned.

**Unix is a trademark of Bell Laboratories.

ADD INSTRUCTION +

[regi1]

INCREMENT INSTRUCTION +

[regi]

to obtain.

Compilers using these code generation techniques can be retargeted in a reasonably straightforward way. const. 1 const = 1 reg .1 reg.1 Retargeting has several aspects. Changing the code generator to generate instructions for the new machine add reg. 1, = const. 1 incr reg.1 can be accomplished by writing a new machine description, running the table generator to produce a new table for the code generator, and replacing the old table with Figure 5. Similar instruction trees. August 1980

31

Unlike our algorithm, in which an efficient matching process is obtained by preprocessing the instruction descriptions and by requiring that they satisfy a sufficient condition, the portable C compiler uses a more general heuristic search mechanism over a more general class of templates. The templates are written by the implementer rather than being generated automatically. Both the quality of the generated code and the speed of the code generator can be significantly affected by choices in the design of the templates. The added generality appears to be both a strength and a weakness. The primary strength lies in the handling of situations which might be problematical in our more restricted setting. The weaknesses are slow searching and no other way than the usual compiler testing to ensure that correct code is generated or that the code generator will not block. One can prevent blocking by adding a template to cover the case in question, but the missing template is not discovered unless the input that needs it is provided. In this code generator as well as in our own, there is a considerable benefit because the information about the target machine is contained in a table or data base rather than being more extensively scattered throughout the compiler code. Not only does such an organization facilitate retargeting, but it also has made it much easier for implementers other than Johnson to understand, modify, and debug versions of the compiler.

Code generation in POCC The Production-Quality Compiler-Compiler project described elsewhere in this issue (pp. 38-49) is an attempt to design a compiler-generator system which yields compilers capable of generating very good code. Its approach to code generation'5 has many similarities to our own. PQCC code generation is, again, table-driven and, like the portable C compiler, relies on a preplanning strategy. The phases prior to actual code generation, notably DELAY and TNBIND, go through a pseudocode generation process to determine addressing modes and to allocate temporary storage and registers. Like our generator, the PQCC code generator uses trees as patterns. The patterns, which are part of productions, are derived by a code-generator-generator'6"'7 from the production-like input/output assertions describing the instructions. As in the portable C compiler, generation is carried out by a goal-directed heuristic search which matches pattern trees to the tree input to the code generator. A match may cause the generation of more than one instruction. In the case of more than one match, a cost criterion is applied. The code-generator-generator-CGG-is similar to our table generator. The input/output assertions describing each machine instruction are analogous to our instruction descriptions, except that there can be more than one assertion for a given instruction (often indicating a side effect). One of the tasks of the CGG is to produce a separate code generator pattern for each effect of the instruction, producing extra actions to compensate if necessary. An August 1980

example from Cattell'7 is, given an instruction that stores into both memory and a register, to describe the register store alone and precede the generated instruction with allocation of a dummy location. In our method, the separate descriptions would be provided by the implementer. The other patterns constructed by the CGG augment the input/output assertions by means of patterns which ensure that there are * productions to transfer between every pair of addressing modes, * at least one production for each operator of the PQCC intermediate language TCOL, and * productions for the control operators such as whiledo and if-then-else. The new productions are derived by running the code generation algorithm on "built-in" input trees that describe the situations to be covered, and by applying axioms to exploit the properties of the TCOL operations. It is interesting to contrast the design of the CGG with that of our table generator. Our test for blocking automatically ensures that it is possible to generate code for all the necessary changes of access mode. It Would be easy for the implementer to use an instruction description to describe each transfer requiring more than one machine instruction, but such cases occur rarely in practice. We have not designed a fixed set of operations for an intermediate language. In using a particular machine description and set of intermediate operations, it is easy to check whether each operation occurs in an instruction description. The blocking test then determines whether the possible operands are general enough. The implementer must supply the required additional descriptions, which might correspond to more than one actual instruction. It also would be possible to add the control descriptions, although our present formulation assumes that the intermediate language is already at the level of labels and jumps.

WVe have described three different forms of tabledriven code generation. In our presentation we have focused on their similarities rather than their differences. One of the chief differences is the role played by the implementer in designing and changing code generators. The portable C compiler demands the most from the implementer, who must specify the template tables and ensure that their interaction with the code generator is correct. This design is the least automated of the tliree but, because it has yielded a large number of working compilers, is the most extensively validated. The Production-Quality Compiler-Compiler demands the least from the implementer. The TCOL input language has been specified and is intended to be sourcelanguage-independent. The code-generator-generator has complete responsibility for devising a code generator from an automatically generated machine description. All aspects of the TCOL/code generator interface are 33

handled by the CGG; in particular, the implementer provides no information about how the instructions might best correspond to the intermediate language. Our approach lies somewhere between these extremes. We have left it to the implementer to provide the instruction descriptions in a way that has proven to be quite easy. The table is produced automatically from those descriptions and is used by an algorithm which generates locally good, correct code. However, the implementer can incorporate special knowledge about uses of the instructions. The table generator computes from the machine description the simple properties that the input to the code generator must satisfy. The implementer can fix discrepancies in the interface either by adding instruction descriptions or by modifying an earlier stage of the compiler. The implementer could exploit a hardware multidimensional array-indexing instruction, for example, either by introducing a high-level indexing operator used both in the intermediate language and in the instruction description, or by providing a low-level description of the instruction. The former solution would be more efficient; the latter would be more general. Both could coexist in the code generator, of course. In making decisions which trade generality for efficiency, we have been more willing than the others to sacrifice potential loss of generality for gains in simplicity and efficiency. In part, oi&r motive is to see how far we can push this approach. However, our design goals also stem from a belief that it should not be necessary to design large complex code generators to produce highquality code for baroque instruction sets. If we can identify the properties that make an instruction set a natural target for translation of high-level languages, we can provide useful information-beyond mere intuition-on which computer architects can base their designs.

The table-driven approach to code generation appears feasible and worthwhile. However, more work needs to be done. Our methods and those of other researchers can probably be integrated into an even better system. Controlled experiments are needed to identify the remaining problems. Nevertheless, it is already clear that the systematic and modular organization provided by table-driven methods is far better than the ad hoc code generation techniques of the past.E

2. B. W. Leverett et al., "An Overview of the Production Quality Compiler-Compiler Project," Tech. Report CMU-CS-79-105, Carnegie-Mellon University, Feb. 1979. 3. F. E. Allen et al., "The Experimental Compiling Systems Project," Report RC6718, Computer Sciences Dept., IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y., Sept. 1977. 4. "Introduction to the Compiler Generating System MUG2," Report TUM-Info 7913, Institut fur Informatik, Technische Universitat Muinschen, May 1979. 5. S. C. Johnson, "A Portable Compiler: Theory and Practice, " Conf. Record 5th Ann. ACM Symp. Principles of Programming Languages, Jan. 1978. 6. A. V. Aho, S. C. Johnson, and J. D. Ullman, "Code Generation for Expressions with Common Subexpressions," J. ACM, Vol. 24, No. 1, Jan. 1977, pp. 146-160. 7. F. E. Allen and J. Cocke, "A Catalog of Optimizing Transformations," in Design and Optimization of Compilers, Prentice-Hall, Englewood Cliffs, N.J., 1972. 8. R. S. Glanville, "A Machine Independent Algorithm for Code Generation and Its Use in Retargetable Compilers," PhD dissertation, Univ. of California, Berkeley, Dec. 1977.

9. R. S. Glansville and S. L. Graham, "A New Method for Compiler Code Generation," Conf. Record Fifth ACM Symp. Principles of Programming Languages, Jan. 1978.

10. S. L. Graham and R. S. Glanville, "The Use of a Machine Description for Compiler Code Generation," Proc. Third Jerusalem Conf. Information

11. 12.

13. 14. 15.

Technology, North

Holland Pub. Co., Aug. 1978. D. E. Knuth, The Art of Computer Programming, Volume I-Fundamental Algorithms, Addison-Wesley, Reading, Mass., 1968. U. Ammann, "On Code Generation in a PASCAL Compiler," Software-Practice & Experience, Vol. 7, No. 3, June/July 1977, pp. 391-423. C. G. Bell and A. Newell, Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971. R. Sethi and J. D. Ullman, "The Generation of Optimal Code for Arithmetic Expressions," J. ACM, Vol. 17, No. 4, Oct. 1970, pp. 715-728. R. G. G. Cattell, J. M. Newcomer, and B. W. Leverett, "Code Generation in a Machine-Independent Compiler,"

ACM Sigplan Symp. Compiler Construction, Boulder, Colo., Aug. 1979. 16. R. G. G. Cattell, "Formalization and Automatic Derivation of Code Generators," PhD dissertation, CarnegieMellon Univ., Pittsburgh, Pa., Apr. 1978. 17. R. G. G. Cattell, "Automatic Derivation of Code Generators from Machine Descriptions," ACM Trans. Programming Languages and Systems, Vol. 2, No. 2, Apr. 1980.

Acknowledgment We did much of our recent work on code generation in

collaboration with Robert Henry, who provided helpful comments during the preparation of this article. This article is based on research sponsored by the National Science Foundation, under grant MCS74-07644-A04.

References 1. A. V. Aho and J.D. Ullman, Principles of Compiler Design, Addison-Wesley, Reading, Mass., 1977.

34

Susan L. Graham is an associate professor of computer science at the University of California, Berkeley. Her research interests include programming language implementation and design and the design of programming tools. Graham received the AB degree in mathematics from Harvard University and the MS and PhD degrees in computer science from Stanford University. A member of ACM and the IEEE Computer Society, she is editor-in-chief of the A CM Transactions on Programming Languages and Systems.

COMPUTER

Suggest Documents