A systematic approach to advanced debugging: incremental compilation

8 downloads 37259 Views 610KB Size Report
Software Systems Research Center. Llnkoping ... machine-independent - It calls the incremental compiler ... The term "advanced debugging" In the title of this.
A S V S T E M A T I C A P P R O A C H to

A ~ V '~NCED DEBUGGING through INCREMENTAL COMPILATION

* * Preliminary Draft * * Peter Fritzson S o f t w a r e Systems Research C e n t e r Llnkoping University S-581 83 Linkoping, Sweden Phone: 1 9 - 1 3 - 1 1 1 7 0 0

Extended abstract=

This paper p r e s e n t s t w o topics: Implementation o f a debugger through use of an Incremental compiler, end techniques f o r f i n e - g r a i n e d incremental compilation. Both the d e b u g g e r and the compiler are components of the highly Integrated programming environment DICE ( D i s t r i b u t e d , Incremental Compiling Environment) which alms at providing programmer s u p p o r t in the c a s e w h e r e the programming environment r e s i d e s in a host computer and the program, is running on a t a r g e t computer t h a t Is c o n n e c t e d to the host.

Commands to the debugger command level includes all legitimate PASCAL statements. The debugger is machine-independent - It calls the incremental compiler which g e n e r a t e s code f o r evaluation o f commands, or modifies the machine code o f the t a r g e t program f o r insertion of breakpoints etc. Essentially all m a c h i n e - d e p e n d e n c e s are isolated Inside the code g e n e r a t o r o f the Incremental compiler. Debug code at conditional breakpoints etc. is e x e c u t e d as e f f i c i e n t l y as t h e compiled code o f the normal program since it also has been compiled by the Incremental compiler. Essentially no e x t r a code is inserted Into the t a r g e t program in order to be able to debug it.

Incremental compilation s t the s t a t e m e n t level Is s u f f i c i e n t l y f a s t to make it usable by the debugger, since recompilatton usually is an or der o f magnitude f a s t e r than p r o c e d u r e - w i s e recompilation. This will permit good i n t e r a c t i v i t y e v e n a f t e r changes to certain global declarations, The e x t r a Information n e e d e d in the program d a t a b a s e to s u p p o r t Incremental compilation is almost~identical t o t h a t n e e d e d f o r debugging. The system allows .continued e x e c u t i o n aft;er • most program changes.~ The algorithms for incremental compilation are c e n t e r e d around a t r e e r e p r e s e n t a t i o n o f the source program, and are applicable to a wide class o f languages. Editing may be done by s t r u c t u r e editor or a t e x t editor coupled to an incremental p a r s e r . The editor marks new or changed nodes in t h e t r e e o f the c u r r e n t procedure. Recompllatlon,,of changed s t a t e m e n t nodes is performed during a p r e o r d e r t r a v e r s a l o f the t r ee. New and old machine code is merged, end branch instructions in the old code are updated. Branch updating can be incremental on t h e s t a t e m e n t level f o r languages with well-formed control s t r u c t u r e s on machines with PC- r el at i ve goto instructions. Incremental comPilation is consistent, I.e. no d e g e n e r a t i o n o f code quality occurs a f t e r many edit-recompilation cycles. Dynamic linking Is made e a s y by generating p o s i t i o n - i n d e p e n d e n t code and having an e x t r a level o f indirection f o r p r o c e d u r e calls.

This r e s e a r c h was s u p p o r t e d by the Swedish Board o f Technical Development.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fe¢ and/or specific permission. ~) 1983

ACM

0-89791-111-3/83/007/0130

$00.75

130

Program database.

Introduction. Two possible definitions o f incremental compilation are given In ( E a r l y - 7 2 > . First, the incremental compiler Is t o e x p e n d an e f f o r t during the translation p r o c e s s t h a t Is roughly proportional to the size of the change to the source program made by the programmer. A second possible t a s k f o r an Incremental compiler Is to allow the programmer t o e x e c u t e his program up to a certain point, stop and edit his s o u r c e file, and then resume e x e c u t i o n where he stopped. In the following we assume t h a t an incremental compiler, or b e t t e r : an Incremental system, has both of t h e s e properties. We will s e e t h a t capabilities of a symbolic debugger can be g r e a t l y e x t e n d e d through usage o f such an Incremental compiler. The term " a d v a n c e d debugging" In the title o f this paper r e f e r s to a debugging environment where all source language s t a t e m e n t s can be given as commands, and w h e r e a s t a t i c analysis d a t a b a s e is available f o r query. Such a debugging environment e x i s t In INTERLISP. Examples o f incremental programming environments f o r b l o c k - s t r u c t u r e d languages with n e s t e d s t a t e m e n t s t r u c t u r e are INTERLISP ( T e l t e l m a n - 7 8 > , Cornell Program S ynt hesi zer , GANDALF , PATHCAL , ECL , etc. The smallest compilation unit in systems like INTERLISP, GANDALF and ECL is the procedure body. INTERLISP, ECL, the Synthesizer and PATHCAL all permit continued e x e c u t i o n a f t e r program modifications, since t h e s e systems contain Interpreters. GANDALF is a compiling environment t h a t permits continued e x e c u t i o n a f t e r local modifications; it does not currently (June 1 9 8 2 ) k e e p t r a c k of the machine-code positions o f s t a t e m e n t s a f t e r recompllatlon (EIIJson-82>. This paper gives shows how an Incremental compiler can be used to implement an a d v a n c e d symbolic debugger, and also gives a solution o f the problem o f incremental: compilation on the s t a t e m e n t level f o r e wide range o f languages, although It is illustrated by the DICE Incremental PASCAL compiler. The method has its g r e a t e s t value in I n t e g r a t e d environments, w here Editor, Compiler and Debugger work together. The main components of the DICE s y s t e m are shown below.

Distributed Incremental Compiling Environment

M o s t information in ~ the d a t a - b a s e is p a c k a g e d In procedure-units. Each unit contains the s o u r c e t r e e program r e p r e s e n t a t i o n , machine code, local symbol table, and d e p e n d e n c y lists f o r d e c l a r a t i o n s t h a t are r e f e r e n c e d In ot her procedures. The symbol t abl e is permanent and allows updating. Implementation status. Currently, (April 1 9 8 3 ) , DICE Is Implemented in 1 8 0 0 0 lines o f INTERLISP on a DEC20 computer. The incremental compiler accepts essent i al l y full PASCAL Including i n p u t - o u t p u t (currently not Including s e t s end procedural p a r a m e t e r s ) and p r o d u c e s PDP11 code o f r e a s o n a b l e quality. A v e r y rudimentary program d a t a - b a s e e x i t s . A s c r e e n - o r i e n t e d debugger Is partly implemented and currently allows all PASCAL s t a t e m e n t t y p e s : a s commands. it also allows s e t t i n g o f breakpolnts, si ngl e- st epp i n g and pacing f o r control o f e x e c u t i o n . The debugger o p e r a t e s on the program in the target computer via DecNet Implementations o f a s t a t i c analyzer and Jan e x t e n d e d program d a t a b a s e a r e n e x t to come.

Economy of a n I n t e g r a t e d s y s t e m . An I n t e g r a t e d programming environment contains tools such as Editor, Compiler Linker, Debugger, d a t a b a s e manager etc. These tools c o o p e r a t e . For example, the editor Informs the Incremental compiler which s t a t e m e n t s ~ need to be recompiled, t h e Parser can be used t o par se program t e x t , debugging commands and as a general command level user Interface. This s h a r i n g o f r e s o u r c e s Is true also f o r d a t a st r uct ur es, as exemplified by the s o u r c e - machine-code mapping and the cross r e f e r e n c e d a t a b a s e . The mapping b e t w e e n , s o u r c e code and machine code a t s t a t e m e n t level Is used both f o r Incremental compilation and f o r debugging purposes like single stepping and insertion o f breakpoints. The cross r e f e r e n c e and s t a t i c analysis d a t a b a s e (similar to the M a s t e r s c o p e subsystem in INTERLISP) s e r v e s a t l e a s t t h r e e purposes: It aids the programmer in understanding his program and in finding Interesting par t s of it.

Target

Host

It makes Incremental reoompllatlon p o s s i b l e e v e n a f t e r edits t o global entitles, since the cross r e f e r e n c e d a t a b a s e contains Information abou t which procedures r e f e r e n c e such:entitles.

data link Editor Compiler Debugger Date base

(program)

It Is used by the debugger t o implement t r a c e and b r e a k s on variables, since t h e system contains information about which p r o c e d u r e s r e f e r e n c e s ; or modifies a certain v a r i a b l e : The Incremental compiler can be called t o Insert e x t r a code around r e l e v a n t s t a t e m e n t s in such procedures.

131

An Incremental compiler provides e single inter f a ce t o the low level machine. Besides the traditional translation task, It can be used instead of a command level Interpreter, or It can be utilized by a. machine independent source level debugger In Implementing breakpolnts, single-stepping etc.

A debugger~plemented throughoincremental compilatioh.

The DICE debugger is oriented towards programs t h a t may e x e c u t e on small machines under :real-time constraints. It is essential that a minimal amount of e x t r a code should be e x e c u t e d in the targe~ program during debugging, compared to ordinary execution. This may of course also be valuable on an ordinary time-shared machine. It is also important t h a t one should be able to invoke the debugger if a run-time error occurs during an ordinary execution. Another requirement is t h a t all legal PASCAL s t a t e m e n t s should be a c c e p t e d as legal commands by the debugger; continued ex ecution a f t e r source program modifications should also be permitted. Our approach to meet these requirements is t o use an Incremental compiler to generate code for complex commands and to make necessary modifications in the o b j e c t code for conditional breakpoints etc.. Since f a s t response t o user commands is important, and single-stepping should be provided, the compiler has been designed to be incremental on t h e : s t a t e m e n t - l e v e l . We will consider some of the more Important functions t h a t are normally found in good debuggers, and how t h e y can be Implemented in an Incremental compiling c o n t e x t .

Allaaed watches on variables. The traditional method o f checking the contents o f a certain memory location a f t e r the e xe cu t io n o f each st a t e me n t, will d e t e c t the 'first' s t a t e m e n t (f irst in the sense of e x e c u t i o n order) in the program which changes the value o f any o f th e aliases o f a certain variable. We call such w a t c h e s a l i a s e d w e l c h e s on variables. The DICE debugger normally uses t e x t u a l w a t c h e s on variables. The question is: can aliased w a t c h e s be implemented in our scheme? The answer seems t o be yes, a t least t o a large e x t e n t . The information ke p t in the c r o s s - r e f e r e n c e database can be used to identify most o f the code positions where aliases might occur, and conditional breakpoints can be Inserted a t t h e se positions. This will give the same e f f e c t as the traditional method and still will not slow down e xe cu t io n as much as the traditional method, which Is important when debugging programs t h a t e x e c u t e under real time constraints. One drawback o f this method may be the g re a t number o f possible aliases and corresponding code insertions in c e r t a i n situations. I.e. consider pointers to records of a variant record t y p e t h a t is o f t e n used.

Single-stepping. An important requirement is t h a t it should be possible t o provide single-stepping for compiled programs which have no e x t r a code inserted b e t w e e n each s t a t e m e n t for debugging purposes. We use the alternative method o f calling the incremental compiler on each s t e p during s t e p - w i s e e xe cu t io n in order to insert a new breakpoint before the n e x t s t a t e m e n t and eliminate the breakpoint t h a t was previously inserted b e f o re t h e current statement.

T e x t u a l w a t c h e s on variables;: A variable in a PASCAL program can be modified either by'an assignment s t a t e m e n t or as a VAn paramster In a function or procedure calil. The system can use the cross r e f e r e n c e d a t a - b a s e to find these positions In the source t r e e , and Insert e x t r a code to t r a c e variable values or to g e n e ra t e a break when such a variable changes value. This kind o f w a t c h e s on:variables are called t e x t u a l w a t c h e s since t h e y coupled to an occurrence o f a variable name~in the program text. The s e l e c t i v e code insertions u s e d for t e x t u a l w e t c h e s s slows down execution much less than the traditional metlhod common In many PASCAL debuggers, which Is to t e s t values of relevant variables at runtlme a f t e r each e x e c u t e d statement. Ideally one would ,like special hardware facilities f o r w a t c h e s . An approximation to this may be found for computers with a paging virtual memory. The method Is t o w r i t e - p r o t e c t the page t h a t contains the interesting variable and to use page trapping on write access. Problems may arise if the variable is on the s t a c k (the subroutine call and return Instructions on the VAX may not work if the s t a c k is w r i t e - p r o t e c t e d ) , or if an inner loop r e f e r e n c e s the w r i t e - p r o t e c t e d page.

132

Conditional b r e a k p o i n t s . A conditional breakpoint is w r i t t e n as an ordinary i f - s t a t e m e n t in PASCAL. It is compiled and inserted into the o b j e c t code by the incremental compiler. Since 'the e x t r a code associated with the conditional breakpoint Is compiled with the regular PASCAL compiler, it will have t h e same quality as ordinary compiled code a n d it will usually e x a c t a small amount o f e x t r a e xe cu t io n time.

Continued e x e c u t i o n a f t e r p r o g r a m m o d i f i c a t i o n s . One o f the important goals in the DICE system Is to allow continued execution after program modifications. Continuation a f t e r editing o f procedures t h a t currently have no activation records on the s t a c k poses no .~n~niAI problems. The same Is true if global declarations are added. Edits o f declarations o f global d a t a s t r u c t u r e s can cause s e v e r e problems, in the general case. The system w i l l t a k e care o f the code,~,but existing data s t r u c t u r e s must be updated according t o t h e changed declarations. In the general case re-Initialization of d a t a - s t r u c t u r e s is l e f t to the programmer, but some common simple cases may be Identified where the system could be helpful. For example, when changing the maximum number o f elements o f an array,

the system could copy the relevant part of the old data into the .new array. When a new field is added to a record type, the system could trace all pointers to data items of that type, re-allocate the data item, copy data from the old data item, and fill the new field with some null-value like NIL or zero. If an active procedure is edited, parts of Its code may be moved. Since this might a f f e c t procedure return addresses, the stack must be searched for such addresses and updated if n e e d e d (Both the GANDALF debugger and the INTERLISP code swapper update return a d r e s s e s on the execution stack). The code of other procedures is not a f f e c t e d since procedure calls goes one level indirect. If formal parameter declarations for procedure are edited, all calls to that procedure must be recompiled (and probably edited)~ If such a procedure is active, one might imagine manipulating argument areas on the s t a c k so that subroutine returns still would cause no problem, but one might ask if this is meaningful. It is probably b e t t e r to pop a f e w activation records from the stack. At last we consider modification to the procedure that contains the current stopping-point. The Incremental compiler will keep track of the machine-code position of the current statement, even If e x t e n s i v e modifications are done to surrounding statements. This makes it possible to continue from the current statement or from any other statement in the procedure. The user must take care of s i d e - e f f e c t s himself. Changes to declarations of local variables will be handled by the incremental compiler (se the part about incremental variable allocation). The compiler will also restructure the corltents of the current activation record. Goals f o r f i n e - g r a i n e d s t a t e m e n t - w i s e incremental compilation.

in order to g e t a responsive interactive programming environment.

F ast r e c o m p i l a t i o n

Consistent compilation - code quality should not degenerate after repeated program changes followed by incremental recompilations. ~ P r e s e r v a t i o n o f e x e c u t i o n state. Execution should

produce by the Portable C Compiler, which has been used to compile many UNIX systems (although the C language permits the programmer to do more source-level optlmizatlons than In PASCAL). The current DICE Incremental PASCAL compiler is fully consistent in code generation. The code quality does not degenerate after arbitrarily many ,edlt-recompllation cycles. It may not be fully consistent In memory allocation, however, since holes In stack-frames may arise after deletion of variable declarations.

A n o v e r v i e w o f fine~-gralned Incremental

compllatloR.

This compiler is centered around an a b s t r a c t s y n t a x tree representation of the source program. During an editing session, the editor marks those nodes of the current procedure that represent changed, inserted or d e l e t e d statements or declarations. When the user has finished editing, or wants to continue execution after he has made program modifications, the compiler does a preorder traversal of the source code trees of all edited procedures, and recompiles all statement nodes that have been changed or inserted. The compiler achieves flexibility by dividing the translation into three passes: Name binding, t y p e analysis and variable allocation, and finally generation o f machine code.

Changes to d e c l a r a t i o n s . This implies modifications of old variable or t y p e declarations or insertion of new ones. This may force changes in memory allocation of variables, i.e. their size and position. It also invalidates the compiled code of all statements that contain affected variables. When recompiling a procedure, the Incremental compiler will search the tree, and recompile the smallest enclosing statement nodes that references such variables. After a change to a global declaration, the compiler will use dependency Information maintained in the program data-base, that tells which procedures depend on the current declaration.

be able to continue after most program changes. Good q u a l i t y of the machine code produced. The

code should be good enough for production use in order to demonstrate the feasibility of incremental compilation concepts in practice. - the compiled program should be separated from the source code so that it can e x e c u t e outside of the program development system.

Separability

Concerning .compilation ~peed - preliminary measurements on the recompllation time.for procedures after making e small modification Indicate a~speed Improvement of a factor of ten to t w e n t y in most cases, compared' to recompillng the procedure from scratch. The greaten part of the code generator of the DICE PASCAL compiler originates from the Portable C Compiler ; it has.simply been translated by hand from C to INTERLISP. This causes the quality of code produced by the DIcE PA6CAL compiler to be almost equivalent the code

133

DICE incrementalcompiler Attributed I Abstract Syntax Tree

Declarations

State~ent~

I Name bindingI Type expansion I Variable allocation

Changed Declarations i/

Semantics,Code Gen. Incremental handling ~

,

r

late Tree

Optimize I

Templates~

Oen.1 /

Code size Code update operations +

I

I

•Code piece bufferI

_1

1. Code update"iff~ IHost data baseL

Code update in'

~I

Targetmemory

Changes to statements~

Given a machine code;~ address, identify the corresponding s o u r c e code statement. This Is for debugging purposes only, The user should be able to stop the target program when It is running, and the system should identify ate which statement in t h e source code the program was stopped, and generate a breakpoint at that position.

This kind of program modification is much more local In scope than changes in declarations, end i t usually only Invalidates the compiled ,code associated with the changed statements. Sometimes, however, such changes will a f f e c t memory allocation of temporaries and constants.

The mapping should be easily updated after program edits. Incremental updating: properties are of course nice in an incremental system.

incremental compilation a t the statement level. This problem can be decomposed into two subproblems:, to find an appropriate mapping b e t w e e n the source code and the machine code, and to find an efficient method t o u p d a t e branch and gore instructions In the machine code of a procedure body. The mapping between: the source level and the machine code level should have t h e following properties: Given a source statement, identify associated machine code. This Is for purposes of Incremental compillation, and statement positions for breakpolnts during debugging.

An implementation of the mapping b e t w e e n source code positions and machine code positions :. Binary machine code Is the only low-level code that Is k e p t in the database: and in the t a r g e t computer. The c o d e s l z e attribute of each statement node In t h e t r e e implements the mapping b e t w e e n source and machine code. The address of a statement can easily be found by adding the procedure s t a r t address with the codeslzes of preceding statements.

134

unchanged o t h e r w i s e 8 is marked as changed,

the editor markeT:changed and d e l e t e d s t a t e m e n t nodes by modifying an a t t r i b u t e celled editmerk. A new Inserted n o d e : h a s the editmark value changed and codesize value zero; an unchanged previously compiled node has editmsrk value unchanged; an old s t a t e m e n t t h a t has been edited Is also marked as changed end finally, e node which is to be d e l e t e d is marked deleted. D e l e t e d nodes with nonzero codeslze cannot be removed Immediately by the editor b e c a u s e then the old machine code would remain. Such nodes a r e e v e n t u a l l y physically removed by the Incremental compiler. During

t h e ~ edit-session,

Attribute codesize: The codesi ze o f S is the sum;of the c o d e s i z e s o f all Its sons. (One might regard the b a c k w a r d branch o f e WHILE-statement as a virtual son in order t o p r e s e r v e the truth o f the a b o v e rule f o r codesize). Initially, all new nodes h a v e editrnerk v a l u e changed end

codasiza value zero.

The previously mentioned a t t r i b u t e s codesize end editmarlc are synthesized f o r compound s t a t e m e n t s like BEGIN-END, IF-THEN-ELSE, WHILE, i REPEAT, F O R , WITH and CASE s t a t e m e n t s In PASCAL. The editmsrk a t t r i b u t e is s y n t h e s i z e d by the editor, end the codeslze a t t r i b u t e Is s y n t h e s i z e d by the Incremental ~compiler according to the following rules (We assume t h a t .S Is a compound s t a t e m e n t t h a t has got some

A small e x a m p l e s u b t r e e . Below, e s u b t r e e corresponding t o the s t a t e m e n t while i < = 15 do i : = i + 2 ;

sons).

is shown d e c o r a t e d with the a t t r i b u t e s (codeslze, editmark), t o g e t h e r with the PDP1 1 machine code. The

Attrlbute e d l t m e r k :

c o n s t a n t 2 has been changed t o 1 by an edit, which also has changed the editmark attribute.

If all sons o f 8 e r e deleted then ,S is marked as de/eted If all sons o f $ ere unchanged then S Is marked as

P D P I 1 code:

S o u r c e tree:

(8,changed) 1:

(4,unchanged)

~

LEQ

2: cap -2(r5), 15 3: 4: bgt +5 5: 6: add 2,-2(r5) 7: 8: br -7

(3,changed) ASSIGN

/

(change 2 to 1)

After Incremental recompilation:

(7,unchanged) 1:

(4,uncffanged)

~

(2,unchanged)

•ASSIGN

2: 3: 4:

crop -2(r5), 1B

4'

bgt +4

5:

i

15

i

6: i nc- 2( r5) 7: br -6

PPlJJ$

135

/

Merging old and n e w machine code.

It is important t h a t CURPOS always r e f e r s t o the c u r r e n t position in the u p d a t e d machine code, in o r d e r t h a t back-patching o f label r e f e r e n c e s In new branch instructions shall be done correctly. A more de ta i l e d description o f the algorithm may be found in (Fritzson-82',~

The idea is actually quite simple. To begin with, r e s e r v e a memory a r e a w h e r e the u p d a t e d code will be stored. A variable OLDPOS holds the current machine code position In the old code, and a variable CURPOS holds the current position In the merged code. Both location count er s are r e l a t i v e to the beginning of the current procedure. Do a preorder t r a v e r s a l o f the t r e e and during t h a t t r a v e r s a l perform the following:

In the current implementation, new machine c o d e is temporarily s t o r e d in a code b u f f e r , and the s e q u e n c e o f i n s e r t / d e l e t e operations are temporarily s t o r e d and not e x e c u t e d immediately. This approach permits the the removal of some o f t h e s e u p d a t e operators, which is d e s c r i b e d below.

If a node is unchanged, c o p y its code to t h e new a r e a and a d v a n c e OLDPOS and CURPOS by an equal amount codesize.

Before the s e q u e n c e of i n s e r t / d e l e t e o p e r a t i o n s are performed, f u r t h e r reduction of the number o f u p d a t e operations is possible if the copy p a r t of insert o p e r a t i o n s are e x t r a c t e d . This enables t h e system t o d e t e c t the important special c a s e w h e r e the total net e x p a n s i o n is zero. The following example shows the saving o f e x p a n d / c o n t r a c t operations.

If a node is marked as deleted, skip its code, i.e. a d v a n c e OLDPOS but do not change CURPOS. If a node is changed, f i r s t compile the node and s t o r e g e n e r a t e d code into the new memory area. The CURPOS c o u n t e r Is automatically incremented by the code g e n e r a t o r during code production. T h e r e a f t e r , a d v a n c e OLDPOS by the old value o f the a t t r i b u t e codesize, and update codesize to be the size o f the new code piece t h a t w a s g e n e r a t e d .

Otd code

Merged code •

18 Insert Sla-~

$1 unchanged $1, unchanged

13 $2, d e t e t e d

)

~.....~._~_'~--~.~....~_

15

S l a , from an inserted node

OLDPOS

NEWPOS

18

$3

G

23

Before reduction=

After reduction:

Update Operations: Insert 4 words at 13 D e l e t e 2 words at 13 Delete 3 words at 115 Insert 6 words at 18

=~

Code M o v e Ops:

T r a n s f e r Ops:

Expand 15 at 13,

Copy 4 t o new 13 Copy 6 t o new 17

If all contractions ere :performed b e f o r e the e x p a n s i o n s , a t most 2N word moves are n e e d e d f o r a code p i e c e o f size N words. In the DICE system, machine code is u p d a t e d simultaneously in the t a r g e t computer a d d r e s s s p a c e and in a code a r e a o f the program d a t a b a s e . Simple r e p l a c e m e n t and d a t a movement commands are s e n t t o routines in the t a r g e t computer o v e r the n e t w o r k link.

136

Incremental updating of G O T O Instbructlons.

Branch updating for a wider range of machines.

Control s t r u c t u r e s in high-level languages are usually complied to a number o f goto or branch instructions In t h e machine-code. If the code inside a construct, i.e. a while-loop, e x p a n d s or c o n t r a c t s during Incremental recompilation, the a d r e s s e s of certain goto-intructions must be updated. This may be hard or e a s y to do incrementally depending on the source language and the t a r g e t machine Instruction set. We will consider t w o methods which are applicable to d i f f e r e n t classes of languages and machines.

On machines t h a t lack P C - r e l a t i v e branch Instructions our previous incremental algorithm for updating b r a n c h - a d d r e s s e s will not w o r k b e c a u s e branch Instructions in the c o d e f o r s t a t e m e n t s like IF s t a t e m e n t s and WHILE s t a t e m e n t s will be Invalid If t h e ' c o d e Is moved. A single insertion a t the beginning o f a procedure body can thus Invalidate all branch Instructions In: its machine code.

I n c r e m e n t a l b r a n c h u p d a t i n g on m a c h i n e s with r e l a t i v e g o t o - i n s t r u c t i o n s ;

This method relies on the f a c t t h a t structured-programming constructs like WHILE-loops t r a n s f e r s control in a local w e l l - s t r u c t u r e d manner. The only e x c e p t i o n from this rule In the language PASCAL is the GOTO s t a t e m e n t - this is y e t another argument against goto programming. This updating method also requires the e x i s t e n c e o f PC-relati ve branch instructions in the t a r g e t computer instruction set, which makes it possible to move code a s s o c i a t e d with compound s t a t e m e n t s without invalidating the code. Each c o n t r o l - s t r u c t u r e is compiled in a p r e d e f i n e d w a y ; it Is known which parts of t h e code contain branch instructions t h a t may need updating. In the current compiler, a h a n d - w r i t t e n procedure f o r each t y p e o f c o n t r o l - s t r u c t u r e t a k e s c a r e of the branch-updating. These p r o c e d u r e s should not be to hard to generate: automatically, i.e. from a denotational description o f the control flow a s p e c t s of PASCAL. has done something similar to t h a t In a small non,incremental compiler f o r a s u b s e t of C. An example, the WHILE-statement: Syntax: WHILE DO Record: TYPE whllestm = RECORD pred : e x p r ; body : s t a t e m e n t END;. Predefined machine code structure: o f a WHILE loop: Li:



L2:

The branches to L1 and L2 need be updated only if t h e code for the s t a t e m e n t body changes size. A special procedure scans through the r e l e v a n t code portion and Identifies t h o s e branch instructions t h a t must be updated. < F r i t z s o n - 8 2 ) gives more details on the updating algorithm. GOTO s t a t e m e n t s must be handled specially; all such i n s t r u c t i o n s In the procedure body must b e c h e c k e d and updated if needed.

137

Thus, If a code e x p a n s i o n or contraction occurs, all branch instructions;In the procedure body must be scanned and updated. (A b i t - v e c t o r marking all branches can be maintained f o r each procedure body.) This means a slightly increased o v e r h e a d t h a t is proportional to the size o f the procedure body, but the scanning process Is still v e r y f a s t compared to recompilation o f the p r o c e d u r e from scratch. This method is equivalent t o the traditional loading process, although it is done Incrementally f o r each procedure body.

Conflicts between incromentality and Code optimization The current~ DICE compiler uses the same code generation s t r a g e g y as the portable C-compiler, I.e. no Intermediate results In temporary registers are kept between s t a t e m e n t s ; optlmlzations are performed only Inside st at ement s. This goes well along with symbolic debugging and incremental compilation, but of course gives l o w e r o b j e c t code quality. There Is a conflict b e t w e e n optimization and Incremental compilationespecially global optimlzations introduce d e p e n d e n c i e s b e t w e e n d i f f e r e n t parts o f the program, which will d e s t r o y the possibility for incrementallty. We are currently starting t o i n v e s t i g a t e if t h e r e e x i s t a reasonable compromise between optimization and Incremental compilation for medium-sized code p i e c e s (,5 - 10 small statements). The problem o f combining some d e g r e e o f optimization with incremental compilation is similar t o the problem o f providing source language debugging on optimized code. ( H e n n e s s y - 8 2 > gives a thorough t r e a t m e n t of part o f the debugging-opt!mlzatlon problem.

The Impact of Incremental compilation on;variable allocation.

In this section we will consider v a r i a b l e allocation In the c o n t e x t o f Incremental compilation. The goal Is to choose s t o r a g e allocation schemes t h a t minimizes the amount o f recompilation without sacrificing e f f i c i e n c y o f compiled code. The following discussion Is c e n t e r e d around var i abl e allocation In eL procedure activation record; s t a t i c allocation of global v a r i a b l e s and c o n s t a n t s can be r e g a r d e d as a special case.

! ...........................

!

! Temporaries

!

! ...........................

!

! Local variables

I

!...........................

!

! Static

!

link

! ...........................

!

! Dgnamic l i n k ,

!

= p r e y . FP

! ...........................

!

! ...........................

!

! Parameters

!

! ...........................

!

If a variable does not Increase In size a f t e r editing Its declarations, only s t a t e m e n t s referring t o t h a t variable need be recomplled. If a variable declaration (not a p a r a m e t e r ) is d e l e t e d , only s t a t e m e n t s referring t o t h a t variable n e e d be recompiled. This will usually result In an unused hole in the s t a c k - f r a m e . This, hole can be used to

Local variables

! ! !

!

!

Parameters

St ack p o i n t e r ,

SP

Suggest Documents