E ective Specialization of Realistic Programs via Use ... - CiteSeerX

6 downloads 174 Views 242KB Size Report
Our empirical studies have demonstrated real-sized ap- plications extensively ... An empirical study of existing partial-evaluation tech- nology in the context of ...
E ective Specialization of Realistic Programs via Use Sensitivity Luke Hornof, Charles Consel hornof,[email protected]

Irisa Campus Universitaire de Beaulieu 35042 Rennes Cedex, France Jacques Noye [email protected]

E cole des Mines de Nantes 4 rue Alfred Kastler 44070 Nantes Cedex 03, France

Abstract In order to exploit specialization opportunities that exist in programs written by researchers outside of the programming language community, a partial evaluator needs to e ectively treat existing realistic applications. Our empirical studies have demonstrated real-sized applications extensively use non-liftable values such as pointers and data structures. Therefore, it is essential that the binding-time analysis accurately treats nonliftable values. To achieve this accuracy, we introduce the notion of use sensitivity, and present a use-sensitive binding-time analysis for C programs which is obtained by a forward analysis followed by a backward analysis. This analysis has been implemented and integrated into our partial evaluator for C, called Tempo. To validate the e ectiveness of our analysis and demonstrate that use sensitivity is critical to obtain highly-specialized programs, we have conducted experimental studies on various components of existing operating systems code. Our results clearly demonstrate that, as opposed to use insensitivity, use sensitivity drastically increases the static computations detected by the analysis which, in practice, leads to successful specialization.

1 Introduction Partial evaluation has been thoroughly studied for a wide variety of programming languages [4, 9, 13, 36]. Both its theoretical and practical aspects have led to

major advances of the technology [10, 22, 35]. Researchers have illustrated the potentials of this technology by applying it to various problems such as compiler generation [1, 15, 26, 29, 30], graphics applications [6, 25, 34], and scienti c computing [7, 8, 24, 31]. Ultimately, just like a compiler, a partial evaluator is a tool. As such, research in the eld should also aim at developing systems capable of exploiting specialization opportunities in applications written by programmers that are not experts in partial evaluation. This goal requires partial evaluators to include analyses and transformations powerful enough to handle programs not speci cally structured for this optimization, rather than expecting programmers to intentionally write code that specializes well. To achieve this goal, we have designed and implemented a partial evaluator in close collaboration with researchers of another eld, namely operating systems. Our collaborators have submitted to us potential specialization opportunities taken from existing industrialstrength programs, such as protocols layers and system calls. An empirical study of these programs has revealed some limitations of existing partial-evaluation technology, most importantly, the lack of accuracy of bindingtime analyses in dealing with complex data structures. Because binding-time information directly de nes the degree of specialization, partial evaluators are unable to take advantage of most specialization opportunities. This paper presents a new form of binding-time analysis, whose accuracy allows thorough specialization of systems code. This analysis has been implemented in

our partial evaluator named Tempo. Most importantly, this partial evaluator has been used by operating systems researchers to specialize existing programs [37, 38, 48]. This work has shown that our analysis drastically improves the degree of specialization of programs compared to existing analyses, as substantiated by experimental results presented in this paper. In the rest of this section, we present in detail one common behavior which was observed in systems code. We then show how existing analyses would loose critical information which would subsequently lead to poor specialization. Finally, we explain how to design an analysis so that information is not lost and specialization is successful.

o er a textual representation for some types of values, most commonly scalars. Such values can remain static even when they are used in a dynamic context, since they can be lifted into their corresponding textual representation during specialization. Unfortunately, realistic applications make extensive use of non-liftable values; they manipulate large nested data structures including pointers and arrays. Useinsensitive binding-time analyses applied to such programs drastically degrades the degree of specialization due to the loss of accuracy mentioned earlier. A common solution to circumvent this problem amounts to thoroughly rewriting the program, carefully separating static and dynamic values, and sometimes duplicating them. This paper presents a new approach to binding-time analysis which achieves use sensitivity for non-liftable values. As a result, more static values can be exploited, which in turn triggers more computations and ultimately allows realistic applications to be specialized successfully.

1.1 A Use-Sensitive Analysis

An empirical study of existing partial-evaluation technology in the context of systems software led us to the conclusion that it cannot be successfully applied to existing code. Most systems programs critically rely on the use of complex data structures that combine such values as pointers, structures and arrays. These data structures implement a system state, which is interpreted by various system components [17, 18, 41, 48]. In the context of partial evaluation, the state is typically partially static and thus used in static computations as well as dynamic computations. These various uses are usually treated conservatively by existing binding-time analyses [5, 28]. More speci cally, when a static value is used in a dynamic context, the static value is lifted into a residual representation during specialization. This operation can be performed for values for which there exists a corresponding textual representation, such as integers. However, other values such as pointers, structures, and arrays, cannot be lifted since they do not have a corresponding textual representation. In these cases, existing binding-time analyses force all uses of such values to be considered dynamic. In this regard, these analyses can be viewed as use insensitive, just as context insensitivity and ow insensitivity force objects to be associated with a unique description regardless of call and assignment contexts, respectively. In addition to forcing uses to become dynamic, use insensitivity also forces the corresponding de nitions to become dynamic as well. This loss of staticness is transitively propagated, since forcing a de nition to become dynamic may create new dynamic uses. More importantly, not only do uses and de nitions become dynamic but also the computations which depend on them. Use insensitivity does not always incur this loss of accuracy. Indeed, programming languages sometimes

1.2 Examples

Let us consider two program fragments which illustrate the need for use sensitivity. In Fig. 1, a pointer variable p is assigned the address of an array, and is subsequently used twice. This example is typical of how pointers are used in systems code: a pointer assigned once and then used multiple times.

int a[10], *p, s, d; p = a; a[11] = 1234; . . . ...= *((p + s) + 10); ...= *((p + d) + 20);

Figure 1: Pointer program If this program fragment is analyzed with a useinsensitive binding-time analysis, undesirable results are produced. More speci cally, if it is analyzed with a and s (as well as all constants) static and with d dynamic, then the value of pointer p depends on the static value a, and is used in both a static and a dynamic context. Since pointer values (in most languages like C) cannot be lifted during specialization, the use of p in the dynamic context has to be considered dynamic. Further, 2

since a use-insensitive analysis requires that all uses of a variable have the same binding time, in this example all of the uses of pointer p are forced to become dynamic. Fig. 2 shows how the uses of p (in the last two lines of the program) would be annotated (where overlined means static and underlined means dynamic), and how the program is subsequently specialized with respect to these annotations.

struct

...

If this program fragment is analyzed with s2 being partially static (s2.s is static while s2.d is dynamic), then accessing the d eld creates a dynamic context which forces structure s1 to become dynamic since, like pointers, it cannot be lifted during specialization. Again, a use-insensitive analysis would then force all uses of s1 to become dynamic, which in turn renders all accesses to this structure dynamic. The resulting binding-time annotations of the last two lines (which include the uses of s1), as well as a specialized version, are as given in Fig. 5.

= *((p + s) + 10); = *((p + d) + 20);

= *((p + 1) + 10); = *((p + d) + 20);

Figure 2: Use-insensitive binding-time annotations and subsequent specialization

Binding-time annotations

Use sensitivity avoids this problem. All uses of a variable are no longer required to have the same bindingtime value, and therefore dynamic uses do not interfere with the static uses. A use-sensitive binding-time analysis therefore annotates the last two lines of the program fragment as given in Fig. 3. Notice how the static use of pointer p is directly exploited in the corresponding specialization.

Specialized w.r.t.

...

= s1.s + 3;

...

= s1.d + 4;

s1.s

= 10 = s1.s + 3; = s1.d + 4;

Figure 5: Use-insensitive binding-time annotations and subsequent specialization

= *((p + s) + 10); = *((p + d) + 20);

A use-sensitive analysis avoids losing this information in a similar way. The use of s1 in the dynamic context still becomes dynamic (since it is not liftable), but now the use of s1 in the static context remains static. The annotations of the last two lines of the program fragment and corresponding specialization are given in Fig. 6.

Specialized w.r.t. s = 1 ... ...

...

... ...

Binding-time annotations ...

s1, s2;

Figure 4: Structure program

Specialized w.r.t. s = 1 ... ...

g

s; int d;

s1 = s2; ... = s1.s + 3; ... = s1.d + 4;

Binding-time annotations ...

fint

= 1234; = *((p + d) + 20);

Figure 3: Use-sensitive binding-time annotations and subsequent specialization

1.3 Implementation and Validation

Another example which motivates the need for use sensitivity is shown in Fig. 4, showing a typical example of how data structures are used. Di erent elds of structure s1 are assigned di erent values, which are subsequently used in di erent contexts.

Our binding-time analysis is performed in two steps. In a rst step, a Forward Binding-Time Analysis (FBTA) computes use binding times by propagating forward an initial binding-time context. In a second step, a Backward Binding-Time Analysis (BBTA) propagates back3

2 Use Sensitivity

Binding-time annotations

Specialized w.r.t.

...

= s1.s + 3;

...

= s1.d + 4;

s1.s ... ...

In order to clarify the idea of use sensitivity, it is important to understand how binding times of variables are computed. First of all, a variable can only be considered static (S ) if all of the values on which the variable depends are static, in which case the value of the variable can be computed at specialization time. A variable which occurs as a left-hand side expression does not depend on any value, and is also considered static since its address can be computed at specialization time. Otherwise, the variable is considered dynamic (D ). We use the function values (), as shown in Fig. 7, to express the values on which a variable depends. The function values-bt () determines if all of these values are static by taking the least upper bound (t) of their bindingtimes (bt ()); if a variable does not depend on any value, values-bt () returns static. Secondly, if the value of a variable cannot be lifted, then the context in which the variable is used also a ects the binding time of the variable. We use the function context () to represent the values in a context, where the program point subscript (e) indicates the speci c variable use. Similarly, the binding time of each context is expressed by the function context-bt (), which computes the least upper bound of the values in the context. In the rst example (Fig. 1), we nd that for the pointer p:

= 10 = 13; = s1.d + 4;

Figure 6: Use-sensitive binding-time annotations and subsequent specialization wards the previously computed use binding times. If the results of these analyses indicate that a variable has both static and dynamic uses, its de nition has to be both static and dynamic. At specialization time such a de nition is both evaluated and residualized. We have implemented a use-sensitive binding-time analysis and incorporated it into Tempo, a partial evaluator speci cally designed to treat industrial-strength systems code written in C [14]. The results of the analysis is used to drive both Tempo's compile-time and run-time specializers [16]. As reported by this paper, experiments on existing systems programs show that this approach achieves a high degree of specialization.

e

e

values(p) = fag context1 (p) = fs; 10g context2 (p) = fd; 20g

1.4 Summary

We introduce a new analysis and program transformation in order to obtain better specialization. This work di ers from existing work in three ways. 1. We introduce the notion of use sensitivity in order to achieve a more accurate binding-time analysis. 2. We have designed, implemented, and integrated into Tempo a binding-time analysis which obtains use sensitivity by combining a forward and a backward analysis. 3. Existing systems programs have been analyzed with this new analysis, and compared to a use-insensitive analysis. The improved accuracy has led to successfully specialization of these programs. In Sect. 2 we present in detail the precision gained through use sensitivity. The forward binding-time analysis is given in Sect. 3.2 and the backward binding-time analysis in Sect. 3.3. Experimental evidence of the bene ts of use sensitivity is given in Sect. 4. Related work is addressed in Sect. 5 and nal remarks are made in Sect. 6.

values-bt(p) = bt(a) =S =S context-bt1 (p) = bt(s) t bt(10) = S t S = S context-bt2 (p) = bt(d) t bt(20) = D t S = D

With this information, the binding time of each variable use and the corresponding variable de nition is calculated. A use-insensitive analysis computes one single binding time, for all of the the variable's uses as well as its de nition, as shown in Fig. 8. For a variable whose value can be lifted, this binding time is simply the values-bt as de ned above. The context-bts are not taken into account, since the value can be lifted at specialization time if necessary. For non-liftable values, however, the binding time is computed by taking the least upper bound of the values-bt and the least upper bound of the context-bts. This is where individual use information is merged and therefore lost. 4

values (id) = values on which id depends context (id) = values on which the context of use depends e

e

G

values-bt (id) =

values ( )g [ f g)

bt

2 (fbt(c) j c2

bt

2 fbt(c) j c2

context-bt (id) = e

id

G

context ( )g

bt

S

bt

e id

Figure 7: Value and context binding times. liftable value use-bt(id)

values-bt (id)

def-bt (id)

use-bt (id)

non-liftable value

G

values-bt (id) t

uses(id)

context-bt (id) e

2

e

use-bt(id)

Figure 8: Computing def-bt () and use-bt () (use insensitive). The purpose of this new lattice is to determine the binding time for a variable's de nition. The new domain, which is the powerset of the normal binding-time domain, expresses all of the possible cases which may arise. A variable which only has static uses only needs a fS g de nition, while one with only dynamic uses only needs a fDg de nition. A variable which has both static and dynamic uses needs a de nition which is both static and dynamic, or fS; Dg. The annotation fg corresponds to dead code|a de nition of a variable which is never used. With such a use-sensitive analysis, the following information would be calculated for pointer p in the example.

In the rst example (Fig. 1), a use-insensitive analysis calculates the following information for pointer p. use-bt(p) = S t (D t S ) = D def-bt(p) = def-bt(p) =D

A use-sensitive analysis computes a separate binding time for each use, and then combines this information in a novel way to compute the binding time of the de nition. This can be seen in Fig. 9. Since binding times are computed separately for each use, we subscript use-bt() with the program point (e) corresponding to the use. For a variable whose value can be lifted, the use and the de nition binding times are computed as in a useinsensitive analysis. For non-liftable values, however, the binding time of each use only takes into account the speci c context in which the variable is used. This is where use sensitivity is obtained. The binding-time of the corresponding de nition, computed by the function def-bt(), must be able to re ect the fact that di erent uses of the same variable may now have di erent binding times. Therefore, F we introduce a new least upper bound operation ( def ) which operates over a new domain and lattice, as seen in Fig. 9 and Fig. 10.

use-bt1 (p) = S t D =D use-bt2 (p) = S t S =S def-bt(p) = (D tdef S ) = fS; Dg

In Fig. 3 we have already seen how uses are annotated and transformed. Now that we have introduced the novel way of annotating de nitions, let us see the annotations and subsequent transformations of the corresponding de nitions in this example. The code anno5

liftable value

non-liftable value

values-bt (id)

values-bt (id) t context-bt (id)

use-bt (id)

F use-bt (id) def

where e 2 uses(id)

where e 2 uses(id)

use-bt (id) e

def-bt (id)

e

e

e

Figure 9: Computing def-bt () and use-bt () (use sensitive). domain

use binding times

de nition binding times

UBt = fS, Dg

DBt = ffg, fSg, fDg, fS, Dgg

fS; Dg

D

lattice

|

S

least upper bound

/

\

\

/

fS g fDg

F

F

fg

def

Figure 10: Use and de nition binding times. tated with respect to use-insensitive information is seen in Fig. 11, and with respect to use-insensitive information in Fig. 12 (where overlined means fS g, underlined means fDg, both means fS; Dg, and none indicates fg). In both cases, we see that the de nition is residualized, which follows from the fact that both de nition annotations contain a dynamic component. In the usesensitive case, the fS; Dg annotation also contains a static component, indicating that the de nition must also be evaluated during specialization, necessary in order to correctly specialize a subsequent static use. In the second example (Fig. 4), we compute the following information for structure s1:

Binding-time annotations p = a;

Specialized w.r.t. s = 1 p = a;

Figure 11: Use-insensitive binding-time annotations and subsequent specialization A use-insensitive analysis yields the following information. use-bt(s1) = S t (S t D) = D def-bt(s1) = use-bt(s1) = D

values(s1) = fg context1 (s1) = fs; 3g context2 (s1) = fd; 4g

A use-sensitive analysis computes the following.

values-bt(s1) = S context-bt1 (s1) = bt(s) t bt(3) = S t S = S context-bt2 (s1) = bt(d) t bt(4) = D t S = D

use-bt1 (s1) = S t S =S use-bt2 (s1) = S tFD =D def-bt(s1) = S def D = fS; Dg

6

self-contained1. We shall describe these analyses using a standard data- ow analysis framework (see, for instance, [2, 32]). We will then sketch how this work has been extended in order to deal with dynamic allocation, and function calls.

Binding-time annotations p = a;

Specialized w.r.t. s = 1

3.1 Preliminaries Locations and States

p = a;

We shall refer to the sets of values propagated by each binding-time analysis as states. States are elements of Location ! Bt, where Bt is one of the previously de ned lattice domains. There are two types of locations: variable locations and structure component locations. A location for a variable, if this variable is not a structure, is represented by a plain identi er. A location for a structure type type and component eld is denoted type: eld. We shall assume that, given a structure type, the function location() returns the structure component locations associated to that type. Notice that by associating locations with structure components on a structure type and eld basis allows di erent components of a given structure to have di erent binding times but forces the same components of di erent structures of a given type to have the same binding time. This conservative handling has so far been sucient to treat the existing systems code we have encountered, and can be extended to achieve more precision if needed. We shall denote n the binary operator from State  Locations ! State resetting to bottom (S or fg, depending on the lattice) a set of locations.

Figure 12: Use-sensitive binding-time annotations and subsequent specialization The annotated and residualized de nitions are found in Fig. 13 and Fig. 14. As with the pointer example, we see here that the dynamic annotations in both cases cause both de nitions to be residualized. And in the use-sensitive case, the static component indicates that the de nition will also be evaluated at specialization time. Binding-time annotations s1 = s2;

Specialized w.r.t.

s1.s

= 10

s1 = s2;

Figure 13: Use-insensitive binding-time annotations and subsequent specialization

Aliases and De nitions

We assume that, prior to binding-time analysis, an alias analysis and a de nition analysis have been executed. The alias analysis gives, for each dereference expression  exp at program point e (superscripting is used to denote program points in the syntax of a program), the set aliases(e) of corresponding aliases, i.e., the set of locations the expression may represent. The de nition analysis computes, for each statement at program point s, the set of locations defs (s), which may be de ned by the statement (see Fig. 16). For a given assignment, a set containing multiple locations is computed when the location de ned at run-time cannot be determined statically. These ambiguous de nitions are due either to aliasing or, with our representation of structures, structure eld assignment. If, on

Binding-time annotations

e

s1 = s2;

Specialized w.r.t.

s1.s

= 10

s1 = s2;

Figure 14: Use-sensitive binding-time annotations and subsequent specialization

A number of other C constructs can be translated into this subset, e.g., assignment, comma and conditional expressions, goto statements (via the elimination procedure suggested by Hendren and Erosa [23]), as well as for and while loops. . . This strategy is followed by Tempo. 1

3 Intra-Procedural Binding-Time Analysis We shall illustrate use-sensitive binding-time analysis on the subset of C described in Fig. 15. This subset is small enough so that our presentation can be 7

Domains:

const 2 Integer id 2 Identifier bop 2 BinaryOperator Abstract syntax: exp ::= const

constant variable reference dereference binary expression component selection

lexp ::= id j * exp j lexp.id

variable dereference component selection

stmt ::= lexp = exp j if ( exp ) stmt else stmt j do stmt while ( exp ) j f stmt g j return exp

assignment conditional statement loop block return (one per program)

j id j & lexp j * exp j exp bop exp j lexp.id

program ::= main ( id ) stmt Figure 15: Syntax of C subset

8

statements: lexp 1 = exp 2 : defs (s) = defs-lexp(e1 ) unambiguous-defs (s) = (j defs (s) j= 1) ! defs (s); fg e

s

e

exp ) stmt11 else stmt22 : defs (s) = defs (s1 ) [ defs (s1 )

ifs (

s

e

s

stmt 0 while ( exp ): defs (s) = defs (s0 ) f stmt11 . . . stmt g : defs (s) = [1  defs (s )

dos

e

s

s

sn n

s

i

n

i

exp : defs (s) = fg

returns

e

expressions: is-a-struct(e): defs-lexp(e) = locations(type(e)) id : defs-lexp(e) = fidg * exp 0 : defs-lexp(e) = aliases(e) lexp 0 . id: defs-lexp(e) = ftype(e0 ).idg e

e

e

e

e

Figure 16: De nition analysis

9

the other hand, a static analysis can deduce the location which will be de ned at run-time by a statement, then the de nition is considered unambiguous. In this case, the function unambiguous-defs () returns the unambiguously de ned location, which subsequently can be used to produce more accurate results. For example, an unambiguous de nition permits a dynamic variable to become static in the forward binding-time analysis.

upper bound of its use binding time and of the binding time of the test performs the proper safe approximation. In case of a dynamic test, all the locations possibly de ned in the scope of the test are raised to dynamic. In case of a static test, the join operation has no e ect, the transfer function is the identity function. It is easy to show that (UState; t) is a join semilattice of nite length (see, for instance, [39]) and that the set of transfer functions generated by closure under joins and compositions is a monotone operation space. This means (see, for instance, [2]) that the set of equations corresponding to any given program can be solved using one of the standard iterative algorithms. This produces, at each program point, the use binding times of the locations involved in the computation of the corresponding constructs as well as the binding times of the constructs.

3.2 Forward Binding-Time Analysis

The FBTA propagates forward use binding-time states, elements of UState = Location ! UBt. For any given program, the initial state contains S for input parameters declared as static and D for those declared as dynamic. The join operator t on use binding-time states is de ned as a pointwise application of the least upper bound operator t on the UState function space range. The data- ow equations relating the state in(s) at the entry point of a statement at program point s and the state out(s) at the output of the same statement are given in Fig. 17 with the basic transfer functions given in Fig. 18. The transfer function f () describes the evolution of the state resulting from an assignment at program point s. It assigns the assignment binding time, given by stmt-use-bt(s) (see Fig. 19), to each location in the set of possible de nitions for the assignment. Note that the assignment binding time depends on the input state. If the assignment is ambiguous, a safe approximation has to be taken: the new binding time of each de ned location is the least upper bound of its previous binding time and of the assignment binding time. If the assignment is unambiguous, the new binding time of the de ned variable is the assignment binding time. A second transfer function, f (), must be applied to conditional statement branches as well as to loop bodies. Such functions deal with assignments performed under dynamic control, i.e. assignments in the scope of a dynamic test whose e ects must, again, be safely approximated, at the corresponding join point. For instance, let us consider the case of a variable which is assigned a static value in a branch of a conditional statement whose test is dynamic. At specialization time, the value of the variable after the join point will remain unknown. At execution time, if the branch is taken, the variable will be assigned a new value; if not, it will keep the value it had before entering the conditional statement. Such a variable has to be considered dynamic at the end of the branch2. In general, taking for each location possibly de ned in a branch the least

3.3 Backward Binding-Time Analysis

The BBTA propagates backward de nition bindingtime states, elements of DState = Location ! DBt. For any given program, the initial state state0 returns fg for any location, meaning that no variable is used after a return has been executed. The join operator F onstatement de nition binding-time states is de ned as a def pointwise application of the least upper bound operator F on the DState function space range. def The data- ow equations for the BBTA are given in Fig. 20 with the basic transfer functions given in Fig. 21. The transfer function f () computes the input state of an assignment from its output state as follows. In case of an unambiguous assignment, the de ned location is reset to fg; there is no use binding time to be propagated up. The function def-bt() computes the definition binding time of the assignment by taking the least upper bound of the de nition binding times of all the locations possibly de ned by the assignment. The function stmt-use() updates the use summary of the locations used in the assignment, based on the de nition binding time of the assignment and the results of FBTA analysis, retrieved via the function exp-use-bt(). The transfer function f () collects uses and updates the corresponding use summaries within the conditional and loop tests. Again, DState and the set of transfer functions are such that the set of equations corresponding to any given program can be solved using one of the standard iterative algorithms. Solving the equations returns updated binding times combining value binding times and context binding times, together with proper de nition binding times for the assignments. The function exp-use() annotates nonliftable static uses in a dynamic context as dynamic, and

s

s

e;s

e

2 An alternative could be to perform a continuation-based analysis, but code explosion would be a problem.

10

=s

ifs (

exp

e

out(s) = f (in(s))

exp 2 :

lexp 1

e

e

)

s

stmt11 s

else

stmt22 : s

in(s1 ) = in(s) in(s2 ) = in(s) out(s) = f 1 (out(s1 )) t f 2 (out(s2 )) e;s

e;s

dos

stmt 0 s

while (

exp ): e

f stmt11 . . . stmt g : s

sn n

s

in(s0 ) = in(s) t f 0 (out(s0 )) out(s) = out(s0 ) e;s

in(s1 ) = in(s) in(s +1 ) = out(s ); 1  i < n out(s) = out(s ) i

i

n

returns

exp : e

in(s) = out(s) Figure 17: Forward binding time analysis | data- ow equations for statements

f (state) = f(loc; stmt-use-bt(stmt(s))) j loc 2 defs (s)g t (state n unambiguous-defs(s)) s

f (state) = f(loc; exp-use-bt(exp(e); state)) j loc 2 defs (s)g t state e;s

Figure 18: Forward binding time analysis | the transfer functions

11

statements: stmt-use-bt(lexp 1 =s exp 2 ) = lexp-use-bt(lexp 1 ; in(s)) t exp-use-bt(exp 2 ; in(s)) stmt-use-bt(if (exp ) stmt11 else stmt22 ) = exp-use-bt(exp ; in(s)) t stmt-use-bt(stmt11 ) t stmt-use-bt(stmt22 ) stmt-use-bt(do stmt 0 while (exp )) = exp-use-bt(exp ; out(s0 )) t stmt-use-bt(stmt 0 ) stmt-use-bt(fstmt11 : : : stmt g ) = t1  stmt-use-bt(stmt ) stmt-use-bt(return exp ) = exp-use-bt(exp ; in(s)) s

e

s

e

e

e

e

s

s

e

s

s

s

e

sn n

s

e

e

s

i

s

si i

n e

expressions: loc-use-bt(loc; state) = is-a-struct-loc(loc) ! t

2locations(type(loc))

loc'

state(loc'); state(loc)

right-hand side expressions: exp-use-bt(const ; ) = S exp-use-bt(id ; state) = loc-use-bt(id; state) exp-use-bt(& lexp 0 ; state) = lexp-use-bt(lexp 0 ; state) exp-use-bt( exp 0 ; state) = exp-use-bt(exp 0 ; state) t (t 2 ( ) state(loc)) exp-use-bt(exp11 bop exp22 ; state) = exp-use-bt(exp11 ; state) t exp-use-bt(exp22 ; state) exp-use-bt(lexp 0 : id; state) = lexp-use-bt(lexp 0 ; state) t loc-use-bt(type(e0 ).id) e

e

e

e

e

e

e

e

e

e

e

e

loc

e

e

aliases e

e

e

left-hand side expressions: lexp-use-bt(id ; state) = S lexp-use-bt( exp 0 ; state) = exp-use-bt(exp 0 ; state) lexp-use-bt(lexp 0 : id; state) = lexp-use-bt(lexp 0 ; state) e

e

e

e

e

e

e

Figure 19: Forward binding-time analysis | binding-time annotation of program

=s

ifs (

exp

e

dos

in(s) = f (out(s))

exp 2 :

lexp 1

e

e

stmt 0 s

)

s

stmt11 s

while (

else

stmt22 :

exp ): e

s

in(s) = f (in(s1 ) out(s1 ) = out(s) out(s2 ) = out(s) e

F in(s )) 2 def

in(s) = in(s0 ) F out(s0 ) = f (out(s)) def f (in(s0 )) e

e

f stmt11 . . . stmt g : s

sn n

s

in(s) = in(s1 ) out(s ) = in(s +1 ); 1  i < n out(s ) = out(s) i

i

n

returns

exp : e

in(s) = f (state0 ) e

Figure 20: Backward binding time analysis | data- ow equations for statements

12

s

F state) n unambiguous-defs(s) def F state f (state) = exp-use(e; exp-use-bt(e))

f (state) = (stmt-use(s; def-bt(s)) s

def

e

auxiliary functions F on statements: def-bt(s) = def 2 ( ) state(loc) loc

defs s

stmt-use(s; def-bt) = lexp-use(lexp(s); def-bt)

auxiliary functions on expressions: loc-use(loc; def-bt) = is-a-struct-loc(loc) ! [

F exp-use(exp(s); def-bt) def loc'

2locations(type(loc))

f(loc'; def-bt)g; f(loc; def-bt)g

right-hand side expressions: exp-use(const ; ) = fg exp-use(id ; S ) = loc-use(id; S ) exp-use(id ; def-bt) = (exp-use-bt(e) = S ^ is-liftable(type(e))) ! f(id; S )g ; loc-use(loc; def-bt) exp-use(&lexp 0 ; def-bt) = lexp-use(lexp 0 ; def-bt) exp-use(exp 0 ; S ) = exp-use(exp 0 ; S ) exp-use( exp 0 ; def-bt) = (exp-use-bt(e0 ) = S ^ is-liftable(type(e))) ! [ 2 ( ) f(loc; S )g ; [ 2 ( ) f(loc; def-bt)g F exp-use(exp11 bop exp22 ; def-bt) = exp-use(exp11 ; def-bt) def exp-use(exp22 ; def-bt) F exp-use(lexp 0 . id; def-bt) = lexp-use(lexp 0 ; def-bt) def loc-use(type(e0 ).id; exp-use-bt(e)) e

e

e

e

e

e

e

e

e

loc

aliases e

loc

aliases e e e

e

e

e

e

e

e

left-hand side expressions: lexp-use(id; ) = fg lexp-use(exp 0 ; def-bt) = exp-use(exp 0 ; def-bt) lexp-use(lexp 0 .id; def-bt) = lexp-use(lexp 0 ; def-bt) e

e

e

e

Figure 21: Backward binding-time analysis | the transfer functions

13

stmt-use() annotates de nitions with the least upper bound of the de nition binding times of the locations de ned. These binding times can now be understood in terms of specialization actions: constructs annotated fS g will lead to evaluation actions, constructs annotated fDg to residualization actions, and assignments annotated fS; Dg to dual evaluation and residualization actions. Assignments annotated fg correspond to dead code and are simply discarded.

able de nition|therefore the set of uses for each nonlocal variable de ned within a function must be taken into account at each call site. Functions are analyzed with respect to each set of di erent variable use information. Treating bindings between formal parameters and actual parameters is performed as usual in the FBTA: each formal parameter inherits the binding-time of its corresponding actual parameter, and the body of the function is analyzed with respect to this value. But in the BBTA, bindings are more complicated. De nitions of variables may not only come from assignments (as seen in the intra-procedural analysis) but also from an actual/formal binding. Because of this, the binding time of a parameter de nition, like that of an assignment de nition, can be static, dynamic, static and dynamic, or fg. Again, like the assignment de nition, this information summarizes the uses of the formal parameter within the function body. As well, like assignments, the specializer needs to be capable of evaluating and residualizing the same actual/formal binding.

3.4 The Complete Analysis

Although we have only presented the FBTA and the BBTA on a subset of C, our implementation treats a near full subset of C, including arrays, dynamic memory allocation, break/continue, gotos, function calls, casts, pre/post increment/decrement, etc. We have chosen two of these features which we nd particularly interesting, dynamic allocation and interprocedural aspects, and give a brief summary of how each is handled in Tempo. Interprocedural details of the FBTA can be found in [27]. First, our strategy to treat dynamic allocation amounts to determining a binding-time description of allocated objects on a program point basis. That is, each program point where dynamic allocation is associated a single location. As a result, the binding-times of all the objects allocated at the same site are merged together. Based on our experience, this strategy is accurate enough to e ectively specialize systems programs. Second, our approach to making our analysis interprocedural and polyvariant is as follows. Each di erent call site context is taken into account when analyzing the body of a function, which allows more precise treatment of functions. The FBTA polyvariance is modeled after existing polyvariant binding-time analyses, which have previously only been implemented for functional languages, see for example [11]. For an imperative language like C, however, a function call signature includes the binding times of not only the functions arguments, but also of the non-local variables. This is necessary since these variables, in addition to the actual parameters, determine the context of a function call. To avoid potential overspecialization, only those variables that are used before they are de ned in the function are taken into account. Also, if a binding-time function cache is used to avoid reanalyzing functions which are called in identical contexts, the cache must not only contain the already analyzed function but also the bindingtime state of non-local variables de ned by the function. The context that needs to be taken into account for polyvariant BBTA also depends on the information at each call site which a ects the analysis of a function. Use-binding times are propagated backward to a vari-

4 Experimental Results We have previously identi ed operating systems as being good candidates for specialization by hand specializing certain existing operating systems components and achieving signi cant speedups [41, 48]. Using current partial evaluation technology to automatically obtain these same speedups is not possible, as existing bindingtime analyses are not precise enough to determine the transformations applied by hand. In fact, partial evaluation has never yet been applied to existing code. Rather, it has always been necessary to write programs from scratch, or to take existing code and adapt it by hand, in order to achieve successful specialization. If partial evaluation is to be successfully applied to existing code, as we have subsequently shown in [37], a use-sensitive binding-time analysis is necessary. In this section, we summarize these results by comparing a use-sensitive analysis with a use-insensitive analysis on a variety of systems programs. Let us now give a brief description of each program considered and present the key invariants used for its specialization. The rst program, copy elim, involves typical message packet manipulation found in network software [49]. The packets are handled via pointers to data. Parts of this data is static (typically, some headers), while other parts are dynamic (the message itself). The second program, minix read, is a fragment of the Minix lesystem implementation [46]. Precisely, we specialized the higher-level routines of the read() system call, with respect to a given le and a given size to be read. 14

Here also, the le descriptor is only partially static (e.g., the le mode is static, while the le o set is dynamic), and this structure is handled via a pointer. The third and fourth programs, client stub and marshalling, are two code fragments coming from Sun's remote procedure call (RPC) implementation [45]. The program client stub contains the client stub layer, and the program marshalling comes from the marshalling layer. We specialized these programs with respect to a given client/server interface, where various descriptors ( le descriptors, socket descriptors, protocol descriptors, . . . ) were partially static. Our experiment consisted of binding-time analyzing all four programs twice: rst, with a use-insensitive analysis and then with a use-sensitive analysis. After each analysis, static statements and expressions were counted. These results are given in Table 1 for each program considered, together with its number of lines, statements, and expressions. The number of static statements and expressions are expressed as a percentage. The percentages obtained by the use-insensitive and use-sensitive analysis are used to compute the gain produced by the latter analysis. In the use-sensitive case, statements and expressions found to be both static and dynamic (i.e., SD) were not counted as static since they appear in the residual program. The main observation to make on Table 1 is that the use-sensitive analysis detects on all programs between 10% and 58% more static statements and expressions. Since specialization evaluates static constructs and residualizes dynamic constructs, the higher percentage of static constructs directly translates into a more optimized residual program. Indeed, the specialization of these programs clearly showed that all their invariants mentioned above were exploited as expected. In some cases, the resulting specialized programs were competitive with respect to their manually specialized version.

representation for dynamic uses) for the same value. For example, the partial evaluators FUSE [50] and Schism [11, 12] both carry around two representations of each closure, allowing it to be both applied or residualized depending on the context. Danvy et. al show how program transformations prior to partial evaluation can achieve similar results [19, 21]. These solutions suce when each value has some dynamic representation, i.e., an appropriate piece of text which can replace the value if it needs to be residualized. This is comparable to seeing all values as being liftable. However, for imperative languages like C, pointers, arrays, and structure values cannot be lifted. In certain cases, pointer values can have dynamic representations based on the name of a variable, (e.g., using &x for the address of x), but this is not always possible. For example, these representations are not valid interprocedurally (if local variable names are passed out of scope), with dynamic memory allocation (where there is no variable name). Further, no such dynamic representation exists for structures. In this case, use sensitivity is required to achieve accurate results. On the other hand, use sensitivity can also be applied to liftable values, which in certain cases can produce better specialization. For example, Schism incorporates a form of use sensitivity with respect to data structures. Even though data structures in Schism can always be lifted, which means dynamic uses do not pollute static uses, lifting many copies of the same data structure introduces more code and memory usage in the residual program. When a large, static data structure occurs in many dynamic contexts, it is not desirable to lift it and thus residualize it in many di erent places. By detecting when this situation happens, Schism residualizes one unique instance of the data structure itself along with multiple references to it, thus avoiding data duplication. It should also be mentioned that in any other analysis which is based on a similar two-level semantics (i.e. evaluate and residualize) will also encounter similar problems. For example, much work has been done in the area of constant propagation, which involves a phase where constant values are propagated, followed by a phases where computations which depend solely on constant values are folded [33]. In these works, only liftable values are considered. If non-liftable values were propagated and used in folding expressions, a similar technique would need to be developed in order to resolve the problems discussed in this paper.

5 Related Work Work related to use sensitivity has been considered from di erent perspectives and languages. Program transformations and value representations have been proposed to achieve some forms of use sensitivity. Some analyses have been developed to solve similar data ow problems.

5.1 Static and Dynamic Representations for the Same Value

5.2 Partial Evaluation of C

There are already a number of existing analyses which can maintain both a static representation (concrete value for static uses) and a dynamic representation (textual

C-Mix is a partial evaluator for C which handles arbitrary data structures [3, 5, 28]. However, since its

15

copy elim minix read client stub marshalling

total statements number total % static of lines number use-insens. use-sens. 254 108 23% 69% 314 174 26% 46% 960 581 45% 54% 910 269 34% 54%

gain 46% 20% 10% 20%

expressions total % static number use-insens. use-sens. 352 27% 85% 378 41% 65% 1732 46% 60% 887 38% 78%

gain 58% 24% 14% 40%

Table 1: Percentage gain of static statements and expressions of systems programs due to use-sensitivity.

5.3 Slicing

binding-time analysis is use-insensitive, dynamic uses of non-liftable values interfere with static uses. As mentioned earlier, one way to circumvent the losses incurred from a use-insensitive analysis is to rewrite the code by hand, carefully separating static uses from dynamic uses. C-Mix attempts at automating such a separation by structure splitting. This technique splits a data structure into separate components by creating a new variable for each structure element. This process can be repeated recursively (on nested structures) until all structures are eliminated. If the elds of the initial structure are liftable values, then all of the corresponding new variables are liftable. And, since values that are liftable do not incur a loss in ow-insensitive analyses, the problem is resolved. Although this approach is currently intra-procedural, Andersen proposes an interprocedural extension to structure splitting which would introduce a new function parameter for each eld of a structure [5]. This approach, however, does not appear to scale up to realistic applications. As already mentioned, systems programs typically maintain a system state which consists of numerous nested data structures. For example, in the marshalling application considered in Sect. 4, the system state is represented by struct cu data, a data structure containing a total of 29 elds. To pass this information interprocedurally, the structure is always passed via a pointer, which avoids copying each eld at each function call. Andersen's proposed interprocedural extension defeats this technique, since each new parameter introduced reintroduces the copying. Also, when dealing with large systems programs, it is typically the case that a small piece of the system is extracted and specialized. After specialization, the new, optimized piece must be reinserted into its larger context. For this reason, it is necessary to preserve the interface between these two parts. Andersen's proposed interprocedural extension does not preserve this interface.

Our FBTA and BBTA have similarities with analyses used in program slicing [47]. Forward slicing techniques, which propagate information from variable de nitions to variables uses, have even been used to de ne binding-time analyses for imperative programs [22]. This forward analysis is very similar to the forward part of our binding-time analysis. However, non-liftable values are not addressed by this work; no backward analysis is provided to treat values used in di erent contexts. On the other hand, there are also backward slicing techniques which are similar to the backward part of our binding-time analysis, propagating information from variable uses to variable de nitions [42]. This can be viewed as a form of neededness information. Just as slicing computes which commands are needed in a slice, our backward analysis computes the binding time of definitions. The main di erence is that, instead of using a two-point domain (needed, not needed), our analysis is performed with a four-point domain (static, dynamic, static and dynamic, and fg) since certain de nitions may to be both evaluated and residualized.

5.4 Arity Raising

Arity raising has been shown to be useful when specializing functional programs [43, 44]. The motivation for this work is to eliminate unnecessary data constructors and accessors as well as to reduce function call overhead. For example, consider a cons cell which is constructed of two values and then passed to a function which only uses one of these values. Arity raising transforms the program by passing the two values to the function instead of the the cons cell. This eliminates the initial construction and subsequent cell access. Further, instead of passing both values, only the value used by the function needs to be passed. Arity raising, like our two stage binding-time analysis, is achieved by combining a forward analysis with a backward analysis. In both cases, the forward phase determines the feasibility of a certain transformation. Our binding-time analysis determines if a construct can 16

be evaluated at specialization time while arity raising determines if a data structure being passed interprocedurally can indeed be split into its subcomponents. As well, both of the backward phases perform a form of neededness analysis. The binding-time analysis collects information concerning a variable's uses in order to determine the corresponding annotation with which to annotate the variable's de nition. Arity raising determines which subcomponents are used by a function and therefore need to be passed. Notice how both require a backwards analysis to propagate information from variable uses to variable de nitions.

[4]

[5]

6 Conclusion

[6]

We introduce the idea of use sensitivity and show how a use-insensitive binding-time analysis incurs a loss of accuracy for applications which contain non-liftable values, such as operating systems code. We present a binding-time analysis which is use sensitive, consisting of a forward analysis followed by a backward analysis. A new lattice is used to calculate the binding times of de nitions, since use sensitivity introduces the idea of both evaluating and residualizing the same de nition. By implementing this analysis and testing it on existing systems code, we have found that it produces results with the precision necessary to achieve a high degree of specialization.

[7]

[8]

[9]

Acknowledgements The authors would like to thank Julia Lawall for her comments on drafts of this paper as well as many fruitful discussions. We are also grateful for the help of Gilles Muller and Eugen-Nicolae Volanschi with the benchmarks. This research is supported in part by France Telecom/SEPT, ARPA grant N00014-94-1-0845, and NSF grant CCR-92243375.

[10]

[11]

References

[12]

[1] S.M. Abramov and N.V. Kondratjev. A compiler based on partial evaluation. In Problems of Applied Mathematics and Software Systems, pages 66{ 69. Moscow State University, Moscow, USSR, 1982. (In Russian). [2] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers Principles, Techniques, and Tools. AddisonWesley, 1986. [3] L.O. Andersen. Self-applicable C program specialization. In Partial Evaluation and SemanticsBased Program Manipulation, pages 54{61, San

[13]

[14]

17

Francisco, CA, USA, June 1992. Yale University, Hew Haven, CT, USA. Technical Report YALEU/DCS/RR-909. L.O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, Denmark, 1994. DIKU Research Report 94/19. L.O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, Computer Science Department, University of Copenhagen, May 1994. DIKU Technical Report 94/19. P.H. Andersen. Partial evaluation applied to ray tracing. DIKU Research Report 95/2, DIKU, University of Copenhagen, Denmark, 1995. R. Baier, R. Gluck, and R. Zochling. Partial evaluation of numerical programs in Fortran. In Partial Evaluation and Semantics-Based Program Manipulation, Orlando, Florida, June 1994 (Technical Report 94/9, Department of Computer Science, University of Melbourne), pages 119{132, 1994. A.A. Berlin. Partial evaluation applied to numerical computation. In 1990 ACM Conference on Lisp and Functional Programming, Nice, France, pages 139{150. New York: ACM, 1990. A. Bondorf and J. Jrgensen. Ecient analyses for realistic o -line partial evaluation. Journal of Functional Programming, 3(3):315{346, July 1993. C. Consel. Polyvariant binding-time analysis for applicative languages. In Partial Evaluation and Semantics-Based Program Manipulation, Copenhagen, Denmark, June 1993, pages 66{77. New York: ACM, 1993. C. Consel. Polyvariant binding-time analysis for applicative languages. In PEPM93 [40], pages 145{ 154. C. Consel. A tour of Schism. In PEPM93 [40], pages 66{77. C. Consel. A tour of schism: A partial evaluation system for higher-order applicative languages. In Partial Evaluation and Semantics-Based Program Manipulation, Copenhagen, Denmark, June 1993, pages 145{154. New York: ACM, 1993. C. Consel, L. Hornof, F. Noel, J. Noye, and E.N. Volanschi. A uniform approach for compile-time and run-time specialization. In Danvy et al. [20], pages 54{72.

[15] C. Consel and S.C. Khoo. Semantics-directed generation of a Prolog compiler. In J. Maluszynski and M. Wirsing, editors, Programming Language Implementation and Logic Programming, 3rd International Symposium, PLILP '91, Passau, Germany, August 1991 (Lecture Notes in Computer Science, vol. 528), pages 135{146. Berlin: Springer-Verlag, 1991. [16] C. Consel and F. Noel. A general approach for run-time specialization and its application to C. In Conference Record of the 23 Annual ACM SIGPLAN-SIGACT Symposium on Principles Of Programming Languages, pages 145{156, St. Petersburg Beach, FL, USA, January 1996. ACM Press. [17] C. Consel, C. Pu, and J. Walpole. Incremental specialization: The key to high performance, modularity and portability in operating systems. In PEPM93 [40], pages 44{46. Invited paper. [18] C. Consel, C. Pu, and J. Walpole. Making production OS kernel adaptive: Incremental specialization in practice. Technical report, Department of Computer Science and Engineering, Oregon Graduate Institute of Science & Technology, 1994. [19] O. Danvy. Type-directed partial evaluation. Technical Report PB-494, Computer Science Department, Aarhus University, July 1995. [20] O. Danvy, R. Gluck, and P. Thiemann, editors. Partial Evaluation, International Seminar, Dagstuhl Castle, number 1110 in Lecture Notes in Computer Science, February 1996. [21] O. Danvy, K. Malmkjr, and J. Palsberg. The essence of eta-expansion in partial evaluation. Lisp and Symbolic Computation, 8(3):209{228, September 1995. [22] M. Das, T. Reps, and P. Van Hentenryck. Semantic foundations of binding-time analysis for imperative programs. In ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, pages 100{110, La Jolla, CA, USA, 1995. ACM Press. [23] A.M. Erosa and L.J. Hendren. Taming control

ow: A structured approach to eliminating goto statements. In Proceedings of the IEEE 1994 International Conference on Computer Languages, May 1994. [24] R. Gluck, R. Nakashige, and R. Zochling. Bindingtime analysis applied to mathematical algorithms.

[25]

[26]

rd

[27]

[28]

[29]

[30] [31]

[32] [33]

[34]

18

In J. Dolevzal and J. Fidler, editors, System Modelling and Optimization, pages 137{146. Chapman and Hall, 1995. C. Goad. Automatic construction of special purpose programs. In D.W. Loveland, editor, 6th Conference on Automated Deduction, New York, USA (Lecture Notes in Computer Science, vol. 138), pages 194{208. Berlin: Springer-Verlag, 1982. C. K. Gomard and N. D. Jones. Compiler generation by partial evaluation. In G. X. Ritter, editor, Information Processing '89. Proceedings of the IFIP 11th World Computer Congress, pages 1139{ 1144. IFIP, Amsterdam: North-Holland, 1989. L. Hornof and J. Noye. Accurate binding-time analysis for imperative languages: Flow, context, and return sensitivity. In ACM SIGPLAN Conference on Partial Evaluation and Semantics-Based Program Manipulation, Amsterdam, The Netherlands, June 1997. ACM Press. To appear. N.D. Jones, C. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. International Series in Computer Science. Prentice-Hall, June 1993. N.D. Jones and D.A. Schmidt. Compiler generation from denotational semantics. In N.D. Jones, editor, Semantics-Directed Compiler Generation, Aarhus, Denmark (Lecture Notes in Computer Science, vol. 94), pages 70{93. Berlin: Springer-Verlag, 1980. J. Jrgensen. Compiler generation by partial evaluation. Master's thesis, DIKU, University of Copenhagen, Denmark, 1992. Student Project 92-1-4. P. Kleinrubatscher, A. Kriegshaber, R. Zochling, and R. Gluck. Fortran program specialization. In Workshop Semantikgestutzte Analyse, Entwicklung und Generierung von Programmen. Justus-LiebigUniversitat Giessen, Germany, 1994. (to appear). T.J. Marlowe and B.G. Ryder. Properties of data

ow frameworks. Acta Informatica, 28(2):121{163, December 1990. R. Metzger and S. Stroud. Interprocedural constant propagation: An empirical study. ACM Letter on Programming Languages and Systems, 2(1{ 4):213{232, March{December 1993. T. Mogensen. The application of partial evaluation to ray-tracing. Master's thesis, DIKU, University of Copenhagen, Denmark, 1986.

[35] T. Mogensen. Partially static structures in a selfapplicable partial evaluator. In D. Bjrner, A.P. Ershov, and N.D. Jones, editors, Partial Evaluation and Mixed Computation, pages 325{347. Amsterdam: North-Holland, 1988. [36] T. Mogensen and A. Bondorf. Logimix: A selfapplicable partial evaluator for Prolog. In K.-K. Lau and T. Clement, editors, LOPSTR 92. Workshops in Computing. Berlin: Springer-Verlag, January 1993. [37] G. Muller, R. Marlet, E.N. Volanschi, C. Consel, C. Pu, and A. Goel. Fast, optimized sun rpc using automatic program specialization. Publication interne 1094, Irisa, Rennes, France, March 1997. [38] G. Muller, E.N. Volanschi, and R. Marlet. Scaling up partial evaluation for optimizing a commercial RPC protocol. Rapport de recherche 1068, Irisa, Rennes, France, December 1996. Also published in ACM SIGPLAN Conference on Partial Evaluation and Semantics-Based Program Manipulation, 1997. [39] H.R. Nielson and F. Nielson. Semantics with Applications: A Formal Introduction. Wiley Professional Computing. John Wiley & Sons, 1991. [40] Partial Evaluation and Semantics-Based Program Manipulation, Copenhagen, Denmark, June 1993. ACM Press. [41] C. Pu, T. Autrey, A. Black, C. Consel, C. Cowan, J. Inouye, L. Kethana, J. Walpole, and K. Zhang. Optimistic incremental specialization: Streamlining a commercial operating system. In Proceedings of the 1995 ACM Symposium on Operating Systems Principles, pages 314{324, Copper Mountain Resort, CO, USA, December 1995. ACM Operating Systems Reviews, 29(5),ACM Press. [42] T. Reps and T. Turnidge. Program specialization via program slicing. In Danvy et al. [20], pages 409{429. [43] S.A. Romanenko. A compiler generator produced by a self-applicable specializer can have a surprisingly natural and understandable structure. In D. Bjrner, A.P. Ershov, and N.D. Jones, editors, Partial Evaluation and Mixed Computation, pages 445{463. Amsterdam: North-Holland, 1988. [44] S.A. Romanenko. Arity raiser and its use in program specialization. In N. Jones, editor, ESOP '90. 3rd European Symposium on Programming, Copenhagen, Denmark, May 1990 (Lecture Notes in Computer Science, vol. 432), pages 341{360. Berlin: Springer-Verlag, 1990.

[45] Sun Microsystems. Network Programming Guide, March 1990. [46] A.S. Tanenbaum. Operating Systems: Design and Implementation. Prentice-Hall, 1987. [47] F. Tip. A survey of program slicing techniques. Report CS-R9438, Computer Science, Centrum voor Wiskunde en Informatica, 1994. [48] E.N. Volanschi, G. Muller, and C. Consel. Safe operating system specialization: the RPC case study. In Workshop Record of WCSSS'96 { The Inaugural Workshop on Compiler Support for Systems Software, pages 24{28, Tucson, AZ, USA, February 1996. [49] E.N. Volanschi, G. Muller, C. Consel, L. Hornof, J. Noye, and C. Pu. A uniform automatic approach to copy elimination in system extensions via program specialization. Rapport de recherche 2903, Inria, Rennes, France, June 1996. [50] D. Weise, R. Conybeare, E. Ruf, and S. Seligman. Automatic online partial evaluation. In J. Hughes, editor, Functional Programming Languages and Computer Architecture, volume 523 of Lecture Notes in Computer Science, pages 165{ 191, Cambridge, MA, USA, August 1991. SpringerVerlag.

19