A Space-Efficient Optimization of Call-by-Need - CiteSeerX

0 downloads 0 Views 2MB Size Report
the usual features including arithmetic, list-operators, if- then-else, etc. The syntax and semantics of our example programs should be clear. Furthermore we will ...
636

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-13, NO. 6, JUNE 1987

A Space-Efficient Optimization of Call-by-NeedF. WARREN BURTON, DIETER MAURER, HANS-GEORG OBERHAUSER,

Abstract-Call-by-need is widely regarded as an optimal (to within a constant factor) parameter passing mechanism for functional programming languages. Except for certain special cases involving higher order functions, call-by-need is optimal with respect to time. However, call-by-need is far from optimal with respect to space. We examine some of the space problems which can arise with call-by-need and other parameter passing mechanisms. A simple optimizing technique, based on work by Mycroft [1], is proposed. If it can be determined both that an expression must be evaluated eventually and that the evaluation of the expression is likely to reduce the space required by the program, then the evaluation is performed as soon as possible. This optimization does not result in optimal space performance in all cases. However, in most of the common cases where call-by-need causes a problem the proposed optimization avoids the problem. Since our technique is not always optimal, it is likely to be of greatest advantage in situations where efficiency is important but not critical. For example, functional languages with call-by-name semantics are increasingly being used as specification languages. Since such a specification is runnable, it may be used as a prototype. This makes it possible to experiment with a program and refine the specification before the implementation in the target language is started. Index Terms-Call-by-need, functional programming, optimization, space, strictness.

I. INTRODUCTION

CALL-BY-NEED [2] is used as a universal parameter

passing mechanism in numerous functional language implementations. Call-by-need combines the advantages of call-by-name and call-by-value. Whenever a function application of the formf (arg) has to be evaluated, the evaluation of f is started before the argument expression arg is computed. The argument is computed only if its value is needed to continue the computation in the course of evaluating f. Once evaluated, the value of arg is kept for future use. If overhead is ignored, then the amount of work required to evaluate a program using call-by-need is as small as possible using any evaluation mechanism except for certain special cases involving higher order functions. (We will describe such a special case in Section VIC.) Implementing call-by-need generally leads to some overhead thus increasing the running time of programs. Implementations based on lambda calculus style reducManuscript received October 31, 1984; revised July 31, 1985. This work was supported in part by the National Science Foundation under Grants ECS-8312748 and DCM-8514946. F. W. Burton was supported in part and H.-G. Oberhauser was supported by SFB124 "VLSI-Design-Methods and

Parallelism." F. W. Burton is with the Department of Computer Science, University of Utah, Salt Lake City, UT 84112. D. Maurer, H.-G. Oberhauser, and R. Wilhelm are with the Fachbereich 1-Informatik, Universitaet des Saarlandes, D-6600 Saarbruecken, West Germany. IEEE Log Number 8714445.

AND

REINHARD WILHELM

tion form a closure whenever an argument is passed using call-by-need. This provides a suitable environment in case the argument has to be evaluated later. (This overhead can be avoided in combinator based implementations [3].) Mycroft [1] has discussed the problem of converting callby-need to call-by-value in certain "safe" cases to avoid the overhead of call-by-need. If call-by-value is used arguments are evaluated before the corresponding function body is computed. Therefore no closures are needed. The conversion is regarded as safe if it can be determined by the compiler that the parameter must eventually be evaluated, provided that the program terminates. We are going to use Mycroft's methods to point out the usefulness of call-by-need to call-by-value transformations with respect to space. We are not interested in the constant factor overhead in certain implementations mentioned above. We regard the evaluation of a functional program as a reduction process on expresssions. The program together with its input is repeatedly transformed according to reduction rules until a normal form (the result) is produced. We are interested in the space requirements of the intermediate expressions which are calculated in the course of this reduction. It does not matter whether a lambda calculus or combinator reduction scheme is used to implement the language. We will use a reduction process at the level of the functional language itself instead of worrying about low level representations (i.e., lambda expressions or combinators). With many programs, the size of these intermediate expressions, and therefore the storage requirements, will grow linearly with the execution time of the program if call-by-need is used. In some of these cases, use of a different parameter passing mechanism may result in a program which will run in constant space and will not require any increase in execution time (except for a constant factor increase or decrease due to the overhead of parameter passing). One solution to the above problem is to give the programmer explicit control over parameter passing and evaluation order. Burton [4] has proposed annotations for doing this. In a serious production system where efficiency is critical, this may be the most promising way to proceed. At present, functional languages are suitable for small problems where constant factor efficiency is not overly important. In addition, functional programming languages are increasingly being used as formal specification languages. One advantage of this is that the specification is runnable. The specification immediately gives a slow but complete prototype. This makes it possible to debug

0098-5589/87/0600-0636$01.00 © 1987 IEEE

637

BURTON et al.: SPACE-EFFICIENT OPTIMIZATION OF CALL-BY-NEED

and refine a specification before the implementation of a production program is started. In applications such as these, the primary concern in the design of the language is ease of use. In particular, as far as possible the programmer should not have to worry about operational details such as choice of the parameter passing mechanism.! We will consider the problem of automatically selecting a suitable parameter passing mechanism, taking into account both time and space. The optimal choice will not always be made. For example, in some cases there will be a time-space tradeoff (e.g., space requirements may be reduced if some work is performed early whether it will be used or not). In such cases no single right choice exists. In general, even when one parameter passing mechanism is strictly better than another, the problem of determining this is undecidable. More commonly, the problem of determining the best parameter passing is intractable. However a relatively simple strategy will capture a lot of the possible optimizations while avoiding any transformation which would decrease the performance, so a programmer who expects call-by-need can only be pleasantly surprised. In Section II we will give an informal sketch of Mycroft's algorithm. In Section III we will look at several examples of what can go wrong with respect to the storage requirements of a functional program. The proposed solution is presented in Section IV. The examples of Section III are reconsidered in Section V, and shown to perform well with respect to space with the proposed optimization. Section VI considers the limitations of the approach and examines situations where the proposed optimization fails to produce the desired results. Section VII is the conclusion.

when interpreted over the domain B: = { n, t }, ordered by n < t. We assume knowledge of a monotonic function N[s]:Bn -+ B for each system defined (primitive) function s that describes whether a call s ( el, * * , en ) will terminate depending on the termination of ei. More precisely, for each function f we want NI f ] to have the following property: 1) N[ f ] is monotonic and 2) let {il, * * *, ik } be any set of positions in {1, * * , n} such that N[f] (xi, ** , x") = n if xij = n forj = *, e,) will not 1 .. k. Then the evaluation of f(el, terminate provided the evaluation of eij does not terminate forj = 1 . . k. Starting with this information about system defined functions we can compute N[ f ] for a user defined (nonprimitive) function f. Having computed N[ f ] we can easily determine, whether the ith parameter of f may be passed by value by testing whether NI f I , en) = n forxi = n andxj = t forj * i. (el, To compute N[ f ] we associate a monotonic function Nn I[T ] : Bn -+ B with each subterm T of expr in a function , x,) = expr. Nn [ TI is defined indefinition f(x1, of expressions by: on the structure ductively Nn[g(el, * , em)] (x) = N[g] (Nj[el] (x), Nn[em] (x)) -

-

-

Nn [k] (x) = t, for constant k N,[x] (x) =Xi , xv). where x is a shorthand for (xl,

We assume here that a program is first order, i.e., functions are not allowed as function parameters or results of function applications. For a first order program

II. MYCROFT's APPROACH

Following Mycroft [1] we may use an abstract interpretation to determine which call-by-need parameters may be safely passed by value. We will give an informal description of this approach. We will assume that a program is a collection of first order recursive equations of the form: -

fiF

we get an equation system

N[ f1 ]

=

Nm. [expl ]

)=~expl

N[f ] -Nmn[expn ] with variables Njf1 ], * - , N[f, ] over [Bm' B], * , [BmIn -- BI. Any solution of this system can We may pass the ith parameter of a function f by value be used to determine which parameter may be passed by if in any function call, f (el, * * en ), ei is eventually value, as shown above. The least fixed point yields the en) ter- best results and can be computed iteratively by starting evaluated provided the evaluation of f( el, * minates (i.e., if the evaluation of a function call with N[f, ]O, the constant function of the respective arity yielding always n. , en) does not terminate whenever the evaluf (el, -

-

-

,

,

...

ation of ei does not terminate). Therefore all values of terminating computations are identified in the abstract interpretation and are called t (for terminating). We also assign a "value" n (for nonterminating) to all nonterminating computations and we investigate the effect of n,

III. STORAGE PROBLEMS

The functional programming language we are going to use throughout the paper is based on the lambda calculus but allows the definition of named functions and provides

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-13, NO. 6, JUNE 1987

638

the usual features including arithmetic, list-operators, ifthen-else, etc. The syntax and semantics of our example programs should be clear. Furthermore we will assume that tail recursion can be removed by the compiler and that any data object which becomes inaccessible may be immediately reclaimed by the storage management system. The size of expressions is defined inductively on their structure in a natural way: size(x) = 1, x variable size(n) = 1, n natural number , xn)) = size(f ) + size(x1 ) + size(f(xi, +

size(Xn)

size(if b then t else f) = 1 + size(b) + size(t) + size(f) size(a op b) = size(a) + size(b) + 1 where op is some binary operator such as +,-, *, etc. Example: A list of n items al, , an of size one each is represented as cons (a,, cons (a2, *-, cons (an, NIL)) * ) and requires n + n + 1 = 0(n) space. A. Problems with Call-by-Need In this section we give an example of the problems with space if call-by-need is used as the parameter passing mechanism. Consider the problem of computing f (1) + f(2) + * * + f(n) for any given n. Suppose f(i ) can be computed in 0(1) time and space. Clearly g(O, n) will produce the desired result, where g is defined as follows: g(a, m) < = if m = O then a else g(f(m) +a, m-1). With call-by value, this tail recursive function will execute in O(n) time and 0(1) space. However, with callby-need the computation of g (0, 4) would proceed as follows: -

In this case, the second parameter of select will grow at each level of recursion, if it is passed using call-by-need. For example

select(5, count(l,10)) will generate the intermediate result

head(tail(tail(tail(tail(count(1, 10)))))); so this algorithm requires O(m) space. The reader should note that the storage requirements vary linearly with the execution time in these examples. Many real functional programs contain functions having structures similar to the ones in these examples.

The key observation in this section is: call-by-need causes storage problems because necessary reduction steps which reduce space requirements were delayed too long.

B. Problems with Call-by-Value The opposite -problem to that of the previous examples may arise with call-by-value. That is, evaluation steps which cause an increase in storage requirements may be done too soon. Consider the expression

-tail .sum(O. map( f, count(l, n))) where tailUsum(a, x) < = if x = nil then a else tail-sum(a + head(x), tail(x)) and map(f,x) < if x = nil then nil else cons(f(head(x)), map(f, tail(x)))

This expression is equivalent in value to g (0, n) in our previous example. If call-by-value is used, then a list of length n will be produced by count, requiring O(n) storage. However, if call-by-need is used except that the first parameter of tail sum is passed by value then this g(0,4)

Suggest Documents