Experiments with the Call-by-Value Partial Evaluator M. Alpuente
x
M. Falaschi
{
G. Vidal
3
x
March 4, 1998
Abstract This paper summarizes our experience gained using the Indy system, a narrowing-driven partial evaluator for functional logic programs which combines in a useful and eective way the propagation of partial data structures, by means of logical variables and uni cation, with better opportunities for optimization thanks the functional dimension. Indy allows the user to select either a call-by-value (innermost, eager) or a call-by-name (outside-in, lazy) narrowing strategy to construct local narrowing trees. The trees are stopped according to an unfolding rule which is another parameter of the implementation, and then resultants are extracted from the root-to leaf paths of the tree. A short description of the system is given and some experimental results are presented for the call-by-value (innermost) evaluation strategy. The experiments show in practice that Indy can give signi cant performance improvements and is quite eective on several interesting transformation problems.
1
Introduction
Partial evaluation (PE) is a semantics-preserving optimization technique for computer programs which consists of the specialization of the program w.r.t. parts of its input [10, 19]. Recently, a narrowing-driven PE method (NPE) for functional logic programs has been proposed in [4]. The use of narrowing allows NPE to handle expressions containing partial information (by means of logical variables and uni cation) in a natural way, as well as to consider ecient evaluation strategies and include a deterministic simpli cation phase which provides better opportunities for optimization and improves both, the overall specialization and the eciency of the method [5]. Using the terminology of [16], NPE produces both polyvariant and polygenetic specializations, i.e. it can produce dierent specializations for the same function de nition, and can also combine distinct original function de nitions into a comprehensive specialized function, respectively. The NPE framework follows a structure similar to the framework developed by Martens and Gallagher [25] for partial deduction of logic programs, where a clear distinction between local and global control is made. Roughly speaking, local control concerns the construction of partial narrowing trees for single terms, while global control is dedicated to ensure the closedness of the partially evaluated program, producing suciently many specialized versions for each original function de nition without risking nontermination. The algorithm starts by partially evaluating the initial set of terms, and then it recursively specializes the terms which are introduced dynamically during this process. Appropriate unfolding and abstraction (generalization) operators are needed to ensure that in nite unfolding is not attempted (local termination) and that the set of (to be) partially evaluated terms is kept nite throughout the specialization process (global termination). The framework is parametric w.r.t. the narrowing strategy which is used for the automatic construction of ( nite) narrowing trees from which partial evaluations are extracted. A call-by-value instance of the generic algorithm is formalized in [5] using innermost (normalizing) 3 This work has been partially supported by CICYT under grant TIC 95-0433-C03-03 and by HCM project CONSOLE. x DSIC, U. P. Valencia, Camino de Vera s/n, 46022 Valencia, Spain (falpuente,
[email protected]). { Dip. Matematica e Informatica, U. Udine, Via delle Scienze 206, 33100 Udine, Italy (
[email protected]).
1
narrowing, while [3] formalizes a call-by-name instance based on lazy narrowing. The Indy system [2] (Integrated Narrowing-Driven specialization system) is a rather concise implementation of the narrowing-driven partial evaluator of [3, 4] and allows the user to select either an innermost or a lazy narrowing strategy. The system is written in SICStus Prolog v3.6 and is publicly available. In this paper, a short description of the Indy system is given and some points are worked out which may be of general interest for future development of similar systems: the termination criteria, the kind of optimizations that can be achieved, and the advantages of using normalization. Some experimental results are also presented for the call-by-value (innermost) evaluation strategy which show in practice that Indy can give signi cant performance improvements and is quite eective on several interesting transformation problems. Actually, our experiments show that Indy combines some good features of deforestation [30], PE [19], and partial deduction [24], similarly to Turchin's supercompilation [29]: the system passes the so-called \KMP-test" (which neither standard PE nor deforestation pass), and it is is also able to do deforestation (which cannot be done by means of standard PE or partial deduction [16]) without any ad-hoc arti ce. The paper is organized as follows. Section 2 brie y recalls the call-by-value PE algorithm used by the system. Section 3 reports on some experiments with the call-by-value partial evaluator, while Section 4 contains some theses of general interest resulting from our experiments, each of them illustrated by representative examples. Finally, Section 5 concludes the paper and gives some directions for further developments. 2
The Call-by-Value Partial Evaluator
The Indy system has been constructed at the Technical University of Valencia as a part of the CPD project sponsored by CICYT1 . The complete implementation consists of about 400 clauses (2000 lines of code). The partial evaluator is expressed by 95 clauses, the metainterpreter by 115 clauses (including the code needed to handle the ground representation ), the parser and other utilities by 95 clauses, the user interface by 50 clauses, and the postprocessing renaming by 45 clauses. The implementation only considers unconditional programs. Conditional programs are treated by using the prede ned functions and, if then else, and case of, which are reduced by standard de ning rules (see, e.g., [26]). The current implementation allows the user to select either a call-by-name (lazy) or a call-by-value (innermost) evaluation strategy. Normalization between narrowing steps is also allowed (using a terminating subset of program rules). The specializer also performs a post-processing renaming phase which is useful to automatically ful ll the independence condition as well as to ensure that the left-hand sides of the specialized rules are patterns [3]. In this section, we recall the essentials of the call-by-value narrowing-driven PE algorithm introduced in [5], an instance of the generic NPE algorithm of [4] which uses the normalizing innermost narrowing relation of [13] to build the local narrowing trees. The requirements that programs are constructor-based, completely-de ned, canonical TRS's are necessary for the completeness of this strategy and thus assumed in the remainder of this section. In the following, stands for the normalizing innermost narrowing relation which formalizes the computation steps. Specialized program rules are extracted from narrowing derivations using the notion of resultant. De nition 2.1 (resultant) Let s be a term and R be a program. Given a narrowing derivation s 3 t, the associated resultant is the rewrite rule s ! t.
The partial evaluation of a term s is obtained by constructing a (possibly incomplete) narrowing tree for s in R, and then extracting the specialized de nitions (the resultants) from the root-to-leaf paths of the tree. De nition 2.2 (partial evaluation) Let
R be a program and s be a term. Let be a nite (possibly incomplete) narrowing tree for s in R. Let ft1; : : : ; tn g be the terms in the leaves of . 1 CICYT is the Spanish National Funding Agency.
2
Then, the set of resultants of the narrowing sequences fs +i ti j i = 1; : : : ; n g is called a partial evaluation of s in R. The partial evaluation of a set of terms S in R is de ned as the union of the partial evaluations for the terms s in S.
Appropriate independence and closedness conditions are needed to ensure the correctness of partial evaluations. Essentially, the independence condition ensures that no additional answers are produced in the specialized program, while the closedness condition is aimed to ensure that resultants form a complete description covering all calls that may occur at run-time. Formal de nitions can be found in [5]. An unfolding rule is used to guarantee that no in nite unfolding will be attempted. We now recall the nonembedding unfolding rule of [5]. The following de nition extends the homeomorphic embedding (\syntactically simpler") relation [12] to nonground terms.
De nition 2.3 (embedding relation) The homeomorphic embedding relation E on terms is de ned as the smallest relation satisfying: x g (t1 ; : : : ; tn ) t, if and only if: 1. f g (with m = n) and si
2. s
E y for all variables x ; y, and s f (s1; : : : ; sm ) E
E ti for all i = 1; : : : ; n, or
E tj , for some j , 1 j n.
The following criterion makes use of the embedding relation in a constructive way to produce nite narrowing trees. We say that two terms are comparable if they have the same outermost function symbol. A narrowing derivation t0 t1 : : : tn is admissible if there is not two comparable terms ti ; tj (i < j ) in the derivation such that ti E tj . De nition 2.4 (nonembedding narrowing tree) Given a term t0 and a program R, the nonembedding narrowing tree E (t0; R) for t0 in R is formed by the narrowing derivations: t0
:::
tn
tn +1
such that the following conditions hold: 1. the derivation (t0
:::
tn ) is admissible, and:
2. (a) the leaf tn +1 only contains constructor symbols (a successful derivation), or (b) the leaf tn +1 is not a constructor term and it does not contain any reducible expression (a failing derivation), or (c) tn +1 embeds some previous term ti , i n, in the derivation (an incomplete derivation, which is cut o because there is a risk of nontermination).
Nonembedding narrowing trees are nite, as showed in [5]. We de ne the nonembedding unfolding rule U E(s ; R) as the set of resultants associated to the derivations in E (s ; R). The last parameter of the NPE algorithm is the abstraction operator. This operator is used
to guarantee that the set of terms obtained during PE is kept nite, while still provides a right amount of specialization. In [5] we introduced a nonembedding abstraction operator which uses a simple kind of structure (con guration ) consisting of sequences of terms q (t1 ; : : : ; tn ) that we manipulate in such a way that termination of the specialized algorithm is guaranteed. The abstraction operator makes use of the notion of \most speci c generalization". A generalization of the nonempty set of terms ft1; : : : ; tn g is a pair ht ; f1; : : : ; n gi such that, for all i = 1; : : : ; n ; t i = ti . The pair ht ; 2i is the most speci c generalization (msg) of a set of terms S , written msg (S ), if 1) ht ; 2i is a generalization of S , and 2) for every other generalization ht 0; 20 i of S , t 0 is more general than t . The msg of a set of terms is unique up to variable renaming [21]. Given a con guration q , in order to add a new term t (headed by a de ned function symbol), the function abstract proceeds as follows. 3
If t does not embed any comparable term in q , then t is simply added to q . If t embeds several comparable terms of q , then it considers the rightmost one in the sequence, say t 0, and distinguish two cases: { if t is an instance of t 0 with substitution , then it recursively attempts to add the terms in to q ; { otherwise, the msg of t 0 and t is computed, say hw ; 1 ; 2 i, and then it attempts to add w as well as the terms in 1 and 2 to the con guration resulting from removing t 0 from q. Below we sketch the instance of the NPE algorithm for call-by-value specialization. To simplify the presentation, we consider the specialization w.r.t. an initial term t (the extension to sets of terms is straightforward). We let Rcalls denote the rhs's of the rules in R.
Algorithm 2.5
Input: a program R and a term t Output: a set of terms S Initialization: i := 0; q0 := t Repeat 0 1. R := U E(qi ; R); 0 ); 2. qi +1 := abstract (qi ; Rcalls 3. i := i + 1; Until qi = qi 01 Return S := qi The output of the algorithm is not a partial evaluation, but a set of terms S from which the partial evaluation U E(S ; R) is easily derived. Algorithm 2.5 incorporates only the scheme of a complete call-by-value partial evaluator. The resulting partial evaluations might be further optimized by eliminating redundant functors and unnecessary repetition of variables. Also, since we allow the specialization of terms containing nested function symbols, the residual program might also contain nested function symbols in the lhs's of program rules. Then, a post-processing renaming transformation can be useful to restore the constructor discipline The renaming phase may also serve to remove any remaining lack of independence in the set of specialized terms. The formal de nition of the post-processing renaming transformation can be found in [3]. Roughly speaking, the renaming transformation proceeds as follows: First, given the set of terms S which results from the execution of Algorithm 2.5, an independent renaming S 0 is computed. Essentially, S 0 contains pairs of terms hs ; s 0 i for each term s 2 S , such that s 0 consists of a fresh function symbol whose arguments are the distinct variables in s . Then, each resultant in R0 = U E (S ; R) is renamed according to S 0 , i.e. we perform fold on every function call in R0 (replacing the original term by a call to the newly de ned function) using the corresponding renaming in S 0 . The following section reports on some experiments with the Indy system which con rm that our approach pays o in practice and give signi cant speedups for standard problems. 3
Experimental Results
In order to assess the practicality of the call-by-value partial evaluator, we have benchmarked the speed and specialization achieved by the Indy system. Further information about the system can be found in [2]. We consider a set of benchmark programs which cover a range of (functional as well as logic) program transformation problems. The benchmarks used for the analysis are: 4
ackermann, the
classical ackermann's function; allones, which transforms all elements of a list into 1; applast, which appends an element to the end of a given list and returns the last element of the resulting list; double app, which concatenates three lists by performing two (nested) calls to append; double flip, which ips a tree structure twice, then returning the original tree back; fibonacci, bonacci's function; kmp, a semi-nave string pattern matcher; length app, which appends two lists and then computes the length of the resulting list; max length, which returns the maximum and the length of a list; palindrome, a program to check whether a given list is a palindrome; reverse, the well-known reverse with accumulating parameter; rev acc type, equal reverse with an extra type-check; sumprod, which obtains the sum and the product of the elements of a list. Some of the examples are typical partial deduction benchmarks (see [20, 22]) adapted to a functional syntax, while others come from the literature of functional program transformations, such as positive supercompilation [28], fold/unfold transformations [8, 11], and deforestation [30]. Tables 1 and 2 reproduce the source code of the benchmark programs as well as the specialized calls. double app append([],Y) -> Y append([X|R],Y) -> [X|append(R,Y)] call:
append(append(X,Y),Z)
double flip double_flip(T) -> flip(flip(T)) flip(leaf(N)) -> leaf(N) flip(tree(L,N,R)) -> tree(flip(R),N,flip(L)) call: double_flip(T)
applast applast(L,X) -> last(append(L,[X]) last([X]) -> X last([X|R]) -> last(R) append([],Y) -> Y append([X|R],Y) -> [X|append(R,Y)] call: applast(L,X)
ackermann ackermann(N) -> ack(s(s(0)),N) ack(0,N) -> s(N) ack(s(M),0) -> ack(M,s(0)) ack(s(M),s(N)) -> ack(M,ack(s(M),N))
allones f(L) -> allones(length(L)) allones(0) -> [] allones(s(N)) -> [1,allones(N)] length([]) -> 0 length([H|T]) -> sum(s(0),length(T)) sum(0,Y) -> Y sum(s(X),Y) -> s(sum(X,Y)) call: f(L)
length app lengthapp(L1,L2) -> length(append(L1,L2)) length([]) -> 0 length([H|T]) -> sum(s(0),length(T)) sum(0,Y) -> Y sum(s(X),Y) -> s(sum(X,Y)) append([],X) -> X append([H|T],X) -> [H|append(T,X)] call: lengthapp(L1,L2)
call:
ackermann(N)
Table 1: Benchmark programs and specialized calls (I).
5
fibonacci fib(0) -> s(0) fib(s(0)) -> s(0) fib(s(s(N))) -> sum(fib(s(N)),fib(N)) sum(0,Y) -> Y sum(s(X),Y) -> s(sum(X,Y))
call:
fib(N)
reverse reverse(L) -> rev(L,[]) rev([],A) -> A rev([H|T],A) -> rev(T,[H|A]) call:
reverse(L)
sumprod sumprod(L) -> sum(sumlist(L),prodlist(L)) sumlist([]) -> 0 sumlist([H|T]) -> sum(H,sumlist(T)) prodlist([]) -> s(0) prodlist([H|T]) -> prod(H,prodlist(T)) sum(0,Y) -> Y sum(s(X),Y) -> s(sum(X,Y)) prod(0,Y) -> 0 prod(s(X),Y) -> sum(prod(X,Y),Y) call: sumprod(L) rev acc type rev([],A) -> A rev([H|T],A) -> cond(islist(A),rev(T,[H|A])) islist([]) -> true islist([H|T]) -> islist(T) cond(true,A) -> A call: rev(L,[])
kmp match(P,S) -> loop(P,S,P,S) loop([],SS,OP,OS) -> true loop([P|PP],[],OP,OS) -> false loop([P|PP],[S|SS],OP,OS) -> if(eq(P,S),loop(PP,SS,OP,OS),next(OP,OS)) next(OP,[]) -> false next(OP,[S|SS]) -> loop(OP,SS,OP,SS) if(true,A,B) -> A if(false,A,B) -> B eq(a,a) -> true eq(b,b) -> true eq(a,b) -> false eq(b,a) -> false call: match([a,a,b],S) max length maxlen(L) -> pair(max(L,0),length(L)) length([]) -> 0 length([X|R]) -> s(length(R)) max([],M) -> M max([X|R],N) -> if(leq(X,N),max(R,N),max(R,X)) if(true,A,B) -> A if(false,A,B) -> B leq(0,0) -> true leq(0,s(M)) -> true leq(s(N),0) -> false leq(s(N),s(M)) -> leq(N,M) call: maxlen(L) palindrome palindrome(L) -> eqlist(reverse(L),L) reverse(L) -> rev(L,[]) rev([],L) -> L rev([X|L],Y) -> rev(L,[X|Y]) eqlist([],[]) -> true eqlist([A|RA],[B|RB]) -> if(eq(A,B),eqlist(RA,RB),false) if(true,A,B) -> A if(false,A,B) -> B eq(0,0) -> true eq(0,s(M)) -> false eq(s(N),0) -> false eq(s(N),s(M)) -> eq(N,M) call: palindrome([s(0),s(s(0))|R])
Table 2: Benchmark programs and specialized calls (II). 6
3.1
The Experiments
Table 3 contains the considered runtime calls, where t stands for the call executed in the initial program and t 0 for the (renamed) call executed in the specialized program. Benchmark
ackermann allones applast
double app double flip
Runtime Goals t t t t t t t t
kmp length app max length palindrome reverse rev acc type sumprod
0 0 0
t t
fibonacci
0
t t t t t t t t t t t t t t t t
0 0 0 0 0 0 0 0 0
ackermann(s(s(s(s(s(s(0))))))) = N ackermann1 1(s(s(s(s(s(s(0))))))) = N f([0; 0; 0; 0; 0; 0; 0; 0; 0]) = L f1 1([0; 0; 0; 0; 0; 0; 0; 0; 0]) = L applast([1; 2; 3; 4; 5; 6; 7; 8; 9; 1; 2; 3; 4; 5]; 6) = N applast2 1([1; 2; 3; 4; 5; 6; 7; 8; 9; 1; 2; 3; 4; 5]; 6) = N append(append([1; 2; 3; 4; 5; 6; 7; 8; 9]; [1; 2]); L) = [1; 2; 3; 4; 5; 6; 7; 8; 9; 1; 2; 3] append3 1([1; 2; 3; 4; 5; 6; 7; 8; 9]; [1; 2]; L) = [1; 2; 3; 4; 5; 6; 7; 8; 9; 1; 2; 3] double flip(tree(tree(leaf(1); 2; tree(leaf(1); 2; leaf(3))); 2; tree(leaf(1); 2; tree(leaf(1); 2; tree(leaf(1); 2; leaf(3)))))) = T double flip1 1(tree(tree(leaf(1); 2; tree(leaf(1); 2; leaf(3))); 2; tree(leaf(1); 2; tree(leaf(1); 2; tree(leaf(1); 2; leaf(3)))))) = T fib(s(s(s(s(s(s(s(s(0))))))))) = N fib1 1(s(s(s(s(s(s(s(s(0))))))))) = N match([a; a; b]; [a; a; a; a; b]) = W match1 1([a; a; a; a; b]) = W lengthapp([s(0); 0; s(s(0)); s(0)]; [s(s(0)); s(0); 0; s(0); 0]) = N lengthapp2 1([s(0); 0; s(s(0)); s(0)]; [s(s(0)); s(0); 0; s(0); 0]) = N maxlen([s(0); s(s(0)); 0; s(s(s(0))); s(0); s(s(0))]) = T maxlen1 1([s(0); s(s(0)); 0; s(s(s(0))); s(0); s(s(0))]) = T palindrome([s(0); s(s(0)); 0; s(0); s(s(0)); s(0); 0; s(s(0)); s(0)]) = B palindrome1 1([0; s(0); s(s(0)); s(0); 0; s(s(0)); s(0)]) = B reverse([1; 2; 3; 4; 5; 6; 7; 1; 2; 3; 4; 5; 6; 7]) = L reverse1 1([1; 2; 3; 4; 5; 6; 7; 1; 2; 3; 4; 5; 6; 7]) = L rev([1; 2; 3; 4; 5; 6]; []) = L rev1 1([1; 2; 3; 4; 5; 6]) = L sumprod([s(0); s(s(0)); s(s(s(0))); s(0); s(0); s(s(0))]) = N sumprod1 1([s(0); s(s(0)); s(s(s(0))); s(0); s(0); s(s(0))]) = N
Table 3: Runtime calls for the original and specialized programs. Table 4 summarizes our timing results. Specialization times were measured on a HP 712/60 workstation, running under HP Unix v10.01 (column MixTime). Speedups were computed by running the original (R) and specialized programs (RMix ) under the publicly available innermost FL system LPG2 in order to be fair and have realistic evaluations. For technical reasons, execution times were measured on a SUN SparcStation IPX/40, running under SUN OS v5.2 (columns Time R and Time RMix , respectively). Times are expressed in milliseconds and are not normalized. All times are the average of 10 executions. The fourth column indicates the speedups. Unfortunately, we have not been able to compare the compiled code sizes since programs are interpreted in LPG. 4
Analyzing the Results
Let us brie y analyze our results. Some benchmarks are common examples used to illustrate the ability of a program transformer to perform deforestation [30]. This is a test that neither standard PE nor partial deduction can pass. Essentially, the aim of deforestation is the elimination of useless intermediate data structures, thus reducing the number of passes over data. For instance, in benchmarks double app and double flip, an intermediate data structure (a list and a tree, respectively) is created during the computation. After specialization, the program is completely deforested. The specialized code we obtain for double flip is: double_flip1_1(leaf(A)) -> leaf(A) 2 LPG is a functional logic language implemented at LSR-IMAG, Grenoble, France, and is publicly available at
http://www-lsr.imag.fr/Les.Groupes/scop/f-logiciel.html.
7
Benchmark
ackermann allones applast double app double flip fibonacci kmp length app max length palindrome reverse rev acc type sumprod
MixTime
430 40 40 30 30 90 1.090 110 1.120 1.630 50 90 1.190
Time
R
975 275 1.015 708 252 1.464 934 287 2.432 746 893 720 1.498
Time
RMix
966 217 416 464 160 1.464 14 168 365 541 825 1.087 1.389
Speedup 1,01 1,27 2,44 1,53 1,57 1 66,71 1,71 6,66 1,38 1,08 0,66 1,08
Table 4: Runtimes and speedups. double_flip1_1(tree(A,B,C)) -> tree(flip1_1(A),B,flip1_1(C)) flip1_1(leaf(A)) -> leaf(A) flip1_1(tree(A,B,C)) -> tree(flip1_1(A),B,flip1_1(C))
which runs 1,57 times faster than the original program (a similar speedup has been achieved for Table 4). This seems to indicate that narrowing-driven PE is specially well-suited for performing deforestation. Deforestation algorithms do not recognize that an expression contains two or more functions that consume the same data structure. A further source of improvement can originate from the elimination of (unnecessary) multiple traversals of the same data structure. This is shown by the benchmarks allones, applast, and length app. For instance, the program length app traverses the input lists twice, one for appending them and one for counting the number of elements they contain. Indy produces the following specialized program for length app, which traverses the arguments only once:
double app, see
lengthapp2_1([],[]) -> 0 lengthapp2_1([],[A|B]) -> s(length1_1(B)) lengthapp2_1([A|B],C) -> s(length2_1(B,C)) length1_1([]) -> 0 length1_1([A|B]) -> s(length1_1(B)) length2_1([],[]) -> 0 length2_1([],[A|B]) -> s(length1_1(B)) length2_1([A|B],C) -> s(length2_1(B,C))
and which runs 1,71 times faster than the original program. The benchmarks ackermann, fibonacci, max length, and sumprod are classical programs where tupling [9] can produce a signi cant improvement. The tupling transformation eliminates parallel traversals of identical data structures by merging loops together into a new recursive function de ned on tuples (pairs), which traverses data structures only once. The transformation can also eliminate some unnecessary recursive calls. Table 4 shows that Indy is not able to perform all tupling automatically. The partial evaluation of bonacci's function actually gives back the original program. Only max length has been speeded-up. Further investigation is needed to study how this kind of optimization can be achieved. Generally, tupling is very complicated and automatic tupling algorithms either result in high runtime cost (which prevents them from being employed in a real system), or they succeed only for a restricted class of programs. It would be interesting to investigate the approach of Leuschel et al. [23] in our setting. They have recently shown that 8
conjunctive partial deduction is able to achieve most of the tupling of the fold/unfold approach, with lower complexity and an easier understanding of control issues. Benchmarks reverse and rev acc type are dicult due to the presence of an accumulating parameter and a checklist (in the case of rev acc type). This makes it dicult to produce a more ecient program without jeopardizing termination, and most PE systems either do not terminate or fail to achieve any specialization. For these benchmarks, Indy obtains a speedup factor of about 1,08 and a slowdown of 0,66, respectively. This is mainly due to the fact that some intermediate calls which are not covered by previous de nitions appear. Namely, we obtain a sequence of calls of the form: rev(L,[]), rev(L',[X']), rev(L'',[X'',X']), etc. To avoid nontermination, the partial evaluator is forced to generalize, thus losing all possible specialization. A similar situation has been given with palindrome, since this program makes use of the reverse function. More sophisticated techniques are required for satisfactory specialization. Our last example is a kind of standard test in PE and similar techniques: the specialization of a semi-nave pattern matcher for a xed pattern into an ecient algorithm (sometimes called \the KMP-test" [28]). This example is particularly interesting because it provides a kind of optimization that neither (conventional) PE nor deforestation can achieve. The specialization of the program kmp to the pattern [a,a,b] is the almost optimal program: match1_1([]) -> false match1_1([a]) -> false match1_1([b|A]) -> loop1_1(A) match1_1([a,a]) -> false match1_1([a,b|A]) -> loop1_1(A) match1_1([a,a,a|A]) -> if1_1(A) match1_1([a,a,b|A]) -> true loop1_1([]) -> false loop1_1([a]) -> false loop1_1([b|A]) -> loop1_1(A) loop1_1([a,a]) -> false loop1_1([a,b|A]) -> loop1_1(A) loop1_1([a,a,a|A]) -> if1_1(A) loop1_1([a,a,b|A]) -> true if1_1([]) -> false if1_1([b|A]) -> true if1_1([a|A]) -> if1_1(A)
which never backs up on the subject string when a mismatch is detected: whenever three consecutive a's are found, the algorithm looks for one b, instead of attempting to match the whole pattern [a,a,b] from scratch. This is essentially a KMP-style pattern matcher which runs 66 times faster than the original program for the considered input string. However, the cost of uni cation with the lhs's of the rules is not irrelevant, and performance could be still improved. We have obtained the best specialization by using the unfolding rule which expands derivations while outermost function symbols are prede ned (setting ostrans): Indy gave the desired optimal KMP specialized matcher: match1_1(A) -> loop1_3([]) -> loop1_3([a|A]) loop1_3([b|A]) loop1_2([]) -> loop1_2([a|A]) loop1_2([b|A]) loop1_1([]) -> loop1_1([a|A]) loop1_1([b|A])
loop1_3(A) false -> loop1_2(A) -> loop1_3(A) false -> loop1_1(A) -> loop1_3(A) false -> loop1_1(A) -> true
which is really a KMP-style pattern matcher for the pattern [a,a,b]. We note that the success of the specialization highly depends upon the use of bit-strings, i.e. strings containing only a's and 9
b's.
Propagation of negative information could be helpful to achieve a similar eect for a general alphabet [28]. The code for all the specialized benchmarks can be obtained by running the system Indy, whose distribution includes all the example programs showed here. Our experiments show that the partial evaluator combines in a useful and eective way the propagation of partial data structures (by means of logical variables and uni cation) with better opportunities for optimization (thanks to the functional dimension). In general, the inclusion of deterministic simpli cation steps (normalization) has improved both the overall specialization and the eciency of the method. We have also observed that normalization is essential for succeeding (with the call-by-value evaluation strategy) in benchmarks such as double app, double flip, and length app where deforestation (or elimination of multiple traversals of the same data structure) is the main optimization to be done. Indy combines some good features of deforestation, PE, and PD, similarly to Turchin's supercompilation: the system passes the so-called \KMP-test" (which neither standard PE nor deforestation pass), and it is is also able to do deforestation (which cannot be done by means of standard PE or PD [16]) without any ad-hoc arti ce. 5
Conclusions and Further Research
Few attempts have been made to study the relationship between PE techniques used in logic and functional languages (see e.g. [6, 15, 27]). On the contrary, there is a parallel development of program transformation ideas within the functional programming and the logic programming communities with too little discussion between them. This separation has the negative consequence of duplicated work since developments are not shared and many similarities are overlooked. As pointed out by Srensen et al. [28], a direct comparison of methods is often blurred due to different language paradigms and dierent perspectives. We think that the uni ed (functional-logic) treatment of the problem lays the ground for comparisons and brings the dierent methodologies closer, which could generate new insights for further developments in both elds. In [4] we have presented an automatic, on-line PE algorithm for functional logic programs, whose behavior does not depend on the eager or lazy nature of the narrower and have shown that it guarantees closedness of the residual program as well as termination of the transformation. Useful instances of this generic algorithm can be easily de ned by considering a xed narrowing strategy. In this work, we have considered the case of normalizing innermost narrowing, since it has been shown that this strategy is a reasonable improvement over pure logic SLD resolution [13, 17]. We think that our results can give support and induce new research in hybrid approaches to specialization. To provide empirical evidence of the practicality of our approach, we have presented some experimental results using the system Indy. The results from this preliminary implementation are quite reasonable and show that eciency as well as specialization can be achieved for several interesting program transformation problems. We are currently working on the extension of the framework for specializing complex expressions containing conjunctions [1]. We expect to further improve performance in program specialization by introducing be tting partitioning techniques similar to those in [14, 23], which in our context can bene t from the fact that the language treats terms of arbitrary complexity naturally. We also mention the investigation of the application of our framework to optimal versions of lazy narrowing strategies, such as needed narrowing [7] (and its extension to a higher order framework), which has been proposed as the basic operational principle of Curry [18], a language which is intended to become a standard in the functional logic programming community. References
[1] E. Albert, M. Alpuente, M. Falaschi, P. Julian, and G. Vidal. Improving Control in Functional Logic Program Specialization. Technical Report DSIC-II/2/97, UPV, 1998. Available from 10
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]
URL: http://www.dsic.upv.es/users/elp/papers.html. E. Albert, M. Alpuente, M. Falaschi, and G. Vidal. Indy User's Manual. Technical Report DSIC-II/12/98, UPV, 1998. Available from URL: http://www.dsic.upv.es/users/elp/papers.html. M. Alpuente, M. Falaschi, P. Julian, and G. Vidal. Specialization of Lazy Functional Logic Programs. In Proc. of the ACM SIGPLAN Conf. on Partial Evaluation and Semantics-Based Program Manipulation, pages 151{162. ACM, New York, 1997. M. Alpuente, M. Falaschi, and G. Vidal. Narrowing-driven Partial Evaluation of Functional Logic Programs. In H. Riis Nielson, editor, Proc. of the 6th European Symp. on Programming, ESOP'96, pages 45{61. Springer LNCS 1058, 1996. M. Alpuente, M. Falaschi, and G. Vidal. Partial Evaluation of Functional Logic Programs. Technical Report DSIC-II/11/98, UPV, 1998. M. Alpuente, M. Falaschi, and G. Vidal. A Unifying View of Functional and Logic Program Specialization. ACM Computing Surveys, 1998. To appear. S. Antoy, R. Echahed, and M. Hanus. A Needed Narrowing Strategy. In Proc. 21st ACM Symp. on Principles of Programming Languages, Portland, pages 268{279, 1994. R.M. Burstall and J. Darlington. A Transformation System for Developing Recursive Programs. Journal of the ACM, 24(1):44{67, 1977. W. Chin. Towards an Automated Tupling Strategy. In Proc. of Partial Evaluation and Semantics-Based Program Manipulation, Copenhagen, Denmark, June 1993, pages 119{132. ACM, New York, 1993. C. Consel and O. Danvy. Tutorial notes on Partial Evaluation. In Proc. of 20th Annual ACM Symp. on Principles of Programming Languages, pages 493{501. ACM, New York, 1993. J. Darlington. Program transformation. In J. Darlington, P. Henderson, and D. A. Turner, editors, Functional Programming and its Applications, pages 193{215. Cambridge University Press, 1982. N. Dershowitz and J.-P. Jouannaud. Rewrite Systems. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B: Formal Models and Semantics, pages 243{320. Elsevier, Amsterdam, 1990. L. Fribourg. SLOG: a logic programming language interpreter based on clausal superposition and rewriting. In Proc. of Second IEEE Int'l Symp. on Logic Programming, pages 172{185. IEEE, New York, 1985. R. Gluck, J. Jrgensen, B. Martens, and M.H. Srensen. Controlling Conjunctive Partial Deduction of De nite Logic Programs. In Proc. Int'l Symp. on Programming Languages: Implementations, Logics and Programs, PLILP'96, pages 152{166. Springer LNCS 1140, 1996. R. Gluck and M.H. Srensen. Partial Deduction and Driving are Equivalent. In Proc. Int'l Symp. on Programming Language Implementation and Logic Programming, PLILP'94, pages 165{181. Springer LNCS 844, 1994. R. Gluck and M.H. Srensen. A Roadmap to Metacomputation by Supercompilation. In O. Danvy, R. Gluck, and P. Thiemann, editors, Partial Evaluation, Int'l Seminar, Dagstuhl Castle, Germany, pages 137{160. Springer LNCS 1110, February 1996. M. Hanus. The Integration of Functions into Logic Programming: From Theory to Practice. Journal of Logic Programming, 19&20:583{628, 1994. 11
[18] M. Hanus, H. Kuchen, and J.J. Moreno-Navarro. Curry: A Truly Functional Logic Language. In Proc. ILPS'95 Workshop on Visions for the Future of Logic Programming, pages 95{107, 1995. [19] N.D. Jones, C.K. Gomard, and P. Sestoft. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Englewood Clis, NJ, 1993. [20] J. Lam and A. Kusalik. A Comparative Analysis of Partial Deductors for Pure Prolog. Technical report, Department of Computational Science, University of Saskatchewan, Canada, May 1991. Revised April 1991. [21] J.-L. Lassez, M. J. Maher, and K. Marriott. Uni cation Revisited. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 587{625. Morgan Kaufmann, Los Altos, Ca., 1988. [22] M. Leuschel. The ecce partial deduction system and the dppd library of benchmarks. Technical report, Accessible via http://www.cs.kuleuven.ac.be/~lpai, 1998. [23] M. Leuschel, D. De Schreye, and A. de Waal. A Conceptual Embedding of Folding into Partial Deduction: Towards a Maximal Integration. In M. Maher, editor, Proc. of the Joint International Conference and Symposium on Logic Programming JICSLP'96, pages 319{332. The MIT Press, Cambridge, MA, 1996. [24] J.W. Lloyd and J.C. Shepherdson. Partial Evaluation in Logic Programming. Journal of Logic Programming, 11:217{242, 1991. [25] B. Martens and J. Gallagher. Ensuring Global Termination of Partial Deduction while Allowing Flexible Polyvariance. In L. Sterling, editor, Proc. of ICLP'95, pages 597{611. MIT Press, 1995. [26] J.J. Moreno-Navarro and M. Rodrguez-Artalejo. Logic Programming with Functions and Predicates: The language Babel. Journal of Logic Programming, 12(3):191{224, 1992. [27] A. Pettorossi and M. Proietti. A Comparative Revisitation of Some Program Transformation Techniques. In O. Danvy, R. Gluck, and P. Thiemann, editors, Partial Evaluation, Int'l Seminar, Dagstuhl Castle, Germany, pages 355{385. Springer LNCS 1110, 1996. [28] M.H. Srensen, R. Gluck, and N.D. Jones. A Positive Supercompiler. Journal of Functional Programming, 6(6):811{838, 1996. [29] V.F. Turchin. Program Transformation by Supercompilation. In H. Ganzinger and N.D. Jones, editors, Programs as Data Objects, 1985, pages 257{281. Springer LNCS 217, 1986. [30] P.L. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73:231{248, 1990.
12