to determine the applicability of di erent program transformations. Two important components ... The main innovation in the compiler is the in- corporation of powerful ...... modi ed version of CLP(R) v1.1 on a Sparc 1000. The speedup ratios are ...
An Optimizing Compiler for CLP(R) Andrew D. Kelly1 , Andrew Macdonald2, Kim Marriott1, Harald Sndergaard2, Peter J. Stuckey2 , Roland H.C. Yap1 2 ;
2
1 Dept. of Computer Science, Monash University, Clayton 3168, Australia. Dept. of Computer Science, The University of Melbourne, Parkville 3052, Australia.
Abstract. The considerable expressive power and exibility gained by
combining constraint programming with logic programming is not without cost. Implementations of constraint logic programming (CLP) languages must include expensive constraint solving algorithms tailored to speci c domains, such as trees, Booleans, or real numbers. The performance of many current CLP compilers and interpreters does not encourage the widespread use of CLP. We outline an optimizing compiler for CLP(R), a CLP language which extends Prolog by allowing linear arithmetic constraints. The compiler uses sophisticated global analyses to determine the applicability of dierent program transformations. Two important components of the compiler, the analyzer and the optimizer, work in continual interaction in order to apply semantics-preserving transformations to the source program. A large suite of transformations are planned. Currently the compiler applies three powerful transformations, namely \solver bypass", \dead variable elimination" and \nofail constraint detection". We explain these optimizations and their place in the overall compiler design and show how they lead to performance improvements on a set of benchmark programs.
1 Introduction CLP(R) is an extension of Prolog incorporating arithmetic constraints over the real numbers, with applications in a variety of areas, for example nancial planning, options trading, electrical circuit analysis, synthesis and diagnosis, civil engineering, and genetics. As with many other constraint logic programming languages, CLP(R) has so far mainly been considered a research system, useful for rapid prototyping, but not really competitive with more conventional programming languages when performance is crucial. In general, performance is probably the main current obstacle to the widespread use of CLP. This situation is not surprising, as current CLP systems are oshoots of rst-generation research systems. In this paper we present an optimizing compiler for CLP(R). We are working towards a second generation implementation which overcomes the eciency problems of current technology. The main innovation in the compiler is the incorporation of powerful program transformations and associated sophisticated global analysis which determines information about various kinds of interaction among constraints. Our earlier studies [10, 8, 15, 12, 14] have indicated
that a suite of transformation techniques can lead to an order of magnitude improvement in execution time or space for particular classes of programs. Our implementation veri es this. Our compiler also continues a line of experimental Prolog compilers which have made use of global program analysis to great advantage, see Taylor [18] and van Roy [19]. However, we achieve even larger performance improvements because linear arithmetic constraint solving is signi cantly more expensive than uni cation, and the scope for improvement in the handling of constraints is correspondingly greater. The most powerful analyses and program transformations employed by our CLP(R) compiler have no parallel in other programming language paradigms but are speci c to linear real number constraint solving. We assume the reader is familiar with constraint logic programming in general and with constraint logic programming over real arithmetic constraints in particular. A good introduction to CLP is to be found in [7] and a detailed introduction to CLP(R) can be found in [9]. In Section 2 we discuss the existing compiler and abstract machine CLAM. Section 3 presents three program transformations, \solver bypass," \dead variable elimination," and \nofail constraint detection." Section 4 covers the structure of the highly optimizing compiler and its three major components: the optimizer, the analyzer, and the code generator. In Section 5 we present our empirical results on a set of benchmarks. Section 6 contains a summary.
2 Compilation of CLP(R) Execution of CLP(R) programs involves repeatedly adding a constraint to the constraint store and checking that the store remains satis able. Thus a successful execution path involves a growing number of constraints in the store, and the key issues in implementation are incremental constraint solving and dealing with a growing constraint store. While the constraint solvers in CLP(R) are specialized for incremental solving, adding a single new constraint may in the worst case they require processing virtually the entire constraint store. The number of constraints in the constraint store can therefore have a major impact on run-time speed. The rst CLP(R) implementation was based on an interpreter which consisted of a Prolog-like rewriting engine and a set of constraint solvers: a uni cation solver, a linear equation solver, a linear inequality solver, and a non-linear solver, together with an interface which translates constraints into a canonical form suitable for the constraint solvers. The constraint solvers are organized in a hierarchy: uni cation solver, direct evaluation/testing, linear equation solver and linear inequality solver, where the later solvers are more expensive to invoke than earlier ones. The interface sends constraints to the earliest solver in the hierarchy that can deal with the constraint. Thus a more expensive solver is only invoked when the previous solver in the hierarchy is not applicable. For example, when solving a linear equation, the linear equation solver is often sucient and only when all the variables in the equation are involved in inequalities is it necessary to also use the inequality solver.
The current implementation of CLP(R) is a compiler. It translates into code for an abstract machine, the CLAM. This machine is an extension of the Prolog WAM architecture (see for example At-Kaci [1]) to deal with arithmetic constraints. Because the constraint solvers deal principally with linear constraints, the main arithmetic instructions in the CLAM construct the data structures representing linear arithmetic forms. These data structures are in a form which can be used directly by the constraint solvers. The constraint solving hierarchy used in the interpreter is retained but is more eective since some run-time decisions in the interpreter can now be shifted to compile-time. To give a avor of the CLAM (see [8] for details), let us describe the compilation of the constraint 5 + X = Y . Assume that X is a new variable and the equation store contains Y = Z + 3:14. The following CLAM code could be generated: initpf 5 addpf var 1,X addpf val -1,Y solve eq0
lf :5 lf :5 + X lf :1:86 + X ? Z solve :1:86 + X ? Z = 0
On the left are the CLAM instructions while the right shows the eect on the constructed \linear form" (lf ). The original constraint is rewritten into a linear canonical form, 5 + X ? Y = 0 to compile. The CLAM code executes as follows: First a new linear form is initialized to 5, and X , being a new variable, is added directly. Then Y is added which entails adding its linear form, Z + 3:14. After the rst three instructions have constructed a linear form, the last solve eq0 instruction is used to represent the equation lf = 0. In general the solve eq0 may reduce to an assignment, a test or a call to the equation solver. CLAM instructions operate below the level of a single constraint and can be optimized and combined in various ways. The highly optimizing compiler extends the earlier work on a core CLAM instruction set which is sucient to execute a CLP(R) program together with some peephole optimizations and some rule level optimizations. For example, when we know that a constraint is always satis able, it may be possible to decide at compile-time how the constraint is to be represented in the constraint store at run-time and simply add that data structure.
3 Three Optimizations In this section we present three powerful optimizations for CLP(R) programs using a worked example. Though we only informally justify the correctness of each of the optimization methods, global analysis methods (see Section 4) can be used to determine the applicability of each method. Consider the following CLP(R) program de ning the relation mortgage where mortgage(p; t; r; mp; b) holds when p; t; r; mp; b respectively represent the principal, number of time periods, interest rate, payment and nal balance of a loan.
(MG) mortgage(P, T, R, MP, B) :T = 0, P = B.
mortgage(P, T, R, MP, B) :T >= 1, NT = T - 1, I = P * R, NP = P + I - MP, mortgage(NP, NT, R, MP, B).
Solver Bypass. Often, by the time a constraint is encountered, it is a simple Boolean test or assignment. In that case a call to the solver can be replaced by the appropriate test or assignment. This both decreases the size of the constraint store and removes expensive calls to the solver. The optimization requires determining when variables are constrained to a unique value and when they are new to the solver. Consider the execution of the goal mortgage(p; t; r; mp; B ) where p; t; r; mp are all xed values. By replacing constraints with tests and assignments we can in fact remove all access to the constraint solver. This results in the following program:
(BYP) mortgage(P, T, R, MP, B) :(T = 0), (B = P).
test assign
mortgage(P, T, R, MP, B) :(T >= 1), (NT = T - 1), (I = P * R), (NP = P + I - MP), mortgage(NP, NT, R, MP, B).
test assign assign assign
Here the test wrapper causes the generation of code to evaluate the constraint, given values for all the variables and to check whether the appropriate relationship holds, while assign causes evaluation of the right-hand side as well as its assignment to the variable on the left. This may require equations to be re-arranged; here P = B became B = P . Removal of Dead Variables. A common source of redundancy in the constraint solver are variables which will never be referred to again, so-called dead variables. Execution can be improved by adding instructions that remove variables from the constraints currently in the store. This optimization requires determining which variables are still alive, that is, can be accessed later in the computation. One feature of the optimization is that usually it will not apply for all calling patterns of a predicate and thus the predicate de nition has to be split, to utilize the bene ts of variable removal. Consider the original program (MG) and the variable P in the second clause. After the constraint NP = P + I ? MP , the variable is never again referred to within the clause. If it is not required after the call, it can be removed at this point. Examining the recursive call mg(NP; NT; R; MP; B ), we note that the variable NP is never referred to again. Similar arguments apply to the variable NT . Hence we can optimize the program by giving two versions of the clauses| one for when the variables P and T may be referred to later (mortgage), and one
when they are not (mortgage 1). For the second set of clauses we can remove the variables P and T from the solver after their last occurrence. This is indicated by the wrappers of the form dead(var). This reduces the number of variables and constraints in the solver.
(REV) mortgage(P, T, R, MP, B) :T = 0, P = B. mortgage(P, T, R, MP, B) :T >= 1, NT = T - 1, I = P * R, NP = P + I - MP, mortgage 1(NP, NT, R, MP, B).
mortgage 1(P, T, R, MP, B) :( )(T = 0), ( )(P = B). mortgage 1(P, T, R, MP, B) :T >= 1, ( )(NT = T - 1), I = P * R, ( )(NP = P + I - MP), mortgage 1(NP, NT, R, MP, B).
dead T dead P dead T
dead P
Nofail Constraint Detection. Sometimes, when a new constraint is encountered, it can be guaranteed not to fail because of the presence of new variables. The solver can use this to solve the constraint quickly, but there is still an overhead in detecting the possibility and manipulating the constraint into the required form. If the information about new variables is collected at compile-time, we can produce specialized CLAM instructions that reduce the overhead. Consider the example equation from Section 2, 5 + X = Y . If X is new, the equation can never cause failure and we can produce more ecient code. For the mortgage program there are several cases of \nofail" behaviour. Adding nofail wrappers for these cases yields the following program, optimized for any possible goal:
(NOF) mortgage(P, T, R, MP, B) :T = 0, P = B.
mortgage(P, T, R, MP, B) :T >= 1, nofail(NT = T - 1), nofail(I = P * R), nofail(NP = P + I - MP), mortgage(NP, NT, R, MP, B).
Combining the Optimizations. The three optimizations may interact. Sometimes two optimizations may be available on a single constraint, so an overall strategy is needed for their application. As an example, when a variable has a xed value then its removal from the constraint solver is less important since there is no overhead in further constraint solving. Nofail optimizations can be replaced by assignments when the appropriate variables are xed. Similarly the splitting caused by dead variable removal can often allow more solver bypass optimizations.
Program
Optimizer
Analyzer
Code generator
Optimizing Compiler
CLAM code
extended CLAM
core CLAM
CLAM emulator
Fig. 1. The optimizing compiler
4 The Compiler In the previous section we indicated how the key to faster execution of constraint logic languages lies with sophisticated compile-time optimizations. We now sketch the implementation of the optimizing CLP(R) compiler which performs these optimizations. The system currently consists of about 45,000 lines of C++ code. The optimizing compiler has four distinct components : a parser which performs normalization and syntax analysis, an optimizer which takes a program as input and performs high-level optimizations, an analyzer which is used by the optimizer to determine applicability of the various optimizations, and a code generator which takes the output of the optimizer and translates it into CLAM code. One reason for this architecture is that it allows us to leverage from existing technologies and software. In particular, the CLAM emulator is an extension of that used in the rst-generation CLP(R) compiler. We now discuss the three main components of the compiler (see also Figure 1).
The Analyzer Analysis is formalized in terms of abstract interpretation [4]. Consider the idea of a constraint logic program interpreter which answers goals by returning not only a set of answer constraints, but also a thoroughly annotated version of the program: For each program point it lists the current constraint store for each time that point was reached during evaluation of the given query. Since control
may return to a program point many times during evaluation, each annotation is naturally a (possibly in nite) set of constraints. Properly formalized, this idea leads to the notion of a collecting semantics which gives very precise data ow information, but which is not nitely computable in general. However, if we replace the possibly in nite sets of constraints by conservative \approximations" or \descriptions" then we may obtain a data ow analysis which works in nite time. This is the idea behind abstract interpretation of logic programs and constraint logic programs [5, 13]. As an example consider the mortgage program from Section 3. If this program is analyzed for the class of calls in which the rst three arguments are bound to a number then the following annotated program results. The constraint description fX; Y; Z; : : :g is read as: at this point the constraint store constrains the variables X , Y , Z , : : : to take a unique value, that is, they are bound to a number. The two columns to the right give the set of variables which are ground after the corresponding statement in the program. The dierent columns occur because of dierent calling patterns to mortgage. The rst annotation column results from the initial call. First Call Other Calls mortgage(P, T, R, MP, B) :T = 0, P = B. mortgage(P, T, R, MP, B) :T >= 1, NT = T - 1, I = P * R, NP = P + I - MP, mortgage(NP, NT, R, MP, B).
fP; T; Rg fT; Rg fP; T; Rg fT; Rg fP; T; R; B g fT; Rg fP; T; Rg fT; Rg fP; T; Rg fT; Rg fP; T; R; NT g fT; R; NT g fP; T; R; NT; I g fT; R; NT g fP; T; R; NT; I g fT; R; NT g fP; T; R; NT; I g fT; R; NT g
For example, initially, when the second clause is entered, P , T , and R are ground. The statement T >= 1 does not change this. After the statement NT = T - 1, as T is ground, NT becomes ground. Similarly, after I = P * R, I becomes ground. Nothing is changed by executing NP = P + I - MP. To analyze the eect of mortgage(NP,NT,R,MP,B) we need to know its eect for the new calling pattern in which only the second and third arguments are ground. This is detailed in the second column. This second calling pattern leads to a call with the same calling pattern, so the analysis stops. The analyzer implicitly returns this information as a calling pattern graph. The analyzer used in the compiler actually uses a more powerful description domain for determining groundness but we have used this simple domain to clarify the exposition. To facilitate the rich variety of analyses required in the compiler, the analyzer is a generic tool similar to other analysis engines, such as PLAI [16] and GAIA [11] developed for Prolog. The core of the analyzer is an algorithm for ecient xpoint computation. Eciency is obtained by keeping track of which parts of a program must be reexamined when a success pattern is updated. The role of the analyzer is to provide information which determines applicability of optimizations such as solver bypass, dead variable removal and nofail
constraint removal. The main interface to the optimizer is by way of ve functions which provide information for a given program point in some clause either for a single calling pattern to that clause, or for all such calling patterns. The functions respectively return: the list of ground variables (those constrained to take a unique value); the list of variables which are de nitely free in the sense that they are only constrained to be equal to another variable; the list of variables which are nofail for a particular constraint in the sense that if this constraint is added to the solver then it cannot cause failure (they are free and not aliased to any other variable in the following constraint); the list of variables which are dead in the sense that they will not be referenced directly or indirectly in the future after the next constraint; the list of variables that are possibly Herbrand, and therefore, not candidates for arithmetic constraint optimisations. Groundness and freeness information is for determining applicability of the solver bypass optimization while nofail and dead variable information is used for nofail detection and dead variable removal respectively. Another function in the interface of the analyzer is to split rules. For example after the preceding analysis, the optimizer might split mortgage into two dierent versions, one for each calling pattern. An important feature of the analyzer is that it is incremental in the sense that when a rule is split, the program is not reanalyzed from scratch, but rather analysis information for the original rule is used to give information for the new rule. This is unlike most generic Prolog analyzers which are non-incremental. Details of the description domains used by the analyzer to provide information about freeness, groundness, nofailness and deadness are deliberately kept insulated from the optimizer, so as to make it easier to change these. Currently the analyzer works on descriptions which are tuples of 6 dierent domains. The rst description domain, Pos, consists of Boolean functions which capture groundness information about variables and de nite dependencies among variables, see for example [2]. For example the function X ^ (Y ! Z ) indicates that the variable X is constrained to take a unique value, and that if Y is ever constrained to a unique value, then so is Z . Our implementation uses ROBDDs to represent the Boolean functions. The second description domain, Free, captures information about \freeness". It is based on the descriptions used by Debray [6] for the optimization of Prolog uni cation. Each description consists of a list of de nitely free variables and an equivalence relation which captures possible \aliasing" between free variables. For example, the description after processing the constraints X = Y ^ Z = W is that X , Y Z and W are free and that X and Y are possibly aliases and Z and W are possibly aliases. If Z = f (Y ) is now encountered, the description is that X and Y are free and that they are possibly aliases. The implementation uses a Boolean matrix to represent the alias relation. The third description domain, CallAlive, consists of lists of variables which are possibly still \textually alive". For example, in the goal X = 1; p(X; Y ); q(Y ), both X and Y are initially textually alive, for the call p(X; Y ) only Y is alive. For the call q(Y ) no variables are textually alive.
Unfortunately, not being textually alive may not mean a variable is dead, for two reasons. The rst reason is \structure sharing" between terms. If a textually non-alive variable shares with a term that contains textually alive variables, then it may not be dead. For example, consider the variable X in the goal Y = f (X ); p(X ); Y = f (Z ); q(Z ). Here the variable X may be accessed in the call to q(Z ) by means of Z . The second reason is that \hard" constraints such as non-linear equations are delayed by the constraint solver until they become simple enough to solve. This means that variables in hard constraints may be alive until the time we can guarantee the constraint is simple enough to be solved in the linear constraint solver. For example, consider X in goal X = Y Z; Y = 1. Even though X does not appear after X = Y Z it will be accessed in the linear constraint solver after processing Y = 1 as when Y is grounded, essentially the constraint X = 1 Z is added to the linear solver. For these reasons the analyzer has two more description domains which provide information about dead variables. The fourth description domain, Shar, captures information about possible sharing of variables between Prolog terms. It is based on descriptions introduced by Sndergaard [17] for eliminating occur checks in Prolog. The description consists of a possible sharing relation for variables. Consider again the goal Y = f (X ); p(X ); Y = f (Z ); q(Z ). After Y = f (X ), X and Y possibly share, but Z does not share with anything else. After Y = f (Z ), all variables possibly share. Note that arithmetic constraints do not cause sharing to take place. A special variable ? represents \hidden" alive variables. Thus if a variable shares with ? it cannot be dead. The fth description domain, NonLin, consists of lists of variables which are possibly contained in a delayed non-linear. When a non-linear is rst encountered, groundness information is used to check if it is de nitely linear. If it is not, then the analyzer assumes it is non-linear and adds its variables to the NonLin list. For instance, in mortgage, when the statement I = P * R is reached with R ground, the analyzer will determine that I = P * R is linear, so no variables will be added. If none of I , P or R were ground then all three variables would be added to the NonLin list. A variable in the NonLin list is never dead3 . The sixth description domain, Type, captures information about whether a variable is de nitely arithmetic or possibly Herbrand. The implementation maintains two lists of variables. More accurate type information can be obtained by keeping track of aliasing between variables for cases like X = Y when no type information is known about X or Y . It is expected that this can be done eciently by integrating type analysis with freeness analysis. As a real example of the analyzer's use, consider the analysis of the mortgage program for queries of the form
, T = , R = , MP = 1, h ^ ^ f gf NT = T - 1, h ^ ^ ^ f gf I = P * R, h ^ ^ ^ ^ f gf NP = P + I - MP, h ^ ^ ^ ^ ^ $ f gf mortgage(NP,NT,R,MP,B). h ^ ^ ^ ! ;f gf mortgage(P,T,R,MP,B) :T = 0, P = B.
P
T
R;
B ;
P; T ; R; M P; B ;
T
R;
B ;
P; T ; R; M P; B ;
R
B;
;
P; T ; R; M P; B ;
(B; P )
P
T
R;
B; N T ; N P; I ;
P; T ; R; M P; N T ; N P; B; I ;
T
R;
B; N T ; N P; I ;
P; T ; R; M P; N T ; N P; B; I ;
T
P
T
R
NT
P
T
P
P
P
P
T
g ;i g ;i gi g ;i g ;i g ;i g ;i g ;i ? gi
P
T
R
R
NT;
NT
I
R
(M P
(M P
B; N P; I ;
I;
B; N P N P );
B );
;
;
P; T ; R; M P; N T ; N P; B; I ;
P; T ; R; M P; N T ; N P; B; I ;
B ;
P; T ; R; M P; N T ; N P; B ;
P; T ; R; M P; B ;
(B; P ); (B;
)
At the end of a clause, local variables are uninteresting, so we have restricted the last annotation accordingly. From the annotation immediately before the last, we see that the calling pattern CP0 generates a new calling pattern CP1 : hT ^ R ^ (MP $ P ); fB g; fR; MP; B g; ;; ;i: In turn, CP1 generates the calling pattern CP2 : hT ^ R ^ (MP ! P ); fB g; fR; MP; B g; ;; ;i which generates itself. We will not go through details of the analysis for these subsequent calling patterns. After the analysis terminates it stores information about the relationship between calling patterns for the head of a clause and the calling patterns for each body atom in the clause. This de nes a calling pattern graph similar to the predicate call graph of the program, but containing calling pattern information. The optimizer will use this graph to order the processing of predicates and calling patterns. In particular the optimizer will examine entire strongly connected components (SCCs) of this graph one by one. The call graph and calling pattern graph for the example are shown in Figure 2.
The Optimizer The optimizer examines the clauses of the program and where possible performs optimizations upon them. It nds where this is possible by querying the analyzer about which variables are free, ground, dead, and nofail at the various program points for the calling patterns of interest. The optimizations are performed by re-writing the clauses to include specialized rather than generic primitives.
Program Call Graph
Calling Pattern Graph
h ^ ^ f gf
mortgage: mortgage
P
T
R;
B ;
g ; ;i
P; T ; R; M P; B ;
;
h ^ ^
(M P
$ f gf
g ; ;i
h ^ ^
(M P
! f gf
g ; ;i
mortgage:
T
mortgage:
T
R
R
P );
P );
B ;
B ;
R; M P; B ;
R; M P; B ;
;
;
:
:
Fig. 2. Call graph and calling pattern graph for mortgage The modi ed program is written in an intermediate language called CLIC, for Constraint Logic Intermediate Code. This language is a superset of CLP(R) which allows for non-logical commands, including the wrappers already introduced. Thus, the compiler is in fact a CLIC source-to-source transformer with a CLAM-generating back-end. CLIC can be considered a hybrid imperative-logic programming language which could be useful (if dangerous) for writing ecient constraint programs. The optimizer examines and optimizes each strongly connected component (SCC) in the program call graph in turn. SCCs are examined from the bottom up, so that predicates at the bottom of the call graph are optimized rst. This decision implements the heuristic that optimizations at a lower level are likely to be more important than at a higher level since low level predicates are (in general) executed more frequently. In general, optimizations made for one predicate may prevent optimizations in other predicates, but for the optimizations currently implemented in the compiler this is not the case since none of the optimizations can change calling patterns. When optimizing the clauses in a single SCC, the optimizer rst performs optimizations which are valid for all calling patterns of each predicate. It scans each clause, examining each constraint for possible optimizations. Also if an atom (de ned in a lower SCC) is present in the clause which is always called using a calling pattern for which the optimizer has created a specialized version, the atom is replaced by a call to the specialized version. Where competing optimizations are available they are currently handled as follows: assign and test are preferred to nofail and dead. Thus for example a dead wrapper will not be added in cases where the dead variable is always ground. This decision is based on the eects of the optimizations in the run-time system. Removing a xed dead variable will not increase the speed of future constraint solving, but will cause extra overhead if backtracking occurs, and using assignment is preferable to nofail since assignment does not involve the solver at all. We now detail how mortgage is optimized for queries of the form
P =
,
T =
,
R =
,
MP = 1 can be made a test. In general, any constraint in which all variables are ground, can be made into a test. Next the constraint NT = T - 1 is considered. It can be made into an assignment because the analyzer gives the information that NT is free in all calling patterns and T is ground in all calling patterns. In general, for an equation to be made an assignment it must have one free variable that can be put on the left-hand side, and all other variables ground. The optimizer can re-arrange linear equalities, so that the optimization is applicable when any one variable is free. For multiplication the variable on the left must be free to start with, as they have a standard form Y = X Z . The analyzer gives the information that NP is nofail for the constraint NP = P + I - R. This is because NP is free and shares with no other variables in the constraint. Similarly the constraint B = P is found to be a nofail4 . The resulting program is
(OPT-ALL) mortgage(P, T, R, MP, B) :(T = 0), nofail(B = P).
test
mortgage(P, T, R, MP, B) :(T >= 1), (NT = T - 1), I = P * R, nofail(NP = P + I - MP), mortgage(NP, NT, R, MP, B).
test assign
Next the calling pattern graph is examined. If there are multiple calling patterns for any predicates in the SCC of the call graph under consideration, the optimizer examines each SCC of the calling pattern graph in turn, again bottom up. If, for the particular calling pattern, new optimizations are available, either because a new constraint optimization is possible, or an atom may be replaced with a more speci c version, the optimizer creates a new version of the predicate for this speci c calling pattern. This operation will be performed whenever any new optimization is made possible by splitting. This means optimization of one constraint can cause many splits, as the divergence in calling patterns that allow the optimization may be some way up the call graph. Once the optimizer has determined that a predicate must be split, it constructs a new copy of the code | containing the optimizations already made for all calling patterns | and optimizes this code further. Analysis information for the new code is extracted from the original. In general some reanalysis may be required, but for the optimizations dealt with by the current compiler, no new analysis is required. New split predicates may allow new optimizations of other predicates, so for maximum improvement we should continue the process 4
Note that the equation I = P * R could be made a nofail constraint except the run-time system does not currently support nofail non-linear constraints.
of splitting and optimizing until a xpoint is reached. However, for simplicity, the optimizer presently examine each calling pattern once only. In our example, the initial call has given the calling pattern SCC detailed in the last subsection. Examining the SCC for the lowest calling pattern and the rst clause, the analyzer will give the information that for this calling pattern, P is dead. This means the optimizer can modify the constraint nofail(B = P) to dead(P)(nofail(B=P)). Hence it will split a new copy of the code for this calling pattern. After optimization, the optimizer gives
(OPT-CP2)
mortgage$CP2(P, T, R, MP, B) :(T >= 1), (NT = T - 1), mortgage$CP2(P, T, R, MP, B) :I = P * R, (T = 0), ( )(nofail(NP = P + I - MP)), ( )(nofail(B = P)). mortgage$CP2(NP, NT, R, MP, B).
test dead P
test assign
dead P
Note that the recursive call is updated to call mortgage$CP2 since when it is examined the clause, the more speci c version has been created. Next the SCC for the middle calling pattern, CP1, is examined. It gives a new version identical to (OPT-CP2), except that the head predicates are called mortgage$CP1. At present the compiler will not collapse these two versions, however in the future it will do so. Now the SCC for the top calling pattern is examined. The analyzer gives the information that for the top calling pattern, P is ground. Therefore the optimizer further optimizes both B = P and I = P * R. Similarly, the call to mortgage in the body now has a specialized version that can be used. The resulting code is
(OPT-CP0)
mortgage$CP0(P, T, R, MP, B) :(T >= 1), (NT = T - 1), mortgage$CP0(P, T, R, MP, B) :(I = P * R), (T = 0), nofail(NP = P + I - MP), (B = P). mortgage$CP1(NP, NT, R, MP, B).
test assign
test assign assign
Note that nofail and dead wrappers have been replaced by assign in the rst clause, and a dead declaration removed in the second clause, since the variable P is now always ground. The nal step in the optimization of the program is to examine the goal which can now be specialized to execute using mortgage$CP0.
, T = , R = , MP