| | × | -1 | o | ∪ | ∩ | φ ( ) |
( ) | [:, ]
ψ
Each term t of the algebra represents a binary relation on O for any given database instance, written d(t). For instance, the semantics of the latter introduced construction is: d([z:t 1 , t2 ]) = d(t2 ), with the extension d(z) = d(t1 ). A logic assertion and an algebraic term are equivalent if they represent the same relation for every database instance. We have shown that each assertion could be translated into an equivalent algebraic term, and reciprocally, which means: Theorem [Ca91a]: For any assertion a(x,y) with two free variables, there exists a term T of A (R) such that: (x,y) ∈ d(T) ⇔ |a(x,y)| d = true. Conversely there exists such an assertion of L 3 for any term T of A (R). The actual translation of an L 3 query into an algebraic query is interesting because among many possible algebraic translations, there is usually one optimal solution. Translation into the algebraic form is based on rewriting and involves a lot of knowledge about object functions. The principle is to solve the equation assertion(x,y), while considering that x is known and that y is searched. The result of the resolution is a relational algorithm that explains how to get y from x and is represented as a term in our relational algebra. We can now take each LAURE constraint c =(s,((R 1 ƒ1 ), ...,(Rm ƒm )),a(x,v 1 , ..., vm )) and translate them into m algebraic constraints {R i Ê Ti } [Ca91a], where T i = t o ƒ i -1 , and t is equivalent to the assertion a(x,R 1 (ƒ 1 (x)), ..., Ri 1 (ƒ i-1 (x)),y,R i + 1 (ƒ i + 1 (x)), ..., R n (ƒ n (x))). The interest of this translation is given by: I(c)((R i,x)) = {y | (x,y) ∈ d(Ti) } .
We say that the algebraic constraint {R i Ê Ti } is derived from the initial constraint set C. A technique for compiling efficiently the computation {y, (x,y) ∈ d(T)} was developed for rule resolution [Ca91a] which can be reused here. Because the relational algebra is a good framework for compiling, we will now extend our algebraic representation to all logic constructions defined in Section 2.3.
3.3 Combined Semantics for Rules and Constraints We use the algebra to represent all conditions from the L 3 language. If R i is multi-valued (R i ∈ R*), a rule rule[ if
a(x,y)
then
R i(x y)]
is translated into an algebraic rule (t Ê Ri ), wheret is equivalent to the L3 assertion a(x,y). Definition: A rule is a formula (Ti Ê Ri ) where Ti ∈ A (R) and R i ∈ R*. A database instance d satisfies a rule (Ti Ê Ri) if and only if d(R i) Ê d(Ti). Example: The rule
rule[if [z exists friend(x z) friend(z y)] then friend(x y)] is transformed into (friend o friend Ê friend)
The semantics of rule resolution is based on a minimal fixpoint, as usual: Theorem [Ta55]: For any initial database instance d0 , there exists a unique minimal database instance which contains d 0 and satisfies a given set of rules. Rules will be evaluated top-down, using the Query/subquery with Differentiation algorithm described in [Ca91a]. If R i is mono-valued, a similar rule ( r u l e [ i f a(x,y) then R i (x y)]) is translated into an algebraic constraint that states that the value R i (x) must be one of those given by a(x,y) 3 . Algebraic constraints are also derived from objects constraints (previous section) by considering each goal R i ( ƒ i (x)) and generating a solved-form of the condition T i , which gives the value of R i (ƒ i (x)) if all other goals are solved. Thus, we represent constraints and rules with a family of algebraic constraints, that have the converse form of a rule: Definition: A constraint is a formula (Ri Ê Ti ) where Ti ∈ A (R) and R i ∈ R1 . A database instance d satisfies a constraint (Ri Ê Ti ) if and only if d(R i) Ê d(Ti). Example: constraint[
for_all (x rectangle) if [l = length(x)] [w = width(x)] [a = area(x)] then [[l * w] = a] ] is transformed into (length Ê ψ ( /,area,width), area Ê ψ (
*,length,width), …)
3
Thus, if there is a unique y for an object x, we get R i (x) = y.
Constraints are more interesting since we do not simply search for a database which satisfies all constraints (the empty database would do); instead, we look for a complete database instance, which associates one value to each resolution goal. Because of the correctness of the translation, we can show that constraints solutions can be expressed as: Definition: A database instance d is a solution for the constraint Ê Ti) if and only if d is complete and d satisfies the constraints.
(Ri
We now introduce production rules to take care of propagation rules on mono-valued relations, negative constraints, and integrity constraints. An algebraic production rule (T i ⇒ ƒ) is made with a condition term T i and a conclusion action ƒ. Definition: A production rule (Ti ⇒ ƒ) is made from a term T∈ A (R) and an action ƒ. Its semantics is that ƒ(x,y) is executed each time the database instance changes from d to d' and (x,y) ∈ d'(Ti ) ∧ (x,y) ∉ d(T i). In general, since we have given no order to production rules, this is a non-deterministic operational semantics (the order in which rules are triggered may be important). In this paper, we only consider three sorts of production rules for which we can give a more precise semantics: • Propagation rules were introduced with the syntax: axiom[ if a(x,y) then R i (x y)].
The conclusion operation of such a rule is a definite update: d(R i )(x) ← y . The assertion a is translated into the equivalent term T i and the algebraic rule is written (T i ⇒ R i ). All propagation rules on a given relation R i are combined with the ∪ operator (T 1 ⇒ R i and T 2 ⇒ R i are combined into (T 1 ∪ T2 ) ⇒ R i). We also suppose that R1 is divided into a set of relations that are defined with constraints and a set of relations that are defined with propagation rules (no intersection to avoid conflicts). In practice, this occurs since we use propagation rules to build and maintain additional information useful for the constraint resolution. •
Integrity constraints
were introduced with the syntax:
constraint[ if a(x,y) check a'(x,y)].
The assertion a is translated into t, and the assertion a' into t'. The conclusion operation of such a rule is to test if (x,y) ∈ d(t') and raise a contradiction if (x,y) ∉ d(t'). We shall write those production rules (T i ⇒ t') •
Negative constraints
were introduced with the syntax:
constraint[ if a(x,y) then [R i no x y]].
The conclusion operation of such a rule is to remove y from the set of possible values d(R i )(x) → d(R i )(x) - {y}, assuming that no choice was
made on R i (x) (in which case the rule is ignored). This implies that we have a way in the database to make a distinction between R i (x) = {y} because a choice was made and R i (x) = {y} because y is the unique possible value. A negative constraint, therefore, specifies which value should not be taken for a goal when a choice is made and is implemented as an algebraic production rule. We shall write them (T i ⇒ · Ri ) . Since propagation rules have the same expressive power as a Turing machine, it is undecidable to know if a the program defined as a set of production rules halts or characterizes uniquely its solution 4 . This is the price to pay for introducing a powerful paradigm to describe heuristics. In the rest of the paper, we simply call F ( P r ,d) the result of applying to d a set of production rules (in a given order) that is supposed to end and to capture the intended heuristic. Further research is needed to see if we can identify a smaller class of production rules with a deterministic semantics, in which useful strategies can still be written. If we now mix production rules with constraints, we use the previous characterization of all solutions to define the semantics of a program made of an initial database d 0 , some rules, some constraints, and some production rules: Definition: Given a set of constraints C and a set of production rules P r , an computation sequence is an sequence d0 , d1 ,... dn such that: ∀ i, |d i| = |di-1 |- 1, |dn | = 0, di > ε (F(P r , d i-1 )). A solution of a program (d0 , C, P r ) is a complete database instance which is the last member of a computation sequence. Because we have restricted propagation rules (partition on R 1 ), we can show that a solution of a program (d 0 , C, Pr ) is a solution of (d 0 , C). R u l e s on mono-valued relations are supposed to be evaluated implicitly each time a mono-valued relation appears in a logic or algebraic expression, according to [Ca91a]. Therefore, the deductive rule resolution is a slave of the constraint resolution.
4.
Resolution
4.1
Abstract
Interpretation
In this section we give an overview of the general method proposed in [Ca91b], which is itself an application of [CC77] to the resolution of constraints. More details and the demonstrations of the results may be found in these two papers. Since the ε computation is done on sets of P( O ) , the first step is to build an abstraction P # (O) [Ca91b] of P( O ). We call D# the sub-lattice induced by P# (O): D# = {d ∈ D, ∀ Ri ∈ R1 , ∀ o ∈ O, d(Ri)(o) ∈ P# }.
4
Consider the following (legal) program: r(x) = 1 ⇒ r(x) = 2
, r(x) = 2 ⇒ r(x) = 1 .
Following [MMS86], we define the abstraction ( α ) and concretization ( γ ) functions as follows: Definition: We define the abstraction function α by: ∀ d∈ D, ∀ (Ri,x) ∈ R 1 × O, α ( d ) ( R i )(x) = ∩ {y, y ∈ P# (O) ∧ d(R i )(x) Ê y}; we take the identity on D# → D as the concretization function γ . In order to build an abstract approximation of ε , we first build an abstract approximation of the relational calculus on A (R). If a binary relation on O is represented by a function from ℜ = (O → P(O) ) (a functional view on relations), the algebra A (R) is generated by some operations on ℜ , such as ∪ , ∩ ... and so on. For each of these operations, we can define an abstract operation ∪ # , ∩ # such that: (r 1 # ∪ # r1 # )(x) = ∩ {y, y ∈ P# (O) ∧ (r1 # (x) ∪ r2 # (x)) Ê y} ... To improve the feasibility, we only require that the abstract operator returns an abstract set which is larger than the abstraction of the correct result. Obviously, the closer the abstraction is, the better practical result we will get. Therefore, we use as many mathematical properties as we can to improve the prediction. The result is an abstract evaluation d # ( T ) of any relational term T in the abstract representation d # = α ( d ) of the database instance d, such that: ∀ d ∈ D, ∀ T ∈
A (R),
d# (T) ∈ (O → P# (O)) and ∀ x ∈ O, d(T)(x) Ê
d # (T)(x) We can now define the abstract deduction function: Definition: ∀ d# ∈ D#, ∀ Ri ∈ R1 , ∀ x ∈ O, ε # (d # )((R i,x)) = d# ((R i,x)) ∩ # { ∩ d# (T i)(x), (Ri Ê Ti) is derived from C} We have defined a consistent abstract interpretation [CC77]. By combining this result with the property of Y( ε )(d 0 ), we get: Theorem
[Ca91b]: ε # has a lower fixpoint operator such that Y( ε # )( α (d 0 )) is smaller than any solution to the problem (d0 , C).
This means that we can consistently reduce the possible domains before starting the enumerations of possible valuations (prediction of the possible domains). Since we have identified an application of the fixpoint Y( ε ) to build solutions through approximation sequences, we can use abstract interpretation in a more general manner rather than for simple domain prediction. We define the notion of abstract approximation sequence: Definition: An abstract approximation sequence is a sequence d 0 , d 1 ... dn such that: ∀ i, di ∈ D# , |di| = |di-1|- 1, |dn | = 0, di > ε # (d i-1)
Abstract approximation sequences are easier to generate because the cost of the abstract computation is independent from the database size. However, this is still a sound and complete procedure: Theorem
[Ca91b]: A completed database instance v is a solution of (d 0 , C) iff there exists an abstract approximation sequence (d i ) such that dn = v.
We can also define abstract fixpoint approximation sequence, which will converge faster (there is a smaller set of abstract fixpoint approximation sequences) but with a high complexity of computation (computing Y( ε # ) is more complex).
4.2 Lazy Evaluation vs. Propagation In the next section, we shall describe an algorithm that builds an exhaustive enumeration of abstract computation sequences; thus, it is sound and complete. There are still two freedom degrees upon which the efficiency of the resolution will depend: • Goal ordering: some problems (for instance, the n-queens with a large number of queens [VH89]) demand the application of the firstfail principle, which states that the goal with the smallest domain should be tried first; some other problems, such as placement problems, hold a better order derived from the object topology. • Balance between propagation and evaluation: each constraint can be evaluated lazily just before a value is chosen for a goal, or it can be propagated so as to maintain the domains for all goals as soon as any hypothetical assignment is made. The previously mentioned first-fail principle requires some propagation so that the choice made according to cardinality is significant. In this paper we describe an algorithm that uses the first-fail principle because we have found it to be the more commonly useful, but we have also used some variations (other orders) for other problems such as [GGN90]. The cardinality of each goal (R i ,x i ) is the cardinal of the abstract set d # (R i )(x i ). Those sets will be maintained by active propagation because of the following. • Active constraints: each constraint has a mode, that is either specified by the user or inferred from some general declarations. A constraint can be lazy (it will be used with a ε -reduction step at the "last minute"), abstract-lazy (it will also be used at the last minute for a ε # reduction step), or a c t i v e . When a constraint is active, the abstract sets are dynamically reduced so that the current database instance is a fixpoint for the ε # reduction function associated to this constraint. • Negative constraints can be lazy or active. When a negative constraint is active, the negative constraint is implemented as an abstract propagation rule (cf. Section 3.3).
Active constraints or propagation rules rely on the ability to efficiently propagate an update in the database. We need to know which new pairs satisfy a rule condition (an algebraic term) when an update R i ( a , b ) is made. As was noticed in [FU76] and detailed in [PS77], incrementally computing a set of objects satisfying a given specification is analogous to mathematical differentiation. Differentiation rules have been developed for database relational algebra ([BR86] or [SKGB87]). A nice property of this relational algebra ( A (R)) is that differentiation can be introduced as a higher-order operation [Ca91a]. If we define the induced functional algebra F (R 1 ,…, Rn ) as A ( 0 ,1 ,R 1 ,…, R n ), where 0 and 1 are reserved names , each term ƒ of this algebra represents a function from O × O to P(O × O) for each database instance d. By extension, we write this function d(ƒ), which is defined by: ∀ o1 ,o 2 ∈ O, d(ƒ)(o1 ,o 2 ) = d(ƒ) in A (0,1,R 1 , ..., Rn ), where d(0) = ∅ , d(1) = {(o1 ,o 2 )) and ∀ i, d(R i) = d(Ri)
The key property of this algebra is the existence of a formal operation ∂/∂ called differentiation on A (R) × R → F (R), defined by formal rules. We write ∂t/∂R i for the differentiate of the term t according to R i .The interest of differentiation holds in this result: T h e o r e m : • ∀ d ∈ D, ∀ Ri ∈ R, ∀ (o1,o2) ∈ O × O, ∀ t ∈ A(R), if (o1 ,o 2 ) does not belong to d(R i ) and if we define a database instance d' by d'(Ri) = d'(Ri) ∪ {(o1 ,o 2 )) and d'(Rj) = d'(Rj) for all other j: d'(t) = d(t) ∪ d'(∂t/∂R i)(o 1 ,o 2 ) • ∂ t / ∂ R i is the smallest term from F (R) which satisfies the previous equation (any other similar term represents a function that always contains ∂ t/ ∂ R i ) . The idea of differentiation can be found in the RETE algorithm [Fo82], where it is a graph operation, or for relational databases, where it is defined by a database computation [BR86]. In this model, we obtain a f o r m a l differentiation (on abstract functions instead of database instances), which provides a better implementation. More details and correctness proofs may be found in [Ca91a]. As explained in [Ca89], the differentiated terms can be in turn compiled into efficient low-level functions.
4.3
Resolution
Algorithm
We shall now describe a resolution algorithm that produces one (possible) solution to a set of constraints, rules, and production rules. The first step is to compute an approximation of the fixpoint, using the abstract
interpretation. We then start the enumeration of all completions, using the first-fail principle. The propagation is based on two operations: • The function obtained by d i f f e r e n t i a t i o n , ∂ T / ∂ R i ( x , y ) , returns the exact set of pairs that appears in d(T) when (x,y) is added in d(R i ) (cf. previous section). • Similarly ∂ # T / ∂ R i ( x , S ) returns a set of pairs (x',S') where S ' is an abstract interpretation of d(T)(x'), which uses the new value S given to d ( R i )(x). Notice that this is just a convenient notation (there is no "differentiation" with abstract interpretation since we must have S' = d # (T)(x') because the new value S is not a positive update). We use an exception-handling mechanism described in [Don90], which catches contradictions raised either by the detection of an empty set of possible values or the violation of an integrity constraint. We may now describe the algorithm (the database instance d, the constraints, rules, and production rules are global resources), which solves a list of given goals. The resolution algorithm [Ca91c] uses two steps: Predict(L) and Enumerate(L). Predict(L) computes (semi-naive iteration) the fixpoint Y( ε # ) [Ca91b] for the goals in L. We apply each reduction step ( ε # ) for each relevant rule until no further reduction can be performed. E n u m e r a t e(L) builds all the possible approximation sequences using a ε -reduction step for lazy constraints, a ε # -reduction step for abstract lazy constraints and a Y( ε # ) reduction step for active constraints. The backtrack mechanism relies on the ability to make copies of the database and return to previously stored states. Fortunately, this is supported efficiently in the LAURE system [Ca91a]. Whenever a new fact R i (x,y) is obtained (for instance, a choice R i (x) = y is made by the constraint solver), it is propagated using differentiation to activate all relevant production rules and using the abstract computation to reduce the domains of the current goals. This algorithm is sound and returns one possible solution. With a minor modification, we can use it to build the set of all solutions. The completeness relies on the fact that the algorithm builds all computation sequences because of the respective properties of abstract interpretation and differentiation. Since solutions are complete computation sequence by definition, the result follows. Notice that because of the characterization of Section 3.1, a corollary is that the algorithm finds the exact set of all solutions for a "pure" constraint problem(d 0 , C) with no production rules. The implementation has been described in other papers [Ca89,91a] with more details. Here we just give some principles, that may explain some of the good results that will be presented in the next section. Each rule or constraint is transformed into an equivalent algebraic form which holds no logic variables and can be compiled into low-level (C) code. The ability of direct procedural compilation is one great advantage of combinatoric logic. The translation is performed before run-time, and the algebraic form is actually the representation of an imperative computation. The practical
application of this property is presented in [Ca89], which shows how compiled demons are produced from logical rules. Each rule, axiom or constraint is actually stored as a set of equivalent demons, stored in the demon attributes of the relations. The logic language L 3 is implemented as an extension of the LAURE programming language. Resolution algorithms are just reduced to triggering the right compiled functions, with very little overhead, and the unification (top-down) or pattern-matching (bottomup) work is entirely performed during the compilation. The current implementation uses LAURE's own main-memory object management. This makes LAURE a deductive database language rather than a true database. However, we have tried large problems using virtual memory and obtained good results (task assignment problems, large transitive closures). We believe that our resolution techniques are well suited for a large volume of information. The main reason is the tight coupling with the object-oriented model, which supports the efficient handling of a large domain through set organization. Our current work is to use C++ as a common object layer, to interface LAURE with a commercial OODBMS. This will add persistency to LAURE objects and complete our database system.
5. Results 5.1 Constraints Classics Comparing LAURE with other languages is difficult. When all solutions are needed, our experiments have found that ad-hoc (non declarative) PROLOG programs were faster than most constraint solvers. For finite domains constraints, when one solution is needed, the CHIP [VH89] system is a better candidate. Since we have used LAURE and (compiled Quintus 3.0) PROLOG on a SUN Sparc1, we divided numbers given in [VH89] by 10, to take into account the better hardware and the better performance that new implementations of CHIP-like systems now achieve and we also modified the times reported with other machines in [VD91] and [AB91] (Figure 1). Since this is an empiric approximation, the results are only significant by their order of magnitude. For instance, constraint solvers are usually not very fast to find all the solutions of the 8-queens problem (an ad-hoc PROLOG program is faster with 330ms for all solutions). The result in the first example of Figure 1 says that a naive LAURE program (three small constraints) is faster than the complex PROLOG program. When the number of queens grows, a constraint-resolution strategy starts to pay (the previous PROLOG program fails). The second example is the classic cryptarithmetic puzzle SEND+MORE = MONEY. The domain reduction methods used in the CHIP system [VH89] allow the reduction of those sets of possible values in such a way that only one backtracking is necessary to find the solution. This result is also
obtained with LAURE, which was expected since our abstract interpretation captures the dual representation of the CHIP system. The total computation time for LAURE is 12ms with active constraints and 6ms with abstract lazy constraints, which compares correctly to the CHIP a p p r o a c h . All the execution time for active constraints is spent in the abstract fixpoint computation, but compilation gives performance that is almost comparable to a hand-coded implementation (such as CHIP's domain reduction procedure). On some other constraint problems for which there is a good strategy, LAURE performs even better when compared to other constraints solvers. For instance, LAURE solved the 8th magic series problem [VH89] in 170 ms, whereas newer CHIP compiled implementations are in the 1s range on a SUN4. Problem
LAURE
other
8 queens (all solutions)
200 ms
330 ms
64 queens (one solution)
200 ms
250 ms CHIP* [AB91]
SEND+MORE=MONEY (one solution) [VH89]
6 ms
6 ms
8th magic series (one solution)
170 ms
1.3 s
House problem (all solution)
50 ms
120 ms CHIP/10 [VH89]
Prolog
CHIP/10 cc(fd) [VD91]
Figure 1: Classics (SUN SPARC1 Workstation)
5.2 Real-Life Problems Our conclusion from these small problems is that LAURE is an efficient constraint solver, in the same range (at least) of the newer and faster implementation of the CHIP system. This is obtained with a more general system, which supports constraints on object hierarchy and can be extended with production rules. When production rules are used to help the constraint resolution (such as for the magic series), performance can be increased by an order of magnitude. Similar results were also obtained with the PECOS language [PA91], which also combines constraints and objects. Real-life (or larger) problems are more interesting, but it is also more difficult to set up a fair comparison. In the rare cases where the problem is totally exposed, there are still many tricks used in testing programs which modify performance significantly. A good example is the building of a five-segment bridge, taken from Bartusch's PhD thesis and reported by P. Van Hentenryck in [VH89]. This problem is made of 46 tasks, with precedence, sharing, and domainspecific constraints. Its interest is to be both a representative problem and a well-described one. We first wrote a simple, declarative LAURE program, with five generic constraints that were instantiated into 250 objects.
Following the indications in [VH89], we used an extra attribute (ordering on shared tasks), so as to take care of mutual exclusion first. It is a wellknown strategy that, when each pair of exclusive tasks have been ordered, the optimal schedule is found through a graph computation of the atleast and atmost date for each task [VH89]. Although LAURE has no specific knowledge about how to solve inequations, abstract interpretation of the scheduling constraints (incremental updates of the abstract domains as soon as any new hypothesis is made) mimic the deterministic computation of the a t l e a s t and a t m o s t date. This is done with similar efficiency to the original CHIP implementation. To extend LAURE with some knowledge about scheduling problems, we added a set of four axioms that incrementally compute the two extra attributes atleast and atmost and we ordered the tasks according to their durations. This new program is much faster. It finds the optimal solution in 200 ms and its proof of optimality in 2 s on a SUN Sparc1. This is slightly faster than the 3s reported in [AB91] or the 5s obtained with cc(fd). However, for instance, using redundant constraints and a better organization (different from the one in [VH89]), a PECOS program only needs 1.5s. For this problem, LAURE is also in the same range of efficiency, and we keep a definite conclusion for a more detailed analysis. LAURE is now being tested in a very large scheduling-assignment telecommunications problem (many thousands of tasks). In this problem, the complexity barrier is too high, and we need to make some simplifications, using heuristics. These heuristics involve various additional pieces of information that can be computed with production rules. In this large example, we actually use the production-rule paradigm in its full generality [Ca89] (the conclusion of the production rule can be any expression of the host language), which is quite versatile but brings us even further from a clean declarative semantics. The production rules are the interface between the constraint resolution and a library of algorithms described with m e t h o d s in the LAURE object-oriented language. The efficiency of this hybrid approach is guaranteed by the compilation of production rules through differentiation. We have also used LAURE for automatic layout problems [GGN90], where the disjunctive abstract interpretation gives very good results. Those two last examples show the real problems which LAURE is eventually used for. Unfortunately, it is more difficult to evaluate performance, both because those two problems do not fit readily into existing available constraint solver, and also because the comparison should include the time spent in development. Ultimately, to goal of a constraint solver is to provide performance which is comparable to a low-level implementation at a much lower cost.
4.3 Comparison with Related Work LAURE is a general constraint solver for order-sorted finite domains. Although it may be seen as a Constraint Logic Programming language,
LAURE is very different from CLP( R ) [CLP], PROLOG-III or other systems intensively based on Horn-clause resolution because it uses no logic variables and deals with global constraints. The LAURE resolution strategy, including the prediction phase, is directly inspired by work performed at the ECRC on the CHIP system [DSVH87] [VH89]. The main contribution is to extend integer techniques related in [VH89] to arbitrary order-sorted domains, using abstract interpretation. Two other important contributions are the algebraic framework, which supports efficient compilation, and the integration with production rules, which permits domain-specific extensions. A lot of work has been done to integrate constraint resolution into logic programming languages: real number constraints into Horn-clauses (CLP( R ) family); integer constraints into order-sorted logic (LIFE [AKP90]); finite (flat) domains (actually integer) constraints into PROLOG (CHIP family [VH89]); or various constraints theories into DATALOG [KKR90]. Our work is at a lower level since we deal with the actual resolution of constraints, not their integration into another language. For instance, if we use this work as a theory of constraints over an order-sorted domain, it makes perfect sense to study CLP(O) [JL87], where O is our order-sorted domain. This is actually a subject for further research, such as using this constraint theory for the constraint scheme used for LIFE [AKP90]. As a contribution to the theory of constraints over an order-sorted domain, this approach is very general and could be adapted for any objectoriented constraint solver. It is our intuition that most efficient constraint solvers on object-oriented domains use techniques to reduce the search space that could be easily described with an abstract interpretation. Representing a domain reduction algorithm as an abstract interpretation has two advantages: it gives a consistency result [CC77], and it gives tools to compare and combine techniques using the lattice of abstract interpretations. It is still an open question to find out if introducing production rules is an overkill. Because there is no simple semantics for productions rules, some of the declarativeness is lost when we introduce production rules. On the other hand, we have found that most heuristics translate easily into production rules, because of their expressive power (we have also found that trying to mimic constraint resolution with production rules is very inefficient). Our current research is to identify a smaller production rules language (only the propagation rules cause problems). There are many possible syntactical restrictions that would yield a nicer semantics, but we do not know yet if we would still be able to use them for real-life situations. From a practical perspective, LAURE is close to commercial expert-system shells such as ART or pro-Kappa. However, LAURE includes a sophisticated constraint solver which we think is mandatory in an object-oriented deductive system. In addition, we have found LAURE to be much faster on pure production-rules applications, such as small expert-system benchmarks. Because differentiation produces only the new relevant objects, propagation is performed with the minimal number of
computations [BR86], whereas the graph of nodes built into RETE is only an approximation of this property. In addition, differentiation produces a sequence of computations that can be executed without any special data structure, whereas the RETE algorithm requires a complex graph structure to be maintained. This work shares many similarities with many theoretical approaches to merging object-oriented and logic programming [Ku85] [AK89] [AKP90]. The closest example is the revisited O-logic [KW89], which is also intended as a framework for AI programming. The LAURE model is a strict subset of this more ambitious model but is able to propose realistic implementation techniques from these restrictions [Ca91a]. In addition, the LAURE model allows explicit representation of disjunctive information, which we think is a very useful feature for deductive object-oriented database applications (cf. Section 1.1). There is some similarity (in the motivations) between the LAURE data model and OR-objects for representing disjunctive information [INV90] although the hierarchical aspect (order-sorted domain) plays a critical role in the LAURE model [Ca91a]. Another subject for further research is the adaptation of the constraint resolution theory, upon which LAURE is based [Ca91b], to OR-objects.
6.Conclusion This paper has identified a research domain, investigated three technical issues and proposed a proven solution. We have motivated the integration of global constraints into object-oriented database from an application point of view. We have proposed a semantics for objectoriented constraints, which translated easily into a relational algebra, thus giving tools for optimization and compilation. We developed an abstract interpretation technique that extends previously known integer domain reduction methods to any order-sorted domain. We have argued that some room should be left to domain-specific heuristics to solve realistic large problems and that productions rules are a natural way to go. We have proposed, based on the above-mentioned techniques, a resolution algorithm that has proven to be efficient in practical examples.
Acknowledgment This work has benefited from fruitful discussions with Hassan Aït-Kaci and Pascal Van Hentenryck. I would like to thank Drew Adams, Francois Monnet and Diane Hoffoss for their support and encouragements. I am also grateful to members of our database research group at Bellcore, including Sam Epstein, Madhur Kohli, Shamim Naqvi, and Yatin Saraya.
References [AB91] A. Agoun, N. Beldiceanu: Overview of the CHIP Compiler. Proc. of the 8th ICLP, Paris, 1991.
[AKG87] S. Abiteboul, P. Kanellakis, G. Grahne. On the Representation and Querying of Sets of Possible Worlds. Proc. of ACM SIGMOD, 1987. [AK89] S. Abiteboul, P. Kanellakis. Object Identity as a Query Language Primitive. Proc. ACM Conf. on Management of Data, 1989. [AKP90] H. Aït-Kaci, A. Podelski. The Meaning of Life. PRL Research Report, DEC, 1990. [BMM89] A. Borning, M. Maher, A. Martindale , M. Wilson. C o n s t r a i n t Hierarchies and Logic Programming. Proc. of the 6th ICLP , Lisbon, June 1989. [BR86] F. Bancilhon, F. Ramakrishnan. An Amateur's Introduction to Recursive Query Processing Strategies. Proc. ACM SIGMOD Conf. on the Management of Data, Washington, May 1986. [Ca89] Y. Caseau. A Formal System for Producing Demons from Rules. Proc. of DOOD89, Kyoto 1989. [Ca91a] Y. Caseau. A Deductive Object-Oriented Language. in the Annals of Mathematics and Artificial Intelligence, Special Issue on Deductive Databases, February 1991. [Ca91b] Y. Caseau. Abstract Interpretation of Constraints over an Order-Sorted Domain. Proc. of ILPS, San Diego, October 1991. [Ca91c] Y. Caseau. Rule-Aided Constraint Resolution in LAURE. PDK91, to appear in Lecture Notes in Computer Science, 1991. [CC77] P. Cousot, R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Constructions or Approximation of Fixpoints. Proc. Fourth ACM Symposium of Principles of Programming Languages, 1977. [CLP] N. Heintze & al.: Constraint Logic Programming: A Reader. Proc. 4th IEEE Symposium on Logic Programming, San Francisco, 1987. [FU76] A. Fong, J. Ullman. Induction Variables in Very High Level L a n g u a g e s . Proc. Third ACM Symposium of Principles of Programming Languages, 1976. [Don90] C. Dony. Exception Handling and Object-Oriented Programming: Towards a Synthesis. Proc of OOPSLA'90, Ottawa, 1990. [DSVH87] M. Dincbas, H. Simonis, P. Van Hentenryck. Extending Equation Solving and Constraint Handling in Logic Programming. Colloquium on Resolution of Equation in Algebraic Structures, Austin, May 1987. [Fo82] C.L. Forgy. RETE: A Fast Algorithm for the Many Pattern/Many Object Pattern Matching Problem. Artificial Intelligence, no 19, 1982. [G&al90] M. Ganti, P. Goyal, R. Nassif, P. Sunil. An Object-Oriented Development Environment. COMPCON, Feb 1990. [HS89] R. Hull, J. Su. Untyped Sets, Invention, and Computable Queries. Proceeding of PODS-89, Philadelphia, 1989.
[INV90]
T. Imielinski, S. Naqvi, K. Vadaparty. Querying Design and Planning Databases. Proc. of DOOD91, Munich, 1991. [JL87] J. Jaffar, J.-L. Lassez. Constraint Logic Programming. Proc. ACM Symp. Principles of Programming Languages, San Francisco, 1987. [KKR90] P. Kanellakis, G. Kuper, P. Revesz. Constraint Query Languages. Proc. of 9th ACM PODS, 1990. [KW89] M. Kifer, J. Wu. A logic for Object-Oriented Logic Programming (Maier's O-Logic Revisited). Proceeding of PODS-89, Philadelphia, 1989. [Ku85] G. M. Kuper. The Logical Data Model: A New Approach to Database Logic. PhD Dissertation, Stanford University, 1985. [Ma77] A. Mackworth: Consistency in Networks of Relations. Artificial Intelligence vol.8, 1977. [McL81] B.J. MacLennan. Programming With A Relational Calculus. Rep n° NPS52-81-013, Naval Postgraduate School, September 1981. [Mel86] C.S. Mellish. Abstract Interpretation of Prolog Programs. Proc. Third Int. Conf. Logic Programming, 1986. [MS90] K. Marriott, H. Sondergaard. Analysis of Constraint Logic Programs. Proc. of the NACLP, Austin, 1990 [PA91] J.F. Puget, P. Albert: PECOS: programmation par contraintes orientée objets. Génie Logiciel et Systèmes Experts, vol. 23, 1991. [PS77] B. Paige, J.T. Schwartz. Expression Continuity and the Formal Differentiation of Algorithms. Proc. Fourth ACM Symposium of Principles of Programming Languages, 1977. [SKGB87] H. Schmidt, W. Kiessling, V. Guntzer, R. Bayer. C o m p i l i n g Exploratory and Goal-Directed Deduction into Sloppy DeltaIteration. Proc. of the Symposium on Logic Programming, San Francisco, 1987. [Ta55] A. Tarski. A Lattice Theoretical Fixpoint Theorem and its Application. Pacific Journal of Mathematics n° 5, 1955. [VHD91] P. Van Hentenryck, Y. Deville. The Cardinality Operator: A New Logical Connective for Constraint Logic Programming. Proc. of the 8th ICLP, Paris, 1991. [VH89] P. Van Hentenryck. Constraint Satisfaction in Logic Programming. The MIT press, Cambridge, 1989. [Vi86] L. Vieille. Recursive Axioms in Deductive Databases: The Query/Subquery Approach. Proc. First Intl. Conference on Expert Database Systems, Charleston, 1986.