Constraints in an Object-Oriented Deductive

0 downloads 0 Views 100KB Size Report
time a mono-valued relation appears in a logic or algebraic expression, according to .... A constraint can be lazy (it will be used with a ε-reduction step at the "last.
Constraints in an Object-Oriented Deductive Database Yves Caseau Bellcore, 445 South Street, Morristown NJ 07962, USA. e-mail: [email protected]

Abstract This paper relates our experience in integrating constraint resolution into an object-oriented deductive system. We motivate this work by showing that disjunctive information and global constraints fit naturally in an object-oriented model and are actually necessary to perform common tasks. We identify three difficulties in developing such an extended object-oriented system (namely, compilation, domain reduction, and heuristics) and propose a solution for each. We provide a formal semantics and prove the correctness of those three techniques. We illustrate the performance results with the implementation we have built in LAURE.

1.

Introduction

Non-traditional database applications, such as design and planning (scheduling/ resource assignment), try to use object-oriented databases, because the complex structures (objects) that arise naturally when describing the design of a telephone switch or when assigning tasks to a pool of technicians fit the object-oriented paradigm more naturally. However, there is an implicit amount of non-determinism implied in these problems (e,g., the design is partially specified or the tasks are not already assigned), which is difficult to represent, whether in a commercial objectoriented database or in a relational database. Therefore, a natural extension to a data model [AKG87] [INV90] [Ca91a] is to introduce disjunctive (or incomplete) information, which yields to the notion of global constraints to restrict the possible completions of the incomplete information. For instance, the budget for a design will be seen as a global constraint on all the design choices (the total cost should be less than a given value). An object-oriented model with disjunctive information and global constraints may be the perfect environment to specify scheduling or planning problems, but we still have to solve the constraints. Because of the complexity (disjunctive constraint problems are non-polynomial), large problems are very hard to solve. Among all the technical difficulties, we have identified three issues that must be resolved in the implementation of a realistic constraint solver:

International on Deductive and Object-Oriented Databases, Munich, December 1991, LNCS 566 p. 292-311, Springer-Verlag.

• We need some techniques for domain reduction [DSVH87]. The constraint solver which was most successful when applied to real-life problems is CHIP [VH89] and is based (among other things) on smart techniques for reducing the domains of each unknown variable by constraint propagation before starting an enumeration (backtracking). One of our goal was to generalize and extend these methods for general order-sorted domains [Ca91b] (as opposed to flat integer domain). • When the complexity barrier is too high, problems are usually simplified by the use of d o m a i n - d e p e n d e n t h e u r i s t i c s . It seems unlikely that a generic constraint-resolution strategy could replace the various methods which have been developed for each class of problems. The difficulty is to find a paradigm for integrating heuristics in a manner as declarative as possible (otherwise the advantages of constraint logic programming over ad hoc programming will be lost) • We need to support c o m p i l a t i o n of constraints into simpler, more manageable units. Based on our experience, an interpreted constraint solver simply cannot be used in a large application. Here we decribe some techniques that were implemented in the LAURE language and have been used successfully for large applications. We use a relational algebra to represent constraints, which supports efficient compilation using techniques developed for query optimization [Ca89]. We allow the mixing of production rules with constraints, which we have found to be a natural and efficient way to write heuristics. We perform domain reduction with an abstract interpretation of the relational algebra, which generalizes previous domain reduction techniques. The paper is organized as follows. Section 2 describes the framework (an object-oriented deductive database) and the motivations (nondeterminism) for introducing constraints as a paradigm for objectoriented logic programming. We show why we need to integrate different forms of logic programming to achieve a better resolution of constraints. Section 3 gives a semantics for global constraints and for database completions. We show how the algebraic framework developed for deductive rules in [Ca89] may be reused here, yielding to the notion of algebraic constraints. We introduce a limited set of production rules, similar to expert-systems rules, to enhance the constraint resolution. Section 4 describes the resolution strategy we use in LAURE and some of the tools that we have developed. We use an abstract interpretation technique [Ca91b] to perform domain reduction before and during resolution; we compile rules and constraints using the relational algebra; and we introduce heuristics through production rules. Section 5 illustrates the practical results we have obtained by implementing those techniques in the LAURE system.

2.

Motivations

In this section we first recall the features of the object-oriented deductive language [Ca89] that we used as a framework to introduce constraint. We show that although local constraints are integrated in our logical language, global constraints are necessary to introduce disjunctive information in the object-oriented database [IVN90]. We then investigate the necessary relationship between constraints and rules. A more complete description of the LAURE data model (and a comparison with similar work [Ku85] [AK89] [KW89]) may be found in [Ca91a].

2.1

Object-Oriented

Deduction

In this paper we assume the existence of an object domain O with a taxonomy (a lattice S of classes), a type system and some f e a t u r e s (instance variables or methods), to which we attach binary relations [Ca91a]. The object domain O is closed under finite enumeration (all finite subsets of O belong to O) through a recursive definition. Features (members from the set F) are seen as functions from O n → O. Adding extensional relations on top of O is similar to [AK89], but we limit ourselves to binary relations, and we make a distinction between mono-valued and multi-valued relations [KW89]. The limitation to binary relation is traditional to object-oriented programming, and we use classes to group many binary relations into a large n-ary relation 1 . An object model is made from such an object domain O and a set R= {R 1 , ... Rn } of relation names that is divided into R* and R1 . A database instance is defined (next section) as an assignment d of a binary relation on O to each R i ∈ R, such that relations from R 1 are mono-valued (for each x in O, the cardinal of the set of objects bound to x by the relation is at most 1). In [Ca89], we have described a logic language for performing deduction in such a database. Here, we give a brief overview of the LAURE logic language (L3 ) so that we can motivate the introduction of constraints. We first start with DATALOG, restricted to the relations that exist in the object-oriented database, which are all binary relations. We can, for instance, write a rule computing equivalence connection classes defined by the base relation bound: rule[ if [or

[bound(x y) or connected(y x)] [z exists connected(x z) connected(z y)]]] then connected(x y)]

A first aspect specific to object-oriented databases is that some relations are known to be mono-valued. We also want to compare objects using order relations defined as methods on objects (such as ,...). Last, some non binary-relations are represented as methods (interpreted functions) 1 A relation R(x , … , x ) is represented by a class and a set of n binary relations. i n Although equivalent from a theoretical point of view with the relational database model, this representation yields to a different implementation and resolution strategy, since tuple creation becomes object invention [HS89].

in LAURE, and we want to integrate them into the logic language (we actually only allow ternary functional relations, which we expect to have some mathematical properties, such as associativity inside a m o n o i d structure, the existence of an inverse in a g r o u p structure, commutativity or distributivity) 2 . Introducing mono-valued relations (r(x)), comparison methods ([x = y]) or operations ([x + y]) inside the logic language yields to expressions that are often called local constraints ([CLP], [KKR90]). Local constraints can be merged with logic predicates to build relations that cannot be described with pure DATALOG. An example, taken from [KKR90], is the computation of an a s c e n d i n g transitive closure, which links two points x and y if there is a path in the base relation of points joining x to y, such that the heights of intermediate points make an increasing sequence: rule[

i f [[bound(x y) and [height(y) >= height(x)]] or [z exists bound(x z) ascending(z y) [height(z) >= t h e n ascending(x y)]

height(x)]

So far, we cannot create, inside a rule, new objects that would be necessary to represent tuples from a DATALOG relation with arity larger than 2. We propose to allow for, inside the rule body, a limited form of object invention [HS89], which can be used to simulate both predicates with higher arity or un-interpreted functions (functional terms) [Ca91a]. In the following example, we want to compute the distance between two cities, which is a ternary relation. We use a functional term pair(x,y) to create a composite object, such that d i s t a n c e ( p a i r ( x , y ) , d ) means that there is a path with length d between x and y. The expression [pair of x y] is introduced in our language to allow the invention of a pair object with slots first = x and second = y: rule[

for_all (x pair)(y integer) if [r:road exists [start(r) = first(x)] [or [[end(r) = second(z)] and [y = length(r)]] [[end(r) != second(z)] and [y = [length(r) + distance([path of end(r) second(x)])]]]]] then distance(x y)]

We may now define the L 3 language more formally. We identify four subsets F 1 , Fc , Fo and Fp of F containing features such that (respectively) their arity is 1 (O → O); they represent a comparison (O × O → Boolean); they represent an operation (O × O → O); or they represent a bijection. An L 3 assertion with free variables in the set V is described by the following grammar: ::

2

< R* > ( ) | [ ]| [ ∧ ] |

Each of these properties are used as rewriting rules [Ca91a] that permit the translation from the logic form into the relational algebra.

[ ∨ ] | [∃ *] ::

V | O | () | [ ] | [ of *]

Given the notion of an assignment function ν of (V → O), the semantics of this language is straightforward to define. Given a database instance d and an assignment function ν , if E is an expression(V), [E]d, ν is an object of O; if A is an assertion(V), [A]d,ν is a boolean value.

2.2 Disjunctive Information and Global Constraints LAURE supports disjunctive information, also seen as incomplete information [AKG87]. Disjunctive information is used to represent choices that are not made yet in a design or a planning problem. In the LAURE model, we attach disjunctive information to mono-valued attributes; another solution would be to introduce OR-objects [INV90], but using relations simplifies the data model. Each mono-valued relation may be given a set of possible values for a given object x, either directly such as in: size(John) ∈ {162, 177, 190}; color(my_ball) ∈ {blue,green}

or for a set of given objects such as in the following declaration: ∀ x ∈ Toy, color(x) ∈ {yellow,red,purple,orange,brown}

Therefore, this model does not support any disjunctive information, but only disjunctions about the same object and the same relation. This covers the kind of incompleteness which is frequent with object-oriented applications. If a certain value of the relation for an object is not known, we represent a set of possible values instead. Therefore, we define a database instance as an assignment from R to (O → P (O)). Definition: A database instance is a function of D = (R → (O → P (O))). For each x ∈ O, if R i ∈ R* , d(R i)(x) represents the class of x according to the relation denoted by R i (set of objects bound to x ). If R i ∈ R1 , d ( R i ) ( x ) represents the set of possible values for R i (x). If this set has one unique member y, then it means that R i (x) = y. There is a lattice structure on database instances, derived from the following order: ∀ d1,d2 ∈ D, d1 < d2 ⇔

∀ Ri ∈ R*,∀ x ∈ O, d1(R i)(x) Ê d2(R i)(x) ∧ ∀ Ri ∈ R1 ,∀ x ∈ O, d2 (R i)(x) Ê d1 (R i)(x)

The intuitive meaning of this order is that if d 2 > d 1 , then d 2 contains the knowledge in d 1 plus some additional information (thus the reversed inclusion for mono-valued relations). A database instance d usually

contains some incomplete information through the value of relations from R1. This representation yields to the notion of unknown value (or goal for resolution), which is a pair ( R i , x) such that R i ( x ) is not known (i.e., |d ( R i )(x)| > 1). Incomplete information comes generally with rules that define how they can be completed. A natural paradigm is the notion of global constraints, which are equations on the objects of the database, which restrict possible completions by requiring specific properties (e.g., the total budget is less than $100K). If we compare them to local c o n s t r a i n t s , such as those presented in the previous section, a local constraint is an equation introduced in the body of a rule, which restricts the unification, whereas a global constraint is an equation on the database, which restricts the completion process. A constraint is made of an L 3 assertion a ( v 1 ,... vm ) and some goals {R i (x i ), i = 1 .. m} and states that the assertion a should be satisfied for the valuation v i = R i (x i ) when each goal R i (x i ) has received a unique value. To introduce some factorization (the ability to write constraints that apply to multiple objects using the class taxonomy), we choose either x i = x or x i = f i (x), and we quantify universally on x over some class s ∈ S (f i represents some instance variable of the object x). We can, therefore, represent global constraints over the database instance d of the form: ∀ x ∈ s, ∀ v1, v2,..., vn ∈ O, (∀ i ∈ {1…m} v i = d(Ri)(f i(x)) ) ⇒ |a(v1 ,...,v n )| d = true

In the LAURE syntax (see next examples), the domain s is introduced with a for_all declaration, the goals declarations follow an if keyword, and the L3 assertion follows the then keyword. For instance, we define a set of points (a plane) and a subset (a line) defined by the equation [X + 2 Y = 10]. ;; we define a new class with two coordinates [plane :: class with (slot cx -> integer) (slot cy -> integer)] ;; a line is a subset of a plane [line1 :: class superset (plane)] ;; we define the line by an equation constraint[ for_all (p:line1) if [X = cx(p)] [Y = cy(p)] then [[X + [2 * Y]] = 10]]

Constraints do not necessarily apply to one simple object, as in the previous example. We can use composite objects to apply a constraint to multiple receivers. Here we create a class of objects above, such that the existence of such an object X that points to two graphic objects a and b must imply that b is on top of a in the layout: ;; a class of complex objects [above :: class with (slot top -> object) (slot bottom -> object)] ;; if X belongs to above, top(X) should be on top of bottom(X) constraint[ for_all (X above)

if [h1 = y(top(X))] [h2 = y(bottom(X))] then [h1 > [h2 + h(bottom(X))]]]

2.3 Rules and Constraints In this section we present the various kinds of rules and constraints that we have found necessary to represent complex scheduling and planning problems. A formal semantics for each of those logic constructions will be given in Section 3. Here we only describe what we would like to introduce in our deductive object-oriented database. Deductive rules (such as those presented in Section 2.1) and constraints play two very different roles and are both necessary. Whereas a rule tells that we can deduce something if a condition is true, a constraint only says that if we c h o o s e some values, they have to satisfy a condition. A rule (rule[ if a(x,y) t h e n r(x y)]) is made of a condition assertion a(x,y) ∈ L3 and a conclusion relation r ∈ R. Informally, a rule is satisfied if all pairs for which the condition is true are in the conclusion relation r. This, of course, creates a problem if r is mono-valued and too many values can be deduced. This is why we see each rule on a mono-valued relation as a global constraint (if the assertion a(x,y) is functional in x, the constraint captures the same meaning): rule[ if a(x,y)

then r(x y)])

is replaced by: constraint[ if [y = r(x)] then a(x,y)]) . Production rules are the basic tools for building expert systems. A production rule is made of a condition and a conclusion expression. Its semantics is purely operational and says that if the condition is satisfied by some objects at some point in time, the conclusion expression should be evaluated. We have investigated various strategies for helping constraint resolution with domain-dependent heuristics and have found that most heuristics are easily represented by a set of production rules, as related in [Ca91c]. We shall not consider the LAURE production rule paradigm in its full generality in this paper, but rather the practical subset that we need for improving constraint resolution and to which we can give "clean" semantics. All production rules in LAURE have a similar form: they are defined with a for_all statement that gives their range of validity; an if L3 condition on one or two variables; and a then statement, which represents the conclusion expression to be evaluated each time a new pair satisfies the L 3 condition. The implementation of production rules is done by efficient propagation (e.g., the RETE algorithm [Fo82]). In LAURE, we use formal differentiation (cf. Section 4.2 and [Ca91a]) which supports complete compilation into C code. Propagation rules have the same form as deductive rules (a condition assertion and a conclusion relation) but are evaluated in a bottom-up

manner (Section 4.2). The conclusion is simply R i (x y). For instance, here is a production rule that maintains the maximum salary in a department: axiom[

for_all (d department)(y integer) if [e:employee exists [works(e) = d] [salary(e) = y] [y > max(d)] then max(d y)]

Negative constraints are used to forbid some values for a given goal (R,x). The conclusion is [R no x y], and it means that y is no longer a possible value for R(x). Here is an example which says that two queens should not be on the same line in the famous queens problem (how to place eight queens on a chessboard): constraint[

for_all (q queen)(c case) if [z:queen exists [line(place(z)) = line(c)]] then [place no q c]]

Integrity Constraints can be used to check the integrity of a completion. The conclusion (introduced by the keyword c h e c k ) is another L 3 expression. The associated semantics is that if the conclusion condition is not verified, an error (contradiction) is raised. Here is the classic database example (an employee shouldn't make more than his manager): constraint[

for_all (x integer) (y integer) if [z:employee exists salary(z x) [y = salary(manager(z))] check [x < y]]

Integrity constraints play a very important part in the development of a knowledge system, independantly of the constraint resolution studied in this paper. They are implemented as production rules which conclusion is to check some given assertion. Using differentiation, this allows a fast and minimal checking of the condition each time a relevant fact is added to the database.

3.

Semantics

3.1 Logic Constraint Satisfaction A LAURE constraint is a syntactical object that restricts the possible completion of the database instance. It is also a generic constraint since it applies to possibly many LAURE objects. In this paper we assume that we have an order-sorted domain O (see [Ca91a] for details) with a class taxonomy (S, [ v2 + h(bottom(x))]]) We implicitly have a set of unknown values (called goals) Ê R1 × O, on which the completion will operate. For each goal (R i ,x), we have a possible domain, which is d(R i )(x), and a value y if |d(R i )(x)| = 1 and d(R i )(x) = {y} (also written d(R i )(x) = y). A database instance is complete if it contains no disjunctive information. In this paper, we suppose that the set of goals is finite, and we write |d| = |{(R i,x) ∈ R1 × O, |d(Ri)(x)| > 1}| (thus d is complete if |d| = 0). We call D the set of all database values (R → (O → P(O) )), and Dc the set of all complete database values. A solution to a constraint is a complete database instance such that all assertions are satisfied: Definition: A constraint (s, ((R 1 ƒ 1 ), ... ,(Rm ƒ m )), a(x,v 1 , ..., vm )) is satisfied by a complete database instance if and only if: ∀ o ∈ O, o ∈ c ⇒ |a(o, R1 (f1 (o)), ..., R m (f m (o)))| d = true A constraint satisfaction problem is made of an initial domain valuation d 0 and a set C of constraints. A solution is a completed database instance v that satisfies all the constraints in C such that for all x in O, d ( R i )(x) ∈ d 0 (R i )(x). A given constraint problem can have many or no solutions. The semantics we have given is general in the sense that it applies to any form of constraint assertions [Ca91b]. However, there usually is (partially) a way to go from the possible values (d 0 ) to a solution using constraint propagation. We shall now build a more explicit characterization by using the lattice structure on D. For a given database instance d , we now define an arc-consistency function [Ma77] I d (c) ∈ R1 × O → P (O) for all c ∈ C by: ∀ c = (s, l = ((R1 ƒ 1), ... ,(Rm ƒ m )), a(x,v1, ..., vm )), ∀ i ∈ {1, ... ,m}, Id(c)(R i,x) = { y ∈ O, ∀ o ∈ fi-1(x), ∃ v ∈ Dc, v > d, v(ui) = y ∧ |a(o,R1 (ƒ 1 (o)) ... , Rm (ƒ m (o))) |v =

true} I d ( c ) ( R i ,x) is the set of values that can be given to R i (x) in one of the solutions of the constraint c ( f i - 1 (x) is the set of object o such that f i ( o ) = x). We assume that I d ( c ) can be computed directly (not by a simple enumeration of all valuations) so we can define a deduction step with the function ε from D → D: Definition: the deduction function ε is defined by the application of each constraint to each unknown value: ( ε (d)(R i))(x) = d(Ri)(x) ∩ { ∩ Id (c)(R i,x), c = (s, (...(Ri fi) ...),a)}, ε is isotone (order-preserving): (d1 > d2 ⇒ ε (d 1 ) > ε (d 2 )) This function represents a complete propagation of the set of possible values through the constraints (which is not always practical). Any

solution of the set of constraints C is a fixpoint of ε by definition, but the converse is not necessarily true. The deduction function has a least fixpoint YD ( ε ), which is not (always) a solution but verifies: Theorem

[Ta55]: YD ( ε ) ( d 0 ) is smaller than any solution v of the constraint problem (d 0 , C). In particular, if YD ( ε )(d 0 )(R)(x) = ∅ for some (R,x), there are no solutions.

The idea is to use YD ( ε )(d 0 ) instead of d0 to start the enumeration of possible valuations because it represents a smaller set. The goal of Section 4.1 is to provide approximations of Y D ( ε ) because its computation is too expensive. We can now give a characterization of constraints solutions: Definition: such that:

An approximation sequence is a sequence d 0 , d1 ... dn ∀ i, |di| = |di-1|- 1, |dn| = 0, di > ε (d i-1) A fixpoint approximation sequence is an ∀ i, di > YD ( ε )(d i-1) approximation sequence such that:

T h e o r e m : A completed database instance v is a solution of (d0 , C) iff either: • there exists an approximation sequence (d i ) such that dn = v, • there exists a fixpoint approximation sequence (d i ) such that dn = v. Proof: An approximation sequence is equally defined by a sequence of choices (R i (x i ) = y i , i = 1 ... n), such that d i (R j ) = d i-1 (R j ) and d i (R i )(x) = d i-1 (R j )(x) for x ≠ x i . If v is a solution, we can take an arbitrary ordering in the goals of d 0 , which yields a sequence of choices (R i (x i ) = v(Ri )(x i )). Since v is a solution, it follows that v > Y( ε )(d i ) for any i, and thus the sequence (d i ) is both a fixpoint approximation and an approximation sequence. Conversely, if we consider an approximation sequence (d i ), a given constraint c = (s,((R 1 ƒ 1 ), ... ,(Rm ƒ m )), a(x,v 1 , ..., vm ))), and o a given object of s , we consider the smallest i such that all goals (R i , fi ( o ) ) are reduced (a unique value has been assigned). By construction the choice (R i ( x i ) = yi ) for d i is such that there exists j, such that x i = f j ( o ) . Since y i ∈ ε (d i )(R i ,x i ), it follows that |a(o,R 1 (ƒ 1 (o)) ..., R n (ƒ n (o)))| d i = true, and thus d n satisfies the constraint c for the object o . Thus, d n satisfies all constraints and is a solution of the initial problem. This theorem (and its variations used in Section 4.1) is the basis for showing that constraint resolution algorithms are sound and complete.

3.2

Algebraic

Representation

A relational algebra can be formed from some classical operations on binary relations [McL81]. For our model, these operations are composition (written o ), which is the binary case of the relational join, intersection (written ∩), union (written ∪ ) , inversion (written - 1 ), and cartesian product (A × B is the graph of a binary relation). An important property is that each operation can be evaluated on a "set-at-a-time" basis. Moreover, efficient compilation techniques relieve us from physically computing the

sets involved in intermediate computations [Ca89]. The algebra A (R) is made from the set of variables R and the relational operations. Some other operations are introduced to capture the object functions. Precisely we define: ∀ c ∈ Fc , φ (r 1 c r2 )(x y) ⇔ ∃ x1 , x2 , r1 (x x1 ) ∧ r2 (x x2 ) ∧ c(x1 ,x 2 ) = true ∧ x = y ∀ ƒ ∈ Fo ∪ Fp 2 , ψ (r 1 ƒ r2 )(x y) ⇔ ∃ x1 , x2 , r1 (x x1 ) ∧ r2 (x x2 ) ∧ ƒ(x1 ,x 2 ) = y

We add the notion of term variable to the induced algebra and obtain the query algebra A (R) defined by: ::

|

| | × | -1 | o | ∪ | ∩ | φ ( ) |

( ) | [:, ]

ψ

Each term t of the algebra represents a binary relation on O for any given database instance, written d(t). For instance, the semantics of the latter introduced construction is: d([z:t 1 , t2 ]) = d(t2 ), with the extension d(z) = d(t1 ). A logic assertion and an algebraic term are equivalent if they represent the same relation for every database instance. We have shown that each assertion could be translated into an equivalent algebraic term, and reciprocally, which means: Theorem [Ca91a]: For any assertion a(x,y) with two free variables, there exists a term T of A (R) such that: (x,y) ∈ d(T) ⇔ |a(x,y)| d = true. Conversely there exists such an assertion of L 3 for any term T of A (R). The actual translation of an L 3 query into an algebraic query is interesting because among many possible algebraic translations, there is usually one optimal solution. Translation into the algebraic form is based on rewriting and involves a lot of knowledge about object functions. The principle is to solve the equation assertion(x,y), while considering that x is known and that y is searched. The result of the resolution is a relational algorithm that explains how to get y from x and is represented as a term in our relational algebra. We can now take each LAURE constraint c =(s,((R 1 ƒ1 ), ...,(Rm ƒm )),a(x,v 1 , ..., vm )) and translate them into m algebraic constraints {R i Ê Ti } [Ca91a], where T i = t o ƒ i -1 , and t is equivalent to the assertion a(x,R 1 (ƒ 1 (x)), ..., Ri 1 (ƒ i-1 (x)),y,R i + 1 (ƒ i + 1 (x)), ..., R n (ƒ n (x))). The interest of this translation is given by: I(c)((R i,x)) = {y | (x,y) ∈ d(Ti) } .

We say that the algebraic constraint {R i Ê Ti } is derived from the initial constraint set C. A technique for compiling efficiently the computation {y, (x,y) ∈ d(T)} was developed for rule resolution [Ca91a] which can be reused here. Because the relational algebra is a good framework for compiling, we will now extend our algebraic representation to all logic constructions defined in Section 2.3.

3.3 Combined Semantics for Rules and Constraints We use the algebra to represent all conditions from the L 3 language. If R i is multi-valued (R i ∈ R*), a rule rule[ if

a(x,y)

then

R i(x y)]

is translated into an algebraic rule (t Ê Ri ), wheret is equivalent to the L3 assertion a(x,y). Definition: A rule is a formula (Ti Ê Ri ) where Ti ∈ A (R) and R i ∈ R*. A database instance d satisfies a rule (Ti Ê Ri) if and only if d(R i) Ê d(Ti). Example: The rule

rule[if [z exists friend(x z) friend(z y)] then friend(x y)] is transformed into (friend o friend Ê friend)

The semantics of rule resolution is based on a minimal fixpoint, as usual: Theorem [Ta55]: For any initial database instance d0 , there exists a unique minimal database instance which contains d 0 and satisfies a given set of rules. Rules will be evaluated top-down, using the Query/subquery with Differentiation algorithm described in [Ca91a]. If R i is mono-valued, a similar rule ( r u l e [ i f a(x,y) then R i (x y)]) is translated into an algebraic constraint that states that the value R i (x) must be one of those given by a(x,y) 3 . Algebraic constraints are also derived from objects constraints (previous section) by considering each goal R i ( ƒ i (x)) and generating a solved-form of the condition T i , which gives the value of R i (ƒ i (x)) if all other goals are solved. Thus, we represent constraints and rules with a family of algebraic constraints, that have the converse form of a rule: Definition: A constraint is a formula (Ri Ê Ti ) where Ti ∈ A (R) and R i ∈ R1 . A database instance d satisfies a constraint (Ri Ê Ti ) if and only if d(R i) Ê d(Ti). Example: constraint[

for_all (x rectangle) if [l = length(x)] [w = width(x)] [a = area(x)] then [[l * w] = a] ] is transformed into (length Ê ψ ( /,area,width), area Ê ψ (

*,length,width), …)

3

Thus, if there is a unique y for an object x, we get R i (x) = y.

Constraints are more interesting since we do not simply search for a database which satisfies all constraints (the empty database would do); instead, we look for a complete database instance, which associates one value to each resolution goal. Because of the correctness of the translation, we can show that constraints solutions can be expressed as: Definition: A database instance d is a solution for the constraint Ê Ti) if and only if d is complete and d satisfies the constraints.

(Ri

We now introduce production rules to take care of propagation rules on mono-valued relations, negative constraints, and integrity constraints. An algebraic production rule (T i ⇒ ƒ) is made with a condition term T i and a conclusion action ƒ. Definition: A production rule (Ti ⇒ ƒ) is made from a term T∈ A (R) and an action ƒ. Its semantics is that ƒ(x,y) is executed each time the database instance changes from d to d' and (x,y) ∈ d'(Ti ) ∧ (x,y) ∉ d(T i). In general, since we have given no order to production rules, this is a non-deterministic operational semantics (the order in which rules are triggered may be important). In this paper, we only consider three sorts of production rules for which we can give a more precise semantics: • Propagation rules were introduced with the syntax: axiom[ if a(x,y) then R i (x y)].

The conclusion operation of such a rule is a definite update: d(R i )(x) ← y . The assertion a is translated into the equivalent term T i and the algebraic rule is written (T i ⇒ R i ). All propagation rules on a given relation R i are combined with the ∪ operator (T 1 ⇒ R i and T 2 ⇒ R i are combined into (T 1 ∪ T2 ) ⇒ R i). We also suppose that R1 is divided into a set of relations that are defined with constraints and a set of relations that are defined with propagation rules (no intersection to avoid conflicts). In practice, this occurs since we use propagation rules to build and maintain additional information useful for the constraint resolution. •

Integrity constraints

were introduced with the syntax:

constraint[ if a(x,y) check a'(x,y)].

The assertion a is translated into t, and the assertion a' into t'. The conclusion operation of such a rule is to test if (x,y) ∈ d(t') and raise a contradiction if (x,y) ∉ d(t'). We shall write those production rules (T i ⇒ t') •

Negative constraints

were introduced with the syntax:

constraint[ if a(x,y) then [R i no x y]].

The conclusion operation of such a rule is to remove y from the set of possible values d(R i )(x) → d(R i )(x) - {y}, assuming that no choice was

made on R i (x) (in which case the rule is ignored). This implies that we have a way in the database to make a distinction between R i (x) = {y} because a choice was made and R i (x) = {y} because y is the unique possible value. A negative constraint, therefore, specifies which value should not be taken for a goal when a choice is made and is implemented as an algebraic production rule. We shall write them (T i ⇒ · Ri ) . Since propagation rules have the same expressive power as a Turing machine, it is undecidable to know if a the program defined as a set of production rules halts or characterizes uniquely its solution 4 . This is the price to pay for introducing a powerful paradigm to describe heuristics. In the rest of the paper, we simply call F ( P r ,d) the result of applying to d a set of production rules (in a given order) that is supposed to end and to capture the intended heuristic. Further research is needed to see if we can identify a smaller class of production rules with a deterministic semantics, in which useful strategies can still be written. If we now mix production rules with constraints, we use the previous characterization of all solutions to define the semantics of a program made of an initial database d 0 , some rules, some constraints, and some production rules: Definition: Given a set of constraints C and a set of production rules P r , an computation sequence is an sequence d0 , d1 ,... dn such that: ∀ i, |d i| = |di-1 |- 1, |dn | = 0, di > ε (F(P r , d i-1 )). A solution of a program (d0 , C, P r ) is a complete database instance which is the last member of a computation sequence. Because we have restricted propagation rules (partition on R 1 ), we can show that a solution of a program (d 0 , C, Pr ) is a solution of (d 0 , C). R u l e s on mono-valued relations are supposed to be evaluated implicitly each time a mono-valued relation appears in a logic or algebraic expression, according to [Ca91a]. Therefore, the deductive rule resolution is a slave of the constraint resolution.

4.

Resolution

4.1

Abstract

Interpretation

In this section we give an overview of the general method proposed in [Ca91b], which is itself an application of [CC77] to the resolution of constraints. More details and the demonstrations of the results may be found in these two papers. Since the ε computation is done on sets of P( O ) , the first step is to build an abstraction P # (O) [Ca91b] of P( O ). We call D# the sub-lattice induced by P# (O): D# = {d ∈ D, ∀ Ri ∈ R1 , ∀ o ∈ O, d(Ri)(o) ∈ P# }.

4

Consider the following (legal) program: r(x) = 1 ⇒ r(x) = 2

, r(x) = 2 ⇒ r(x) = 1 .

Following [MMS86], we define the abstraction ( α ) and concretization ( γ ) functions as follows: Definition: We define the abstraction function α by: ∀ d∈ D, ∀ (Ri,x) ∈ R 1 × O, α ( d ) ( R i )(x) = ∩ {y, y ∈ P# (O) ∧ d(R i )(x) Ê y}; we take the identity on D# → D as the concretization function γ . In order to build an abstract approximation of ε , we first build an abstract approximation of the relational calculus on A (R). If a binary relation on O is represented by a function from ℜ = (O → P(O) ) (a functional view on relations), the algebra A (R) is generated by some operations on ℜ , such as ∪ , ∩ ... and so on. For each of these operations, we can define an abstract operation ∪ # , ∩ # such that: (r 1 # ∪ # r1 # )(x) = ∩ {y, y ∈ P# (O) ∧ (r1 # (x) ∪ r2 # (x)) Ê y} ... To improve the feasibility, we only require that the abstract operator returns an abstract set which is larger than the abstraction of the correct result. Obviously, the closer the abstraction is, the better practical result we will get. Therefore, we use as many mathematical properties as we can to improve the prediction. The result is an abstract evaluation d # ( T ) of any relational term T in the abstract representation d # = α ( d ) of the database instance d, such that: ∀ d ∈ D, ∀ T ∈

A (R),

d# (T) ∈ (O → P# (O)) and ∀ x ∈ O, d(T)(x) Ê

d # (T)(x) We can now define the abstract deduction function: Definition: ∀ d# ∈ D#, ∀ Ri ∈ R1 , ∀ x ∈ O, ε # (d # )((R i,x)) = d# ((R i,x)) ∩ # { ∩ d# (T i)(x), (Ri Ê Ti) is derived from C} We have defined a consistent abstract interpretation [CC77]. By combining this result with the property of Y( ε )(d 0 ), we get: Theorem

[Ca91b]: ε # has a lower fixpoint operator such that Y( ε # )( α (d 0 )) is smaller than any solution to the problem (d0 , C).

This means that we can consistently reduce the possible domains before starting the enumerations of possible valuations (prediction of the possible domains). Since we have identified an application of the fixpoint Y( ε ) to build solutions through approximation sequences, we can use abstract interpretation in a more general manner rather than for simple domain prediction. We define the notion of abstract approximation sequence: Definition: An abstract approximation sequence is a sequence d 0 , d 1 ... dn such that: ∀ i, di ∈ D# , |di| = |di-1|- 1, |dn | = 0, di > ε # (d i-1)

Abstract approximation sequences are easier to generate because the cost of the abstract computation is independent from the database size. However, this is still a sound and complete procedure: Theorem

[Ca91b]: A completed database instance v is a solution of (d 0 , C) iff there exists an abstract approximation sequence (d i ) such that dn = v.

We can also define abstract fixpoint approximation sequence, which will converge faster (there is a smaller set of abstract fixpoint approximation sequences) but with a high complexity of computation (computing Y( ε # ) is more complex).

4.2 Lazy Evaluation vs. Propagation In the next section, we shall describe an algorithm that builds an exhaustive enumeration of abstract computation sequences; thus, it is sound and complete. There are still two freedom degrees upon which the efficiency of the resolution will depend: • Goal ordering: some problems (for instance, the n-queens with a large number of queens [VH89]) demand the application of the firstfail principle, which states that the goal with the smallest domain should be tried first; some other problems, such as placement problems, hold a better order derived from the object topology. • Balance between propagation and evaluation: each constraint can be evaluated lazily just before a value is chosen for a goal, or it can be propagated so as to maintain the domains for all goals as soon as any hypothetical assignment is made. The previously mentioned first-fail principle requires some propagation so that the choice made according to cardinality is significant. In this paper we describe an algorithm that uses the first-fail principle because we have found it to be the more commonly useful, but we have also used some variations (other orders) for other problems such as [GGN90]. The cardinality of each goal (R i ,x i ) is the cardinal of the abstract set d # (R i )(x i ). Those sets will be maintained by active propagation because of the following. • Active constraints: each constraint has a mode, that is either specified by the user or inferred from some general declarations. A constraint can be lazy (it will be used with a ε -reduction step at the "last minute"), abstract-lazy (it will also be used at the last minute for a ε # reduction step), or a c t i v e . When a constraint is active, the abstract sets are dynamically reduced so that the current database instance is a fixpoint for the ε # reduction function associated to this constraint. • Negative constraints can be lazy or active. When a negative constraint is active, the negative constraint is implemented as an abstract propagation rule (cf. Section 3.3).

Active constraints or propagation rules rely on the ability to efficiently propagate an update in the database. We need to know which new pairs satisfy a rule condition (an algebraic term) when an update R i ( a , b ) is made. As was noticed in [FU76] and detailed in [PS77], incrementally computing a set of objects satisfying a given specification is analogous to mathematical differentiation. Differentiation rules have been developed for database relational algebra ([BR86] or [SKGB87]). A nice property of this relational algebra ( A (R)) is that differentiation can be introduced as a higher-order operation [Ca91a]. If we define the induced functional algebra F (R 1 ,…, Rn ) as A ( 0 ,1 ,R 1 ,…, R n ), where 0 and 1 are reserved names , each term ƒ of this algebra represents a function from O × O to P(O × O) for each database instance d. By extension, we write this function d(ƒ), which is defined by: ∀ o1 ,o 2 ∈ O, d(ƒ)(o1 ,o 2 ) = d(ƒ) in A (0,1,R 1 , ..., Rn ), where d(0) = ∅ , d(1) = {(o1 ,o 2 )) and ∀ i, d(R i) = d(Ri)

The key property of this algebra is the existence of a formal operation ∂/∂ called differentiation on A (R) × R → F (R), defined by formal rules. We write ∂t/∂R i for the differentiate of the term t according to R i .The interest of differentiation holds in this result: T h e o r e m : • ∀ d ∈ D, ∀ Ri ∈ R, ∀ (o1,o2) ∈ O × O, ∀ t ∈ A(R), if (o1 ,o 2 ) does not belong to d(R i ) and if we define a database instance d' by d'(Ri) = d'(Ri) ∪ {(o1 ,o 2 )) and d'(Rj) = d'(Rj) for all other j: d'(t) = d(t) ∪ d'(∂t/∂R i)(o 1 ,o 2 ) • ∂ t / ∂ R i is the smallest term from F (R) which satisfies the previous equation (any other similar term represents a function that always contains ∂ t/ ∂ R i ) . The idea of differentiation can be found in the RETE algorithm [Fo82], where it is a graph operation, or for relational databases, where it is defined by a database computation [BR86]. In this model, we obtain a f o r m a l differentiation (on abstract functions instead of database instances), which provides a better implementation. More details and correctness proofs may be found in [Ca91a]. As explained in [Ca89], the differentiated terms can be in turn compiled into efficient low-level functions.

4.3

Resolution

Algorithm

We shall now describe a resolution algorithm that produces one (possible) solution to a set of constraints, rules, and production rules. The first step is to compute an approximation of the fixpoint, using the abstract

interpretation. We then start the enumeration of all completions, using the first-fail principle. The propagation is based on two operations: • The function obtained by d i f f e r e n t i a t i o n , ∂ T / ∂ R i ( x , y ) , returns the exact set of pairs that appears in d(T) when (x,y) is added in d(R i ) (cf. previous section). • Similarly ∂ # T / ∂ R i ( x , S ) returns a set of pairs (x',S') where S ' is an abstract interpretation of d(T)(x'), which uses the new value S given to d ( R i )(x). Notice that this is just a convenient notation (there is no "differentiation" with abstract interpretation since we must have S' = d # (T)(x') because the new value S is not a positive update). We use an exception-handling mechanism described in [Don90], which catches contradictions raised either by the detection of an empty set of possible values or the violation of an integrity constraint. We may now describe the algorithm (the database instance d, the constraints, rules, and production rules are global resources), which solves a list of given goals. The resolution algorithm [Ca91c] uses two steps: Predict(L) and Enumerate(L). Predict(L) computes (semi-naive iteration) the fixpoint Y( ε # ) [Ca91b] for the goals in L. We apply each reduction step ( ε # ) for each relevant rule until no further reduction can be performed. E n u m e r a t e(L) builds all the possible approximation sequences using a ε -reduction step for lazy constraints, a ε # -reduction step for abstract lazy constraints and a Y( ε # ) reduction step for active constraints. The backtrack mechanism relies on the ability to make copies of the database and return to previously stored states. Fortunately, this is supported efficiently in the LAURE system [Ca91a]. Whenever a new fact R i (x,y) is obtained (for instance, a choice R i (x) = y is made by the constraint solver), it is propagated using differentiation to activate all relevant production rules and using the abstract computation to reduce the domains of the current goals. This algorithm is sound and returns one possible solution. With a minor modification, we can use it to build the set of all solutions. The completeness relies on the fact that the algorithm builds all computation sequences because of the respective properties of abstract interpretation and differentiation. Since solutions are complete computation sequence by definition, the result follows. Notice that because of the characterization of Section 3.1, a corollary is that the algorithm finds the exact set of all solutions for a "pure" constraint problem(d 0 , C) with no production rules. The implementation has been described in other papers [Ca89,91a] with more details. Here we just give some principles, that may explain some of the good results that will be presented in the next section. Each rule or constraint is transformed into an equivalent algebraic form which holds no logic variables and can be compiled into low-level (C) code. The ability of direct procedural compilation is one great advantage of combinatoric logic. The translation is performed before run-time, and the algebraic form is actually the representation of an imperative computation. The practical

application of this property is presented in [Ca89], which shows how compiled demons are produced from logical rules. Each rule, axiom or constraint is actually stored as a set of equivalent demons, stored in the demon attributes of the relations. The logic language L 3 is implemented as an extension of the LAURE programming language. Resolution algorithms are just reduced to triggering the right compiled functions, with very little overhead, and the unification (top-down) or pattern-matching (bottomup) work is entirely performed during the compilation. The current implementation uses LAURE's own main-memory object management. This makes LAURE a deductive database language rather than a true database. However, we have tried large problems using virtual memory and obtained good results (task assignment problems, large transitive closures). We believe that our resolution techniques are well suited for a large volume of information. The main reason is the tight coupling with the object-oriented model, which supports the efficient handling of a large domain through set organization. Our current work is to use C++ as a common object layer, to interface LAURE with a commercial OODBMS. This will add persistency to LAURE objects and complete our database system.

5. Results 5.1 Constraints Classics Comparing LAURE with other languages is difficult. When all solutions are needed, our experiments have found that ad-hoc (non declarative) PROLOG programs were faster than most constraint solvers. For finite domains constraints, when one solution is needed, the CHIP [VH89] system is a better candidate. Since we have used LAURE and (compiled Quintus 3.0) PROLOG on a SUN Sparc1, we divided numbers given in [VH89] by 10, to take into account the better hardware and the better performance that new implementations of CHIP-like systems now achieve and we also modified the times reported with other machines in [VD91] and [AB91] (Figure 1). Since this is an empiric approximation, the results are only significant by their order of magnitude. For instance, constraint solvers are usually not very fast to find all the solutions of the 8-queens problem (an ad-hoc PROLOG program is faster with 330ms for all solutions). The result in the first example of Figure 1 says that a naive LAURE program (three small constraints) is faster than the complex PROLOG program. When the number of queens grows, a constraint-resolution strategy starts to pay (the previous PROLOG program fails). The second example is the classic cryptarithmetic puzzle SEND+MORE = MONEY. The domain reduction methods used in the CHIP system [VH89] allow the reduction of those sets of possible values in such a way that only one backtracking is necessary to find the solution. This result is also

obtained with LAURE, which was expected since our abstract interpretation captures the dual representation of the CHIP system. The total computation time for LAURE is 12ms with active constraints and 6ms with abstract lazy constraints, which compares correctly to the CHIP a p p r o a c h . All the execution time for active constraints is spent in the abstract fixpoint computation, but compilation gives performance that is almost comparable to a hand-coded implementation (such as CHIP's domain reduction procedure). On some other constraint problems for which there is a good strategy, LAURE performs even better when compared to other constraints solvers. For instance, LAURE solved the 8th magic series problem [VH89] in 170 ms, whereas newer CHIP compiled implementations are in the 1s range on a SUN4. Problem

LAURE

other

8 queens (all solutions)

200 ms

330 ms

64 queens (one solution)

200 ms

250 ms CHIP* [AB91]

SEND+MORE=MONEY (one solution) [VH89]

6 ms

6 ms

8th magic series (one solution)

170 ms

1.3 s

House problem (all solution)

50 ms

120 ms CHIP/10 [VH89]

Prolog

CHIP/10 cc(fd) [VD91]

Figure 1: Classics (SUN SPARC1 Workstation)

5.2 Real-Life Problems Our conclusion from these small problems is that LAURE is an efficient constraint solver, in the same range (at least) of the newer and faster implementation of the CHIP system. This is obtained with a more general system, which supports constraints on object hierarchy and can be extended with production rules. When production rules are used to help the constraint resolution (such as for the magic series), performance can be increased by an order of magnitude. Similar results were also obtained with the PECOS language [PA91], which also combines constraints and objects. Real-life (or larger) problems are more interesting, but it is also more difficult to set up a fair comparison. In the rare cases where the problem is totally exposed, there are still many tricks used in testing programs which modify performance significantly. A good example is the building of a five-segment bridge, taken from Bartusch's PhD thesis and reported by P. Van Hentenryck in [VH89]. This problem is made of 46 tasks, with precedence, sharing, and domainspecific constraints. Its interest is to be both a representative problem and a well-described one. We first wrote a simple, declarative LAURE program, with five generic constraints that were instantiated into 250 objects.

Following the indications in [VH89], we used an extra attribute (ordering on shared tasks), so as to take care of mutual exclusion first. It is a wellknown strategy that, when each pair of exclusive tasks have been ordered, the optimal schedule is found through a graph computation of the atleast and atmost date for each task [VH89]. Although LAURE has no specific knowledge about how to solve inequations, abstract interpretation of the scheduling constraints (incremental updates of the abstract domains as soon as any new hypothesis is made) mimic the deterministic computation of the a t l e a s t and a t m o s t date. This is done with similar efficiency to the original CHIP implementation. To extend LAURE with some knowledge about scheduling problems, we added a set of four axioms that incrementally compute the two extra attributes atleast and atmost and we ordered the tasks according to their durations. This new program is much faster. It finds the optimal solution in 200 ms and its proof of optimality in 2 s on a SUN Sparc1. This is slightly faster than the 3s reported in [AB91] or the 5s obtained with cc(fd). However, for instance, using redundant constraints and a better organization (different from the one in [VH89]), a PECOS program only needs 1.5s. For this problem, LAURE is also in the same range of efficiency, and we keep a definite conclusion for a more detailed analysis. LAURE is now being tested in a very large scheduling-assignment telecommunications problem (many thousands of tasks). In this problem, the complexity barrier is too high, and we need to make some simplifications, using heuristics. These heuristics involve various additional pieces of information that can be computed with production rules. In this large example, we actually use the production-rule paradigm in its full generality [Ca89] (the conclusion of the production rule can be any expression of the host language), which is quite versatile but brings us even further from a clean declarative semantics. The production rules are the interface between the constraint resolution and a library of algorithms described with m e t h o d s in the LAURE object-oriented language. The efficiency of this hybrid approach is guaranteed by the compilation of production rules through differentiation. We have also used LAURE for automatic layout problems [GGN90], where the disjunctive abstract interpretation gives very good results. Those two last examples show the real problems which LAURE is eventually used for. Unfortunately, it is more difficult to evaluate performance, both because those two problems do not fit readily into existing available constraint solver, and also because the comparison should include the time spent in development. Ultimately, to goal of a constraint solver is to provide performance which is comparable to a low-level implementation at a much lower cost.

4.3 Comparison with Related Work LAURE is a general constraint solver for order-sorted finite domains. Although it may be seen as a Constraint Logic Programming language,

LAURE is very different from CLP( R ) [CLP], PROLOG-III or other systems intensively based on Horn-clause resolution because it uses no logic variables and deals with global constraints. The LAURE resolution strategy, including the prediction phase, is directly inspired by work performed at the ECRC on the CHIP system [DSVH87] [VH89]. The main contribution is to extend integer techniques related in [VH89] to arbitrary order-sorted domains, using abstract interpretation. Two other important contributions are the algebraic framework, which supports efficient compilation, and the integration with production rules, which permits domain-specific extensions. A lot of work has been done to integrate constraint resolution into logic programming languages: real number constraints into Horn-clauses (CLP( R ) family); integer constraints into order-sorted logic (LIFE [AKP90]); finite (flat) domains (actually integer) constraints into PROLOG (CHIP family [VH89]); or various constraints theories into DATALOG [KKR90]. Our work is at a lower level since we deal with the actual resolution of constraints, not their integration into another language. For instance, if we use this work as a theory of constraints over an order-sorted domain, it makes perfect sense to study CLP(O) [JL87], where O is our order-sorted domain. This is actually a subject for further research, such as using this constraint theory for the constraint scheme used for LIFE [AKP90]. As a contribution to the theory of constraints over an order-sorted domain, this approach is very general and could be adapted for any objectoriented constraint solver. It is our intuition that most efficient constraint solvers on object-oriented domains use techniques to reduce the search space that could be easily described with an abstract interpretation. Representing a domain reduction algorithm as an abstract interpretation has two advantages: it gives a consistency result [CC77], and it gives tools to compare and combine techniques using the lattice of abstract interpretations. It is still an open question to find out if introducing production rules is an overkill. Because there is no simple semantics for productions rules, some of the declarativeness is lost when we introduce production rules. On the other hand, we have found that most heuristics translate easily into production rules, because of their expressive power (we have also found that trying to mimic constraint resolution with production rules is very inefficient). Our current research is to identify a smaller production rules language (only the propagation rules cause problems). There are many possible syntactical restrictions that would yield a nicer semantics, but we do not know yet if we would still be able to use them for real-life situations. From a practical perspective, LAURE is close to commercial expert-system shells such as ART or pro-Kappa. However, LAURE includes a sophisticated constraint solver which we think is mandatory in an object-oriented deductive system. In addition, we have found LAURE to be much faster on pure production-rules applications, such as small expert-system benchmarks. Because differentiation produces only the new relevant objects, propagation is performed with the minimal number of

computations [BR86], whereas the graph of nodes built into RETE is only an approximation of this property. In addition, differentiation produces a sequence of computations that can be executed without any special data structure, whereas the RETE algorithm requires a complex graph structure to be maintained. This work shares many similarities with many theoretical approaches to merging object-oriented and logic programming [Ku85] [AK89] [AKP90]. The closest example is the revisited O-logic [KW89], which is also intended as a framework for AI programming. The LAURE model is a strict subset of this more ambitious model but is able to propose realistic implementation techniques from these restrictions [Ca91a]. In addition, the LAURE model allows explicit representation of disjunctive information, which we think is a very useful feature for deductive object-oriented database applications (cf. Section 1.1). There is some similarity (in the motivations) between the LAURE data model and OR-objects for representing disjunctive information [INV90] although the hierarchical aspect (order-sorted domain) plays a critical role in the LAURE model [Ca91a]. Another subject for further research is the adaptation of the constraint resolution theory, upon which LAURE is based [Ca91b], to OR-objects.

6.Conclusion This paper has identified a research domain, investigated three technical issues and proposed a proven solution. We have motivated the integration of global constraints into object-oriented database from an application point of view. We have proposed a semantics for objectoriented constraints, which translated easily into a relational algebra, thus giving tools for optimization and compilation. We developed an abstract interpretation technique that extends previously known integer domain reduction methods to any order-sorted domain. We have argued that some room should be left to domain-specific heuristics to solve realistic large problems and that productions rules are a natural way to go. We have proposed, based on the above-mentioned techniques, a resolution algorithm that has proven to be efficient in practical examples.

Acknowledgment This work has benefited from fruitful discussions with Hassan Aït-Kaci and Pascal Van Hentenryck. I would like to thank Drew Adams, Francois Monnet and Diane Hoffoss for their support and encouragements. I am also grateful to members of our database research group at Bellcore, including Sam Epstein, Madhur Kohli, Shamim Naqvi, and Yatin Saraya.

References [AB91] A. Agoun, N. Beldiceanu: Overview of the CHIP Compiler. Proc. of the 8th ICLP, Paris, 1991.

[AKG87] S. Abiteboul, P. Kanellakis, G. Grahne. On the Representation and Querying of Sets of Possible Worlds. Proc. of ACM SIGMOD, 1987. [AK89] S. Abiteboul, P. Kanellakis. Object Identity as a Query Language Primitive. Proc. ACM Conf. on Management of Data, 1989. [AKP90] H. Aït-Kaci, A. Podelski. The Meaning of Life. PRL Research Report, DEC, 1990. [BMM89] A. Borning, M. Maher, A. Martindale , M. Wilson. C o n s t r a i n t Hierarchies and Logic Programming. Proc. of the 6th ICLP , Lisbon, June 1989. [BR86] F. Bancilhon, F. Ramakrishnan. An Amateur's Introduction to Recursive Query Processing Strategies. Proc. ACM SIGMOD Conf. on the Management of Data, Washington, May 1986. [Ca89] Y. Caseau. A Formal System for Producing Demons from Rules. Proc. of DOOD89, Kyoto 1989. [Ca91a] Y. Caseau. A Deductive Object-Oriented Language. in the Annals of Mathematics and Artificial Intelligence, Special Issue on Deductive Databases, February 1991. [Ca91b] Y. Caseau. Abstract Interpretation of Constraints over an Order-Sorted Domain. Proc. of ILPS, San Diego, October 1991. [Ca91c] Y. Caseau. Rule-Aided Constraint Resolution in LAURE. PDK91, to appear in Lecture Notes in Computer Science, 1991. [CC77] P. Cousot, R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Constructions or Approximation of Fixpoints. Proc. Fourth ACM Symposium of Principles of Programming Languages, 1977. [CLP] N. Heintze & al.: Constraint Logic Programming: A Reader. Proc. 4th IEEE Symposium on Logic Programming, San Francisco, 1987. [FU76] A. Fong, J. Ullman. Induction Variables in Very High Level L a n g u a g e s . Proc. Third ACM Symposium of Principles of Programming Languages, 1976. [Don90] C. Dony. Exception Handling and Object-Oriented Programming: Towards a Synthesis. Proc of OOPSLA'90, Ottawa, 1990. [DSVH87] M. Dincbas, H. Simonis, P. Van Hentenryck. Extending Equation Solving and Constraint Handling in Logic Programming. Colloquium on Resolution of Equation in Algebraic Structures, Austin, May 1987. [Fo82] C.L. Forgy. RETE: A Fast Algorithm for the Many Pattern/Many Object Pattern Matching Problem. Artificial Intelligence, no 19, 1982. [G&al90] M. Ganti, P. Goyal, R. Nassif, P. Sunil. An Object-Oriented Development Environment. COMPCON, Feb 1990. [HS89] R. Hull, J. Su. Untyped Sets, Invention, and Computable Queries. Proceeding of PODS-89, Philadelphia, 1989.

[INV90]

T. Imielinski, S. Naqvi, K. Vadaparty. Querying Design and Planning Databases. Proc. of DOOD91, Munich, 1991. [JL87] J. Jaffar, J.-L. Lassez. Constraint Logic Programming. Proc. ACM Symp. Principles of Programming Languages, San Francisco, 1987. [KKR90] P. Kanellakis, G. Kuper, P. Revesz. Constraint Query Languages. Proc. of 9th ACM PODS, 1990. [KW89] M. Kifer, J. Wu. A logic for Object-Oriented Logic Programming (Maier's O-Logic Revisited). Proceeding of PODS-89, Philadelphia, 1989. [Ku85] G. M. Kuper. The Logical Data Model: A New Approach to Database Logic. PhD Dissertation, Stanford University, 1985. [Ma77] A. Mackworth: Consistency in Networks of Relations. Artificial Intelligence vol.8, 1977. [McL81] B.J. MacLennan. Programming With A Relational Calculus. Rep n° NPS52-81-013, Naval Postgraduate School, September 1981. [Mel86] C.S. Mellish. Abstract Interpretation of Prolog Programs. Proc. Third Int. Conf. Logic Programming, 1986. [MS90] K. Marriott, H. Sondergaard. Analysis of Constraint Logic Programs. Proc. of the NACLP, Austin, 1990 [PA91] J.F. Puget, P. Albert: PECOS: programmation par contraintes orientée objets. Génie Logiciel et Systèmes Experts, vol. 23, 1991. [PS77] B. Paige, J.T. Schwartz. Expression Continuity and the Formal Differentiation of Algorithms. Proc. Fourth ACM Symposium of Principles of Programming Languages, 1977. [SKGB87] H. Schmidt, W. Kiessling, V. Guntzer, R. Bayer. C o m p i l i n g Exploratory and Goal-Directed Deduction into Sloppy DeltaIteration. Proc. of the Symposium on Logic Programming, San Francisco, 1987. [Ta55] A. Tarski. A Lattice Theoretical Fixpoint Theorem and its Application. Pacific Journal of Mathematics n° 5, 1955. [VHD91] P. Van Hentenryck, Y. Deville. The Cardinality Operator: A New Logical Connective for Constraint Logic Programming. Proc. of the 8th ICLP, Paris, 1991. [VH89] P. Van Hentenryck. Constraint Satisfaction in Logic Programming. The MIT press, Cambridge, 1989. [Vi86] L. Vieille. Recursive Axioms in Deductive Databases: The Query/Subquery Approach. Proc. First Intl. Conference on Expert Database Systems, Charleston, 1986.