Answering queries addressed to a rule base - Semantic Scholar

54 downloads 0 Views 136KB Size Report
For example, assume that the deductive database contains a rule which states ... But it is impossible to make a distinction between Paul, Phil, Ann on the one ...
aa a ..

Answering queries addressed to a rule base ..

Laurence Cholvy ..

ONERA/CERT 2 avenue Edouard Belin 31055 Toulouse, France

Abstract. This article deals with answer generations when querying a rule base in the context of a deductive database. A deductive database involves explicit facts, integrity constraints and general rules that allow new facts to be derived. Usually, answering a query addressed to a deductive database, involves the use of both the explicit facts and the rules. We consider here the problem of providing answers which are independent of the set of explicit facts ; that is, answers which are valid in any database state. In this case, the answer can be considered as a set of formulas de ning sucient conditions characterizing the tuples of individuals which satisfy the query.

Resume. Le probleme traite dans ce papier est celui de la generation des reponses a une question qui est adressee a la base de regles d'une base de donnees deductive. Une base de donnees deductive est de nie par un ensemble de faits explicites, de contraintes d'integrite et de regles generales, non restreintes a des clauses de Horn, qui permettent de deduire des nouveaux faits. Generalement, repondre a une question adressee a une base de donnees necessite l'utilisation des faits ainsi que des regles. Nous examinons ici le cas ou l'on veut fournir des reponses qui ne dependent pas d'un etat particulier de la base, cad qui ne dependent pas d'un ensemble de faits particuliers, et qui donc sont valables a tout instant. Dans ce cas, la reponse peut alors ^etre consideree comme un ensemble de formules caracterisant les conditions susantes pour que des n-uplets d'individus satisfassent la question.

1

1 Introduction Our aim is to study query answering in deductive databases. By deductive databases, we mean databases where there are explicit facts, general rules that allow non explicit facts (deduced facts) to be derived, and integrity constraints [GMN84]. In other words, we consider a deductive database as being composed of a fact base, also called extensional base (EDB), and a rule base, also called intensional base (IDB). The intensional part of a deductive database is usually supposed to be more or less static. It is supposed to change rarely because the rules which are stored represent the assumptions which rule the real world. On the other hand, the extensional part is supposed to be a very dynamic set because it represents snapshots of the real world. The most usual way of answering a query addressed to a rule base is to use the general rules to derive new facts from the elementary facts explicitly stored. The answer is a set of tuples which satisfy the query. We call them extensional answers. For example, assume that the deductive database contains a rule which states that employees who have been working in the company for ten years or more are given a bonus, and a rule which states that employees who work in lab A are also given a bonus. Assume that EDB contains the facts : Paul has been working for 11 years, Peter for 2 years, Phil for 15 years, Ann for 12 years, Joe works in lab A. The usual answer to the query : who receives a bonus ? is: fPaul, Phil, Ann, Joeg. This means that Paul, Phil, Ann and Joe all receive a bonus. But it is impossible to make a distinction between Paul, Phil, Ann on the one hand, and Joe on the other. However, there is a di erence between these persons : the rst three receive a bonus because they have been working for more than 10 years ; the fourth receives a bonus because he works in lab A. Furthermore, the answer depends on the current EDB, i.e., the current database state. It will probably be no longer valid after an update. For example, after adding the fact that Helen works in lab A, the previous set Paul, Phil, Ann, Joe is no longer the answer. We think that a more informative way of answering a query is to provide answers that de ne sucient conditions for tuples to satisfy the query [CD86]. We call them intensional answers. In the previous example, such an approach would lead to the intensional answer : employees who have been working for more than ten years, or who work in lab A receive a bonus. 2

Both conditions (working for more than ten years, and working in lab A) characterize perfectly any known condition for an employee to receive a bonus. De ned in such a way, intensional answers do not depend on a particular database state and are valid as long as the IDB part is not modi ed. In this sense, they are more general than extensional answers. If a user is interested in the meaning of an answer rather than in its extension, this approach must be used. For example in the context of a set of rules de ning personnel management in a company, it is interesting to provide answers which do not depend on a particular state of the company, but only on the legislation. Similar example can be found in the context of student enrolment when someone wants to know under what conditions he can register a student. The aim of our work is to study this new kind of answer. In section 2, we set out our assumptions and give the formal de nition of intensional answers. In section 3, we de ne a correct algorithm for generating them. Section 4 is devoted to examples. Finally, in section 5, we compare our approach with similar studies.

2 De nitions In this section, we aim to give a formal de nition of intensional answers without any reference to a method for generating them. We choose the predicate calculus [CC73] [Lov78] to represent the database information and the inferences which will be made from them. We de ne a rst order language L by constant symbols (which represent the constants used in the database), predicate symbols (which represent the relations) and function symbols (which represent the functions,if any, used in the database).

2.1 De nition of the database As usual, the database is de ned by : DDB = IDB [ EDB.  IDB is a nite set of L formulas representing the rules of the deduc-

tive database, and the integrity constraints. In order to consider only meaningful formulas, we suppose that they are range-restricted (see de nition in appendix) [Dem90] [Nic79]. Indeed, this class of formulas is one of the largest decidable classes of formulas which are domain-independent. [Dem90] has shown that 3

only domain-independent formulas are meaningful in the context of deductive databases. Notice that IDB formulas are not restricted to Horn clauses, i.e. their clausal forms are general clauses.  EDB is a set of ground atomic formulas of L, which represent the

tuples stored in the database.

2.2 De nition of the query and the intensional answers Let us consider an open query i.e. an open formula of L, Q(X), where X is a tuple of free variable symbols. In more usual approaches [GMN84], the answer is de ned by : kQ(X) k= fA : A is a tuple of L constant symbols and IDB [ EDB ` Q(A)g

An equivalent de nition, in the case of Horn clauses, is [DR86] : kQ(X) k= fA : A is a tuple of L constant symbols and

Q(A) is true in the least model of IDB [ EDB g

In our approach, we would like to de ne the answer as a disjunction of elementary conditions : each elementary condition being a sucient condition that a tuple A of individuals satis es the query. Thus, each elementary condition ans(X) is an open formula of L which satis es : (1) IDB ` 8 X (ans(X) ! Q(X)) The formula 8 X (ans(X) ! Q(X)) characterizes all the subsets of Q(X), thus all the conditions ans(X) that individuals A may satisfy in order to satisfy Q. Indeed, if ans(A) is true, so is Q(A). Moreover, we want this formula to be deducible from IDB only, because we want the answer to be computed without taking EDB into account. However, many formulas ans(X) which satisfy (1) are not interesting and must be discarded. They are listed below :  rstly, we notice that the query Q(X) and its variants (see de -

nition in appendix) satisfy (1). However, these possible answers are not interesting because they do not provide new conditions for characterizing the individuals satisfying the query. 4

For example, \employees who work in lab B and who earn more than 10000" is a possible answer to the query \ who are the employees who earn more than 10000 and work in lab B". It is not, however, interesting.  secondly, we notice that any formula which characterizes an empty

set of individuals (i.e. such that no tuple of individuals satis es it), satis es (1). Indeed, such formulas are characterized by : (2) IDB ` 8 X : ans(X) Any formula which satis es (2) satis es (1) too. Such formulas are not interesting answers because they are supposed to de ne conditions which individuals may satisfy in order to satisfy Q (condition(1)) and, because of (2), it is known that no individual satis es them. For example, suppose that IDB contains information about the employees of an company, such as : a technician is an employee, the highest wage of a technician is 10000 .. Then, \technicians who earn more than 15000" is a possible answer to the query \who are the employees who earn more than 15000" ; however, it is not an interesting answer because no technician earns more than 15000.

 nally, many formulas which satisfy (1) are syntactically redundant.

(We consider that a formula ans1 is syntactically redundant with a formula ans2 if ` 8 X ans1(X) ! ans2(X)) For example, if \technicians" is an answer to a given query, then the formulas \technicians who earn more than 8000", \technicians who work in lab B" ... are also answers, but they are \less general" than the rst one. Notice that we do not want to discard answers such as : IDB ` 8 X (ans1(X) ! ans2 (X)). Indeed, this condition characterizes semantically redundant formulas (in the sense that the redundance depends on the application). Such formulas are interesting for the intensional answers problem because they represent answers belonging to di erent levels of precision.

5

We can now summarize the general de nition of intensional answers by : kQ(X) k= ans1 (X) _ ans2(X) _ ... such that 8 i = 1.. - IDB ` 8 X (ansi (X) ! Q(X)) - IDB 6` 8 X : ansi (X)

- ansi is not a variant of Q(X) - there is no formula ans (X) which satis es (1), (2) (3) and such that ` 8 X ansi (X) ! ans (X) 0

0

In the following section, we consider a particular case in which we provide a correct algorithm for generating intensional answers. Notice that some other conditions could be added to re ne the definition. For example, we could assume that the user has de ned a particular language which represents his domain of interest. In this case, we could want the formulas ansi to belong to this language. The de nition of intensional answers would be de ned according to the user language. Furthermore, we could also parametrize the de nition of intensional answers so that the user de nes the answer level of details. These features are not studied here.

3 A method for generating intensional answers The assumptions we make are described in section 3.1. We then describe a correct algorithm which generates the intensional answers.

3.1 Assumptions Firstly, we suppose that queries have the form : Q(X) = 9 Y (l1 ^ .. ^ lm )(X,Y), such that : - X is a tuple of free variable symbols - Y is the (possibly empty) set of bound variable symbols, all existentially bound - l1 .. lm are literals - Q(X) is a range-restricted formula Note that this de nition corresponds to queries stating (in database terminology) : 6

 intersections : R1(X) ^ R2(X)  projections : 9 Y R1(X,Y)  cartesian products : R1(X1) ^ R2(X2)  selections : 9 Y R1(X,Y) ^ R2(Y)  joins : 9 Y R1(X1,Y) ^ R2(Y,X2)  di erences : R1(X) ^ : R2(X)

In this case, it is natural to expect that the elementary conditions in the intensional answer have the same form as the query i.e. conjunction of literals existentially quanti ed. Indeed, we must remenber that formulas of the intensional answer are conditions, like the query. This leads us to de ne, in this particular case, intensional answer by : kQ(X) k= ans1 (X) _ ans2(X0 _ .. such that : 8 i = 1 .. (1) IDB ` 8 X (ansi(X) ! Q(X)) (2) IDB 6` 8 X : ansi (X)

(3) ansi is not a variant of Q (4) there is no formula ans' which satis es (1) (2) (3) such that ` 8 X (ansi(X) ! ans (X)) (5) ansi(X) = 9 Y (li1 ^ .. ^ lini )(X,Y), where lij are literals 0

Secondly, we assume that IDB U f : Q(X) g is not recursive (see de nition in appendix). This assumption is added because, in some recursive sets, it may happen that an in nite number of formulas ans1 , ans2 .. satisfy the previous ve criteria, as illustrated by the following example : If IDB = f 8 x 8 y father(x,y) ! ancestor(x,y), 8 x 8 y 8 z father(x,y) ^ ancestor(y,z) ! ancestor(x,z) g, Q(x,y) = ancestor(x,y) Then any formula of the following form : 9 y1 .. 9 yn father(x,y1) ^ father(y1,y2) ^ .. father(yn,y) belongs to the intensional answer of Q(x,y). Although intensional answers are not always in nite in all recursive cases, we felt that the recursive case should be studied apart, and we 7

prefered to focus on non-recursive cases. This explains our assumption of discarding queries such that IDB U f : Q(X) g is recursive. Finally, note that discarding recursivity obliges us to limit ourselves to a predicate calculus without equality : indeed, some of the axioms de ning equality are recursive.

3.2 Presentation of the algorithm We assume that the reader is familiar with the notions used in the mechanical theorem proving domain such as : clauses, resolution rules, resolution strategies, subsumed clauses ... These notions are de ned in [CC73] and [Lov78]. We call C the Skolemization function which associates a set of rst order formulas to one of its clausal form and X0, a tuple of new constant symbols (Skolem constants). The algorithm is mainly based on the following equivalence : IDB ` 8 X (ans(X) ! Q(X)) () C(IDB) [ C(: Q(X0)) [ C(ans(X0)) is refutable by Resolution The initial idea was to generate the resolvents of C(IDB) [ C(: Q(X0)). Any set of clauses which refutes them is a potential answer in the sense that it satis es the previous condition. However, this idea has been re ned in the following way :  we have proved that considering resolvents of C(IDB) [ C(: Q(X0))

which are not resolvents of C(IDB) only, is sucient. We call these resolvents pertinent clauses and note them R(X0,Y).

 we have shown that, instead of considering all the formulas which refute a pertinent clause R(X0,Y), considering : 9 Y : R(X,Y) is

sucient.

 a pertinent clause R1(X0,Y) subsumes another one R2(X0,Y) if and

only if the potential answer provided by the latter is redundant with the potential answer provided by the former i.e., ` 8 X (9 Y : R2(X; Y) ! 9 Y : R1(X; Y)). So, because of condition (5), removing subsumed pertinent clauses is necessary, and sucient to discard redundant answers.

 pertinent clauses R(X0,Y) which are tautologies must also be eliminated because the formulas 9 Y : R(X,Y) do not satisfy condition

8

(2). However, tautologies are not the only resolvents providing formulas which do not satisfy (2). And a test remains to be done.  nally, the correctness of the algorithm is not challenged when elim-

inating pertinent clauses R(X0,Y) which are tautologies, or which are subsumed by another pertinent clause, or such that 9 Y : R(X,Y) do not satisfy (2).

We rst based our algorithm on a saturation strategy with a set of support. It is a quite simple strategy, but rather expansive. Finally, we adopted another algorithm, which is still a kind of saturation, augmented with a strategy which prevents to store intermediate resolvents (see Ordered linear strategy in [CC73]). Clauses are generated by developping, from C(: Q(X0)), the tree of all the possible linear resolutions. The completeness of this strategy has been proved. But we don't obtain major improvment of performance since these two kinds of resolution strategies are of the same order. The main functions are : function intensional-answers (Q) RES resolvents ( C(IDB),C(: Q(X0))) if RES = unsat then return (nil) print(\any formula belongs to the intensional answer of Q") else, answer nil for any clause R(X0,Y) belonging to RES do ans 9 Y : R(X,Y) if IDB 6` 8 X : ans(X) then answer answer [ f ansg return(answer)

The function which generates all the pertinent resolvents is : function resolvents (C(IDB), C(: Q(X0))) clist C(: Q(X0)) i 1 unsat false while (i  length(clist) and : unsat) do cl clist(i) res all clauses obtained :

9

- by solving any permutation of cl on its last literal with a clause of C(IDB) - by reducing it with the reduction rules of the ordered linear resolution strategy - by deleting tautologies if empty clause doesn't belong to res then clist list of clauses obtained - by adding clauses of res to clist - by deleting the subsumed clauses i i+1 else unsat true if unsat then return(unsat) else return(clist) The intensional answers are provided by calling the function intensionalanswers. This algorithm stops since C(IDB) [ f: Q(X)g is not recursive [Lew]. Notice that the test IDB 6` 8 X : ans(X) is decidable, since C(IDB) [ C(ans(X0)) is not recursive. Indeed, C(IDB) is not recursive and C(ans(X0)) is a set of literals.

4 Examples 4.1 First example (Horn clauses) Let us consider a rst order language whose predicate symbols are : mother(x,y), father(x,y), parent(x,y), grandparent(x,y), and secondly let us consider the following set of rules : 8 x 8 y mother(x,y) ! parent(x,y) 8 x 8 y father(x,y) ! parent(x,y) 8 x 8 y 8 z parent(x,y) ^ parent(y,z) ! grandparent(x,z)

They state that a mother is a parent, a father is a parent, the parent of a parent is a grandparent, nobody can be his own mother or his own father. Let us consider the question Q(x,y) = grandparent(x,y). Then the algorithm provides the answers : 10

 9 z parent(x,z) ^ parent(z,y)  9 z parent(x,z) ^ mother(z,y)  9 z mother(x,z) ^ parent(z,y)  9 z parent(x,z) ^ father(z,y)  9 z father(x,z) ^ parent(z,y)  9 z mother(x,z) ^ mother(z,y)  9 z father(x,z) ^ mother(z,y)  9 z mother(x,y) ^ father(z,y)  9 z father(x,z) ^ father(z,y)

which respectively state :  if x is the parent of z, who is the parent of y, then

x is the grandparent of y

 if x is the parent of z, who is the mother of y, then

x is the grandparent of y

 and so on ...

4.2 second example (non-Horn clauses) Let us consider a rst order language whose predicate symbols are : cat(x), dog(x), animal(x), black(x), white(x), and the following set of rules : 8 x animal(x) ! cat(x) _ dog(x) 8 x cat(x) ! black(x) _ white(x) 8 x dog(x) ! black(x)

They state that an animal is a cat or a dog, a cat is black or white, a dog is black. Let us consider the question : Q(x) = animal(x) ^ black(x) i.e. which are the black animals ? The algorithm provides the following answers : 11

 animal(x) ^ : cat(x)  animal(x) ^ dog(x)  animal(x) ^ : white(x)

which respectively mean :  animals which are not cats are black animals  animals which are dogs are black animals  animals which are not white are black animals

In this example, some resolvents have been deleted because they were subsumed by others. For instance, : animal(x0) _ : cat(x0) _ white(x0) has been deleted. It would have given the redundant answer : animals which are cats and which are not white, which is redundant with : animals which are not white.

4.3 Third example (function symbols) Let us consider the rules : 8 x employee(x) ! wage(x, w(x)) 8 x manager(x) ! employee(x) 8 x 8 y manager(x) ^ wage(x, y) ! gt(y, 20)

They mean that employees have a wage, managers are employees and manager wages are greater than 20. We consider the query : Q(x) = employee(x) ^ gt(w(x), 20) i.e. who are the employees whose wage is greater than 20 ? The algorithm gives the following answers :  employee(x) ^ (9 y manager(y) ^ wage(y, w(x)))  manager(x)

which can be interpreted as :  employees who have a manager's wage earn more than 20  managers earn more than 20

In this case, some potential answers have been eliminated because they were redundant. For example "employees who are managers" has been eliminated. 12

5 Comparison with other studies As far as we know, the problem of providing intensional answers has never been studied. However, many studies aim to provide formulas as answers, for di erent reasons. We list some of them below.

5.1 Informative answers This term is used in [Jan81] and [GM87] to de ne explanations which could be provided when the answer to a query is empty. For example, if, in the database, no employee is older than 55, then the query `who are the employees of Computer Science department who are older than 60 ?" has an empty answer, because the subquery \who are the employees who are older than 60 ?" has an empty answer. This last sentence will provide an informative answer. These explanations are generated from the initial query, according to a syntactic criterium (subformulas of the query), and, possibly, simpli ed by integrity constraints. Although the two problems aim to create more cooperative systems, the problem of informative answering and the problem of intensional answering are di erent : in the rst case, answers are explanations to a failure, and in the second, answers are sucient conditions of success.

5.2 Intelligent query answering The problem posed by Imielinski in [Imi87] is to optimize a query, addressed to a deductive database. The answer to a query Q which is addressed to the deductive database IDB [ EDB is, theoretically, de ned by the answer of Q as if it were addressed to the classical database (we note it IDB(EDB), obtained by saturating EDB with rules of IDB. Since this computing must be avoided, one must nd a way to compute the answer eciently. For this, Imielinski suggests providing new rules (we note them IDB'), from IDB rules and from the query, such that, if they are used to saturate the answer to the query when it is computed on EDB only, then they provide the same answer. In the general case, not all IDB rules can be transformed by the query, and some of them must be used, at rst, to saturate EDB. Let us note R the rules of IDB which cannot be transformed, and R' the others. As the answer to a query Q, Imielinski suggests providing both an extensional part : R(EDB) (i.e the set obtained by saturating EDB with 13

the rules which cannot be transformed), and an intensional one : R' (i.e. the rules which are transformed). Although they seem to be similar, the problems of intelligent query answering and of intensional answers are di erent. Indeed, their aims are di erent and the answers provided are di erent. The intelligent query answering is introduced for performance optimisation in order to avoid long computation. This is not the aim of intensional answers. Furthermore, answers provided by an intelligent query answering are de ned in terms of the query predicate, and have a particular form. So answers provided in the two approaches are di erent.

5.3 Query compilation The term \query compilation" refers to studies whose aim is the following : given a set of axioms (rules) and given a query, the problem is to transform the query in an equivalent set of queries whose evaluation will be less expansive. The comparison between query compilation and intelligent answers concerns the techniques used in both studies. 5.3.1

Reiter's work

The problem de ned by Reiter [Rei78] is a problem of query compilation : given a set of deductive rules and a query, the problem is to generate a set of queries, such that the union of their answers, when evaluated on the database state, is the answer of the initial query. The answers obtained after compilation are supposed to be expressed in terms of basic relations (i.e. relations stored in the base), thus their evaluation may be performed in an ecient way by the evaluator of a DBMS. The context of this study is a rst order language, without function symbols, but with equality. The deductive rules may be non-Horn. The method which Reiter uses for generating such queries is based on a refutation of IDB [ f : Q(X) g, developping a tree of linear resolutions, which provides queries. Because, he doesn't discard equality, he has to keep the substitutions on variables X. Let us illustrate this method on the following example : IDB = f 8 x calculus(x) ! teach(A,x) , 8 x 8 y 8 z enrolled(x,y) ^ teach(z,y) ! teacher-of(x,z) g Q(x) = teacher-of (a,x) (the question is : who are a's teachers ?) 14

The refutation of C(IDB) [ f C( : Q(x)) g leads to the following queries : - Q1(x) = 9 y enrolled(a,y) ^ teach(x,y) - Q2(x) = (x = A) ^ (9 y enrolled(a,y) ^ calculus(y)) This means that, the union of Q1 answers and Q2 answers (i.e. the set of persons who teach courses a is enrolled in and, possibly, an individual \A", if a is enrolled in a calculus course) are Q answers. (Q2(x) has been obtained by keeping the substitution which has been realized on x). The queries generated in this approach could be seen as intensional answers. But, not all intensional answers are provided by this algorithm. Indeed, intensional answers are de ned from pertinent clauses, and the development of the tree of linear resolutions do not provide all the pertinent clauses (because the clauses are solved on their last literal). However, we are presently working on adapting this kind of strategy for generating intensional answers by developping, from C(: Q(X0)), the tree of all the possible ordered linear resolutions. This means that, at each level, a clause is solved not only on its last literal, but all its permutations are solved on their last literals. The completeness of this strategy remains to be proved. Notice that, as far as we know, Reiter's method for compiling queries does not attack the problem of optimizing the compilation by removing redundant and empty queries, as it is done in our algorithm. The same kind of elimination strategy should improve his method. 5.3.2

Partial evaluation

The aim of partial evaluation in logic programming [Ven] [TF86] is to optimize computation. Given a logic program (set of PROLOG sentences), and given a open query (goal), the problem is to nd another program (generated from the initial set of clauses and from to the query), such that computing it on the set of facts will be less expansive than computing the initial query. The main techniques used to partially evaluate a program are based on Resolution. So the method used for generating intensional answers is similar to the one used in partial evaluation. However, as far as we know, it deals only with logic programs i.e. set of Horn clauses (possibly extended with cut and built-in predicates), and does not suggest a solution for non-Horn clauses. This restriction is not assumed in intensional answer generation.

15

5.4 Hypothesis generation A work which seems similar to ours is that of Morgan [Mor71] [Mor75]. In these papers, he dealt with the problem of hypothesis generation : given a set of formulas S and a formula F, the problem is to nd the formulas H which must be added in order to deduce F. For example, nd the formulas which must be added to fdogs are mammalsg in order to deduce \dogs are hairy". Once they are formalized, the problems of hypothesis generation and of querying a schema are similar, as shown below. In the case of hypothesis generation, given a set S of closed formulas, and a closed formula F, the problem is to nd closed formulas H such that : S [ fHg ` F. In the case of intensional answers, given a set of closed formulas IDB and an open formula Q(X), the main problem is to nd open formulas ans(X) such that : IDB ` (8 X ans(X) ! Q(X)). This leads to the problem : let X0 be a tuple of new constant symbols, nd closed formulas ans(X0) such that : IDB [ fans(X0)g ` Q(X0). This proves that the formal problems are similar : if we do not consider the other conditions the answers must satisfy, the problem of intensional answer generation is a particular case of the problem of hypothesis generation. In [Mor71], Morgan indicates a way of solving this problem. He transforms the initial problem into the following : nd H such that F _ : S j=f H where j=f is a dual notion of logical consequence (usually noted j=) de ned by : F1 j=f F2 i every interpretation which falsi es F1 falsi es F2. He de nes a notion of f-resolution (f-clauses, f-resolvents, ..) which is such that every f-resolvent f-res(c1,c2) of two f-clauses c1, c2 veri es : c1 _ c2 j=f f-res(c1,c2). Thus, the application of this inference rule to F _ : S, produces all the hypotheses. The remaining problem is to select, among these formulas, the \adequate" ones.

5.5 Negative explanations Another problem we can present concerns the generation of negative explanations [Sau87]. Its context is the deduction systems area (expert systems ..), and it is de ned as follows : the user of a system is sometimes surprised by the solution (or answer) provided and may wonder 16

why the solution he expected has not been provided. In such a case, the system has to explain why the user's solution has not been found (\why-not" problem). This problem is very similar to that of hypotheses generation : indeed, explaining why a solution has not been found (i.e why a formula was not deducible) leads to providing the missing formulas (i.e hypothesis), in order to nd it.

6 Further work and conclusion As a conclusion, we will insist on the interest of our approach, in the case of queries addressed to the rules of a deductive database, or, more generally, to a rule base. Such an approach is interesting when we intend to provide answers which do not depend on a particular state of fact base. The algorithm we presented is based on a theoretical study, and provides all the answers, under the assumptions we made. As already noted, no assumption is made on the clause forms, i.e. the algorithm is de ned for general clauses. However, we have also studied the particular case where all the clauses are Horn clauses : in such a case, we based our algorithm on a Resolution strategy which is the PROLOG strategy [PC88] Our work has left many problems unsolved and has de ned many directions of investigation. Indeed, any extension which would generalize the context of intensional answers will be interesting. For example, it would be interesting to take the equality into account, and also, to extend to recursive cases. It would also be interesting to stratify the intensional answers into \levels", corresponding to more and more detailed conditions. At each level, the user could stop the process of answer generation if he considered the formulas he had obtained were detailed and explicit enough. Finally, this problem could bene t from studies which aim to take into account the \domain of interest" of the user, in a deduction process. In our case, it leads to a consideration of what interests the user (which are the notions he wants to deal with in his application ?), in order to provide only answers which are pertinent to him. This could be another solution to limit the size of the answers.

17

a This work was supported by the Programme de recherches cordonnees and by the ESPRIT program, ESTEAM project. The rst part of this work was done with R. Demolombe who suggested considereing range restricted formulas and who introduced the term intensional answers.

References [CC73]

R. Lee C. Chang. Symbolic Logic and Mechanical Theorem Proving. Academic Press, 1973.

[CD86]

L. Cholvy and R. Demolombe. Querying a rule base. In Proc. of 1st Int. Conf. on Expert Database Systems, 1986.

[Dem90] R. Demolombe. Syntactical characterization of a subset of domain independant formulas. Journal of ACM, 1990. [DR86]

R. Demolombe and V. Royer. Evaluation strategies for recursive axioms : an uniform approach. Technical report, ESPRIT Project ESTEAM-316, 1986.

[GM87]

A. Gal and J Minker. Informative and cooperative answers in databases using integrity constraints. Technical report, University of Maryland, 1987.

[GMN84] H. Gallaire, J. Minker, and J. M. Nicolas. Logic and databases : a deductive approach. ACM Surveys, 16(2), 1984. [Imi87]

T. Imielinski. Intelligent query answering in rule based systems. Journal of logic programming, december 1987.

[Jan81]

J. M. Janas. On the feasibility of informative answers. In Advances in database theory. Plenum Press, 1981.

[Lew]

H. R. Lewis. Cycles of uni ability and decidability by resolution. Technical report, Harvard University.

[Lov78]

D. Loveland. Automated theorem proving. North Holland, 1978.

[Mor71] C. Morgan. Hypothesis generation by machine. Arti cial Intelligence, 2, 1971. 18

[Mor75] C. Morgan. Automated hypothesis generation using extended inductive resolution. In Proc of IJCAI, 1975. [Nic79]

J.M. Nicolas. Contributions a l'etude theorique des bases de donnees. PhD thesis, Universite Paul Sabatier, Toulouse France, 1979.

[PC88]

E. Pascual and L. Cholvy. Answering queries addressed to the rule base of a deductive database. In IPMU conference, 1988.

[Rei78]

R. Reiter. Deductive question-answering on relational database. In Logic and data bases. Plenum Press New-York, 1978.

[Sau87]

C. Saurel. Explineg1 : une methode de generation d'explication negatives dans les systemes a base de connaissances. In Proc of actes des journees de programmtion en logique (CNET), 1987.

[TF86]

A. Takeuchi and K Furukawa. Partial evaluation of prolog programs and its application to meta-programing. In Proc of IFIP. North Holland, 1986.

[Ven]

R. Venken. A prolog meta-interpreter for partial evaluation and its application to source to source transformation and query optimization. In Proc of ECAI. North Holland.

19

Appendix Variant formulas A clause is a variant of another one if it is obtained from it by renaming variables, Skolem functions or constants and by permuting literals. A set of clauses S1 is a variant of another set of clauses S2, if clauses of S1 are variants of clauses of S2. A formula F1 is a variant of another formula F2 if there is a clausal form of F1 which is variant of a clausal form of F2. Range-restricted formulas Let F be a formula under a prenex conjunctive normal form. F is range-restricted if :

1. a free variable appears in a positive literal of every disjunction in which it appears, 2. if an existentially quanti ed variable appears in a negative literal, then there is a disjunction all of whose literals are positive, such that this variable appears in all these literals 3. an universally quanti ed variable appears in a negative literal of any disjunction where it appears Recursive sets of clauses Our notion of recursivity is based on a binary relation, R, de ned as follows. Let us consider a set S of clauses, and let P1(..) _ .. _ Pn(..) _ : Q1(..) _ .. _ : Q(m) be a clause in S (Pi and Qj are predicate symbols, and Pi(..) and : Qj(..) are literals). Then, we de ne the following tuples :  : Pi R Pj for every pair




 : Pi R : Qj for every pair




S is recursive i there is a predicate symbol P such that : P R* P or : P R* : P (R* is the transitive closure of R). 20

Suggest Documents